Feature Scaling - Standardization vs Normalization
- Why are these so important?
The quality of a Machine Learning is decided by the quality of data we are providing to it.
Each and every value of each and every column has an impact on the model. The greater the difference between the magnitudes of values of the features , lower will be the accuracy of model.
Lets understand it with an example.
Lets consider a data set with two major features : Age and Income.
Here, what is happening is that the Age is ranging from 0–100, whereas the Income ranges from 0 to about 1-10 Lac.
It can be observed that the Income is about 10,000 times larger than age. So, these two features vary greatly in magnitude. When we do further analysis, like multivariate linear regression, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean that it is more important factor for prediction than age. Therefore, the range of all features should be scaled so that each feature contributes approximately proportionately to the final distance.
To make all the features uniform in magnitude, Data Transformation or Feature Scaling is essential.
In this article, we will be discussing what, why of feature scaling, the techniques to achieve feature scaling, it’s usefulness, and python snippet to achieve feature scaling using these techniques.
Contents
+ Feature Scaling
+ Normalization
+ Standardization
+ Implementation
+ When to use what?
What is Feature Scaling?
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.
However, every dataset does not require features scaling. It is required only when features have different ranges.
This can be achieved using two widely used techniques.
- Normalization
- Standardization
Normalization
Normalization (also called, Min-Max normalization) is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data.
Normalized form of each feature can be calculated as follows:
Standardization
Implementation
About the dataset:
This data set is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
Importing dataset using pandas
The data looks like this :
Now understanding the Alcohol and Malic columns in detail.
Output:
Output:
Observations:
- Feature alcohol ranges between [11,15]
- Feature malic ranges in [0.5,6]
- When we perform a machine learning algorithm (like KNN) in which distances are considered, the model will be biased to the feature, alcohol. Hence, Feature scaling is necessary for this data set.
Let us now perform normalization and standardization. And compare the results.
Applying Normalization
Now, let us apply normalization to each of the features and observe the changes.
Lets perform standardization now using StandardScaler().
Visualisation of Standardizing vs Normalizing
Observations:
- In the raw data, feature alcohol lies in [11,15] and, feature malic lies in [0,6].
- In the normalized data, feature alcohol lies in [0,1] and, feature malic lies in [0,1].
- In the standardized data, feature alcohol and malic are centered at 0.
Elephant in the Room
- Comparing Normalization and Standardization
- When to use What?
- Experiment with multiple scaling methods can dramatically increase your score on classification tasks, even when you hyperparameters are tuned. So, you should consider the scaling method as an important hyperparameter of your model.
- Scaling methods affect differently on different classifiers. Distance-based classifiers like SVM, KNN, and MLP(neural network) dramatically benefit from scaling. But even trees (CART, RF), that are agnostic to some of the scaling methods, can benefit from other methods.
- Knowing the underlying math behind models/ preprocessing methods is the best way to understand the results. (For example, how trees work and why some of the scaling methods didn’t affect them). It can also save you a lot of time if you know no to apply StandardScaler when your model is Random Forest.
- Preprocessing methods like PCA that known to be benefited from scaling, do benefit from scaling. When it doesn’t, it might be due to a bad setup of the number of components parameter of PCA, outliers in the data or a bad choice of a scaling method.
No comments:
Post a Comment