Feature Scaling - Standardization vs Normalization
- Why are these so important?
The quality of a Machine Learning is decided by the quality of data we are providing to it.
Each and every value of each and every column has an impact on the model. The greater the difference between the magnitudes of values of the features , lower will be the accuracy of model.
Lets understand it with an example.
Lets consider a data set with two major features : Age and Income.
Here, what is happening is that the Age is ranging from 0–100, whereas the Income ranges from 0 to about 1-10 Lac.
It can be observed that the Income is about 10,000 times larger than age. So, these two features vary greatly in magnitude. When we do further analysis, like multivariate linear regression, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean that it is more important factor for prediction than age. Therefore, the range of all features should be scaled so that each feature contributes approximately proportionately to the final distance.
To make all the features uniform in magnitude, Data Transformation or Feature Scaling is essential.
In this article, we will be discussing what, why of feature scaling, the techniques to achieve feature scaling, it’s usefulness, and python snippet to achieve feature scaling using these techniques.