Monday, July 26, 2021

Standardization vs Normalization

Feature Scaling - Standardization vs Normalization

- Why are these so important?

The quality of a Machine Learning is decided by the quality of data we are providing to it.

Each and every value of each and every column has an impact on the model. The greater the difference between the magnitudes of values of the features , lower will be the accuracy of model.

Lets understand it with an example.

Lets consider a data set with two major features : Age and Income.

Here, what is happening is that the Age is ranging from 0–100, whereas the Income ranges from 0 to about 1-10 Lac.

It can be observed that the Income is about 10,000 times larger than age. So, these two features vary greatly in magnitude. When we do further analysis, like multivariate linear regression, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean that it is more important factor for prediction than age. Therefore, the range of all features should be scaled so that each feature contributes approximately proportionately to the final distance.

To make all the features uniform in magnitude, Data Transformation or Feature Scaling is essential.

In this article, we will be discussing what, why of feature scaling, the techniques to achieve feature scaling, it’s usefulness, and python snippet to achieve feature scaling using these techniques.

Contents

+    Feature Scaling

    Normalization

    Standardization

    Implementation

    When to use what?




What is Feature Scaling?

Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

However, every dataset does not require features scaling. It is required only when features have different ranges.

This can be achieved using two widely used techniques.

  1. Normalization
  2. Standardization



Normalization

Normalization (also called, Min-Max normalizationis the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data. 

Normalized form of each feature can be calculated as follows:





Standardization

In machine learning, we can handle various types of data, e.g. audio signals and pixel values for image data, and this data can include multiple dimensions. Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines , logistic regression and artificial neural networks). The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.




Implementation

- Lets understand Feature Scaling with the help of a dataset.

The dataset that we will be using is Wine classifier data from Kaggle.

About the dataset:

This data set is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.


Importing dataset using pandas


The data looks like this :

    

Now understanding the Alcohol and Malic columns in detail.


Output: 


Output:




Clearly we can see various statistical function values and the minimum value around 11.03 and maximum value around 14.83.


Output:



Output:




Here we can see the values of features malic lie in the range of 1–6 with few outliers.


Observations:

  • Feature alcohol ranges between [11,15]
  • Feature malic ranges in [0.5,6]
  • When we perform a machine learning algorithm (like KNN) in which distances are considered, the model will be biased to the feature, alcohol. Hence, Feature scaling is necessary for this data set.

Let us now perform normalization and standardization. And compare the results.



Applying Normalization

Now, let us apply normalization to each of the features and observe the changes.



Normalization can be done using MinMaxScaler.


Applying normalization on features -> alcohol and malic:


Output : 



Applying Standardization

Lets perform standardization now using StandardScaler().



Standard Scaling the features -> Alcohol and Malic :




Output:


Visualisation of Standardizing vs Normalizing 



Output:


Observations:

  • In the raw data, feature alcohol lies in [11,15] and, feature malic lies in [0,6].
  • In the normalized data, feature alcohol lies in [0,1] and, feature malic lies in [0,1].
  • In the standardized data, feature alcohol and malic are centered at 0.



Elephant in the Room

- Comparing Normalization and Standardization

- When to use What?

“ Normalization or Standardization?” — There is no obvious answer to this question: it really depends on the application.
  • Experiment with multiple scaling methods can dramatically increase your score on classification tasks, even when you hyperparameters are tuned. So, you should consider the scaling method as an important hyperparameter of your model.
  • Scaling methods affect differently on different classifiers. Distance-based classifiers like SVM, KNN, and MLP(neural network) dramatically benefit from scaling. But even trees (CART, RF), that are agnostic to some of the scaling methods, can benefit from other methods.
  • Knowing the underlying math behind models/ preprocessing methods is the best way to understand the results. (For example, how trees work and why some of the scaling methods didn’t affect them). It can also save you a lot of time if you know no to apply StandardScaler when your model is Random Forest.
  • Preprocessing methods like PCA that known to be benefited from scaling, do benefit from scaling. When it doesn’t, it might be due to a bad setup of the number of components parameter of PCA, outliers in the data or a bad choice of a scaling method.




Thanks for the read. Lets Connect on LinkedIn


No comments:

Post a Comment

Standardization vs Normalization

Feature Scaling - Standardization vs Normalization - Why are these so important? The quality of a Machine Learning is decided by the quality...