Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values or losing information.

Min-Max Normalization

Min-Max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1.

Process:

copy
x_normalized = (x−min(x))/ (max(x)-min(x))

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • min(x) is the minimum value of the feature across the dataset.
  • max(x) is the maximum value of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output 0 0.75 1

Unit Normalization

Unit normalization consists of dividing every entry in a column (feature) by its magnitude to create a feature of length 1 known as the unit vector.

Process:

copy
x_normalized = x / ||x||

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • ||x|| is the magnitude which is calculated as
  • ||x|| = sqrt(x1^2 + x2^2 + ……. xn^2)
  • x1, x2, x3……xn are the original values of the feature.

Example:

Sample Input 10 25 30
Sample Output 0.248 0.620 0.744

Mean Normalization

This transformer transforms the data based on the mean so that sum of the values equals to 0.

Process:

copy
x_normalized = x - mean(x) / max(x) - min(x)

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • mean(x) is the mean of feature across the dataset.
  • min(x) is the minimum value of the feature across the dataset.
  • max(x) is the maximum value of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output -0.583 0.166 0.416

Mean-Std Normalization

The data can be normalized by subtracting the mean (µ) of each feature and a division by the standard deviation (σ). This way, each feature has a mean of 0 and a standard deviation of 1. This results in faster convergence.

Process:

copy
x_normalized = x - mean(x) / std(x)

Where

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • mean(x) is the mean of feature across the dataset.
  • std(x) is the standard deviation of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output -1.120 0.320 0.800

Last Updated 2023-10-09 18:18:15 +0530 +0530