Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values or losing information.

Min-Max Normalization

Min-Max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1.

Process:

copy
x_normalized = (x−min(x))/ (max(x)-min(x))

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • min(x) is the minimum value of the feature across the dataset.
  • max(x) is the maximum value of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output 0 0.75 1

Unit Normalization

Unit normalization consists of dividing every entry in a column (feature) by its magnitude to create a feature of length 1 known as the unit vector.

Process:

copy
x_normalized =  x / ||x||

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • ||x|| is the magnitude which is calculated as
  • ||x|| = sqrt(x1^2 + x2^2 + ……. xn^2)
  • x1, x2, x3……xn are the original values of the feature.

Example:

Sample Input 10 25 30
Sample Output 0.248 0.620 0.744

Mean Normalization

This transformer transforms the data based on the mean so that sum of the values equals to 0.

Process:

copy
x_normalized =  x - mean(x) /  max(x) - min(x)

Where:

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • mean(x) is the mean of feature across the dataset.
  • min(x) is the minimum value of the feature across the dataset.
  • max(x) is the maximum value of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output -0.583 0.166 0.416

Mean-Std Normalization

The data can be normalized by subtracting the mean (µ) of each feature and a division by the standard deviation (σ). This way, each feature has a mean of 0 and a standard deviation of 1. This results in faster convergence.

Process:

copy
x_normalized =  x - mean(x) /  std(x)

Where

  • x_normalized is the normalized value of the feature.
  • x is the original value of the feature.
  • mean(x) is the mean of feature across the dataset.
  • std(x) is the standard deviation of the feature across the dataset.

Example:

Sample Input 10 25 30
Sample Output -1.120 0.320 0.800

Robust Normalization

Robust Scaler is a normalization technique that scales features using the median and interquartile range (IQR), making it less sensitive to outliers compared to standard scaling methods. It centers the normalised data around the median and scales it according to the IQR, which helps maintain the structure of data with extreme values without letting those values skew the scaling.

Process:

copy
xrobust = IQR(x)x − median(x) 

Where

  • x = original feature value
  • median(x) = median of the feature values
  • IQR(x) = interquartile range (75th percentile - 25th percentile)
  • xrobust = normalized value after robust scaling

Example

Suppose the feature values are: [10, 25, 30, 1000 (outlier)]

  • Median = 27.5
  • IQR = 20 (between 25th percentile = 15 and 75th percentile = 35)

Normalized values would be:

Input Value 10 25 30 1000
Output (Robust) -0.875 -0.125 0.125 48.625

Notice that the outlier (1000) has a large scaled value, but the rest are kept in a reasonable range without distortion.

Real-Time benefit of Robust Normalization

In real-world data, such as sensor readings, financial transactions, or health metrics, outliers are common. Using robust normalization prevents these extreme values from dominating the model training, resulting in:

  • More stable and reliable model
  • Better generalization to unseen data
  • Improved performance when the dataset has noisy or extreme values

Last Updated 2025-08-11 15:44:23 +0530 IST