Normalization
Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values or losing information.
Min-Max Normalization
Min-Max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1.
Process:
x_normalized = (x−min(x))/ (max(x)-min(x))
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- min(x) is the minimum value of the feature across the dataset.
- max(x) is the maximum value of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | 0 | 0.75 | 1 |
---|
Unit Normalization
Unit normalization consists of dividing every entry in a column (feature) by its magnitude to create a feature of length 1 known as the unit vector.
Process:
x_normalized = x / ||x||
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- ||x|| is the magnitude which is calculated as
- ||x|| = sqrt(x1^2 + x2^2 + ……. xn^2)
- x1, x2, x3……xn are the original values of the feature.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | 0.248 | 0.620 | 0.744 |
---|
Mean Normalization
This transformer transforms the data based on the mean so that sum of the values equals to 0.
Process:
x_normalized = x - mean(x) / max(x) - min(x)
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- mean(x) is the mean of feature across the dataset.
- min(x) is the minimum value of the feature across the dataset.
- max(x) is the maximum value of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | -0.583 | 0.166 | 0.416 |
---|
Mean-Std Normalization
The data can be normalized by subtracting the mean (µ) of each feature and a division by the standard deviation (σ). This way, each feature has a mean of 0 and a standard deviation of 1. This results in faster convergence.
Process:
x_normalized = x - mean(x) / std(x)
Where
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- mean(x) is the mean of feature across the dataset.
- std(x) is the standard deviation of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | -1.120 | 0.320 | 0.800 |
---|
Robust Normalization
Robust Scaler is a normalization technique that scales features using the median and interquartile range (IQR), making it less sensitive to outliers compared to standard scaling methods. It centers the normalised data around the median and scales it according to the IQR, which helps maintain the structure of data with extreme values without letting those values skew the scaling.
Process:
xrobust = IQR(x)x − median(x)
Where
- x = original feature value
- median(x) = median of the feature values
- IQR(x) = interquartile range (75th percentile - 25th percentile)
- xrobust = normalized value after robust scaling
Example
Suppose the feature values are: [10, 25, 30, 1000 (outlier)]
- Median = 27.5
- IQR = 20 (between 25th percentile = 15 and 75th percentile = 35)
Normalized values would be:
Input Value | 10 | 25 | 30 | 1000 | Output (Robust) | -0.875 | -0.125 | 0.125 | 48.625 |
---|
Notice that the outlier (1000) has a large scaled value, but the rest are kept in a reasonable range without distortion.
Real-Time benefit of Robust Normalization
In real-world data, such as sensor readings, financial transactions, or health metrics, outliers are common. Using robust normalization prevents these extreme values from dominating the model training, resulting in:
- More stable and reliable model
- Better generalization to unseen data
- Improved performance when the dataset has noisy or extreme values
Last Updated 2025-08-11 15:44:23 +0530 IST
Yes
No
Send your feedback to us