Normalization
Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values or losing information.
Min-Max Normalization
Min-Max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1.
Process:
copyx_normalized = (x−min(x))/ (max(x)-min(x))
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- min(x) is the minimum value of the feature across the dataset.
- max(x) is the maximum value of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | 0 | 0.75 | 1 |
---|
Unit Normalization
Unit normalization consists of dividing every entry in a column (feature) by its magnitude to create a feature of length 1 known as the unit vector.
Process:
copyx_normalized = x / ||x||
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- ||x|| is the magnitude which is calculated as
- ||x|| = sqrt(x1^2 + x2^2 + ……. xn^2)
- x1, x2, x3……xn are the original values of the feature.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | 0.248 | 0.620 | 0.744 |
---|
Mean Normalization
This transformer transforms the data based on the mean so that sum of the values equals to 0.
Process:
copyx_normalized = x - mean(x) / max(x) - min(x)
Where:
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- mean(x) is the mean of feature across the dataset.
- min(x) is the minimum value of the feature across the dataset.
- max(x) is the maximum value of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | -0.583 | 0.166 | 0.416 |
---|
Mean-Std Normalization
The data can be normalized by subtracting the mean (µ) of each feature and a division by the standard deviation (σ). This way, each feature has a mean of 0 and a standard deviation of 1. This results in faster convergence.
Process:
copyx_normalized = x - mean(x) / std(x)
Where
- x_normalized is the normalized value of the feature.
- x is the original value of the feature.
- mean(x) is the mean of feature across the dataset.
- std(x) is the standard deviation of the feature across the dataset.
Example:
Sample Input | 10 | 25 | 30 | Sample Output | -1.120 | 0.320 | 0.800 |
---|
Last Updated 2023-10-09 18:18:15 +0530 +0530
Yes
No
Send your feedback to us