Imputation

Imputation is a technique used for replacing the missing data in a dataset with some substitute value to retain most of the data/information of the dataset.

KNN Imputation

This imputation utilises the k-Nearest neighbours method to replace the missing values in the datasets with the mean value from the parameter n_neighbors nearest neighbours found in the training set. By default, the n_neigbours parameter will be set to 5 and the Euclidean distance metric will be used to find the k nearest neighbours.
MissForest Imputation

It initially imputes all missing data using the mean/mode. Then for each variable with missing values, a random forest model will be trained on the observed part and then predicts the missing part.
Mean Imputation

Mean Imputation replaces the null values with the mean of that feature across the whole dataset.
Median Imputation

Median Imputation replaces the null values with the median of that feature across the whole dataset.
Mode Imputation

Mode Imputation replaces the null values with the mode of that feature across the whole dataset.
Group-By Imputation

Group-by imputation takes the following 3 inputs for replacing the null values.
- Columns to Impute: Columns that contains null values and that are needed to be replaced.
- Group-By Columns: These columns will be grouped by while calculating values to replace null values.
- Aggregator: Aggregate function like mean, median, minimum and maximum that need to be used.
By using the above inputs, Group-By Imputation finds the required aggregate values by grouping the Group-By columns and fills the null values in the dataset with those aggregate values.

Last Updated 2023-06-15 17:14:14 +0530 +0530

Yes

Thank you for your feedback!

Send your feedback to us

Skip

Submit

Imputation

KNN Imputation

MissForest Imputation

Mean Imputation

Median Imputation

Mode Imputation

Group-By Imputation