Imputation
Imputation is a technique used for replacing the missing data in a dataset with some substitute value to retain most of the data/information of the dataset.
-
KNN Imputation
This imputation utilises the k-Nearest neighbours method to replace the missing values in the datasets with the mean value from the parameter n_neighbors nearest neighbours found in the training set. By default, the n_neigbours parameter will be set to 5 and the Euclidean distance metric will be used to find the k nearest neighbours.
-
MissForest Imputation
It initially imputes all missing data using the mean/mode. Then for each variable with missing values, a random forest model will be trained on the observed part and then predicts the missing part.
-
Mean Imputation
Mean Imputation replaces the null values with the mean of that feature across the whole dataset.
-
Median Imputation
Median Imputation replaces the null values with the median of that feature across the whole dataset.
-
Mode Imputation
Mode Imputation replaces the null values with the mode of that feature across the whole dataset.
-
Group-By Imputation
Group-by imputation takes the following 3 inputs for replacing the null values.
- Columns to Impute: Columns that contains null values and that are needed to be replaced.
- Group-By Columns: These columns will be grouped by while calculating values to replace null values.
- Aggregator: Aggregate function like mean, median, minimum and maximum that need to be used.
By using the above inputs, Group-By Imputation finds the required aggregate values by grouping the Group-By columns and fills the null values in the dataset with those aggregate values.
Last Updated 2023-06-15 17:14:14 +0530 +0530
Yes
No
Send your feedback to us