Imputers

Imputation is a technique used for replacing the missing data in a dataset with some substitute value to retain most of the data/information of the dataset.

  1. KNN Imputation

    This imputation utilises the k-Nearest neighbours method to replace the missing values in the datasets with the mean value from the parameter n_neighbors nearest neighbours found in the training set. By default, the n_neigbours parameter will be set to 5 and the Euclidean distance metric will be used to find the k nearest neighbours.

  2. MissForest Imputation

    It initially imputes all missing data using the mean/mode. Then for each variable with missing values, a random forest model will be trained on the observed part and then predicts the missing part.

  3. Mean Imputation

    Mean Imputation replaces the null values with the mean of that feature across the whole dataset.

  4. Median Imputation

    Median Imputation replaces the null values with the median of that feature across the whole dataset.

  5. Mode Imputation

    Mode Imputation replaces the null values with the mode of that feature across the whole dataset.

  6. Group-By Imputation

    Group-by imputation takes the following 3 inputs for replacing the null values.

    • Columns to Impute: Columns that contains null values and that are needed to be replaced.
    • Group-By Columns: These columns will be grouped by while calculating values to replace null values.
    • Aggregator: Aggregate function like mean, median, minimum and maximum that need to be used.

    By using the above inputs, Group-By Imputation finds the required aggregate values by grouping the Group-By columns and fills the null values in the dataset with those aggregate values.

Last Updated 2023-10-08 10:48:45 +0530 +0530