Operations in QuickML
Data Preprocessing is the step in which data gets transformed, or encoded, to help the machine parse it. In other words, the features of the data can now be easily interpreted by the algorithm.
 Encoding
 Feature Engineering
 Imputation
 Normalization
 Transformers
Encoding
Encoding is a technique of converting categorical variables (discrete) into numerical (continuous) values so they can be fit easily to a machinelearning model.

Ordinal Encoder
An ordinal encoding involves mapping each unique label to an integer value. This type of encoding is really only appropriate if there is a known relationship between the categories. If the data is ordered, we can use ordinal encoding.
Example:
For temperature values, Low, Normal, and High, we can use ordinal encoding. After encoding the data will look like 0,1,2.(0–>Low temp,2–>High temp). Ordinal encoding uses a single column of integers to represent the classes. An optional mapping dict can be passed in. In this case, we use the knowledge that there is some true order to the classes themselves. Otherwise, the classes are assumed to have no true order and integers are selected at random. 
OneHot Encoding
We use this categorical dataencoding technique when the features are nominal (do not have any order). In onehot encoding, for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category. If the categorical feature is not ordinal (ordered data) and the number of categories in categorical features is less, so onehot encoding can be effectively applied.
Sample input:
color blue red green Sample output:
color_blue color_red color_green 1 0 0 0 1 0 0 0 1 
JamesStein Encoder
For feature value, the JamesStein estimator returns a weighted average of:
 The mean target value for the observed feature value.
 The mean target value (regardless of the feature value).

Label Encoding
This is used to convert a categorical target column into a numerical column by assigning a unique integer or numerical label to each category in the categorical variable. It’s important to note that encoding introduces ordering to the categorical variables, which may not be useful in every case. It is appropriate for ordinal variables where there is inherent order or ranking among the categories.

LeaveOneOut Encoder
Leave one out encoding essentially calculates the mean of the target variables for all the records containing the same value for the categorical feature variable in question. The encoding algorithm is slightly different between training and test data set. For training data sets, the record under consideration is left out, hence the name leave one out.

Target Encoding
In target encoding, we calculate the mean of the target variable for each category and replace the category variable with the mean value. In the case of the categorical target variables, the posterior probability of the target replaces each category.
Target encoding is the process of replacing a categorical value with the mean of the target variable. Any noncategorical columns are automatically dropped by the target encoder model. 
Count Encoder
Count encoding is based on replacing categories with their counts computed on the train set. Counts may be the same for some of the variables, which may result in collision, encoding two categories as the same value. Count encoder can be used if the count of categories are not the same.
Sample Input 10 10 20 30 30 30 Sample Output 2 2 1 3 3 3 
Backward Difference Encoding
In backward difference coding, the mean of the dependent variable for a level is compared with the mean of the dependent variable for the prior level. This type of coding may be useful for a nominal or an ordinal variable.

Helmert Encoding
The mean of the dependent variable for a level is compared to the mean of the dependent variable over all previous levels. This comparison does not make much sense for a nominal variable, such as race.

Catboost Encoding
Catboost is a targetbased categorical encoder. It replaces a categorical feature with average value of target corresponding to that category in training dataset combined with the target probability over the entire dataset. However, this introduces a target leakage, because the target is used to predict the target.
Last Updated 20231008 10:48:45 +0530 +0530
Yes
No
Send your feedback to us