ML Algorithms in QuickML

ML algorithms are programs that can learn from data and improve from experience, without any external intervention. The following algorithms and operations are all available in QuickML as stages that can be configured in one or more pipeline executions.

The most widely used algorithms in the data science domain are,

  1. Classification algorithms
  2. Regression algorithms

Classification Algorithms

Classification is the task of predicting a discrete class label. QuickML features several classification algorithms, including:

AdaBoost Classification

This classification begins by fitting a classifier on the original dataset, followed by additional copies of the classifier on the same dataset. The weights of these instances are adjusted according to the error of the current prediction. As such, subsequent classifiers focus more on difficult cases. AdaBoost is a machine-learning algorithm that builds a series of small, one-step (one level) decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
base_estimator The base estimator from which the boosted ensemble is built. If none, then the base estimator is DecisionTreeClassifier initialized with max_depth=1. object Any classification model except KNN Classification model None
n_estimators
(number of estimators)
The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. int [1, 500] 50
learning_rate Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. float (0.0, +Inf) 1.0
algorithm If ‘SAMME.R’ then use the SAMME.R real boosting algorithm. base_estimator must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations. string {‘SAMME’, ‘SAMME.R’} ’SAMME.R’

CatBoost Classification

CatBoost is based on gradient-boosted decision trees. During training, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees. The number of trees is controlled by the starting parameters.

This classification has much less prediction time compared to others.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
learning_rate Used for reducing the gradient step. float (0,1] 0.03
l2_leaf_reg (l2_leaf_regularization) Coefficient at the L2 regularization term of the cost function. float [0,+inf) 3.0
rsm (random subspace method) The percentage of features to use at each split selection, when features are selected over again at random float (0,1] None
loss_function The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters. string {'Logloss', 'CrossEntropy', 'MultiClass', 'MultiClassOneVsAll'} 'MultiClass'
nan_mode The method for processing missing values in the input dataset. string {'Forbidden', 'Min', 'Max'} Min
leaf_estimation_method The method used to calculate the values in leaves. string {"Newton", "Gradient"} None
score_function The score type used to select the next split during the tree construction. string {L2, Cosine} Cosine
max_depth Maximum depth of the tree. int [1,+Inf) None
n_estimators
(number of estimators)
The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter int [1, 500] None

Decision-Tree Classification

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets, while at the same time an associated decision tree is incrementally developed.

Decision trees can handle both categorical and numerical data. when predicting the output value of a set of features, it will predict the output based on the subset that the set of features falls into.

Hyper Parameters

Parameter Description Data Type Possible Values Default Values
criterion The function to measure the quality of a split. string {“gini”, “entropy”} ”gini”
splitter The strategy used to choose the split at each node. string {“best”, “random”} ”best”
max_depth The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node. int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. float [0, 0.5] 0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”, “log2”} None
max_leaf_nodes Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0

GB Classification

Gradient-boosting classification calculates the difference between the current prediction and the known correct target value. This difference is called residual. After finding this value, the gradient-boosting classifier trains a weak model (Decision Tree) that maps features to that residual. This residual predicted by a weak model is added to the existing model input and thus this process nudges the model towards the correct target. Repeating this step multiple times improves the overall model prediction.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
loss The loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs. string {'deviance', 'exponential'} 'deviance'
learning_rate Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators. float (0.0, +Inf) 0.1
n_estimators
(number of estimators)
The number of boosting stages to perform. int [1, 500] 100
criterion The function to measure the quality of a split. string {'friedman_mse', 'mse', 'mae'} ’friedman_mse’
subsample The fraction of samples to be used for fitting the individual base learners. float (0.0, 1.0] 1.0
max_depth The maximum depth of the individual regression estimators. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. float [0, 0.5] 0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”, “log2”} None
max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0
init An estimator object that is used to compute the initial predictions. object or string estimator (Any classification model except SVM classification and catboost) or ‘zero’ None
warm_start When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. bool True or False False
tol (tolerance) Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops. float [0.0, +Inf) 1e-4

KNN Classification

KNN works by finding the distances between a query (data instance) and all the examples in the data, selecting the specified number examples (K) closest to the query, then voting for the most frequent label in the neighbourhood.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
n_neighbors
(number of neighbours)
Number of neighbors to use by default for kneighbors queries. int [1, n]
n = Total number of records in dataset
5
weights Weight function used in prediction. Possible values string {‘uniform’, ‘distance’} ’uniform’
algorithm Algorithm used to compute the nearest neighbors. string {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’} ’auto’
leaf_size Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. int (1, +Inf) 30
p Power parameter for the Minkowski metric. int [1,3] 2
metric Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. string {‘cityblock’, ‘cosine’, 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean', ’minkowski’} ’minkowski’

LGBM Classification

LGBM works by starting with an initial estimate that is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. It can be used on any data and provides a high degree of accuracy, as it contains many built-in preprocessing steps.

The LightGBM algorithm grows vertically, meaning it grows leaf-wise, while other algorithms grow level-wise. LightGBM chooses the leaf with the largest loss to grow. It can lower more loss than a level-wise algorithm when growing the same leaf.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
boosting_type Method of boosting. string {‘gbdt’, ‘dart’, ‘goss’ } 'gbdt'
num_leaves Maximum tree leaves for base learners. int (1, +Inf) 31
max_depth Maximum tree depth for base learners, <= 0 means no limit. int (-Inf, +Inf) -1
learning_rate Boosting learning rate. You can use callbacks parameter of fit method to shrink/adapt learning rate in training using reset_parameter callback. float (0.0, +Inf) 0.1
n_estimators
(number of estimators)
Number of boosted trees to fit. int [1, 500] 100
subsample_for_bin Number of samples for constructing bins. int (0, +Inf) 200000
min_split_gain Minimum loss reduction required to make a further partition on a leaf node of the tree. float [0.0, +Inf) 0.0
min_child_weight Minimum sum of instance weight (Hessian) needed in a child (leaf). float [0.0, +Inf) 1e-3
min_child_samples Minimum number of data needed in a child (leaf). int [0, +Inf) 20
subsample Subsample ratio of the training instance. float (0.0, 1.0] 1.0
subsample_freq (subsample_frequency) Frequency of subsample, <= 0 means no enable. int (-Inf, +Inf) 0
colsample_bytree (column sample by tree) Subsample ratio of columns when constructing each tree. float (0.0, 1.0] 1.0
reg_alpha (alpha) L1 regularization term on weights. float (0.0, +Inf) 0.0
reg_lambda (lambda) L2 regularization term on weights. float (0.0, +Inf) 0.0
importance_type The type of feature importance to be filled into featureimportances. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. string { ‘gain’, 'split'} 'split'

Logistic Regression

When the target is binary value, we can use logistic classification. It maps the value between 0 and 1.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
penalty Specify the norm of the penalty:
  • 'none': no penalty is added;
  • 'l2': add a L2 penalty term and it is the default choice;
  • 'l1': add a L1 penalty term;
  • 'elasticnet': both L1 and L2 penalty terms are added.
string { ‘l1’, ‘l2’, ‘elasticnet’, ‘none’} ’l2’
dual Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. bool True or False False
tol (tolerance) Tolerance for stopping criteria. float [0.0, +Inf) 1e-4
C Inverse of regularization strength; must be a positive float. float [0.0, +Inf) 1.0
solver Algorithm to use in the optimization problem. string { ‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’ } ’lbfgs’
fit_intercept Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. bool True or False True
l1_ratio The Elastic-Net mixing parameter, with
0 <= l1_ratio <= 1. Only used if penalty='elasticnet'.
float [0, 1] None
multi_class If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. string {'auto', 'ovr', 'multinomial'} 'auto'
intercept_scaling Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. The intercept becomes intercept_scaling * synthetic_feature_weight. float (0, +Inf) 1.0
Note: The values in the "solver" parameter support only few of the values in the "penalty" parameter. So the supported penalties by the solver are mentioned below:
  • newton-cg’ - [‘l2’, ‘none’]
  • ‘lbfgs’ - [‘l2’, ‘none’]
  • ‘liblinear’ - [‘l1’, ‘l2’]
  • ‘sag’ - [‘l2’, ‘none’]
  • ‘saga’ - [‘elasticnet’, ‘l1’, ‘l2’, ‘none’]

Naive Bayes Classification

Naive Bayes is a classifier that uses the Bayes Theorem. It predicts membership probabilities for each class, such as the probability that a given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class.

Random-Forest Classification

The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building individual trees to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

A Bagging is an ensemble meta-estimator that fits base classifiers/regressors on random subsets of the original dataset, then aggregates their individual predictions (either by voting or by averaging) to form a final prediction.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
n_estimators
(number of estimators)
The number of trees in the forest. int [1, 500] 100
criterion The function to measure the quality of a split. string {“gini”, “entropy”} ”gini”
max_depth The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. float [0, 0.5] 0.0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”} None
max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0
bootstrap Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. bool True or False True
oob_score(out of bag score) Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True. bool True or False False
warm_start When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. bool True or False False

SVM Classification

SVM, or Support Vector Machine, is a linear model for classification and regression problems. It can solve linear and non-linear problems and work well for many practical problems. The idea of SVM is simple: The algorithm creates a line or a hyperplane that separates the data into classes.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
C Regularization parameter. The strength of the regularization is inversely proportional to C. float [0.0, +Inf) 1.0
kernel Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used. string {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’} ’rbf’
degree Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels. int [0, +Inf) 3
gamma Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. string or float {‘scale’, ‘auto’} or (0.0, +Inf) ’scale’
coef0 Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’. float (-Inf, +Inf) 0.0
shrinking Whether to use the shrinking heuristic. bool True or False True
probability Whether to enable probability estimates. bool True or False False
tol (tolerance) Tolerance for stopping criterion. float [0.0, +Inf) 1e-3
decision_function_shape Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). string {‘ovo’, ‘ovr’} ’ovr’
break_ties If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. bool True or False False

XGB Classification

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine-learning algorithms under the Gradient Boosting framework. It provides a parallel tree boosting to solve many data science problems quickly and accurately. It uses L1 and L2 regularisation to predict points and it fast in training.

Each model has Fit (to train model), predict (to predict new data), get metrics (to get model’s accuracy and other metrics), and feature_importances (importances of the input features for the prediction).

Adaboost, CatBoost, Decision tree, Gradient boost(GB), LGBM, RandomForest, SVM, and XGB’s basic working principles are almost identical for both regression and classification.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
booster Decides which booster to use. string {‘gbtree', 'gblinear', 'dart' } ’gbtree’
learning_rate Step size shrinkage used in update to prevents over fitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative. float [0,1] 0.1
n_estimators
(number of estimators)
Number of trees to fit. int [1, 500] 100
objective Logistic regression for binary classification. string Mentioned below the table . "binary:logistic"
subsample Control the sample's proportion. int (0,1] 1
max_depth Maximum depth of a tree. int (0, +Inf) 3
max_delta_step If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. int or float [0, +Inf) 0
colsample_bytree (column sample by tree) Column's fraction of random samples. float (0.0, 1.0] 1.0
colsample_bylevel (column sample by level) It is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree. float (0.0, 1.0] 1.0
min_child_weight Minimum sum of weights. int [0, +Inf) 1
reg_alpha (alpha) L1 regularization term on weights. float [0.0, +Inf) 0.0
reg_lambda (lambda) L2 regularization term on weights. float [0.0, +Inf) 0.0
scale_pos_weight (scale positive weight) Control the balance of positive and negative weights, useful for unbalanced classes. int [0, +Inf) 1

POSSIBLE VALUES FOR “OBJECTIVE” PARAM :

{binary:logistic, binary:logitraw, binary:hinge, multi:softmax, multi:softprob}

Last Updated 2023-06-15 17:14:14 +0530 +0530