ML Algorithms in QuickML
ML algorithms are programs that can learn from data and improve from experience, without any external intervention. The following algorithms and operations are all available in QuickML as stages that can be configured in one or more pipeline executions.
The most widely used algorithms in the data science domain are,
 Classification algorithms
 Regression algorithms
Classification Algorithms
Classification is the task of predicting a discrete class label. QuickML features several classification algorithms, including:
AdaBoost Classification
This classification begins by fitting a classifier on the original dataset, followed by additional copies of the classifier on the same dataset. The weights of these instances are adjusted according to the error of the current prediction. As such, subsequent classifiers focus more on difficult cases. AdaBoost is a machinelearning algorithm that builds a series of small, onestep (one level) decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

base_estimator  The base estimator from which the boosted ensemble is built. If none, then the base estimator is DecisionTreeClassifier initialized with max_depth=1.  object  Any classification model except KNN Classification model  None 
n_estimators (number of estimators) 
The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.  int  [1, 500]  50 
learning_rate  Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier.  float  (0.0, +Inf)  1.0 
algorithm  If ‘SAMME.R’ then use the SAMME.R real boosting algorithm. base_estimator must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.  string  {‘SAMME’, ‘SAMME.R’}  ’SAMME.R’ 
CatBoost Classification
CatBoost is based on gradientboosted decision trees. During training, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees. The number of trees is controlled by the starting parameters.
This classification has much less prediction time compared to others.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

learning_rate  Used for reducing the gradient step.  float  (0,1]  0.03 
l2_leaf_reg (l2_leaf_regularization)  Coefficient at the L2 regularization term of the cost function.  float  [0,+inf)  3.0 
rsm (random subspace method)  The percentage of features to use at each split selection, when features are selected over again at random  float  (0,1]  None 
loss_function  The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters.  string  {'Logloss', 'CrossEntropy', 'MultiClass', 'MultiClassOneVsAll'}  'MultiClass' 
nan_mode  The method for processing missing values in the input dataset.  string  {'Forbidden', 'Min', 'Max'}  Min 
leaf_estimation_method  The method used to calculate the values in leaves.  string  {"Newton", "Gradient"}  None 
score_function  The score type used to select the next split during the tree construction.  string  {L2, Cosine}  Cosine 
max_depth  Maximum depth of the tree.  int  [1,+Inf)  None 
n_estimators (number of estimators) 
The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter  int  [1, 500]  None 
DecisionTree Classification
Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets, while at the same time an associated decision tree is incrementally developed.
Decision trees can handle both categorical and numerical data. when predicting the output value of a set of features, it will predict the output based on the subset that the set of features falls into.
Hyper Parameters
Parameter  Description  Data Type  Possible Values  Default Values 

criterion  The function to measure the quality of a split.  string  {“gini”, “entropy”}  ”gini” 
splitter  The strategy used to choose the split at each node.  string  {“best”, “random”}  ”best” 
max_depth  The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.  int  (0, +Inf)  None 
min_samples_split  The minimum number of samples required to split an internal node.  int or float  [2, +Inf) or (0, 1.0]  2 
min_samples_leaf  The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.  int or float  [1, +Inf) or (0, 0.5]  1 
min_weight_fraction_leaf  The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.  float  [0, 0.5]  0 
max_features  The number of features to consider when looking for the best split  int, float or string  (0, n_features] or { “sqrt”, “log2”}  None 
max_leaf_nodes  Grow a tree with max_leaf_nodes in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  int  (1, +Inf)  None 
min_impurity_decrease  A node will be split if this split induces a decrease of the impurity greater than or equal to this value.  float  [0, +Inf)  0.0 
GB Classification
Gradientboosting classification calculates the difference between the current prediction and the known correct target value. This difference is called residual. After finding this value, the gradientboosting classifier trains a weak model (Decision Tree) that maps features to that residual. This residual predicted by a weak model is added to the existing model input and thus this process nudges the model towards the correct target. Repeating this step multiple times improves the overall model prediction.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

loss  The loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs.  string  {'deviance', 'exponential'}  'deviance' 
learning_rate  Learning rate shrinks the contribution of each tree by learning_rate. There is a tradeoff between learning_rate and n_estimators.  float  (0.0, +Inf)  0.1 
n_estimators (number of estimators) 
The number of boosting stages to perform.  int  [1, 500]  100 
criterion  The function to measure the quality of a split.  string  {'friedman_mse', 'mse', 'mae'}  ’friedman_mse’ 
subsample  The fraction of samples to be used for fitting the individual base learners.  float  (0.0, 1.0]  1.0 
max_depth  The maximum depth of the individual regression estimators.  int  (0, +Inf)  None 
min_samples_split  The minimum number of samples required to split an internal node  int or float  [2, +Inf) or (0, 1.0]  2 
min_samples_leaf  The minimum number of samples required to be at a leaf node.  int or float  [1, +Inf) or (0, 0.5]  1 
min_weight_fraction_leaf  The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.  float  [0, 0.5]  0 
max_features  The number of features to consider when looking for the best split  int, float or string  (0, n_features] or { “sqrt”, “log2”}  None 
max_leaf_nodes  Grow trees with max_leaf_nodes in bestfirst fashion. Best nodes are defined as relative reduction in impurity.  int  (1, +Inf)  None 
min_impurity_decrease  A node will be split if this split induces a decrease of the impurity greater than or equal to this value.  float  [0, +Inf)  0.0 
init  An estimator object that is used to compute the initial predictions.  object or string  estimator (Any classification model except SVM classification and catboost) or ‘zero’  None 
warm_start  When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution.  bool  True or False  False 
tol (tolerance)  Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.  float  [0.0, +Inf)  1e4 
KNN Classification
KNN works by finding the distances between a query (data instance) and all the examples in the data, selecting the specified number examples (K) closest to the query, then voting for the most frequent label in the neighbourhood.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

n_neighbors (number of neighbours) 
Number of neighbors to use by default for kneighbors queries.  int  [1, n] n = Total number of records in dataset 
5 
weights  Weight function used in prediction. Possible values  string  {‘uniform’, ‘distance’}  ’uniform’ 
algorithm  Algorithm used to compute the nearest neighbors.  string  {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}  ’auto’ 
leaf_size  Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree.  int  (1, +Inf)  30 
p  Power parameter for the Minkowski metric.  int  [1,3]  2 
metric  Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2.  string  {‘cityblock’, ‘cosine’, 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean', ’minkowski’}  ’minkowski’ 
LGBM Classification
LGBM works by starting with an initial estimate that is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. It can be used on any data and provides a high degree of accuracy, as it contains many builtin preprocessing steps.
The LightGBM algorithm grows vertically, meaning it grows leafwise, while other algorithms grow levelwise. LightGBM chooses the leaf with the largest loss to grow. It can lower more loss than a levelwise algorithm when growing the same leaf.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

boosting_type  Method of boosting.  string  {‘gbdt’, ‘dart’, ‘goss’ }  'gbdt' 
num_leaves  Maximum tree leaves for base learners.  int  (1, +Inf)  31 
max_depth  Maximum tree depth for base learners, <= 0 means no limit.  int  (Inf, +Inf)  1 
learning_rate  Boosting learning rate. You can use callbacks parameter of fit method to shrink/adapt learning rate in training using reset_parameter callback.  float  (0.0, +Inf)  0.1 
n_estimators (number of estimators) 
Number of boosted trees to fit.  int  [1, 500]  100 
subsample_for_bin  Number of samples for constructing bins.  int  (0, +Inf)  200000 
min_split_gain  Minimum loss reduction required to make a further partition on a leaf node of the tree.  float  [0.0, +Inf)  0.0 
min_child_weight  Minimum sum of instance weight (Hessian) needed in a child (leaf).  float  [0.0, +Inf)  1e3 
min_child_samples  Minimum number of data needed in a child (leaf).  int  [0, +Inf)  20 
subsample  Subsample ratio of the training instance.  float  (0.0, 1.0]  1.0 
subsample_freq (subsample_frequency)  Frequency of subsample, <= 0 means no enable.  int  (Inf, +Inf)  0 
colsample_bytree (column sample by tree)  Subsample ratio of columns when constructing each tree.  float  (0.0, 1.0]  1.0 
reg_alpha (alpha)  L1 regularization term on weights.  float  (0.0, +Inf)  0.0 
reg_lambda (lambda)  L2 regularization term on weights.  float  (0.0, +Inf)  0.0 
importance_type  The type of feature importance to be filled into featureimportances. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.  string  { ‘gain’, 'split'}  'split' 
Logistic Regression
When the target is binary value, we can use logistic classification. It maps the value between 0 and 1.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

penalty  Specify the norm of the penalty:

string  { ‘l1’, ‘l2’, ‘elasticnet’, ‘none’}  ’l2’ 
dual  Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver.  bool  True or False  False 
tol (tolerance)  Tolerance for stopping criteria.  float  [0.0, +Inf)  1e4 
C  Inverse of regularization strength; must be a positive float.  float  [0.0, +Inf)  1.0 
solver  Algorithm to use in the optimization problem.  string  { ‘newtoncg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’ }  ’lbfgs’ 
fit_intercept  Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.  bool  True or False  True 
l1_ratio  The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. 
float  [0, 1]  None 
multi_class  If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary.  string  {'auto', 'ovr', 'multinomial'}  'auto' 
intercept_scaling  Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. The intercept becomes intercept_scaling * synthetic_feature_weight.  float  (0, +Inf)  1.0 
 newtoncg’  [‘l2’, ‘none’]
 ‘lbfgs’  [‘l2’, ‘none’]
 ‘liblinear’  [‘l1’, ‘l2’]
 ‘sag’  [‘l2’, ‘none’]
 ‘saga’  [‘elasticnet’, ‘l1’, ‘l2’, ‘none’]
Naive Bayes Classification
Naive Bayes is a classifier that uses the Bayes Theorem. It predicts membership probabilities for each class, such as the probability that a given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class.
RandomForest Classification
The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building individual trees to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.
A Bagging is an ensemble metaestimator that fits base classifiers/regressors on random subsets of the original dataset, then aggregates their individual predictions (either by voting or by averaging) to form a final prediction.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

n_estimators (number of estimators) 
The number of trees in the forest.  int  [1, 500]  100 
criterion  The function to measure the quality of a split.  string  {“gini”, “entropy”}  ”gini” 
max_depth  The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.  int  (0, +Inf)  None 
min_samples_split  The minimum number of samples required to split an internal node  int or float  [2, +Inf) or (0, 1.0]  2 
min_samples_leaf  The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.  int or float  [1, +Inf) or (0, 0.5]  1 
min_weight_fraction_leaf  The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.  float  [0, 0.5]  0.0 
max_features  The number of features to consider when looking for the best split  int, float or string  (0, n_features] or { “sqrt”}  None 
max_leaf_nodes  Grow trees with max_leaf_nodes in bestfirst fashion. Best nodes are defined as relative reduction in impurity.  int  (1, +Inf)  None 
min_impurity_decrease  A node will be split if this split induces a decrease of the impurity greater than or equal to this value.  float  [0, +Inf)  0.0 
bootstrap  Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.  bool  True or False  True 
oob_score(out of bag score)  Whether to use outofbag samples to estimate the generalization score. Only available if bootstrap=True.  bool  True or False  False 
warm_start  When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.  bool  True or False  False 
SVM Classification
SVM, or Support Vector Machine, is a linear model for classification and regression problems. It can solve linear and nonlinear problems and work well for many practical problems. The idea of SVM is simple: The algorithm creates a line or a hyperplane that separates the data into classes.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

C  Regularization parameter. The strength of the regularization is inversely proportional to C.  float  [0.0, +Inf)  1.0 
kernel  Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used.  string  {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}  ’rbf’ 
degree  Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.  int  [0, +Inf)  3 
gamma  Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.  string or float  {‘scale’, ‘auto’} or (0.0, +Inf)  ’scale’ 
coef0  Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.  float  (Inf, +Inf)  0.0 
shrinking  Whether to use the shrinking heuristic.  bool  True or False  True 
probability  Whether to enable probability estimates.  bool  True or False  False 
tol (tolerance)  Tolerance for stopping criterion.  float  [0.0, +Inf)  1e3 
decision_function_shape  Whether to return a onevsrest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original onevsone (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes  1) / 2).  string  {‘ovo’, ‘ovr’}  ’ovr’ 
break_ties  If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned.  bool  True or False  False 
XGB Classification
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machinelearning algorithms under the Gradient Boosting framework. It provides a parallel tree boosting to solve many data science problems quickly and accurately. It uses L1 and L2 regularisation to predict points and it fast in training.
Each model has Fit (to train model), predict (to predict new data), get metrics (to get model’s accuracy and other metrics), and feature_importances (importances of the input features for the prediction).
Adaboost, CatBoost, Decision tree, Gradient boost(GB), LGBM, RandomForest, SVM, and XGB’s basic working principles are almost identical for both regression and classification.
Hyper Parameters:
Parameter  Description  Data Type  Possible Values  Default Values 

booster  Decides which booster to use.  string  {‘gbtree', 'gblinear', 'dart' }  ’gbtree’ 
learning_rate  Step size shrinkage used in update to prevents over fitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.  float  [0,1]  0.1 
n_estimators (number of estimators) 
Number of trees to fit.  int  [1, 500]  100 
objective  Logistic regression for binary classification.  string  Mentioned below the table .  "binary:logistic" 
subsample  Control the sample's proportion.  int  (0,1]  1 
max_depth  Maximum depth of a tree.  int  (0, +Inf)  3 
max_delta_step  If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.  int or float  [0, +Inf)  0 
colsample_bytree (column sample by tree)  Column's fraction of random samples.  float  (0.0, 1.0]  1.0 
colsample_bylevel (column sample by level)  It is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.  float  (0.0, 1.0]  1.0 
min_child_weight  Minimum sum of weights.  int  [0, +Inf)  1 
reg_alpha (alpha)  L1 regularization term on weights.  float  [0.0, +Inf)  0.0 
reg_lambda (lambda)  L2 regularization term on weights.  float  [0.0, +Inf)  0.0 
scale_pos_weight (scale positive weight)  Control the balance of positive and negative weights, useful for unbalanced classes.  int  [0, +Inf)  1 
POSSIBLE VALUES FOR “OBJECTIVE” PARAM :
{binary:logistic, binary:logitraw, binary:hinge, multi:softmax, multi:softprob}
Last Updated 20230615 17:14:14 +0530 +0530
Yes
No
Send your feedback to us