Regression algorithms

Regression is the task of predicting a continuous quantity. QuickML features multiple regression algorithms, including:

AdaBoost Regression

This regression begins by fitting a regressor on the original dataset, followed by fitting additional copies of the regressor on the same dataset. The weights of these instances are adjusted according to the error of the current prediction. As such, subsequent regressors focus more on difficult cases.

Adaboost is a machine-learning algorithm that builds a series of small, one-step (one level) decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model.

Boosting in machine learning is a way of combining multiple simple models into a single composite model. This is also why boosting is known as an additive model, since simple models (also known as weak learners) are added one at a time, while keeping existing trees in the model unchanged. As we combine more and more simple models, the complete final model becomes a stronger predictor.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
base_estimator The base estimator from which the boosted ensemble is built. If None, then the base estimator is DecisionTreeRegressor initialized with max_depth=3. object Any regression model None
n_estimators
(number of estimators)
The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early. int [1, 500] 50
learning_rate Weight applied to each regressor at each boosting iteration. A higher learning rate increases the contribution of each regressor. float (0.0, +Inf) 1.0
loss The loss function to use when updating the weights after each boosting iteration. string {‘linear’, ‘square’, ‘exponential’} "linear"

CatBoost Regression

CatBoost is based on gradient boosted decision trees. During training, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees. The number of trees is controlled by the starting parameters.

It has much less prediction time compared to others.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
learning_rate The learning rate used for training. float (0,1] 0.03
l2_leaf_reg (l2_leaf_regularization) Coefficient at the L2 regularization term of the cost function. float [0,+Inf) 3.0
rsm (random subspace method) The percentage of features to use at each split selection, when features are selected over again at random. float (0,1] None
loss_function The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters. string {'RMSE', 'MAE', 'Quantile:alpha=value, 'LogLinQuantile: alpha=value', 'Poisson', 'MAPE', 'Lq:q=value', 'SurvivalAft:dist=value; scale=value'} Note : range of value = [0, 1] 'RMSE'
nan_mode The method for processing missing values in the input dataset. string {'Forbidden', 'Min', 'Max'} Min
leaf_estimation_method The method used to calculate the values in leaves. string {"Newton", "Gradient"} None
score_function The score type used to select the next split during the tree construction. string {L2, Cosine} Cosine
max_depth Maximum depth of the tree. int [1,+Inf) None
n_estimators
(number of estimators)
The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. int [1, 500] None

Decision-Tree Regression

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. Decision trees can handle both categorical and numerical data. When predicting the output value of a set of features, it will predict the output based on the subset that the set of features falls into.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
criterion The function to measure the quality of a split. string {"mse", "friedman_mse", "mae"} "mse”
splitter The strategy used to choose the split at each node. string {“best”, “random”} ”best”
max_depth The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. float [0, 0.5] 0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”, “log2”}, None
max_leaf_nodes Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0

ElasticNet Regression

Elastic net is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 (Lasso Regression) and L2 (Ridge Regression) penalty functions. Elastic Net is an extension of linear regression that adds regularization penalties to the loss function during training.

Regularization is a technique to prevent the model from over-fitting by adding extra information to it. In regularization technique, we reduce the magnitude of the features by keeping the same number of features.

Sometimes, the lasso regression can cause a small bias (difference between predicted and actual value) in the model where the prediction is too dependent upon a particular variable. In these cases, elastic bet proves to be better performing by combining the regularization of both lasso and ridge regression.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
alpha Constant that multiplies the penalty terms. float (0, +Inf) 1.0
l1_ratio The ElasticNet mixing parameter, with
0 <= l1_ratio <= 1.
For l1_ratio = 0 the penalty is an L2 penalty.
For l1_ratio = 1 it is an L1 penalty.
For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
float [0, 1] 0.5
fit_intercept Whether the intercept should be estimated or not. bool True or False True
normalize This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. bool True or False False
tol (tolerance) The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. float [0.0, +Inf) 1e-4
warm_start When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. bool True or False False
positive When set to True, it forces the coefficients to be positive. bool True or False False
selection If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. string {"cyclic", "random"} "cyclic"

GB Regression

Gradient-boosting regression calculates the difference between the current prediction and the known correct target value.

This difference is called residual. After obtaining this value, gradient-boosting regression trains a weak model (Decision Tree) that maps features to that residual. This residual predicted by a weak model is added to the existing model input, nudging the model towards the correct target. Repeating this step multiple times improves the overall model prediction.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
loss Loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile). string {'ls', 'lad', 'huber', 'quantile'} ’ls’
learning_rate Learning rate shrinks the contribution of each tree by learning_rate. float (0.0, +inf) 0.1
n_estimators
(number of estimators)
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. int [1, 500) 100
criterion The function to measure the quality of a split. string {'friedman_mse', 'mse', 'mae'} ’friedman_mse’
subsample The fraction of samples to be used for fitting the individual base learners. float (0.0, 1.0] 1.0
max_depth Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. float [0, 0.5] 0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”, “log2”} None
max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0
init An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. object estimator (Regression model except cat boost ) or ‘zero’ None
warm_start When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. bool True or False False
tol (tolerance) Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops. float [0.0, +Inf) 1e-4

KNN Regression

KNN Regression works by finding the distances between a query (data instance) and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the point that is the average of the observations in the same neighbourhood.

In other words, it approximates the association between independent variables (input variables) and the continuous outcome (target) by averaging the observations in the same neighbourhood.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
n_neighbors
(number of neighbours)
Number of neighbors to use by default for kneighbors queries. int [1, n]
n = Total number of records in dataset
5
weights Weight function used in prediction.
  • ‘uniform’: uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’: weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
string {‘uniform’, ‘distance’} ’uniform’
algorithm Algorithm used to compute the nearest neighbors string {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’} ’auto’
leaf_size Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. int (1, +Inf) 30
p Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2)
for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
int [1,3] 2
metric Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. str {‘cityblock’, ‘cosine’, 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean', ’minkowski’} ’minkowski’
### kernel Regression

This regression simply fits a line to a scatter plot. Kernel values are used to derive weights to predict outputs from given inputs. Kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
alpha Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. float [0, +Inf) 1.0
kernel Kernel mapping used internally. This parameter is directly passed to pairwise_kernel. If kernel is a string, it must be one of the metrics in pairwise. PAIRWISE_KERNEL_FUNCTIONS or “precomputed”. If kernel is “precomputed”, X is assumed to be a kernel matrix. string {‘additive_chi2’,'chi2' ‘linear’, ‘poly’, ‘polynomial’, ‘rbf’, ‘laplacian’, ‘sigmoid’, 'cosine’} ”linear”
gamma Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. float [0, +Inf) None
degree Degree of the polynomial kernel. float [0, +Inf) 3
coef0 Zero coefficient for polynomial and sigmoid kernels. float (-Inf, +Inf) 1

LGBM Regression

LGBM works by starting with an initial estimate that is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. It can be used on any data and provides a high degree of accuracy, as it contains many built-in preprocessing steps.

The LightGBM algorithm grows vertically, meaning it grows leaf-wise, while other algorithms grow level-wise. LightGBM chooses the leaf with the largest loss to grow. It can lower more loss than a level-wise algorithm when growing the same leaf.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
boosting_type Method of Boosting. string {‘gbdt’, ‘dart’, ‘goss’} 'gbdt'
num_leaves Maximum tree leaves for base learners. int (1, +Inf) 31
max_depth Maximum tree depth for base learners, <= 0 means no limit. int (-Inf, +Inf) -1
learning_rate Boosting learning rate. float (0.0, +Inf) 0.1
n_estimators
(number of estimators)
Number of boosted trees to fit. int [1, 500] 100
subsample_for_bin Number of samples for constructing bins. int (0, +Inf) 200000
min_split_gain Minimum loss reduction required to make a further partition on a leaf node of the tree. float [0.0, +Inf) 0.0
min_child_weight Minimum sum of instance weight (Hessian) needed in a child (leaf). float [0.0, +Inf) 1e-3
min_child_samples Minimum number of data needed in a child (leaf). int [0, +Inf) 20
subsample Subsample ratio of the training instance. float (0.0, 1.0] 1.0
subsample_freq (subsample_frequency) Frequency of subsample, <= 0 means no enable. int (-Inf, +Inf) 0
colsample_bytree (column sample by tree) Subsample ratio of columns when constructing each tree. float (0.0, 1.0] 1.0
reg_alpha (alpha) L1 regularization term on weights. float (0.0, +Inf) 0.0
reg_lambda (lambda) L2 regularization term on weights. float (0.0, +Inf) 0.0
importance_type The type of feature importance to be filled into featureimportances. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature. string { ‘gain’, 'split'} 'split'

Lasso Regression

Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
alpha Constant that multiplies the
L1 term, controlling regularization strength. alpha must be a non-negative float
float (0, +Inf) 1.0
fit_intercept Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations bool True or False True
normalize This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. bool True or False False
tol (tolerance) The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. float [0.0, +Inf) 1e-4
warm_start When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. bool True or False False
positive When set to True, forces the coefficients to be positive. bool True or False False
selection If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. string {"cyclic", "random"} "cyclic"

Linear Regression

Linear regression is a regression model that estimates the linear relationship between independent variable (input) and dependent variable (target) using a straight line. It is the basic algorithm for regression type of problems.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
fit_intercept Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations. bool True or False True
normalize This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the
l2-norm.
bool True or False False

Random-Forest Regression

The random forest is a classification and regression algorithm consisting of many decisions trees. It uses bagging and feature randomness when building individual trees to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

A Bagging is an ensemble meta-estimator that fits base classifiers/regressors on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
n_estimators The number of trees in the forest. int [1, 500] 100
criterion The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion, “absolute_error” for the mean absolute error. string {"mse", "mae"} ”mse”
max_depth The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. int (0, +Inf) None
min_samples_split The minimum number of samples required to split an internal node int or float [2, +Inf) or (0, 1.0] 2
min_samples_leaf The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. int or float [1, +Inf) or (0, 0.5] 1
min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. float [0, 0.5] 0.0
max_features The number of features to consider when looking for the best split int, float or string (0, n_features] or { “sqrt”, “log2”}, None None
max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. int (1, +Inf) None
min_impurity_decrease A node will be split if this split induces a decrease of the impurity greater than or equal to this value. float [0, +Inf) 0.0
bootstrap Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. bool True or False True
oob_score (out of bag score) Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True. bool True or False False
warm_start When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. bool True or False False

Ridge Regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where independent variables are highly correlated. It can be used when the input variables are highly correlated with the target.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
alpha Constant that multiplies the L2 term, controlling regularization strength. float (0, +Inf) 1.0
fit_intercept Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations. bool True or False True
normalize This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the
l2-norm.
bool True or False False
tol (tolerance) Precision of the solution. float [0.0, +Inf) 1e-4
solver Solver to use in the computational routines: string {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’} ’auto’
Note: Values of the solver are:
  • auto’ chooses the solver automatically based on the type of data.
  • svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.
  • cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
  • sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
  • lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
  • sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

SVM Regression

Support vector regression is used to predict discrete values. Support vector regression uses the same principle as the SVMs. The basic idea behind SVM is to find the best fit line. In SVM, the best fit line is the hyperplane that has the maximum number of points.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
C Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. float (0.0, +Inf) 1.0
kernel Specifies the kernel type to be used in the algorithm. If none is given, rbf will be used. If a callable is given it is used to precompute the kernel matrix. string {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’} ’rbf’
degree Degree of the polynomial kernel function (‘poly’). int [0, +Inf) 3
gamma Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. string or float {‘scale’, ‘auto’} or (0.0, +Inf) ’scale’
coef0 Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’. float (-Inf, +Inf) 0.0
shrinking Whether to use the shrinking heuristic. bool True or False True
tol (tolerance) Tolerance for stopping criterion. float [0.0, +Inf) 1e-3
epsilon Epsilon in the
epsilon-SVM model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
float [0, +Inf) 0.1

XGB Regression

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine-learning algorithms under the gradient-boosting framework. It provides a parallel tree boosting to solve many data science problems quickly and accurately. It uses L1 and L2 regularization to predict points and trains quickly.

Hyper Parameters:

Parameter Description Data Type Possible Values Default Values
booster Decides which booster to use. string {‘gbtree', 'gblinear', 'dart' } ’gbtree’
learning_rate Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative. float [0,1] 0.1
n_estimators
(number of estimators)
Number of trees to fit. int [1, 500] 100
objective Logistic regression for binary classification. string Mentioned below the table . "reg:linear"
subsample Control the sample's proportion. int (0,1] 1
max_depth Maximum depth of a tree. int (0, +Inf) 3
max_delta_step If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. int or float [0, +Inf) 0
colsample_bytree (column sample by tree) Column's fraction of random samples. float (0, 1] 1.0
colsample_bylevel (column sample by level) It is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree. float (0, 1] 1.0
min_child_weight Minimum sum of weights. int [0, +Inf) 1
reg_alpha (alpha) L1 regularization term on weights. float [0.0, +Inf) 0.0
reg_lambda (lambda) L2 regularization term on weights. float [0.0, +Inf) 0.0
scale_pos_weight (scale positive weight) Control the balance of positive and negative weights, useful for unbalanced classes. int [0, +Inf) 1

POSSIBLE VALUES FOR “OBJECTIVE” PARAM :

{ “rank:pairwise”, reg:tweedie, “reg:gamma”, “reg:linear”, “count:poisson”}

Last Updated 2023-06-15 17:14:14 +0530 +0530