Regression algorithms

# Regression algorithms
Regression is the task of predicting a continuous quantity. QuickML features following regression algorithms

1. ### AdaBoost Regression

Adaboost is a machine-learning algorithm that builds a series of small, one-step (one level) decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model.

This regression begins by fitting a regressor on the original dataset, followed by fitting additional copies of the regressor on the same dataset. The weights of these instances are adjusted according to the error of the current prediction. So that, subsequent regressors focus more on difficult cases.

Boosting in machine learning is a way of combining multiple simple models into a single composite model. This is also why boosting is known as an additive model, since simple models (also known as weak learners) are added one at a time, while keeping existing trees in the model unchanged. As we combine more and more simple models, the complete final model becomes a stronger predictor.

#### Hyper Parameters:

2. ### CatBoost Regression

CatBoost is based on gradient boosted decision trees. During training, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees. The number of trees is controlled by the starting parameters.

It has much less prediction time compared to others.

#### Hyper Parameters:

3. ### Decision-Tree Regression

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. Decision trees can handle both categorical and numerical data. When predicting the output value of a set of features, it will predict the output based on the subset that the set of features falls into.

#### Hyper Parameters:

4. ### ElasticNet Regression

Elastic net is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 (Lasso Regression) and L2 (Ridge Regression) penalty functions.  Elastic Net is an extension of linear regression that adds regularization penalties to the loss function during training.

Regularization is a technique to prevent the model from over-fitting by adding extra information to it.  In regularization technique, we reduce the magnitude of the features by keeping the same number of features.

Sometimes, the lasso regression can cause a small bias (difference between predicted and actual value) in the model where the prediction is too dependent upon a particular variable. In these cases, elastic bet proves to be better performing by combining the regularization of both lasso and ridge regression.

#### Hyper Parameters:

5. ### GB Regression
    
    Gradient-boosting regression calculates the difference between the current prediction and the known correct target value.

This difference is called residual. After obtaining this value, gradient-boosting regression trains a weak model (Decision Tree) that maps features to that residual. This residual predicted by a weak model is added to the existing model input, nudging the model towards the correct target. Repeating this step multiple times improves the overall model prediction.

#### Hyper Parameters:

<table class="content-table quickml-content-table">
        <thead>
        <tr>
            <th class="w25p">Parameter</th>
            <th class="w30p">Description</th>
            <th class="w10p">Data Type</th>
            <th class="w25p">Possible Values</th>
            <th class="w20p">Default Values</th>
        </tr>
        </thead>
        <tbody>
        <tr>
            <td>loss</td>
            <td>Loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).</td>
            <td>string</td>
            <td>{&#39;ls&#39;, &#39;lad&#39;, &#39;huber&#39;, &#39;quantile&#39;}</td>
            <td>’ls’</td>
        </tr>
        <tr>
            <td>learning_rate</td>
            <td>Learning rate shrinks the contribution of each tree by learning_rate.</td>
            <td>float</td>
            <td>(0.0, +inf)</td>
            <td>0.1</td>
        </tr>
        <tr>
            <td>n_estimators<br> (number of estimators)</td>
            <td>The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.</td>
            <td>int</td>
            <td>[1, 500)</td>
            <td>100</td>
        </tr>
        <tr>
            <td>criterion</td>
            <td>The function to measure the quality of a split.</td>
            <td>string</td>
            <td>{&#39;friedman_mse&#39;, &#39;mse&#39;, &#39;mae&#39;}</td>
            <td>’friedman_mse’</td>
        </tr>
        <tr>
            <td>subsample</td>
            <td>The fraction of samples to be used for fitting the individual base learners.</td>
            <td>float</td>
            <td>(0.0, 1.0]</td>
            <td>1.0</td>
        </tr>
        <tr>
            <td>max_depth</td>
            <td>Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree.</td>
            <td>int</td>
            <td>(0, +Inf)</td>
            <td>None</td>
        </tr>
        <tr>
            <td>min_samples_split</td>
            <td>The minimum number of samples required to split an internal node</td>
            <td>int or float</td>
            <td>[2, +Inf) or (0, 1.0]</td>
            <td>2</td>
        </tr>
        <tr>
            <td>min_samples_leaf</td>
            <td>The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.</td>
            <td>int or float</td>
            <td> [1, +Inf) or (0, 0.5]</td>
            <td>1</td>
        </tr>
        <tr>
            <td>min_weight_fraction_leaf</td>
            <td>The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.</td>
            <td>float</td>
            <td>[0, 0.5]</td>
            <td>0</td>
        </tr>
        <tr>
            <td>max_features</td>
            <td>The number of features to consider when looking for the best split</td>
            <td>int, float or string</td>
            <td>(0, n_features] or { “sqrt”, “log2”}</td>
            <td>None</td>
        </tr>
        <tr>
            <td>max_leaf_nodes</td>
            <td>Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity.</td>
            <td>int</td>
            <td>(1, +Inf)</td>
            <td>None</td>
        </tr>
        <tr>
            <td>min_impurity_decrease</td>
            <td>A node will be split if this split induces a decrease of the impurity greater than or equal to this value.</td>
            <td>float</td>
            <td>[0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>init</td>
            <td>An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero.</td>
            <td>object</td>
            <td>estimator (Regression model except cat boost ) or ‘zero’</td>
            <td>None</td>
        </tr>
        <tr>
            <td>warm_start</td>
            <td>When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution.</td>
            <td>bool</td>
            <td>True or  False</td>
            <td>False</td>
        </tr>
        <tr>
            <td>tol (tolerance)</td>
            <td>Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.</td>
            <td>float</td>
            <td>[0.0, +Inf)</td>
            <td>1e-4</td>
        </tr>
    </tbody>
    </table>

6. ### KNN Regression
    
    KNN  Regression works by finding the distances between a query (data instance) and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the point that is the average of the observations in the same neighbourhood.

In other words, it approximates the association between independent variables (input variables) and the continuous outcome (target) by averaging the observations in the same neighbourhood.

#### Hyper Parameters:

7. ### kernel Regression

This regression simply fits a line to a scatter plot. Kernel values are used to derive weights to predict outputs from given inputs. Kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

#### Hyper Parameters:

8. ### LGBM Regression

LGBM works by starting with an initial estimate that is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. It can be used on any data and provides a high degree of accuracy, as it contains many built-in preprocessing steps.

The LightGBM algorithm grows vertically, meaning it grows leaf-wise, while other algorithms grow level-wise. LightGBM chooses the leaf with the largest loss to grow. It can lower more loss than a level-wise algorithm when growing the same leaf.

#### Hyper Parameters:

<table class="content-table quickml-content-table">
        <thead>
        <tr>
            <th class="w25p">Parameter</th>
            <th class="w30p">Description</th>
            <th class="w10p">Data Type</th>
            <th class="w25p">Possible Values</th>
            <th class="w20p">Default Values</th>
        </tr>
        </thead>
        <tbody>
        <tr>
            <td>boosting_type</td>
            <td>Method of Boosting.</td>
            <td>string</td>
            <td>{‘gbdt’, ‘dart’, ‘goss’}</td>
            <td>&#39;gbdt&#39;</td>
        </tr>
        <tr>
            <td>num_leaves</td>
            <td>Maximum tree leaves for base learners.</td>
            <td>int</td>
            <td>(1, +Inf)</td>
            <td>31</td>
        </tr>
        <tr>
            <td>max_depth</td>
            <td>Maximum tree depth for base learners, &lt;= 0 means no limit.</td>
            <td>int</td>
            <td>(-Inf, +Inf)</td>
            <td>-1</td>
        </tr>
        <tr>
            <td>learning_rate</td>
            <td>Boosting learning rate.</td>
            <td>float</td>
            <td>(0.0, +Inf)</td>
            <td>0.1</td>
        </tr>
        <tr>
            <td>n_estimators<br> (number of estimators)</td>
            <td>Number of boosted trees to fit.</td>
            <td>int</td>
            <td>[1, 500]</td>
            <td>100</td>
        </tr>
        <tr>
            <td>subsample_for_bin</td>
            <td>Number of samples for constructing bins.</td>
            <td>int</td>
            <td>(0, +Inf)</td>
            <td>200000</td>
        </tr>
        <tr>
            <td>min_split_gain</td>
            <td>Minimum loss reduction required to make a further partition on a leaf node of the tree.</td>
            <td>float</td>
            <td>[0.0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>min_child_weight</td>
            <td>Minimum sum of instance weight (Hessian) needed in a child (leaf).</td>
            <td>float</td>
            <td>[0.0, +Inf)</td>
            <td>1e-3</td>
        </tr>
        <tr>
            <td>min_child_samples</td>
            <td>Minimum number of data needed in a child (leaf).</td>
            <td>int</td>
            <td>[0, +Inf)</td>
            <td>20</td>
        </tr>
        <tr>
            <td>subsample</td>
            <td>Subsample ratio of the training instance.</td>
            <td>float</td>
            <td>(0.0, 1.0]</td>
            <td>1.0</td>
        </tr>
        <tr>
            <td>subsample_freq (subsample_frequency)</td>
            <td>Frequency of subsample, &lt;= 0 means no enable.</td>
            <td>int</td>
            <td>(-Inf, +Inf)</td>
            <td>0</td>
        </tr>
        <tr>
            <td>colsample_bytree (column sample by tree)</td>
            <td>Subsample ratio of columns when constructing each tree.</td>
            <td>float</td>
            <td>(0.0, 1.0]</td>
            <td>1.0</td>
        </tr>
        <tr>
            <td>reg_alpha (alpha)</td>
            <td>L1 regularization term on weights.</td>
            <td>float</td>
            <td>(0.0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>reg_lambda (lambda)</td>
            <td>L2 regularization term on weights.</td>
            <td>float</td>
            <td>(0.0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>importance_type</td>
            <td>The type of feature importance to be filled into feature<em>importances</em>. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.</td>
            <td>string</td>
            <td>{ ‘gain’, &#39;split&#39;}</td>
            <td>&#39;split&#39;</td>
        </tr>
        </tbody>
        </table>

9. ### Lasso Regression
    
    Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction.  Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

#### Hyper Parameters:

10. ### Linear Regression
    
    Linear regression is a regression model that estimates the linear relationship between independent variable (input) and dependent variable (target) using a straight line. It is the basic algorithm for regression type of problems.

#### Hyper Parameters:

11. ### Random-Forest Regression
    
    The random forest is a classification and regression algorithm consisting of many decisions trees. It uses bagging and feature randomness when building individual trees to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

A Bagging is an ensemble meta-estimator that fits base classifiers/regressors on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

#### Hyper Parameters:

<table class="content-table quickml-content-table">
        <thead>
        <tr>
            <th class="w25p">Parameter</th>
            <th class="w30p">Description</th>
            <th class="w10p">Data Type</th>
            <th class="w25p">Possible Values</th>
            <th class="w20p">Default Values</th>
        </tr>
        </thead>
        <tbody>
        <tr>
            <td>n_estimators</td>
            <td>The number of trees in the forest.</td>
            <td>int</td>
            <td>[1, 500]</td>
            <td>100</td>
        </tr>
        <tr>
            <td>criterion</td>
            <td>The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion, “absolute_error” for the mean absolute error.</td>
            <td>string</td>
            <td>{&quot;mse&quot;, &quot;mae&quot;}</td>
            <td>”mse”</td>
        </tr>
        <tr>
            <td>max_depth</td>
            <td>The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.</td>
            <td>int</td>
            <td>(0, +Inf)</td>
            <td>None</td>
        </tr>
        <tr>
            <td>min_samples_split</td>
            <td>The minimum number of samples required to split an internal node</td>
            <td>int  or float</td>
            <td>[2, +Inf) or (0, 1.0]</td>
            <td>2</td>
        </tr>
        <tr>
            <td>min_samples_leaf</td>
            <td>The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.</td>
            <td>int or float</td>
            <td>[1, +Inf) or (0, 0.5]</td>
            <td>1</td>
        </tr>
        <tr>
            <td>min_weight_fraction_leaf</td>
            <td>The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.</td>
            <td>float</td>
            <td>[0, 0.5]</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>max_features</td>
            <td>The number of features to consider when looking for the best split</td>
            <td>int, float or string</td>
            <td>(0, n_features] or { “sqrt”, “log2”}, None</td>
            <td>None</td>
        </tr>
        <tr>
            <td>max_leaf_nodes</td>
            <td>Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity.</td>
            <td>int</td>
            <td>(1, +Inf)</td>
            <td>None</td>
        </tr>
        <tr>
            <td>min_impurity_decrease</td>
            <td>A node will be split if this split induces a decrease of the impurity greater than or equal to this value.</td>
            <td>float</td>
            <td>[0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>bootstrap</td>
            <td>Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.</td>
            <td>bool</td>
            <td>True or False</td>
            <td>True</td>
        </tr>
        <tr>
            <td>oob_score (out of bag score)</td>
            <td>Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True.</td>
            <td>bool</td>
            <td>True or False</td>
            <td>False</td>
        </tr>
        <tr>
            <td>warm_start</td>
            <td>When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.</td>
            <td>bool</td>
            <td>True or  False</td>
            <td>False</td>
        </tr>
    </tbody>
    </table>

12. ### Ridge Regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where independent variables are highly correlated. It can be used when the input variables are highly correlated with the target.

#### Hyper Parameters:

Note: Values of the solver are:

* 'auto' chooses the solver automatically based on the type of data.
    * 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.
    * 'cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution.
    * 'sparse_cg' uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
    * 'lsqr' uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
    * 'sag' uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

13. ### SVM Regression

Support vector regression is used to predict discrete values. Support vector regression uses the same principle as the SVMs. The basic idea behind SVM is to find the best fit line. In SVM, the best fit line is the hyperplane that has the maximum number of points.

#### Hyper Parameters:

14. ### XGB Regression

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine-learning algorithms under the gradient-boosting framework. It provides a parallel tree boosting to solve many data science problems quickly and accurately. It uses L1 and L2 regularization to predict points and trains quickly.

#### Hyper Parameters:

<table class="content-table quickml-content-table">
        <thead>
        <tr>
            <th class="w25p">Parameter</th>
            <th class="w30p">Description</th>
            <th class="w10p">Data Type</th>
            <th class="w25p">Possible Values</th>
            <th class="w20p">Default Values</th>
        </tr>
        </thead>
        <tbody>
        <tr>
            <td>booster</td>
            <td>Decides which booster to use.</td>
            <td>string</td> 
            <td>{‘gbtree&#39;, &#39;gblinear&#39;, &#39;dart&#39; }</td>
            <td>’gbtree’</td>
        </tr>
        <tr>
            <td>learning_rate</td>
            <td>Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.</td>
            <td>float</td>
            <td>[0,1]</td>
            <td>0.1</td>
        </tr>
        <tr>
            <td>n_estimators<br>(number of estimators)</td>
            <td>Number of trees to fit.</td>
            <td>int</td>
            <td>[1, 500]</td>
            <td>100</td>
        </tr>
        <tr>
            <td>objective</td>
            <td>Logistic regression for binary classification.</td>
            <td>string</td>
            <td>Mentioned below the table .</td>
            <td>&quot;reg:linear&quot;</td>
        </tr>
        <tr>
            <td>subsample</td>
            <td>Control the sample&#39;s proportion.</td>
            <td>int</td>
            <td>(0,1]</td>
            <td>1</td>
        </tr>
        <tr>
            <td>max_depth</td>
            <td>Maximum depth of a tree.</td>
            <td>int</td>
            <td>(0, +Inf)</td>
            <td>3</td>
        </tr>
        <tr>
            <td>max_delta_step</td>
            <td>If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.</td>
            <td>int  or float</td>
            <td>[0, +Inf)</td>
            <td>0</td>
        </tr>
        <tr>
            <td>colsample_bytree (column sample by tree)</td>
            <td>Column&#39;s fraction of random samples.</td>
            <td>float</td>
            <td>(0, 1]</td>
            <td>1.0</td>
        </tr>
        <tr>
            <td>colsample_bylevel (column sample by level)</td>
            <td>It is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.</td>
            <td>float</td>
            <td>(0, 1]</td>
            <td>1.0</td>
        </tr>
        <tr>
            <td>min_child_weight</td>
            <td>Minimum sum of weights.</td>
            <td>int</td>
            <td>[0, +Inf)</td>
            <td>1</td>
        </tr>
        <tr>
            <td>reg_alpha (alpha)</td>
            <td>L1 regularization term on weights.</td>
            <td>float</td>
            <td>[0.0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>reg_lambda (lambda)</td>
            <td>L2 regularization term on weights.</td>
            <td>float</td>
            <td>[0.0, +Inf)</td>
            <td>0.0</td>
        </tr>
        <tr>
            <td>scale_pos_weight (scale positive weight)</td>
            <td>Control the balance of positive and negative weights, useful for unbalanced classes.</td>
            <td>int</td>
            <td>[0, +Inf)</td>
            <td>1</td>
        </tr>
    </tbody>
    </table>

**POSSIBLE VALUES FOR "OBJECTIVE" PARAM :**

{ "rank:pairwise", reg:tweedie, "reg:gamma", "reg:linear", "count:poisson"}

Regression is the task of predicting a continuous quantity. QuickML features following regression algorithms

AdaBoost Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
base_estimator	The base estimator from which the boosted ensemble is built. If None, then the base estimator is DecisionTreeRegressor initialized with max_depth=3.	object	Any regression model	None
n_estimators (number of estimators)	The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.	int	[1, 500]	50
learning_rate	Weight applied to each regressor at each boosting iteration. A higher learning rate increases the contribution of each regressor.	float	(0.0, +Inf)	1.0
loss	The loss function to use when updating the weights after each boosting iteration.	string	{‘linear’, ‘square’, ‘exponential’}	"linear"

CatBoost Regression

It has much less prediction time compared to others.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
learning_rate	The learning rate used for training.	float	(0,1]	0.03
l2_leaf_reg (l2_leaf_regularization)	Coefficient at the L2 regularization term of the cost function.	float	[0,+Inf)	3.0
rsm (random subspace method)	The percentage of features to use at each split selection, when features are selected over again at random.	float	(0,1]	None
loss_function	The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters.	string	{'RMSE', 'MAE', 'Quantile:alpha=value, 'LogLinQuantile: alpha=value', 'Poisson', 'MAPE', 'Lq:q=value', 'SurvivalAft:dist=value; scale=value'} Note : range of value = [0, 1]	'RMSE'
nan_mode	The method for processing missing values in the input dataset.	string	{'Forbidden', 'Min', 'Max'}	Min
leaf_estimation_method	The method used to calculate the values in leaves.	string	{"Newton", "Gradient"}	None
score_function	The score type used to select the next split during the tree construction.	string	{L2, Cosine}	Cosine
max_depth	Maximum depth of the tree.	int	[1,+Inf)	None
n_estimators (number of estimators)	The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.	int	[1, 500]	None

Decision-Tree Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
criterion	The function to measure the quality of a split.	string	{"mse", "friedman_mse", "mae"}	"mse”
splitter	The strategy used to choose the split at each node.	string	{“best”, “random”}	”best”
max_depth	The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.	int	(0, +Inf)	None
min_samples_split	The minimum number of samples required to split an internal node	int or float	[2, +Inf) or (0, 1.0]	2
min_samples_leaf	The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.	int or float	[1, +Inf) or (0, 0.5]	1
min_weight_fraction_leaf	The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.	float	[0, 0.5]	0
max_features	The number of features to consider when looking for the best split	int, float or string	(0, n_features] or { “sqrt”, “log2”},	None
max_leaf_nodes	Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity.	int	(1, +Inf)	None
min_impurity_decrease	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.	float	[0, +Inf)	0.0

ElasticNet Regression

Elastic net is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 (Lasso Regression) and L2 (Ridge Regression) penalty functions. Elastic Net is an extension of linear regression that adds regularization penalties to the loss function during training.

Regularization is a technique to prevent the model from over-fitting by adding extra information to it. In regularization technique, we reduce the magnitude of the features by keeping the same number of features.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
alpha	Constant that multiplies the penalty terms.	float	(0, +Inf)	1.0
l1_ratio	The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.	float	[0, 1]	0.5
fit_intercept	Whether the intercept should be estimated or not.	bool	True or False	True
normalize	This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.	bool	True or False	False
tol (tolerance)	The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.	float	[0.0, +Inf)	1e-4
warm_start	When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.	bool	True or False	False
positive	When set to True, it forces the coefficients to be positive.	bool	True or False	False
selection	If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default.	string	{"cyclic", "random"}	"cyclic"

GB Regression

Gradient-boosting regression calculates the difference between the current prediction and the known correct target value.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
loss	Loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).	string	{'ls', 'lad', 'huber', 'quantile'}	’ls’
learning_rate	Learning rate shrinks the contribution of each tree by learning_rate.	float	(0.0, +inf)	0.1
n_estimators (number of estimators)	The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.	int	[1, 500)	100
criterion	The function to measure the quality of a split.	string	{'friedman_mse', 'mse', 'mae'}	’friedman_mse’
subsample	The fraction of samples to be used for fitting the individual base learners.	float	(0.0, 1.0]	1.0
max_depth	Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree.	int	(0, +Inf)	None
min_samples_split	The minimum number of samples required to split an internal node	int or float	[2, +Inf) or (0, 1.0]	2
min_samples_leaf	The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.	int or float	[1, +Inf) or (0, 0.5]	1
min_weight_fraction_leaf	The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.	float	[0, 0.5]	0
max_features	The number of features to consider when looking for the best split	int, float or string	(0, n_features] or { “sqrt”, “log2”}	None
max_leaf_nodes	Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity.	int	(1, +Inf)	None
min_impurity_decrease	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.	float	[0, +Inf)	0.0
init	An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero.	object	estimator (Regression model except cat boost ) or ‘zero’	None
warm_start	When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution.	bool	True or False	False
tol (tolerance)	Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.	float	[0.0, +Inf)	1e-4

KNN Regression

KNN Regression works by finding the distances between a query (data instance) and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the point that is the average of the observations in the same neighbourhood.

In other words, it approximates the association between independent variables (input variables) and the continuous outcome (target) by averaging the observations in the same neighbourhood.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
n_neighbors (number of neighbours)	Number of neighbors to use by default for kneighbors queries.	int	[1, n] n = Total number of records in dataset	5
weights	Weight function used in prediction. ‘uniform’: uniform weights. All points in each neighborhood are weighted equally. ‘distance’: weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.	string	{‘uniform’, ‘distance’}	’uniform’
algorithm	Algorithm used to compute the nearest neighbors	string	{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}	’auto’
leaf_size	Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.	int	(1, +Inf)	30
p	Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.	int	[1,3]	2
metric	Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2.	str	{‘cityblock’, ‘cosine’, 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean', ’minkowski’}	’minkowski’

kernel Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
alpha	Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization.	float	[0, +Inf)	1.0
kernel	Kernel mapping used internally. This parameter is directly passed to pairwise_kernel. If kernel is a string, it must be one of the metrics in pairwise. PAIRWISE_KERNEL_FUNCTIONS or “precomputed”. If kernel is “precomputed”, X is assumed to be a kernel matrix.	string	{‘additive_chi2’,'chi2' ‘linear’, ‘poly’, ‘polynomial’, ‘rbf’, ‘laplacian’, ‘sigmoid’, 'cosine’}	”linear”
gamma	Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise.	float	[0, +Inf)	None
degree	Degree of the polynomial kernel.	float	[0, +Inf)	3
coef0	Zero coefficient for polynomial and sigmoid kernels.	float	(-Inf, +Inf)	1

LGBM Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
boosting_type	Method of Boosting.	string	{‘gbdt’, ‘dart’, ‘goss’}	'gbdt'
num_leaves	Maximum tree leaves for base learners.	int	(1, +Inf)	31
max_depth	Maximum tree depth for base learners, <= 0 means no limit.	int	(-Inf, +Inf)	-1
learning_rate	Boosting learning rate.	float	(0.0, +Inf)	0.1
n_estimators (number of estimators)	Number of boosted trees to fit.	int	[1, 500]	100
subsample_for_bin	Number of samples for constructing bins.	int	(0, +Inf)	200000
min_split_gain	Minimum loss reduction required to make a further partition on a leaf node of the tree.	float	[0.0, +Inf)	0.0
min_child_weight	Minimum sum of instance weight (Hessian) needed in a child (leaf).	float	[0.0, +Inf)	1e-3
min_child_samples	Minimum number of data needed in a child (leaf).	int	[0, +Inf)	20
subsample	Subsample ratio of the training instance.	float	(0.0, 1.0]	1.0
subsample_freq (subsample_frequency)	Frequency of subsample, <= 0 means no enable.	int	(-Inf, +Inf)	0
colsample_bytree (column sample by tree)	Subsample ratio of columns when constructing each tree.	float	(0.0, 1.0]	1.0
reg_alpha (alpha)	L1 regularization term on weights.	float	(0.0, +Inf)	0.0
reg_lambda (lambda)	L2 regularization term on weights.	float	(0.0, +Inf)	0.0
importance_type	The type of feature importance to be filled into featureimportances. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.	string	{ ‘gain’, 'split'}	'split'

Lasso Regression

Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
alpha	Constant that multiplies the L1 term, controlling regularization strength. alpha must be a non-negative float	float	(0, +Inf)	1.0
fit_intercept	Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations	bool	True or False	True
normalize	This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.	bool	True or False	False
tol (tolerance)	The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.	float	[0.0, +Inf)	1e-4
warm_start	When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.	bool	True or False	False
positive	When set to True, forces the coefficients to be positive.	bool	True or False	False
selection	If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default.	string	{"cyclic", "random"}	"cyclic"

Linear Regression

Linear regression is a regression model that estimates the linear relationship between independent variable (input) and dependent variable (target) using a straight line. It is the basic algorithm for regression type of problems.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
fit_intercept	Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations.	bool	True or False	True
normalize	This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.	bool	True or False	False

Random-Forest Regression

The random forest is a classification and regression algorithm consisting of many decisions trees. It uses bagging and feature randomness when building individual trees to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
n_estimators	The number of trees in the forest.	int	[1, 500]	100
criterion	The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion, “absolute_error” for the mean absolute error.	string	{"mse", "mae"}	”mse”
max_depth	The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.	int	(0, +Inf)	None
min_samples_split	The minimum number of samples required to split an internal node	int or float	[2, +Inf) or (0, 1.0]	2
min_samples_leaf	The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches.	int or float	[1, +Inf) or (0, 0.5]	1
min_weight_fraction_leaf	The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.	float	[0, 0.5]	0.0
max_features	The number of features to consider when looking for the best split	int, float or string	(0, n_features] or { “sqrt”, “log2”}, None	None
max_leaf_nodes	Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity.	int	(1, +Inf)	None
min_impurity_decrease	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.	float	[0, +Inf)	0.0
bootstrap	Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.	bool	True or False	True
oob_score (out of bag score)	Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True.	bool	True or False	False
warm_start	When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.	bool	True or False	False

Ridge Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
alpha	Constant that multiplies the L2 term, controlling regularization strength.	float	(0, +Inf)	1.0
fit_intercept	Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations.	bool	True or False	True
normalize	This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.	bool	True or False	False
tol (tolerance)	Precision of the solution.	float	[0.0, +Inf)	1e-4
solver	Solver to use in the computational routines:	string	{‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}	’auto’

Note: Values of the solver are:

‘auto’ chooses the solver automatically based on the type of data.
‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.
‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

SVM Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
C	Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive.	float	(0.0, +Inf)	1.0
kernel	Specifies the kernel type to be used in the algorithm. If none is given, rbf will be used. If a callable is given it is used to precompute the kernel matrix.	string	{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}	’rbf’
degree	Degree of the polynomial kernel function (‘poly’).	int	[0, +Inf)	3
gamma	Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.	string or float	{‘scale’, ‘auto’} or (0.0, +Inf)	’scale’
coef0	Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.	float	(-Inf, +Inf)	0.0
shrinking	Whether to use the shrinking heuristic.	bool	True or False	True
tol (tolerance)	Tolerance for stopping criterion.	float	[0.0, +Inf)	1e-3
epsilon	Epsilon in the epsilon-SVM model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.	float	[0, +Inf)	0.1

XGB Regression

Hyper Parameters:

Parameter	Description	Data Type	Possible Values	Default Values
booster	Decides which booster to use.	string	{‘gbtree', 'gblinear', 'dart' }	’gbtree’
learning_rate	Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.	float	[0,1]	0.1
n_estimators (number of estimators)	Number of trees to fit.	int	[1, 500]	100
objective	Logistic regression for binary classification.	string	Mentioned below the table .	"reg:linear"
subsample	Control the sample's proportion.	int	(0,1]	1
max_depth	Maximum depth of a tree.	int	(0, +Inf)	3
max_delta_step	If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.	int or float	[0, +Inf)	0
colsample_bytree (column sample by tree)	Column's fraction of random samples.	float	(0, 1]	1.0
colsample_bylevel (column sample by level)	It is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.	float	(0, 1]	1.0
min_child_weight	Minimum sum of weights.	int	[0, +Inf)	1
reg_alpha (alpha)	L1 regularization term on weights.	float	[0.0, +Inf)	0.0
reg_lambda (lambda)	L2 regularization term on weights.	float	[0.0, +Inf)	0.0
scale_pos_weight (scale positive weight)	Control the balance of positive and negative weights, useful for unbalanced classes.	int	[0, +Inf)	1

POSSIBLE VALUES FOR “OBJECTIVE” PARAM :

{ “rank:pairwise”, reg:tweedie, “reg:gamma”, “reg:linear”, “count:poisson”}

Last Updated 2023-10-09 18:18:15 +0530 IST

Yes

Thank you for your feedback!

Send your feedback to us

Skip

Submit