Once an ML pipeline is successfully executed, a respective ML model is created. This model view can be used to gain an understanding of its internal metrics.

The list of models created can be viewed on the Models page along with the status of each model as seen below. Models

Model Metrics

QuickML users will have the access to view the model metrics for each version, which provide valuable insights into the performance of the machine learning models. These metrics serve as essential indicators to assess the accuracy and effectiveness of the model in making predictions. Model Metrics
QuickML users have access to the following metrics.

Confusion matrix

In machine learning, a confusion matrix is used to measure the performance of a classification model. In simple terms, a confusion matrix is a summary of the number of correct and incorrect predictions made by the machine learning model. The matrix displays the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) as shown below.

Confusion matrix

TP: True Positive is the count of instances where both predicted and actual values are positive.

TN: True Negative is the count of instances where both predicted and actual values are negative.

FP: False Positive is the count of instances where the model predicted them as positive but the actual values are negative.

FN: False negative is the count of instances where the model predicted them as negative but the actual values are positive.


Let’s explain the confusion matrix with a use case: to predict airline’s passenger satisfaction with flight service.

Brief explanation: An airline gathers information about its passengers, including their demographics, flight information, and survey responses regarding their satisfaction with the airline’s service. After that, the airline applies this information to create a machine learning classification model that predicts whether the passenger is satisfied or neutral/dissatisfied.

Let’s evaluate the performance of a classification model using a confusion matrix in QuickML as shown below: Usecase

Values listed down from the confusion matrix:

Total response count 5,196
True Positive (TP) 2,930
False Negative (FN) 182
False Positive (FP) 35
True Negative (TN) 2,049

Commonly used performance metrics to evaluate any classification model are as follows.

Accuracy score: The proportion of correctly predicted instances among the total instances.

Accuracy = TP+TN/Total Responses = (2,930+2,049)/5,196 = 0.958

Precision score: Precision is a measure of how accurate the model’s positive predictions are. It is calculated as the ratio of true positive predictions to the sum of true positive and false positive predictions.

Precision = TP/(TP+FP)= 2,930/(2,930+35) = 0.988

Recall score: Recall score, also known as sensitivity, is the percentage of actual positive cases that a model correctly predicts. It is calculated by dividing the number of true positive predictions by the sum of the true positive and false negative predictions.

In simpler terms, a recall score measures how well a model can identify all of the positive cases in a dataset. A high recall score means that the model is good at finding all of the positive cases, while a low recall score means that the model is missing a lot of positive cases.

Recall score = TP/(TP+FN) = 2930/3112 = 0.941

F1 score: The harmonic mean of precision and recall, providing a balanced assessment of the model’s performance.

F1 Score = 2*Recall*Precision/(Recall+Precision) = 2*0.941*0.988/(0.941+0.988) = 0.9639

From the above metrics, we can infer a few conclusions about the model.

  1. The airline model accurately predicted 4,979 passengers’ level of satisfaction where as, 182 are incorrectly predicted as satisfied and 35 as dissatisfied/neutral.
  2. It has good performance with an accuracy of 95.8%, a precision of 98.8%, and a recall of 94.1%. However, it has missed 182 satisfied passenger predictions. Hence, the model should be fine-tuned to increase the recall score, which then would identify all the satisfied passengers.

By examining these model metrics in QuickML, we can gain deeper insights into the performance of any machine learning model and make informed decisions on model selection and optimization. This will empower users to fine-tune their models and improve predictive accuracy.

Evaluation metrics

QuicML shows the below evaluation metrics regarding classification and regression models created.

  1. Classification


  2. Regression


Cross validation metrics

Cross validation is a method for evaluating the performance of a machine learning model by splitting the training data into k folds, training the model on k-1 folds, and evaluating the model on the remaining fold. This process is repeated k times, and the average performance of the model on the k folds is used to evaluate its overall performance.

In simpler terms, cross validation works by training the model on a subset of the training data and then evaluating its performance on the remaining subset of the training data. This is repeated multiple times, and the average performance of the model on all of the subsets is used to evaluate its overall performance. It helps to ensure that the model is not over fitting the training data and that it will generalize well to new data.

QuickML provides you a plethora of cross validation metrics to track performance for both classification or regression models.

The list of metric types provided in cross validation is below:

  1. Classification Model

    Classification Model

    Metrics Types:

    • ROC AUC OVR weighted
    • ROC AUC OVO weighted
    • Balanced accuracy
    • Average precision
    • F1 score
    • F1 macro
    • F1 micro
    • F1 samples
    • F1 weighted
  2. Regression Model

    Regression Model

    Metrics Types:

    • Negative mean-squared error
    • Negative mean-squared log error
    • Negative root mean-squared error
    • Negative mean absolute error
    • Negative median absolute error
    • Negative mean poisson deviance
    • Negative mean gamma deviance
    • Negative log loss
    • Negative brier score
    • R2 score

Model Versions

Model versioning is the process of tracking and managing different versions of a machine learning model.

Model Versions

This is important because it allows you to compare different versions of the model, track its performance, and select the best version for deployment. Model versioning can also help you roll back to a previous version of the model if necessary.

Last Updated 2024-04-25 15:56:34 +0530 +0530