# Cancer Detection


--------------------------------------------------------------------------------
title: "Introduction"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.672Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/introduction/"
service: "All Services"
--------------------------------------------------------------------------------


# Cancer Detection 

## Introduction

This tutorial will help you build a machine learning model using {{%link href=&#34;/en/quickml/getting-started/introduction/&#34; %}}Catalyst QuickML{{%/link%}}, which analyzes and predicts breast cancer. We will provide you with sample medical datasets that can be used as data sources to the model.

{{%note%}}{{%bold class=&#34;bold-primary&#34;%}}Note:{{%/bold%}} QuickML is currently not available to users accessing from the CA (Canada) data center. If your account is created in the CA DC (accounts.zohocloud.ca/), you will not be able to avail this service.{{%/note%}}

In this tutorial, we will first {{%link href=&#34;/en/quickml/help/data-preprocessing/data-cleaning/&#34; %}}preprocess the datasets{{%/link%}} to ensure that the data is clean and ready for training. Next, we will be constructing a {{%link href=&#34;/en/quickml/help/create-data-pipeline/&#34; %}}data pipeline{{%/link%}} to handle data transformation and a {{%link href=&#34;/en/quickml/help/create-ml-pipeline/&#34; %}}ML pipeline{{%/link%}} to train and evaluate the model. Finally, we will create an {{%link href=&#34;/en/quickml/help/pipeline-endpoints/&#34; %}}endpoint{{%/link%}} for the trained model, which allows external applications to interact with the model and receive real-time predictions for breast cancer.

The Cancer Detection ML model is built using the following Catalyst service:

**{{%link href=&#34;/en/quickml/getting-started/introduction/&#34; %}}Catalyst QuickML{{%/link%}}** : Using this service, we will first preprocess the sample dataset by implementing {{%link href=&#34;/en/quickml/help/operations-in-quickml/encoding/#ml-operators-for-data-preprocessing&#34; %}}node operations{{%/link%}} on them and constructing the {{%link href=&#34;/en/quickml/help/pipeline-builder-interface/walkthrough/#pipeline&#34; %}}data pipeline{{%/link%}}. This preprocessed data will be used to a create ML model by executing {{%link href=&#34;/en/quickml/help/ml-algorithms/classification-algorithms/&#34; %}}ML algorithms{{%/link%}}. Finally, the Cancer Detection ML model can be accessed by external applications using the {{%link href=&#34;/en/quickml/help/pipeline-endpoints/&#34; %}}endpoint URL{{%/link%}} generated in QuickML.

The final output after creating all the required data and ML pipelines in the {{%link href=&#34;https://console.catalyst.zoho.com/baas/index&#34; %}}Catalyst console{{%/link%}} will look like this:


--------------------------------------------------------------------------------
title: "Prerequisites"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.672Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/prerequisites/"
service: "All Services"
related:
- Catalyst QuickML (/en/quickml/getting-started/introduction/)

--------------------------------------------------------------------------------


# Prerequisites

Since this tutorial only involves {{%link href=&#34;/en/quickml/getting-started/introduction/&#34; %}}Catalyst QuickML{{%/link%}}, we will be working entirely in the {{%link href=&#34;https://console.catalyst.zoho.com/baas/index&#34; %}}Catalyst console{{%/link%}} to build data and {{%link href=&#34;/en/quickml/help/create-ml-pipeline/&#34; %}}ML pipelines{{%/link%}}, create ML models and train the models to predict outcomes. Before you begin working on this tutorial, please ensure to download the below datasets:

* **{{%link href=&#34;https://workdrive.zohoexternal.com/external/4973ea52565b9669680b95565145022d3b49c673976027b3120b6f7a3443a9d9&#34; %}}Cancer_detection_A{{%/link%}}**&lt;br&gt;
* **{{%link href=&#34;https://workdrive.zohoexternal.com/external/d68a54ed9e663eb654b0d23d09e32c3352309f5b23b2624fdbe6b1ec87ebe8dd&#34; %}}Cancer_detection_B{{%/link%}}**


This tutorial aims to implement cleaning, refining and pre-processing operations on the datasets and then use them to train ML models. We will be uploading these datasets to Catalyst QuickML in the later sections of this tutorial. 


--------------------------------------------------------------------------------
title: "Create a project"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.672Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/create-a-project/"
service: "All Services"
related:
- Catalyst Projects (/en/getting-started/catalyst-projects)

--------------------------------------------------------------------------------


# Create a Project

Let&#39;s {{%link href=&#34;/en/getting-started/catalyst-projects&#34; %}}create a Catalyst project{{%/link%}} from the Catalyst console.

1. Log in to the {{%link href=&#34;https://console.catalyst.zoho.com/baas/index&#34; %}}Catalyst console{{%/link%}}, and click {{%badge%}}Create new Project{{%/badge%}}.


2. Enter the project’s name as “**CancerDetection**” in the pop-up window that appears.


3. Click {{%bold%}}Create{{%/bold%}}. Your project will be created and opened.


--------------------------------------------------------------------------------
title: "Upload the dataset"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.672Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/upload-dataset/"
service: "All Services"
related:
- Data Connectors (/en/quickml/help/data-connectors/zoho-apps)

--------------------------------------------------------------------------------


# Upload Dataset

Let&#39;s begin by uploading the datasets in Catalyst QuickML using the available dataset {{%link href=&#34;/en/quickml/help/data-connectors/zoho-apps&#34; %}}connectors{{%/link%}}:

1. Navigate to the QuickML service in the Catalyst console and click {{%bold%}}Start Exploring{{%/bold%}}.
 &lt;br /&gt;

2. Navigate to the {{%bold%}}Datasets{{%/bold%}} component and click {{%bold%}}Import Dataset{{%/bold%}}.
 &lt;br /&gt;

3. An Import Dataset pop-up will be displayed. In the {{%bold%}}Data Sources{{%/bold%}} step, navigate to {{%bold%}}File Upload{{%/bold%}} and click {{%bold%}}Upload File{{%/bold%}}.
 &lt;br /&gt;

4. Upload the {{%bold%}}Cancer_detection_A{{%/bold%}} dataset that you have downloaded already and click {{%bold%}}Next{{%/bold%}}.
 &lt;br /&gt;

5. The name of the dataset will be auto-populated based on the uploaded file. You can edit it, if required, then click {{%bold%}}Upload{{%/bold%}}.
 &lt;br /&gt;

The dataset will be uploaded successfully.


Now, you can proceed to upload the another dataset called {{%bold%}}Cancer_detection_B{{%/bold%}} by repeating the steps mentioned above.

Since the datasets used in this tutorial is specific to the health and medical domain, we have included explanations of some terms used :

* **Patient_id**: ID of the Patient
* **Patient_name**: Name of the Patient
* **Diagnosis**: (M = Malignant / B = Benign)

The dataset also includes detailed information regarding the features of the breast mass, such as:

* **radius (mean distance from center to points on the perimeter)**
* **texture (standard deviation of gray-scale values)**
* **perimeter**
* **area**
* **smoothness (local variation in radius lengths)**
* **compactness (perimeter^2 / area - 1.0)**
* **concavity (severity of concave portions of the contour)**
* **concave points (number of concave portions of the contour)**
* **symmetry**
* **fractal dimension (coastline_approximation - 1)**

The mean, standard error and worst or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features in total including the mean radius, radius SE and worst radius.


--------------------------------------------------------------------------------
title: "Create a data pipeline"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.673Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/create-data-pipeline/"
service: "All Services"
related:
- Data Cleaning (/en/quickml/help/data-preprocessing/data-cleaning)
- Data Transformation (/en/quickml/help/data-preprocessing/data-transformation)
- Data Profiler and Viewer (/en/quickml/help/data-profiler-and-viewer/)

--------------------------------------------------------------------------------


# Create Data Pipeline

Now that we have uploaded the required datasets, we will proceed with creating a {{%link href=&#34;/en/quickml/help/pipeline-builder-interface/walkthrough/#pipeline&#34; %}}data pipeline{{%/link%}}
 for them.

1. Navigate to the {{%bold%}}Datasets{{%/bold%}} component in the left menu and click {{%bold%}}Cancer_detection_A{{%/bold%}} dataset.
   &lt;br /&gt;

2. The data pipeline details page will be displayed. Click {{%bold%}}Create Pipeline{{%/bold%}}.
   &lt;br /&gt;

3. Provide the name of the pipeline as &#34;**Pipeline_A**&#34; and click {{%bold%}}Create Pipeline{{%/bold%}}.
   &lt;br /&gt;

  The {{%link href=&#34;/en/quickml/help/pipeline-builder-interface/walkthrough/#pipeline-builder-interface-1&#34; %}}Pipeline Builder interface{{%/link%}} will be opened, as shown in the screenshot below.

  
  We will be performing the following set of data preprocessing operations in order to clean, refine, and transform the datasets, then execute the data pipeline. Each of these operations involves individual {{%link href=&#34;/en/quickml/help/data-preprocessing/data-cleaning/&#34; %}}data nodes{{%/link%}} that are used to construct the pipeline.

# Combine Datasets

Since we have two datasets, we will first need to merge them before the training process. Please make sure to follow the below listed steps to merge the two datasets:

1. In the **Operations** menu, expand the **Data Extraction** component. Drag and drop the **Add Dataset** node into the Pipeline Builder, as shown in the screenshot below. Custom name for the node can be given in the {{%badge%}}Custom Name{{%/badge%}} section. Here we have given as **Cancer dataset 2**.
     &lt;br /&gt;

2. We will now configure the details of the node in the Add Dataset section on the right panel. In our case, we will need to merge the **Cancer_detection_B** dataset with the **Cancer_detection_A** dataset. Choose **Cancer_detection_B** from **Select Dataset** dropdown and click **Save**.

3. Expand the **Data Transformation** component, drag and drop the {{%link href=&#34;/en/quickml/help/data-preprocessing/data-transformation/#union&#34; %}}Union{{%/link%}} node into the Pipeline Builder. Make a connection between the nodes as shown in the screenshot below by joining the links between the two nodes.
   &lt;br /&gt;

4. In the **Union** section on the right panel, choose to **Drop Duplicate Records** and click **Save**.

# Select Fields for Model Training

After combining the datasets, we will need to select the required fields in the merged dataset to train them further.

1. Expand the **Data Cleaning** component in the Operations menu. Drag and drop the **{{%link href=&#34;/en/quickml/help/data-preprocessing/data-cleaning#select-or-drop&#34; %}}Select/Drop{{%/link%}}** node in the Pipeline Builder and make a connection with the **Union** node.
   &lt;br /&gt;

2. In the **Select/Drop** section on the right panel, select the “**patient_id**”, “**patient_name**” and “**_c33**” columns, choose the operation &#34;**Drop**&#34; to drop the columns from the merged dataset, then click **Save**. In our case, these columns are generic, serving no purpose to be trained further, so we are removing them. 

# Data Type Conversion

For the mismatched datatypes in the columns of the datasets, we will be using the {{%link href=&#34;/en/quickml/help/data-preprocessing/data-transformation/#type-conversion&#34; %}}Type conversion node{{%/link%}} to convert the data into the appropriate types. You can view the columns and their datatypes in the **Preview** tab of the Pipeline details page.

In our dataset, the “**texture_mean**”, &#34;**radius_mean**&#34; and &#34;**perimeter_mean**&#34; column contains decimal values, but it is stored as type **String**. Please make sure to follow the below listed steps to carry out the conversion process:

1. Expand the **Data Transformation** component in the **Operations** menu. Drag and drop the **Type Conversion** node into the Pipeline Builder and make a connection with the **Select/Drop** node, as shown in the screenshot below.
   &lt;br /&gt;

2. In the [Type Conversion](/en/quickml/help/data-preprocessing/data-transformation/#type-conversion) section on the right panel, choose the column as **texture_mean**, and select the **Convert To Type** input as **Decimal(16)** from the drop-down menu. Choose between **Throw** and **Nullify** in case of any errors. Likewise click on {{%badge%}}&#34;&#43; Add&#34;{{%/badge%}} button and type convert &#34;**radius_mean**&#34; and &#34;**perimeter_mean**&#34; from **Text** to **Decimal(16)**.
   &lt;br /&gt;

3. Click {{%badge%}}Save{{%/badge%}}.

# Handle Missing Values

As a part of data preprocessing, we will need to check if there are missing values in any of the columns in the datasets and fill them. We will be using the Fill Columns node for executing this operation.

1. Expand the **Data Cleaning** component in the Operations menu. Drag and drop the **Fill Columns** node into the Pipeline Builder and make a connection with the **Type Conversion** node, as shown in the screenshot below.
   &lt;br /&gt;

2. Enable {{%badge%}}Show only the columns with missing values{{%/badge%}} to only select the columns which has empty record, then select the columns which want to be filled with custom value here we select “**concavity_se**”, and “**area_worst**” and choose the **Fill with** input as &#34;**Mean**&#34; and click {{%badge%}}Save{{%/badge%}}. This fills the empty values in the column data with the mean value of the particular columns.
   &lt;br /&gt;

  Now, we have configured the required nodes for this tutorial. Finally, make a connection between the last configured node (i.e. **Fill Columns**) and the **Destination** node.

Click {{%badge%}}Execute{{%/badge%}}. 


The data pipeline will start execution and the status of the execution will be displayed on the pipeline details page as shown in the screenshot below. Once the pipeline has completed execution, the execution status will indicate &#34;**Success**&#34;.


Click {{%badge%}}Execution Stats{{%/badge%}} to view more details about each stage of the execution in detail.


We have now prepared our dataset to develop the ML model. We will be discussing more about the ML pipeline creation in the next section.

{{%note%}}{{%bold%}}Note :{{%/bold%}} The data pipeline can be reused to create multiple ML experiments for varied use cases within your Catalyst project.{{%/note%}}


--------------------------------------------------------------------------------
title: "Create an ML Pipeline"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.673Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/create-ml-pipeline/"
service: "All Services"
related:
- ML Algorithms in QuickML (/en/quickml/help/ml-algorithms/classification-algorithms)
- Operations in QuickML (/en/quickml/help/operations-in-quickml/encoding)

--------------------------------------------------------------------------------


# Create ML Pipeline

In this section, we will be building a prediction ML model using the preprocessed datasets in the previous section. The datasets will be the input to the {{%link href=&#34;/en/quickml/help/pipeline-builder-interface/walkthrough/&#34; %}}ML Pipeline Builder{{%/link%}} which enables you to define the model&#39;s architecture and select a target column for prediction.

To create a ML pipeline:

1. Navigate to the **Pipelines** component in the left menu and click **Create Pipeline**.
   &lt;br /&gt;

2. In the pop-up that appears, select **Prediction** as pipeline type and provide the name of the pipeline as &#34;**Pipeline_B**&#34; and choose the input dataset as **Cancer_detection_A**. In our case, the target column should be &#34;**diagnosis**&#34;. The model name will be auto-populated based on the pipeline name. Click **Create Pipeline**.
  
  The {{%badge%}}Retrain model when the datasset is updated{{%/badge%}} is for retraing the pipeline created, everytime when the dataset is updated, checkout this document on {{%link href=&#34;/en/quickml/help/periodic-sync/&#34; %}}periodic-sync{{%/link%}}, where as the {{%badge%}}Create an Auto-generated pipeline using AutoML{{%/badge%}} will create a ML pipeline automatically reference document on {{%link href=&#34;/en/quickml/help/periodic-sync/&#34; %}}AutoML pipeline{{%/link%}}.

3. The pipeline details page will be displayed, as shown in the screenshot below.
  

  Now that we have created our ML pipeline, we will proceed to configure the pipeline by defining the nodes in the ML Pipeline Builder interface.

### Data Type Conversion

Since our target column &#34;**diagnosis**&#34; contains categorical data of type **String**, we will encode it for further ML training standards.

1. In the **Operations** menu, navigate to **ML operations-&gt;Encoding-&gt;Label Encoder**. Drag and drop the {{%link href=&#34;/en/quickml/help/operations-in-quickml/encoding/#label-encoding&#34; %}}Label Encoder node{{%/link%}} to the ML Pipeline Builder Interface. Label encoding can only be applied to the target column. Hence, it is executed automatically.


This operation will convert the column values of type **String** to **Integer**, while maintaining the order and preserving data accuracy.

### Hyperparameter Tuning

For any ML model, it&#39;s mandatory to implement an ML algorithm based on which the model will be trained. In this tutorial, we will be implementing the {{%link href=&#34;/en/quickml/help/zia-features-and-ml-algorithm/ml-algorithms/classification-algorithms/#logistic-regression&#34; %}}logistic classification algorithm{{%/link%}} to configure the tuning parameters for the ML model to ensure it is optimized for our preprocessed dataset.

1. In the Operations menu, expand **ML operations-&gt;Algorithm-&gt;Classification-&gt;Logistic Regression**. Drag and drop the **Logistic Regression** node in the Pipeline Builder. The node will be connected to the **Destination** node automatically. Make a connection with the **Label Encoder** and the **Logistic Regression** node.


2. For the **Logistic Regression** node, we will go with the default configuration and click **Save**.


  We have now completed making the required node connections and configurations. We can proceed to execute the pipeline by clicking on **Execute** for further evaluation and deployment.

  
  Click **Execution Stats** to view more details about each stage of the execution in detail.

  
  Upon successful execution of the ML Pipeline, the prediction model is created and will be displayed under the **Models** section. 

  You can view the details of the model in the Models details page by clicking on the model name.

  
  Additionally, the accuracy of the generated model can be evaluated and viewed in the **Metrics** section of the Models details page. This provides valuable insights on the performance and effectiveness of the model in making predictions on the data.

  
--------------------------------------------------------------------------------
title: "Create an Endpoint"
description: "Create and configure a powerful ML pipeline that can be used to predict the chances of being affected by breast cancer using the Catalyst QuickML service."
last_updated: "2026-03-18T07:41:08.686Z"
source: "https://docs.catalyst.zoho.com/en/tutorials/cancer-detection/create-endpoint/"
service: "All Services"
related:
- Pipeline Endpoints (/en/quickml/help/pipeline-endpoints)

--------------------------------------------------------------------------------


# Create Endpoint

We will now {{%link href=&#34;/en/quickml/help/pipeline-endpoints/&#34; %}}create an endpoint{{%/link%}}
for the Cancer Detection ML model to allow to interact with the model seamlessly and get predictions.

1. Navigate to the **Endpoints** component in the left menu and click **Create Endpoint**.
 &lt;br/&gt;

2. Provide the name of the endpoint as &#34;**Cancer_Detection**&#34;, choose the model as **Pipeline_B model**, the ML model that we created in the previous step and click {{%badge%}}Create Endpoint{{%/badge%}}.
 &lt;br/&gt;

3. In the Endpoints details page, you can first test the model by providing a sample request. For the below request, the model predicts the record is Cancer Malignant(M). The **likelihood score** is used in classification models to return the probability estimation, this value gives the model&#39;s confidence in its predictions.

4. Click {{%badge%}}Publish{{%/badge%}} and use the endpoint URL to integrate the ML model with any other applications. 
 &lt;br/&gt;

{{%note%}}{{%bold%}}Note :{{%/bold%}} You can also check out {{%link href=&#34;/en/quickml/help/pipeline-endpoints/#external-oauth2-authentication&#34; %}}this document{{%/link%}} to implement pipeline authentication to ensure secured access to endpoints, the ML models and datasets.{{%/note%}}