Create a data pipeline

Now that we have uploaded the dataset, we will proceed with creating a data pipeline with the dataset.

Navigate to the Datasets component in the left menu. There are two ways to create a data pipeline:

You can click on the dataset and then click Create Pipeline in the top-right corner of the page.

- You can click on the pen icon located to the left of the dataset name, as shown in the image below.

Here, we are uploading the Bank_Customers_Sample_Data dataset for preprocessing.

Name the pipeline “Churn_Prediction_Data_Pipeline” and click Create Pipeline.

The pipeline builder interface will open as shown in the screenshot below.

We will be performing the following set of data preprocessing operations in order to clean, refine, and transform the datasets, and then execute the data pipeline. Each of these operations involve individual data nodes that are used to construct the pipeline.

Data preprocessing with QuickML

Select/drop columns

Selecting or dropping columns from a dataset is a common data preprocessing step in data analysis and machine learning. The choice to select or drop columns depends on the specific objectives and requirements of your analysis or modelling task. The columns we don’t need for our model training from this dataset are “RowNumber”, “CustomerId” and “Surname” in the provided datasets. Using QuickML, you may quickly choose the necessary fields from the dataset for model training using the Select/Drop node from the Data Cleaning component.
Filling columns in dataset with values

Using the Fill Columns node in QuickML, we can easily fill the column values based on any certain condition. We can fill the null values or non-null values based on our requirements. For the columns “EstimatedSalary,” and “Balance” we are replacing the empty values with a custom value of “0”.
Filter Data

Filtering a dataset typically involves selecting a subset of rows from a DataFrame that meet certain criteria or conditions. Here we are using the Filter node from the Data Cleaning session to filter all the columns “CreditScore”, “Geography”, “Gender”, “Age”, “Tenure” and “Exited” that have non-empty values using the Filter node from the Data Cleaning session.
Save and Execute

Once all the nodes are connected, click the Save button to save the pipeline. Then click on Execute button to execute the pipeline.

You’ll be redirected to the page below, which shows the executed pipeline with the execution status. We can see here that the pipeline execution was successful.

Click on Execution Stats to access more details regarding the compute usage, as shown below.

In this part, we’ve looked at how to process data using QuickML, giving you a variety of effective ways to get your data ready for the creation of machine learning models. This data pipeline can be reused to create multiple ML experiments for varied use cases within your Catalyst project.

Last Updated 2025-03-11 13:27:23 +0530 +0530

Data Cleaning Data Transformation Data Profiler and Viewer

Create a data pipeline

Data preprocessing with QuickML

Select/drop columns

Filling columns in dataset with values

Filter Data

Save and Execute