# Car Price Prediction -------------------------------------------------------------------------------- title: "Introduction" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.674Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/introduction/" service: "All Services" -------------------------------------------------------------------------------- # Car Price Prediction ## Introduction In this tutorial, we will guide you through the process of building a powerful machine learning model using {{%link href="/en/quickml/getting-started/introduction/" %}}Catalyst QuickML{{%/link%}} to predict car prices. {{%note%}}{{%bold class="bold-primary"%}}Note:{{%/bold%}} QuickML is currently not available to users accessing from the CA (Canada) data center. If your account is created in the CA DC (accounts.zohocloud.ca/), you will not be able to avail this service.{{%/note%}} We'll first do {{%link href="/en/quickml/help/data-preprocessing/data-cleaning/" %}}preprocess the datasets{{%/link%}} to make sure they're tidy and prepared for training. A {{%link href="/en/quickml/help/create-data-pipeline/" %}}data pipeline{{%/link%}} will be built next to handle data transformation, and an {{%link href="/en/quickml/help/create-ml-pipeline/" %}}ML pipeline{{%/link%}} will be built to train and test the model. Finally, we will provide an {{%link href="/en/quickml/help/pipeline-endpoints/" %}}endpoint{{%/link%}} for the trained model that enables interaction with external apps and provides car price predictions. The reason for building two pipelines is because we can reuse the data pipeline to build any number of ML pipelines in the future. The Car Price Prediction ML model is built using the following Catalyst service: **{{%link href="/en/quickml/getting-started/introduction/" %}}Catalyst QuickML{{%/link%}}** : Using this service, we will first preprocess the sample dataset by implementing {{%link href="/en/quickml/help/data-preprocessing/data-cleaning/" %}}node operations{{%/link%}} on them and constructing the {{%link href="/en/quickml/help/create-data-pipeline/" %}}data pipeline{{%/link%}}. This preprocessed data will be used to create an ML model by executing {{%link href="/en/quickml/help/ml-algorithms/classification-algorithms/" %}}ML algorithms{{%/link%}}. Finally, the Car Price Prediction ML model can be accessed by external applications using the {{%link href="/en/quickml/help/pipeline-endpoints/" %}}endpoint URL{{%/link%}} generated in QuickML. The final output, after creating all the required data and ML pipelines in the {{%link href="https://console.catalyst.zoho.com/baas/index" %}}Catalyst console{{%/link%}}, will look like this: -------------------------------------------------------------------------------- title: "Prerequisites" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.674Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/prerequisites/" service: "All Services" related: - Machine Learning Algorithms (/en/quickml/help/ml-algorithms/classification-algorithms/) -------------------------------------------------------------------------------- # Prerequisites Since this tutorial involves only {{%link href="/en/quickml/getting-started/introduction/" %}}Catalyst QuickML{{%/link%}}, we will be working entirely in the {{%link href="https://console.catalyst.zoho.com/baas/index" %}}Catalyst console{{%/link%}} to build data and {{%link href="/en/quickml/help/create-ml-pipeline/" %}}ML pipelines{{%/link%}}, create ML models, and train the models to predict outcomes. Before you begin working on this tutorial, please download the below dataset: - {{%link href="https://workdrive.zohoexternal.com/external/588c8d9ac39d9ecd355ad14737d93462224c4da61c3ed51ae37d3bd889b3ae6c" %}}Car_Price_1{{%/link%}} - {{%link href="https://workdrive.zohoexternal.com/external/e2d9526d1b1a0ba701ddadab7d654d8dbfe0696eeb806348987db0ac44caf400" %}}Car_Price_2{{%/link%}} This tutorial aims to implement cleaning, refining and pre-processing operations on the datasets, and then use them to train ML models. We will be uploading the dataset to Catalyst QuickML in the later sections of this tutorial. -------------------------------------------------------------------------------- title: "Create a project" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.675Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/create-a-project/" service: "All Services" related: - Catalyst Projects (/en/getting-started/catalyst-projects) -------------------------------------------------------------------------------- # Create a Project Let's {{%link href="/en/getting-started/catalyst-projects" %}}create a Catalyst project{{%/link%}} from the Catalyst console. 1. Log in to the {{%link href="https://console.catalyst.zoho.com/baas/index" %}}Catalyst console{{%/link%}}, then click {{%badge%}}Create a new Project{{%/badge%}}. <br /> 2. Enter the project’s name as "**CarPricePrediction**" (or a name you wish to give for the project) in the pop-up window that appears. <br /> 3. Click {{%badge%}}Create{{%/badge%}}. Your project will be created and automatically opened. To access your project later, simply click on the {{%badge%}}Access Project{{%/badge%}} button. <br /> -------------------------------------------------------------------------------- title: "Upload the dataset" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.683Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/upload-dataset/" service: "All Services" related: - Create Your First pipeline (/en/quickml/help/create-ml-pipeline) -------------------------------------------------------------------------------- # Upload the Dataset Let's begin by uploading the dataset in Catalyst QuickML using the available dataset {{%link href="/en/quickml/help/data-connectors/zoho-apps/" %}}dataset connectors{{%/link%}}. 1. Navigate to the QuickML service in the Catalyst console and click {{%badge%}}Start Exploring{{%/badge%}}. <br /> 2. Navigate to the {{%badge%}}Datasets{{%/badge%}} component and click {{%badge%}}Import Dataset{{%/badge%}}. <br /> 3. An Import Dataset pop-up will be displayed. In the **Data Sources** step, navigate to File Upload and click {{%badge%}}Upload File{{%/badge%}}. <br /> Upload the **Car_Price_1** dataset followed by the **Car_Price_2** dataset that you downloaded earlier. We can have the Quotes Type as "**Double Quotes(")**" and Escape Character as "**Backslash(\)**" and click {{%badge%}}Next{{%/badge%}}. <br /> The name of the dataset will be auto-populated based on the uploaded file. You can edit it, if required, and click {{%badge%}}Upload{{%/badge%}}. <br /> The dataset is now uploaded successfully. <br /> The dataset will be displayed in the **All Datasets** section. You can click on the dataset name to view the dataset's details. On the dataset **Details** page, you can view the {{%link href="/en/quickml/help/data-profiler-and-viewer/#what-is-data-profiling" %}}profiling, data preview{{%/link%}} and {{%link href="/en/quickml/help/data-visualization/overview/" %}}visualization chart{{%/link%}} of the dataset. The dataset's details and its profile can be seen in the below screenshot. <br /> -------------------------------------------------------------------------------- title: "Create a data pipeline" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.684Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/create-data-pipeline/" service: "All Services" related: - Data Cleaning (/en/quickml/help/data-preprocessing/data-cleaning) - Data Transformation (/en/quickml/help/data-preprocessing/data-transformation) - Data Profiler and Viewer (/en/quickml/help/data-profiler-and-viewer/) -------------------------------------------------------------------------------- # Create a data pipeline Now that we have uploaded the dataset, we will proceed with creating a {{%link href="/en/quickml/help/create-data-pipeline/"%}}data pipeline{{%/link%}} with the dataset. 1. Navigate to the **Datasets** component in the left menu. There are two ways to create a data pipeline: - You can click on the dataset and then click {{%badge%}}Create Pipeline{{%/badge%}} in the top-right corner of the page. <br /> - You can click on the pen icon located to the left of the dataset name, as shown in the image below. <br /> Here, we are uploading the **Car_price_1** dataset for preprocessing. **Car_Price_2** will be added to this dataset in the upcoming preprocessing steps. 2. Name the pipeline "**Car Price Prediction Data Pipeline**" and click {{%badge%}}Create Pipeline{{%/badge%}}. <br /> The {{%link href="/en/quickml/help/pipeline-builder-interface/walkthrough/#pipeline-builder-interface-1" %}}pipeline builder interface{{%/link%}} will open as shown in the screenshot below. <br /> We will be performing the following set of data preprocessing operations in order to clean, refine, and transform the datasets, and then execute the data pipeline. Each of these operations involve individual {{%link href="/en/quickml/help/data-preprocessing/data-cleaning/" %}}data nodes{{%/link%}} that are used to construct the pipeline. ### Data preprocessing with QuickML 1. #### Combining two datasets With the aid of the {{%badge%}}Add Dataset{{%/badge%}} [node](/en/quickml/help/data-preprocessing/data-extraction/#add-dataset) in QuickML, we can add a new dataset (please note that you must first upload the dataset you wish to add). Here, we are adding the **Car_Price_2** dataset to merge with the existing dataset. <br /> Use the {{%badge%}}Union{{%/badge%}} [node](/en/quickml/help/data-preprocessing/data-transformation/#union) in the drag-and-drop QuickML interface from **Data Transformation > Union** to combine the two supplied datasets, Car_Price_1 and Car_Price_2, into a single dataset. If any duplicate records exist in either dataset, be careful to tick the box labeled "**Drop Duplicate Records**" while performing the operation. This will remove the duplicate records from both datasets. <br /> 2. #### Select/drop columns Select or drop columns from a dataset is a common data preprocessing step in data analysis and machine learning. The choice to select or drop columns depends on the specific objectives and requirements of your analysis or modelling task. The columns we don't need for our model training are "**MPG,**" "**Convenience,**" "**Exterior,**" "**Clean title,**" "**Currency,**" and "**Name**" in the provided datasets. Using QuickML, you may quickly choose the necessary fields from the dataset for model training using the Select/Drop node from the Data Cleaning section. <br /> 3. #### Filter dataset Filtering a dataset typically involves selecting a subset of rows from a DataFrame that meet certain criteria or conditions. Here, we are filtering the "**Drivetrain**", "**Fuel Type**", "**Engine**", "**Transmission**", and "**Safety**" columns that have **non-empty** values using the {{%badge%}}Filter{{%/badge%}} [node](/en/quickml/help/data-preprocessing/data-cleaning/#filter) from the Data Cleaning section. <br /> 4. #### Fill columns in dataset with values Using the {{%badge%}}Fill Columns{{%/badge%}} [node](/en/quickml/help/data-preprocessing/data-cleaning/#fill-columns) in QuickML, we can easily fill the column values based on any certain condition. We can fill the null values or non-null values based on our requirements. Here, we are filling in the "**new&used**" column with the custom value "**Used**" for any entries in the column that are not labeled as "**New**". For columns "**Accidents or damage**","**1-owner vehicle**", and "**Personal use only**", we are replacing the empty values with a custom value "**Not mentioned**". <br /> 5. #### Save and execute Now connect the Fill Columns node to the **Destination** node. Once all the nodes are connected, click {{%badge%}}Save{{%/badge%}} to save the pipeline and then click {{%badge%}}Execute{{%/badge%}} to execute the pipeline. <!-- <br /> --> It will redirect you to a page that will show the executed pipeline with the execution status. <br /> Click on {{%badge%}}Execution Stats{{%/badge%}} to view more details about each stage of the execution in detail. <br /> In this part, we've looked at how to process data using QuickML, giving you a variety of effective ways to get your data ready for the creation of machine learning models. This data pipeline can be reused to create multiple ML experiments for varied use cases within your Catalyst project. -------------------------------------------------------------------------------- title: "Create an ML Pipeline" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.685Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/create-ml-pipeline/" service: "All Services" related: - ML Algorithms in QuickML (/en/quickml/help/ml-algorithms/classification-algorithms) - Operations in QuickML (/en/quickml/help/operations-in-quickml/encoding) -------------------------------------------------------------------------------- # Create an ML pipeline To build the prediction model, we will use the preprocessed dataset in the {{%link href="/en/quickml/help/create-ml-pipeline/"%}}ML Pipeline Builder{{%/link%}}. The initial step in the ML Pipeline Builder involves selecting the **target column**, which is the column we are trying to predict. To create an ML pipeline, first navigate to the **Pipelines** section and click {{%badge%}}Create Pipeline{{%/badge%}}. <br /> In the pop-up that appears, select **Prediction** as pipeline type and provide a name (here, we used **Car Price Model**) for the pipeline and specify the model name. Then, select the appropriate dataset and the column name of the target. In this case, our target is the column named "**money**". <br /> While selecting the dataset, we need to select the source dataset which we chose for building the data pipeline, as the preprocessed data is reflected in the source dataset. In our case, we will be importing the Car_Price_1 dataset, as we have selected this dataset for preprocessing and cleaning the data. ### Encoding categorical columns Encoders are used in various data preprocessing and machine learning tasks to convert categorical or non-numeric data into a numerical format that machine learning algorithms can work with effectively. Here, we are using {{%link href="/en/quickml/help/operations-in-quickml/encoding/#ordinal-encoder" %}}Ordinal Encoding{{%/link%}} for encoding all the categorical features. It assigns integers to the categories based on their order, making it possible for machine learning algorithms to capture the ordinal nature of the data. We may use the Ordinal Encoder node from **ML Operations > Encoding > Ordinal Encoder** in QuickML to turn the category columns into numerical columns. Here, we are converting **all categorical columns** to numerical format while retaining the columns' original order and data for model training. <br /> ### Imputers Imputers are used in various fields, such as data analysis, statistics, and machine learning to handle missing or incomplete data. Here, we are using {{%link href="/en/quickml/help/operations-in-quickml/imputers/#group-by-imputation" %}}Group By{{%/link%}} Imputer from **ML operations > Imputers > Group-By Imputer** for imputing the missing values in the dataset. Group by Imputing refers to a data imputation technique where missing values are filled based on some grouping or categorization of selected columns. If we can say that a particular column can be imputed by considering another set of column values, we can use this imputation technique. Here, the columns with missing values are "**Mileage**", "**Exterior Color**", "**Interior Color**", and "**Seating**". We have grouped the columns "**brand**", "**Year**", and "**Model**" to fill in the missing values. <br /> ### Feature Engineering The act of developing new features (variables) from already existing data is referred to as {{%link href="/en/quickml/help/operations-in-quickml/feature-engineering/#feature-generation" %}}Feature Generation{{%/link%}}. These additional features can be utilized to enhance the functionality of a machine learning model or to get more insight into the underlying data. A crucial part of the data preprocessing pipeline is feature generation, which can assist transforming raw data into something more suitable for modelling and extract useful information from it. Here, we have used the "**Autolearn**" feature generation technique to generate the features. This method generates features from the existing columns. We can select this node from **ML Operations > Feature Engineering > Feature Generation > Autolearn**. <br /> ### Selecting algorithm and model fitting The next step in ML pipeline building is selecting the appropriate algorithm for training the preprocessed data. Here, we'll use the {{%link href="/en/quickml/help/ml-algorithms/regression-algorithms/#lgbm-regression" %}}LightGBM Regressor Algorithm{{%/link%}} for training the data. LightGBM (Light Gradient Boosting Machine) is a popular gradient boosting framework used for various machine learning tasks, including regression problems. It is known for its efficiency and speed in training, making it a popular choice for large datasets. We can quickly construct the LightGBM Regression method in QuickML's ML Pipeline Builder by dragging and dropping the relevant **LightGBM Regression** node from **ML Operations > Algorithm > Regression > LGBM Regression**. In order to make sure the model is optimized for our particular dataset, we may also adjust the tuning parameters. In our instance, we can just stick with the default settings. When everything is configured, we may save the pipeline for further testing and deployment. <br /> Once we drag-and-drop the algorithm node, its end node will be automatically connected to the destination node. Once the pipeline is saved, you can execute the pipeline by clicking {{%badge%}}Execute{{%/badge%}} at the top-right corner of the pipeline page. This will redirect you to the **execution** page, where you can see the execution of the pipeline. <br /> Click {{%badge%}}Execution Stats{{%/badge%}} to view more details about each stage of the ML pipeline execution in detail. <br /> The prediction model is created and can be examined under the Model section (click on **Car Price Model model**) following the successful execution of the ML workflow. <br /> Metrics below offers useful perceptions into the efficiency and performance of the model while making predictions based on the data. <br /> -------------------------------------------------------------------------------- title: "Create an Endpoint" description: "car price predictionCreate a powerful ML pipeline that can be used to predict the car price using the Catalyst QuickML service." last_updated: "2026-03-18T07:41:08.685Z" source: "https://docs.catalyst.zoho.com/en/tutorials/car-price-prediction/create-endpoint/" service: "All Services" related: - Pipeline Endpoints (/en/quickml/help/pipeline-endpoints) -------------------------------------------------------------------------------- # Create an endpoint We will now create an endpoint for the above Deal Prediction model to allow external applications to interact with the model seamlessly and get predictions. 1. Navigate to the **Endpoints** component in the left menu and click {{%badge%}}Create Endpoint{{%/badge%}}. <br /> 2. Name the endpoint as "**Car Price Prediction**", choose the model as **Car Price Model model**, the ML model that we created in the previous step, and click {{%badge%}}Create Endpoint{{%/badge%}}. <br /> 3. On the endpoint's details page, you can test the model by providing a sample request in the Request column and click {{%badge%}}Predict{{%/badge%}}. This will generate the predicted value in the Response column. <br /> 4. Click {{%badge%}}Publish{{%/badge%}} and use the endpoint URL to integrate the ML model with any other applications. <br /> {{%note%}}{{%bold%}}Note :{{%/bold%}} You can also check out {{%link href="/en/quickml/help/pipeline-endpoints/#external-oauth2-authentication" %}}this document{{%/link%}} to implement pipeline authentication to ensure secured access to endpoints, the ML models, and datasets.{{%/note%}}