Create an ML pipeline

To build the prediction model, we will use the preprocessed dataset in the ML Pipeline Builder. The initial step in building the ML Pipeline involves selecting the target column, which is the column that we are trying to predict.

To create an ML pipeline, first Navigate to the Pipelines component and click on the Create Pipeline option. ML Pipeline Creation 1

In the pop-up that appears, provide the pipeline name, we’ll Name the pipeline as Fraud Detection ML pipeline and the model Fraud Detection ML pipeline Model in the Create Pipeline pop-up. Then, select the appropriate dataset and the column name of the target. ML Pipeline Creation Meta

We need to select the source dataset that is chosen for building the data pipeline, as the preprocessed data is reflected in the source dataset. In our case, we will be importing the Fraud Detection Dataset, as we have selected it for preprocessing and cleaning, and our target is the column named is_fraud.

  1. Normalize the columns

    Since the values of the features “amt”, “city_pop”, “age” are in various ranges, we will use the Mean-Std Normalization component to scale down the values of the features to a common range, typically between 0 and 1. Navigate to ML operations->Normalization. Drag and drop the Mean-Std Normalization node to the ML pipeline builder interface. In the configuration box on the right panel, choose all the columns except “is_fraud” and click Save. normalization

  2. Encoding categorical columns

    Encoders are used in various data preprocessing and machine learning tasks to convert categorical or non-numeric data into a numerical format that machine learning algorithms can work with effectively.

    • One-hot encoder

      One-hot encoding is typically applied to categorical columns in a dataset, where each category represents a distinct class or group. This method typically increases the dimensionality of the dataset because it creates a new binary column for each unique category. The number of binary columns is equal to the number of unique categories minus one, as you can infer the presence of the last category from the absence of all others.

      Here, we are using the One-Hot Encoder node to encode the following columns: “category”, “gender” and “state”. We’ll use the One-Hot Encoder node by navigating to ML operations, selecting the -> Encoding component and choosing -> One-Hot Encoder in QuickML to turn the selected category columns into numerical columns.

      One Hot Encoding

  3. ML Algorithm:

    The next step in ML pipeline building is selecting the appropriate algorithm for training the preprocessed data. Here we’ll use the XGBoost classification algorithm to train the data.

    XGBoost (Extreme Gradient Boosting) is a popular and powerful machine learning algorithm commonly used for classification tasks. It’s an ensemble learning method that combines the predictions of multiple decision trees to create a strong predictive model. XGBoost is known for its speed, scalability, and ability to handle complex datasets.
    We can quickly construct the XGBoost Classification method in QuickML’s ML Pipeline Builder by dragging and dropping the relevant XGBoost Classification node from ML operations, selecting ->Algorithm, clicking ->Classification, and choosing ->XGBoost Classification.
    In order to make sure the model is optimized for our particular dataset, we may also adjust the tuning parameters; in our instance, we can just stick with the default settings. When everything is configured, we may save the pipeline for further testing and deployment. XGBClassification
    Once we drag-and-drop the algorithm node, its end node will be automatically connected to the destination node. Click Save to save the pipeline and execute the pipeline by clicking the Execute button at the top-right corner of the pipeline builder page. This will redirect you to the page below which shows the executed pipeline with execution status. We can clearly see here that the pipeline execution is successful. executed-ml-pipeline
    Click Execution Stats to view more compute details about each stage of the model execution in detail. execution-stats-ml-pipeline
    The prediction model is created and can be examined under the Model section(click on Fraud Detection ML pipeline model) following the successful completion of the ML workflow. Model name
    This offers useful perceptions into the efficiency and performance of the model while making predictions based on the data. Model-Metrics

Last Updated 2024-10-10 12:38:19 +0530 +0530