Upload Dataset

Let’s begin by uploading the datasets in Catalyst QuickML using the available dataset connectors:

  1. Navigate to the QuickML service in the Catalyst console and click Start Exploring.

select-quickml

  1. Navigate to the Datasets component and click Import Dataset.

import-dataset

  1. An Import Dataset pop-up will be displayed. In the Data Sources step, navigate to File Upload and click Upload File.

upload-file

  1. Upload the Cancer_detection_A dataset that you have downloaded already and click Next.

upload-file-screen

  1. The name of the dataset will be auto-populated based on the uploaded file. You can edit it, if required, then click Upload.

enter-dataset-name

The dataset will be uploaded successfully.

dataset-uploaded

Now, you can proceed to upload the another dataset called Cancer_detection_B by repeating the steps mentioned above.

Since the datasets used in this tutorial is specific to the health and medical domain, we have included explanations of some terms used :

  • Patient_id: ID of the Patient
  • Patient_name: Name of the Patient
  • Diagnosis: (M = Malignant / B = Benign)

The dataset also includes detailed information regarding the features of the breast mass, such as:

  • radius (mean distance from center to points on the perimeter)
  • texture (standard deviation of gray-scale values)
  • perimeter
  • area
  • smoothness (local variation in radius lengths)
  • compactness (perimeter^2 / area - 1.0)
  • concavity (severity of concave portions of the contour)
  • concave points (number of concave portions of the contour)
  • symmetry
  • fractal dimension (coastline_approximation - 1)

The mean, standard error and worst or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features in total including the mean radius, radius SE and worst radius.

Last Updated 2024-04-05 23:41:34 +0530 +0530

RELATED LINKS

Data Connectors