The datasets used are:
-
Low Imbalance: Hepatocellular Carcinoma Dataset https://www.kaggle.com/mrsantos/hcc-dataset#hcc-data-complete-balanced.csv
-
Medium Imbalance : Breast Cancer Dataset https://archive.ics.uci.edu/ml/datasets/Breast+Cancer
-
High Imbalance : Porto Seguro’s Safe Driver Prediction https://www.kaggle.com/c/porto-seguro-safe-driver-prediction (Due to the large size, the pre-processed dataset was not uploaded into this repository)
For the code used in pre-processing these datasets, go to the 'Preprocessing' folder
The Python notebooks used in this work are:
- Dataset_1_LowImbalance.ipynb : The implementation using the Hepatocellular Carcinoma Dataset
- Dataset_2_MediumImbalance.ipynb: The implementation using the Breast Cancer Dataset
- Dataset_3_HighImbalance.ipynb: The implementation using the Porto Seguro’s Safe Driver Prediction
- Plots.ipynb: The code for generating the plots used in the report