The datasets used are:
Low Imbalance: Hepatocellular Carcinoma Dataset
Medium Imbalance : Breast Cancer Dataset
High Imbalance : Porto Seguro’s Safe Driver Prediction (Due to the large size, the pre-processed dataset was not uploaded into this repository)
For the code used in pre-processing these datasets, go to the 'Preprocessing' folder
The Python notebooks used in this work are:
- Dataset_1_LowImbalance.ipynb : The implementation using the Hepatocellular Carcinoma Dataset
- Dataset_2_MediumImbalance.ipynb: The implementation using the Breast Cancer Dataset
- Dataset_3_HighImbalance.ipynb: The implementation using the Porto Seguro’s Safe Driver Prediction
- Plots.ipynb: The code for generating the plots used in the report