ishwarvenugopal / UoE_CE888_ImbalancedDatasets Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

The final project for the CE888: Data Science and Decision Making module (Spring Term) at the University of Essex

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Datasets		Datasets
Preprocessing		Preprocessing
Dataset_1_LowImbalance.ipynb		Dataset_1_LowImbalance.ipynb
Dataset_2_MediumImbalance.ipynb		Dataset_2_MediumImbalance.ipynb
Dataset_3_HighImbalance.ipynb		Dataset_3_HighImbalance.ipynb
Plots.ipynb		Plots.ipynb
README.md		README.md
Report.pdf		Report.pdf

Repository files navigation

Imbalanced_Datasets

The datasets used are:

Low Imbalance: Hepatocellular Carcinoma Dataset https://www.kaggle.com/mrsantos/hcc-dataset#hcc-data-complete-balanced.csv
Medium Imbalance : Breast Cancer Dataset https://archive.ics.uci.edu/ml/datasets/Breast+Cancer
High Imbalance : Porto Seguro’s Safe Driver Prediction https://www.kaggle.com/c/porto-seguro-safe-driver-prediction (Due to the large size, the pre-processed dataset was not uploaded into this repository)

For the code used in pre-processing these datasets, go to the 'Preprocessing' folder

The Python notebooks used in this work are:

Dataset_1_LowImbalance.ipynb : The implementation using the Hepatocellular Carcinoma Dataset
Dataset_2_MediumImbalance.ipynb: The implementation using the Breast Cancer Dataset
Dataset_3_HighImbalance.ipynb: The implementation using the Porto Seguro’s Safe Driver Prediction
Plots.ipynb: The code for generating the plots used in the report

Read Report.pdf for a complete description and analysis of the project

About

The final project for the CE888: Data Science and Decision Making module (Spring Term) at the University of Essex

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%