Skip to content

In this project, three prospective approaches are demonstrated for pre-processing large data sets in practical time-frames, that can attempt to address the class imbalance by improving the running time of the relevant SMOTE+ENN oversampling techniques, with the aim of improving or enabling classifier performance. The focus of our study was to im…

Notifications You must be signed in to change notification settings

SMJajoo/Big-data-Credit-card-fraud-detection

Repository files navigation

BIg-data-Credit-card-fraud-detection

Dataset:

Link for the dataset: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

The data-set used here consists of 284,807 European credit card transaction instances of which 492 (0.172%) are fraudu- lent (denoted by the class label ’1’). 2 of the 30 predictors are the output of PCA transformation applied in part to anonymize the data set. The remaining 2 are the ’Time’ and ’Amount’ features.

Three aproaches are used to solve this problem.

  1. Local - Local Map Reduce

image

Local SMOTE+ENN

  1. Global - Smote_Enn_Global

image

GlobalSMOTEEN

  1. Hybrid - Smote_Enn_Cluster_Global

image

HybridSMOTEENN

About

In this project, three prospective approaches are demonstrated for pre-processing large data sets in practical time-frames, that can attempt to address the class imbalance by improving the running time of the relevant SMOTE+ENN oversampling techniques, with the aim of improving or enabling classifier performance. The focus of our study was to im…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published