Using credit card fraud detection dataset we'll build a binary classification model that can identify transactions as either fraudulent or valid, based on provided, historical data. In a 2016 study, it was estimated that credit card fraud was responsible for over 20 billion dollars in loss, worldwide. Accurately detecting cases of fraud is an ongoing area of research.
Since we have true labels to aim for, we'll take a supervised learning approach and train a binary classifier to sort data into one of our two transaction classes: fraudulent or valid. We'll train a model on training data and see how well it generalizes on some test data.
The notebook will be broken down into a few steps:
- Loading and exploring the data
- Splitting the data into train/test sets
- Defining and training a LinearLearner, binary classifier
- Making improvements on the model
- Evaluating and comparing model test performance