This project explores and analyzes the Consumer Loans dataset from Kaggle.
Final data product is described here.
Data Source: Consumer loans dataset | Kaggle
Data Description
The dataset contains information about loan applicants, including their income , marital status, education, and loan outcomes (if the loan was finalized), etc...
Project Goals
- Understand consumer loan characteristics and trends.
- Identify factors influencing loan finalization.
- Develop models to predict loan approval.
Data Analysis and Exploration
- Conduct exploratory data analysis (EDA) to understand distributions of loan amounts and borrower characteristics.
- Visualizing the data using histograms, boxplots, scatter plots, and correlation plots.
- Performing data cleaning and pre-processing (handling missing values, outliers, etc.).
Feature Engineering
- Creating new features
Model Development (still in progress)
Project Structure
Consumer-Loans-Analysis/
├── data/ # Raw and processed data
├── notebooks/ # Jupyter Notebooks
├── models/ # Trained machine learning models
├── pipelines/ # pipelines for processing data
├── reqirements.txt # needed tool versions
└── README.md # This file
Dependencies
- Pandas
- NumPy
- Matplotlib.pyplot
- Scikit-learn (for machine learning)
- Seaborn
Usage
- Clone the project repository.
- Install dependencies:
pip install -r requirements.txt
- Run analysis and modeling scripts (e.g.,
jupyter notebook eda_processing.py
).