This project explores machine learning approaches to predict outcomes related to road traffic accidents in Rawalpindi, Punjab, Pakistan. Using the "Road Traffic Accident Dataset," we aim to build models for one of two potential targets:
- Injury Type: Classification of the injury severity level
- Patient Status: Status of the patient post-accident
The assignment is divided into multiple phases to systematically approach data preprocessing, model selection, training, evaluation, and final analysis.
- Task: Apply data preprocessing
- Apply data preprocessing
- Train Logistic Regression model on the train set
- Test the trained model on the test set
- Evaluate the performance on the test set using the following metrics:
- Accuracy
- Confusion Matrix
- Precision
- Recall
- F1 Score
- Plot the following learning curves:
- Accuracy (y-axis) vs Solver (x-axis)
- Accuracy (y-axis) vs Max_iter (x-axis)
Models are built for two targets: Injury Type and Patient Status
Note:
- Solver options:
{lbfgs, liblinear, newton_cg, newton-cholesky, sag, saga}
- Max_iter options:
{50, 100, 150, 200, 250, 300}
- Apply necessary data preprocessing
- Train the following models on the train set:
- Decision Tree
- Support Vector Machine (SVM)
- Evaluate the performance of the trained models on the test set using the following metrics:
- Accuracy
- Precision
- F1 Score
- Recall
- Confusion Matrix
- Prepare a comparison table for the performance of all models, including Logistic Regression from Phase 2.
- Plot the following curves:
- Decision Tree: Accuracy (y-axis) vs Max_depth (x-axis)
- SVM: Accuracy (y-axis) vs Kernel (x-axis)
Models are built for two targets: Injury Type and Patient Status
Note:
- Decision Tree
Max_depth
options:{4, 5, 6, 7, 8, 9}
- SVM
Kernel
options:{linear, poly, rbf, sigmoid}
-
Apply necessary data pre-processing techniques.
-
Train the following ensemble models on the training set.
- Random Forest
- XGBoost
- AdaBoost
-
Evaluate the models on the test set using the following metrics: Accuracy, Precision, Recall, F1 Score, and Confusion Matrix
-
Prepare a performance comparison table for all the models (including the previous ones).
-
Apply additional techniques to improve the models’ performance.