Skip to content

This project aims to predict flight arrival delays using various machine learning algorithms. It involves EDA, feature engineering, and model tuning with XGBoost, LightGBM, CatBoost, SVM, Lasso, Ridge, Decision Tree, and Random Forest Regressors. The goal is to identify the best model for accurate predictions.

Notifications You must be signed in to change notification settings

samidulloabdullaev/Flights_Arrival_Delay_regression-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Flight Arrival Delay Prediction Project

Overview

This project aims to predict flight arrival delays using various regression algorithms. The goal is to apply multiple machine learning models and identify the best-performing model for this task. The models used in this project include:

  • CatBoost Regressor
  • XGBoost Regressor
  • LightGBM Regressor
  • Support Vector Machine (SVM) Regressor
  • Lasso Regressor
  • Ridge Regressor
  • Decision Tree Regressor
  • Random Forest Regressor

Table of Contents

Introduction

Predicting flight arrival delays is a critical task for airlines and passengers. This project explores the application of several advanced machine learning algorithms to create an accurate prediction model for flight arrival delays. By comparing different models, I aim to find the best approach for this problem.

Project Structure

flight_arrival_delay_prediction/
├── data/
│   ├── train.csv
│   ├── test.csv
├── notebooks/
│   ├── EDA_and_feature_engineering.ipynb
│   ├── Delays_Modeling_popular_models.ipynb
│   ├── delays_modeling.ipynb
│   ├── lightgbm_parameter_tuning.ipynb
│   ├── cateboost_parameter_tuning.ipynb
│   ├── xgboost_parameter_tuning_and_voting_regressor.ipynb
│   ├── best_models_voting_regressor.ipynb
│   ├── best_single_model_lightgbm.ipynb
├── README.md
├── requirements.txt

Dependencies

To install the required dependencies, use the following command:

pip install -r requirements.txt

The main libraries used in this project include:

pandas
numpy
catboost
xgboost
lightgbm
scikit-learn
matplotlib
seaborn

Data

The data folder contains the training and test datasets (train.csv and test.csv). The data preprocessing steps are outlined in the EDA_and_feature_engineering.ipynb notebook. Preprocessing

Data preprocessing involves cleaning the dataset, handling missing values, feature engineering, and scaling the features. These steps are crucial for preparing the data for model training. The detailed steps are in the EDA_and_feature_engineering.ipynb notebook too. Model Training

The model training process, including hyperparameter tuning, is documented in the Delays_Modeling_popular_models.ipynb notebook. I use cross-validation to ensure robust performance estimation. The training includes:

Splitting the data into training and validation sets.
Training models using CatBoost, XGBoost, LightGBM, SVM, Lasso, Ridge, Decision Tree, and Random Forest regressors.
Hyperparameter tuning using grid search and random search techniques.

valuation

The model evaluation, including cross-validation and Mean Absolute Percentage Error (MAPE) calculation, is detailed in the xgboost_parameter_tuning_and_voting_regressor.ipynb notebook. I compare the performance of different models to identify the best one.

Results

After fine tuning every model and train them with train data, I found the best single model is LightGBM regressor and the best Voting Regressor model is combination of LightGBM, Catboost, XGBoost regressor with reaching 6.46 % MAPE

Usage

To preprocess the data, open and run the EDA_and_feature_engineering.ipynb notebook.

To train the models, open and run the Delays_Modeling_popular_models.ipynb notebook.

To evaluate the models and compare their performance, open and run the notebook named parameter_tuning.ipynb.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

About

This project aims to predict flight arrival delays using various machine learning algorithms. It involves EDA, feature engineering, and model tuning with XGBoost, LightGBM, CatBoost, SVM, Lasso, Ridge, Decision Tree, and Random Forest Regressors. The goal is to identify the best model for accurate predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published