This project focuses on developing a machine learning model using Random Forest Regression to predict flight prices based on historical data. The model helps consumers make informed decisions when booking flights, and can also be used by airlines and travel agencies to optimize their pricing strategies.
- Exploratory Data Analysis (EDA): Understanding the data distribution, visualizing trends, and identifying important features.
- Data Preprocessing: Handling missing values, encoding categorical variables, and normalizing numerical data.
- Feature Engineering: Identifying key features that affect flight pricing.
- Model Implementation: Using Random Forest to predict flight prices with an R² score of 0.812.
- Model Evaluation: Evaluating the model using metrics like MAE, MSE, and RMSE.
The dataset used for this project includes the following key features:
- Airline: Airline carrier name
- Source & Destination Airports: Departure and arrival airports
- Departure Date: Date and time of the flight
- Flight Duration: Total duration of the flight
- Layovers: Number of layovers
- Price: Flight ticket price (target variable)
- Data Cleaning: Addressed missing values and outliers, and prepared the dataset for modeling.
- Exploratory Data Analysis (EDA): Used visualizations (histograms, box plots, scatter plots) to analyze data distribution and feature relationships.
- Feature Engineering: Identified relevant features using correlation heatmaps and feature importance analysis.
- Modeling: Implemented a Random Forest Regression model.
- Model Tuning: Used hyperparameter tuning (RandomizedSearchCV) to optimize the model.
- Model Evaluation: Tested the model using metrics like MAE, MSE, RMSE, and R² Score.
The Random Forest model achieved the following results:
- Mean Absolute Error (MAE): 1165.61
- Mean Squared Error (MSE): 4,062,650.69
- Root Mean Squared Error (RMSE): 2015.60
- R² Score: 0.812
To run this project locally:
- Clone the repository.
git clone https://github.com/sneha-rangole/flight-price-prediction.git
- Navigate to the project directory.
cd flight-price-prediction
- Install the necessary packages.
pip install -r requirements.txt
- Prepare the dataset and ensure it is placed in the correct directory.
The project successfully implemented a Random Forest regression model to predict flight prices with a reasonably high accuracy. The model is capable of understanding complex relationships between various features, making it a robust tool for airline ticket price prediction.
This project is licensed under the MIT License - see the LICENSE file for details.