Welcome to the Car Price Prediction project! This repository contains the code and resources used to predict car prices based on various features using machine learning techniques.
This project aims to predict the selling price of cars using a dataset from Kaggle. The dataset contains various features of cars such as age, mileage, fuel type, seller type, transmission type, and more. Multiple machine learning algorithms were used to build and evaluate models to achieve accurate predictions.
The dataset used in this project is obtained from Kaggle. You can find it here.
The repository contains the following files and directories:
CarPricePrediction.ipynb
: Jupyter notebook containing the code for data preprocessing, model building, and evaluation.car_price_prediction_model.pkl
: Trained Random Forest model saved using joblib.README.md
: Project overview and documentation.requirements.txt
: List of Python packages required to run the project.
Make sure you have the following installed:
- Python 3.6 or higher
- Jupyter Notebook
- Required Python packages (listed in
requirements.txt
)
-
Clone the repository:
git clone https://github.com/harishsemwal/CarPricePrediction.git cd CarPricePrediction
-
Install the required packages:
pip install -r requirements.txt
-
Run the Jupyter notebook:
jupyter notebook CarPricePrediction.ipynb
The data preprocessing steps include:
- Importing necessary libraries and the dataset.
- Exploring the dataset for understanding and identifying missing values.
- Dropping irrelevant columns.
- Creating new features such as the car's age.
- Encoding categorical variables using one-hot encoding.
- Visualizing correlations between features and the target variable.
Several machine learning algorithms were used to build the prediction models:
- Linear Regression
- Multiple Linear Regression
- Random Forest Regressor
- Decision Tree Regressor
Each model was evaluated using the R-squared score to determine its performance. The results were as follows:
- Random Forest Regressor: 95% accuracy
- Decision Tree Regressor: 94% accuracy
- Multiple Linear Regression: 91% accuracy
RandomizedSearchCV was used to find the optimal parameters for the Random Forest Regressor to improve its performance.
The final model, Random Forest Regressor, was trained with the optimal parameters and achieved a high R-squared score on the test data.
- Incorporating additional features like car brand and model.
- Exploring advanced machine learning algorithms like Gradient Boosting and XGBoost.
- Enhancing data quality by collecting more recent car listings.
- Deploying the model in a web application for real-time predictions.
- Applying advanced feature engineering techniques.
This project successfully demonstrated the use of machine learning algorithms to predict car prices. The Random Forest Regressor provided the best performance among the models tested. Future enhancements can further improve the accuracy and usability of the model.
Feel free to fork this repository and contribute by submitting a pull request. For major changes, please open an issue to discuss what you would like to change.
This project is licensed under the MIT License.
- Thanks to Kaggle for providing the dataset.
- Special thanks to all the contributors of the libraries used in this project.
Developed by Harish Prasad Semwal
Email: harishsemwal581@gmail.com