This project focuses on classifying pulsar stars using the Support Vector Machine (SVM) algorithm, a powerful method in the realm of supervised learning. The goal is to automate the identification process of pulsar stars from candidates collected during surveys, based on predictive modeling.
Datasets
: Holds the processed and raw datasets.Processed_data
: Contains processed data ready for analysis.Raw_data
: Contains raw data files.
v_pred_test
: Stores predicted outcomes on test data.notebooks
: Jupyter notebooks for Exploratory Data Analysis (EDA) and model training.venv
: A virtual environment directory for project dependencies..gitignore
: Specifies untracked files to ignore.README.md
: Provides an overview of the project.requirements.txt
: Lists all the necessary Python packages.
To run this project, follow these steps:
-
Make sure Python 3.8 or later is installed on your machine.
-
Clone the repository to your local environment.
-
Navigate to the project's root directory and set up a Python virtual environment:
python -m venv venv
-
Activate the virtual environment:
On Windows:
.\venv\Scripts\activate
On macOS and Linux:
source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
To perform EDA or train the SVM model, open the Jupyter notebooks located in the notebooks
directory:
EDA_Test_Data.ipynb
: For exploratory data analysis on test data.EDA_Train_Data.ipynb
: For exploratory data analysis on training data.MODEL_TRAINING.ipynb
: For training the SVM model.
Run the notebooks sequentially to explore the data and train the model.
The Datasets
directory is organized as follows:
Processed_data
: Processed files likepulsar_data_test_processed.csv
for use in modeling.Raw_data
: The original, unprocessed data files.
Predictions from the test data are saved in v_pred_test
with filenames indicating they are predictions, such as Pulsar_data_test_Predicted.csv
.
If you'd like to contribute, please fork the repository and create a pull request with your features or changes.
Open-sourced software licensed under the MIT license.