This project aims to predict the sound pressure level of an airplane airfoil self-noise using machine learning models. The dataset used for this project, stored in UCI machine learning repository sourced from aerodynamic studies on NACA 0012 airfoil, consists of key features related to airfoil characteristics, such as frequency, angle of attack, chord length, and suction side displacement.
The main goal is to incorporate feature engineering techniques like feature importance and apply various machine learning models specifically the following models for regression analysis:
- Linear Regression
- Random Forest Regressor
-
Data Preprocessing: The data is read from a
.dat
file, and prepared for analysis, including defining features and the target variable. The dataset is loaded usingpandas
and split into training and testing sets usingtrain_test_split
. -
Feature Engineering:
- Feature Importance Scores (FIS): We use a RandomForestRegressor to determine the most influential features. This helps in selecting the most impactful features for improved model performance.Features are ranked based on their importance scores.
-
Model Training:
- Linear Regression: Initially, a LinearRegresion model was used to observe performance.
- RandomForestRegressor: A RandomForestRegressor model is then trained using the top 3 selected features. It is a baseline model that leverages ensemble learning to predict the sound pressure level.
-
Performance Evaluation: Model performance is evaluated using mean squared error (MSE) as the primary metric, providing a clear understanding of each model's accuracy.
data/
: Contains the datasetairfoil_self_noise.dat
.notebooks/
: Jupyter notebook for exploratory data analysis and visualizations (main_analysis.ipynb
).models/
: Directory for storing the trained models (trained_model.pkl
).scripts/
: Python scripts for training and prediction (train_model.py
andpredict.py
).README.md
: Project overview and instructions.requirements.txt
: Lists the required Python libraries for the project.
The dataset (./data/airfoil_self_noise.dat
) used for this project features the NACA 0012 airfoil tested under various wind tunnel speeds and angles of attack. Notably, the airfoil span and observer position remained consistent across all experiments. It contains the following columns:
- Frequency: Frequency of the sound.
- Angle of Attack: Angle of attack of the airfoil.
- Chord Length: Chord length of the airfoil.
- Free-stream Velocity: Velocity of the air stream.
- Suction Side Displacement: Measurement related to the displacement on the suction side.
- Sound Pressure Level (SPL): The target variable (dependent variable) representing the sound pressure level.
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
frequency | Feature | Integer | Hz | no | |
attack-angle | Feature | Binary | deg | no | |
chord-length | Feature | Continuous | m | no | |
free-stream-velocity | Feature | Continuous | m/s | no | |
suction-side-displacement-thickness | Feature | Continuous | m | no | |
scaled-sound-pressure | Target | Continuous | dB | no |
1. Dependencies(requirements.txt
)
Ensure all dependencies are installed in your Python virtual environment by running:
pip install -r requirements.txt
2. Exploratory Data Analysis (main_analysis.ipynb
)
The Jupyter notebook provides visualizations and basic statistics of the dataset. Run it in a Jupyter environment to explore the distribution of features and relationships between them.
3. Training the Model (train_model.py
)
To train the model, run the following command:
python scripts/train_model.py
This script loads the data, performs feature selection, trains the RandomForestRegressor
, saves the trained model, and outputs the feature importance ranking and MSE.
4. Making Predictions (predict.py
)
To use the trained model for making predictions, run:
python scripts/predict.py
This script loads the trained model from the ./models/trained_model.pkl
, accepts new input data, and prints the predicted SPL.