Soiligator is an advanced machine learning project designed to optimize irrigation management by predicting whether irrigation is necessary based on environmental and soil-related data. Leveraging feature engineering and robust predictive models, Soiligator provides actionable insights that improve agricultural efficiency and sustainability.
- Predictive Models: Utilizes Logistic Regression, Random Forest, and Support Vector Machine (SVM) algorithms for accurate irrigation predictions.
- Feature Engineering: Incorporates non-linear interaction terms and outlier handling for enhanced model performance.
- Scalable Design: Easily extendable to include additional features like soil type and crop variety.
- Data Resilience: Designed to handle label noise and outliers, ensuring robustness in real-world applications.
- Overview
- Key Features
- Installation
- Usage
- Data Description
- Model Training and Evaluation
- Results
- Future Work
To use this project, install the required Python packages with the following command:
pip install -r requirements.txt
- pandas: Data manipulation and analysis
- numpy: Numerical operations
- matplotlib & seaborn: Data visualization
- scikit-learn: Machine learning model training and evaluation
Alternatively, install the libraries manually:
pip install pandas numpy matplotlib seaborn scikit-learn
Start by loading the dataset modified_irrigation_dataset.csv
, which includes:
- Moisture: Soil moisture content.
- Temperature: Ambient temperature.
- Humidity: Air humidity level.
- Irrigation_Needed: Target label indicating whether irrigation is required.
The implementation is available in a Jupyter Notebook: soil_analysis.ipynb
. Execute the cells sequentially to:
- Load and preprocess the dataset.
- Engineer additional features.
- Train machine learning models.
- Evaluate and compare model performance.
The dataset comprises features representing soil and environmental conditions:
- Moisture: Measures the water content in the soil (0–100%).
- Temperature: Ambient temperature in degrees Celsius.
- Humidity: Air humidity as a percentage (0–100%).
- Moisture_Temp_Interaction: Interaction term between soil moisture and temperature to capture non-linear effects.
- Humidity_Squared: Non-linear transformation of humidity to account for atmospheric retention properties.
- Outliers: Synthetic outliers introduced in 5% of the data to test model resilience.
- Label Noise: Added noise to 5% of target labels to simulate real-world conditions.
- Outlier Handling: Removes or neutralizes extreme values.
- Feature Scaling: Standardizes features using
StandardScaler
for optimal model performance. - Train-Test Split: Splits the data into 80% training and 20% testing subsets.
- Logistic Regression: A baseline model for binary classification.
- Random Forest Classifier: An ensemble learning model for handling complex patterns.
- Support Vector Machine (SVM): A robust classifier for high-dimensional data.
- Accuracy: Overall correctness of predictions.
- Confusion Matrix: Breakdown of true positives, false positives, true negatives, and false negatives.
- ROC Curve and AUC Score: Measures the model's ability to distinguish between classes.
- Precision-Recall Curve: Highlights performance in handling imbalanced data.
- Classification Report: Includes precision, recall, F1-score, and support.
- Logistic Regression: Achieved baseline performance with moderate accuracy.
- Random Forest: Outperformed other models, achieving high accuracy and robustness to noise and outliers.
- SVM: Demonstrated strong performance on standardized features but required longer training times.
- Confusion Matrix: Provided for each model to analyze prediction errors.
- ROC Curves: Highlighted the trade-offs between sensitivity and specificity.
- Precision-Recall Curves: Demonstrated model effectiveness on imbalanced datasets.
- Hyperparameter Tuning: Optimize models using Grid Search or Random Search to improve accuracy.
- Feature Expansion: Include additional predictors such as:
- Soil type
- Crop type
- Real-time weather forecasts
- Time-Series Analysis: Incorporate temporal data to predict irrigation needs over time.
- Deployment: Package the model into a web or mobile application for practical use by farmers and agricultural experts.
Contributions are welcome! Please fork the repository, make your changes, and submit a pull request. For any queries, feel free to contact the project owner.