Skip to content

Executed complex SQL queries to assess patient health metrics, focusing on BMI and glucose levels. Improved data integrity and query performance through schema enhancements and indexing. Created patient views and ranked data by glucose levels for insightful analysis.

Notifications You must be signed in to change notification settings

meabhaykr/Diabetes_Prediction_SQL_Analysis

Repository files navigation

Diabetes Prediction SQL Analysis


Poster.png

🚀 Overview

The Diabetes Prediction Project is an initiative aimed at analyzing patient health data to predict diabetes prevalence and identify key health indicators for diabetic patients. This project leverages SQL for database management and data retrieval, allowing efficient exploration and manipulation of patient health records for various analytical purposes.

📋 Table of Contents

✨ Features

  • Efficient retrieval and analysis of patient information
  • Complex SQL queries for health data insights
  • Data schema improvements to ensure data integrity
  • SQL query optimization techniques for enhanced performance

🛠️ Database Setup

  1. Clone the repository:
    https://github.com/meabhaykr/Diabetes_Prediction_SQL_Analysis
  2. Connect to your SQL database and set up the required tables using the provided schema.
  3. Import the patient data into the database to enable the analysis.

📊 Data Analysis Tasks

The following SQL-based tasks were executed for comprehensive analysis:

  1. Retrieve patient IDs and ages of all patients.
  2. Select all female patients older than 40.
  3. Calculate the average BMI of patients.
  4. List patients by descending blood glucose levels.
  5. Identify patients with both hypertension and diabetes.
  6. Determine the count of patients with heart disease.
  7. Group patients by smoking history and count smokers vs. non-smokers.
  8. Retrieve Patient IDs with BMI above the average.
  9. Find patients with the highest and lowest HbA1c levels.
  10. Calculate patient age in years from their birthdate.
  11. Rank patients by blood glucose levels within each gender.
  12. Update the smoking history of patients older than 50 to "Ex-smoker."
  13. Insert new patient data with sample values.
  14. Delete all patients with heart disease.
  15. Find patients with hypertension but not diabetes using the EXCEPT operator.
  16. Define a unique constraint on the "patient_id" column to enforce uniqueness.
  17. Create a view displaying Patient IDs, ages, and BMI values.

⚙️ Performance Optimization

To improve SQL query performance, the following methods were implemented:

  • Indexing: Indexes were added to columns frequently used in WHERE clauses and JOIN conditions.
  • Selective Columns: Reduced data transfer by selecting only necessary columns in queries.
  • Efficient WHERE Clauses: Ensured WHERE clauses were optimized without functions on indexed columns.
  • JOIN Optimization: Utilized the most suitable JOIN types and optimized join conditions.
  • Statistics Update: Regular updates to table statistics for more effective query planning.

🛠️ Future Improvements

  • Normalization: Apply normalization principles to reduce data redundancy.
  • Foreign Keys: Establish foreign key relationships for data integrity.
  • Audit Trails: Implement audit trails to track data changes over time.
  • Constraints and Defaults: Use default values and constraints for data integrity.
  • Index Optimization: Maintain a balance with indexing for frequently queried fields.

About

Executed complex SQL queries to assess patient health metrics, focusing on BMI and glucose levels. Improved data integrity and query performance through schema enhancements and indexing. Created patient views and ranked data by glucose levels for insightful analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published