Diabetes Prediction SQL Analysis

🚀 Overview

The Diabetes Prediction Project is an initiative aimed at analyzing patient health data to predict diabetes prevalence and identify key health indicators for diabetic patients. This project leverages SQL for database management and data retrieval, allowing efficient exploration and manipulation of patient health records for various analytical purposes.

✨ Features

Efficient retrieval and analysis of patient information
Complex SQL queries for health data insights
Data schema improvements to ensure data integrity
SQL query optimization techniques for enhanced performance

🛠️ Database Setup

Clone the repository:

https://github.com/meabhaykr/Diabetes_Prediction_SQL_Analysis

Connect to your SQL database and set up the required tables using the provided schema.
Import the patient data into the database to enable the analysis.

📊 Data Analysis Tasks

The following SQL-based tasks were executed for comprehensive analysis:

Retrieve patient IDs and ages of all patients.
Select all female patients older than 40.
Calculate the average BMI of patients.
List patients by descending blood glucose levels.
Identify patients with both hypertension and diabetes.
Determine the count of patients with heart disease.
Group patients by smoking history and count smokers vs. non-smokers.
Retrieve Patient IDs with BMI above the average.
Find patients with the highest and lowest HbA1c levels.
Calculate patient age in years from their birthdate.
Rank patients by blood glucose levels within each gender.
Update the smoking history of patients older than 50 to "Ex-smoker."
Insert new patient data with sample values.
Delete all patients with heart disease.
Find patients with hypertension but not diabetes using the EXCEPT operator.
Define a unique constraint on the "patient_id" column to enforce uniqueness.
Create a view displaying Patient IDs, ages, and BMI values.

⚙️ Performance Optimization

To improve SQL query performance, the following methods were implemented:

Indexing: Indexes were added to columns frequently used in WHERE clauses and JOIN conditions.
Selective Columns: Reduced data transfer by selecting only necessary columns in queries.
Efficient WHERE Clauses: Ensured WHERE clauses were optimized without functions on indexed columns.
JOIN Optimization: Utilized the most suitable JOIN types and optimized join conditions.
Statistics Update: Regular updates to table statistics for more effective query planning.

🛠️ Future Improvements

Normalization: Apply normalization principles to reduce data redundancy.
Foreign Keys: Establish foreign key relationships for data integrity.
Audit Trails: Implement audit trails to track data changes over time.
Constraints and Defaults: Use default values and constraints for data integrity.
Index Optimization: Maintain a balance with indexing for frequently queried fields.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dataset Analysis Questions Set.pdf		Dataset Analysis Questions Set.pdf
Diabetes Dataset.xlsx		Diabetes Dataset.xlsx
Diabetes Prediction.sql		Diabetes Prediction.sql
Poster.png		Poster.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Prediction SQL Analysis

🚀 Overview

📋 Table of Contents

✨ Features

🛠️ Database Setup

📊 Data Analysis Tasks

⚙️ Performance Optimization

🛠️ Future Improvements

About

Releases

Packages

Languages

meabhaykr/Diabetes_Prediction_SQL_Analysis

Folders and files

Latest commit

History

Repository files navigation

Diabetes Prediction SQL Analysis

🚀 Overview

📋 Table of Contents

✨ Features

🛠️ Database Setup

📊 Data Analysis Tasks

⚙️ Performance Optimization

🛠️ Future Improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages