Skip to content

This repository contains an exploratory data analysis (EDA) of the Titanic dataset. Key analyses include survival rates by gender, passenger class, age distribution, family size, and correlation heatmaps.

Notifications You must be signed in to change notification settings

AshishSingh789/Titanic_Dataset_EDA_and_Visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Titanic Dataset EDA and Visualization

Project Overview

This project involves performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset from Kaggle. The objective is to explore relationships between variables and identify patterns and trends in the data.

#Problem Statement

The goal is to analyze the Titanic dataset to:

Explore the distribution of variables like age, gender, and class. Understand survival rates based on various factors such as gender, passenger class, and family size. Visualize relationships between variables to identify significant patterns.

Key Analyses and Visualizations

Age Distribution: Visualized the distribution of ages among passengers. Survivors vs Non-Survivors: Compared survival outcomes based on multiple factors. Survival Rate by Gender: Analyzed how survival rates differed between male and female passengers. Survival Rate by Passenger Class: Explored the survival rates across different passenger classes (1st, 2nd, 3rd). Survival by Family Size: Investigated the relationship between family size and survival chances. Correlation Heatmap: Created a heatmap to examine the correlation between numerical variables in the dataset. Fare vs Survival: Analyzed whether higher ticket fares led to higher survival rates.

Libraries Used

pandas: For data manipulation and cleaning. matplotlib: For creating static visualizations. seaborn: For advanced visualizations and plots. numpy: For numerical operations.

Install required libraries:

bash Copy code pip install pandas matplotlib seaborn numpy

Run the analysis and visualizations:

bash Copy code python analysis.py Files Included analysis.py: The script containing the EDA and visualizations. titanic.csv: The dataset used for analysis (optional if dataset not included). output/: Directory with images of generated visualizations.

Conclusion

Through EDA, we uncovered interesting trends such as the higher survival rates of women and first-class passengers, and we visualized important relationships between key variables in the Titanic dataset.

Glimps of Data Analysis and Visualisation

Survival Rate by Gender

Survival Rate by Passenger Class (Pclass)

survival_by_embarked

survival_by_family_size

age_distribution_survivors_vs_non_survivors

correlation_heatmap

About

This repository contains an exploratory data analysis (EDA) of the Titanic dataset. Key analyses include survival rates by gender, passenger class, age distribution, family size, and correlation heatmaps.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published