The COVID-19 pandemic, which swept across the globe from early 2020 onwards, had profound and far-reaching effects on societies, economies, and healthcare systems worldwide. It led to widespread disruptions in daily life, severe strains on healthcare infrastructure, and significant socio-economic impacts.
This analysis report examines the trajectory of confirmed cases, fatalities, and recoveries from COVID-19 across different countries during the critical period of 2020 to 2021, shedding light on the varying impacts and responses observed globally.
- Organization: Mentorness
- Type: Remote
- Duration: 26/05/2024 to 26/06/2024
- Number of tasks: 3
This project is done as a part of Mentorness Internship from 01/06/2024 to 12/06/2024. This is the task 2 where I am analyzing the data about the number of confirmed, dead and recovered cases of Corona Virus pandemic in various countries recorded between January 2020 to June 2021.
"The CORONA VIRUS pandemic has had a significant impact on public health and has created an urgent need for data-driven insights to understand the spread of the virus. As a data analyst, you have been tasked with analyzing a CORONA VIRUS dataset to derive meaningful insights and present your findings."
- Project Source Code: SQL file with queries
- Dashboard
- Presentation Report
- Video Presentation explaining the approach towards dataset, problem statement, inferences drawn, final outcome and future scope.
- Microsoft Excel: For Data Cleaning and Transforming
- Microsoft SQL Server: For Data Analysis
- Power BI: For Data Visualization and Dashboard
- Canva: For creating Project Report
- GitHub: Documenting the result
- Data Cleaning
- Data Analysis
- Data Visualization
- Presentation
- Documentation
The Dataset contains 78387 rows and 8 columns that provides information about the number of confirmed, dead and recovered cases of Corona Virus pandemic in various countries recorded between January 2020 to June 2021.
Column Name | Datatype | Description |
---|---|---|
Province | Str | Geographic subdivision within a country/region |
Country/Region | Str | Geographic entity where data is recorded |
Latitude | Str | North-south position on Earth's surface |
Longitude | Str | East-west position on Earth's surface |
Date | Str | Recorded date of CORONA VIRUS data |
Confirmed | Str | Number of diagnosed CORONA VIRUS cases |
Deaths | Str | Number of CORONA VIRUS related deaths |
Recovered | Str | Number of recovered CORONA VIRUS cases |
- I created a copy of the original CSV file as an Excel worksheet.
- First I checked for null values using a Conditional Formatting. It returned zero null values.
- Then I checked for duplicate entries using Remove Duplicate tool. There were no duplicate entries.
- I changed the format of the Date column from General to Date.
- Similarly I changed the format of Confirmed, Deaths and Recovered columns from General to Number, with no decimal places.
- Finally I saved and imported this Excel workbook into Microsoft SQL Server.
I am tasked to find answers to these 16 questions using SQL queries.
There were no Null values present in the dataset.
The data spans from January 2020 to June 2021, which gives us total 18 months of data, instead of general 12 months which aggregates the data based on 12 months a year which does not take into account the months from different years (2020 and 2021).
The mode is the value that appears most frequently in a set of data. In the context of the COVID-19, a mode of 0 means that on more days than any other number, there were no new confirmed cases reported.
This could indicate that there were several days with no spread of the infection or that no new cases were detected on those days. It’s a positive sign when the mode is 0, as it suggests there were periods with potentially very low transmission rates.
Q11. Check how Corona Virus spread out with respect to confirmed case per month (e.g., total confirmed cases, their average, variance & Standard Deviation).
The monthly summary statistics give us a detailed look at the spread of COVID-19 over time. They include several different numbers that help us understand the situation better:
-
Total count: This is the total number of confirmed cases for a given month. For example, in April, there were 24,047,819 confirmed cases, which was the highest for any month.
-
Mean: This is the average number of cases per day in a month. December had the highest daily average, meaning that, on average, more people were confirmed to have COVID-19 each day in December than in other months.
-
Range: This tells us the difference between the smallest and largest number of cases in a month. The minimum number of cases in any month was 0, and the maximum number varied, with the highest being 823,225 in December.
-
Median: This is the middle value of confirmed cases when they are arranged in order. For all months, the median was 23, indicating that half the days had fewer than 23 cases, and the other half had more.
-
Variance and Standard Deviation: These are statistical measures of how much the number of cases varies from day to day within a month. A higher variance and standard deviation mean that the number of cases was very inconsistent, with a lot of fluctuation from day to day. Most months had high variance and standard deviation, indicating that the number of cases each day was quite unpredictable.
In simple terms, these statistics tell us that April saw the most cases overall, December was the worst on average each day, and there was a lot of variation in the number of cases from day to day in most months. The situation was quite volatile, with some days having very few cases and others having a very high number of cases.
Q12. Check how Corona Virus spread out with respect to death case per month (e.g., total confirmed cases, their average, variance & Standard Deviation).
The monthly summary statistics give us a detailed look at the mortality rate of COVID-19 over time.
-
Total count: In case of death cases, April had the highest number of cases 554,220 death cases among all the other months.
-
Mean: December had the highest daily average (71), meaning that, on average, more people were dying due to COVID-19 each day in December than in other months.
-
Range: The minimum number of cases in any month was 0, and the maximum number varied, with the highest being 7374 in June.
-
Median: For all months, the median was 0, indicating that half the days had 0 cases, and the other half had more.
-
Variance and Standard Deviation: These are statistical measures of how much the number of cases varies from day to day within a month. A higher variance and standard deviation mean that the number of death cases was very inconsistent, with a lot of fluctuation from day to day. However, the monthly variance and standard deviations for death cases were lower compared to confirmed cases.
Q13. Check how Corona Virus spread out with respect to recovered case per month (e.g., total confirmed cases, their average, variance & Standard Deviation).
The monthly summary statistics give us a detailed look at the recovery rate of COVID-19 over time.
-
Total count: In May, 20,651,389 people cured, which was the highest for any month.
-
Mean: December had the highest daily average (2498), meaning that, on average, more people were recovered from COVID-19 each day in December than in other months.
-
Range: The minimum number of cases in any month was 0, and the maximum number varied, with the highest being 1,123,456 in December.
-
Median: For all months, the median was 2, indicating that half the days had fewer than 2 recovered cases, and the other half had more.
-
Variance and Standard Deviation: Most months had high variance and standard deviation, indicating that the number of recovered cases each day was quite unpredictable.
I have analyzed 78,386 records of Covid 19 cases (Confirmed, Death, Recovered) of 121 countries from January 2020 to June 2021 and found the following results:
-
From January 2020 to June 2021, total 169,065,144 people were diagnosed with Novel Corona Virus (confirmed cases). The data shows that 3,647,894 people died during this period and 113,089,548 people recovered.
-
Confirmed Cases Trend: From January to July 2020, there was a steady rise in confirmed COVID-19 cases, followed by a slight decline and then a sharp increase from October to December. In 2021, the peak occurred in January, with a subsequent gradual decline until March, before rising again through April.
- Death Cases Trend: A significant spike in deaths was observed in April 2020. Following this, death cases decreased, only to rise steadily from October 2020 to January 2021, reaching a peak. The numbers then fell until March, rose in May, and declined again in June.
- Recovered Cases Trend: The trend in recovered cases mirrored that of confirmed cases, indicating that as more individuals were diagnosed, a corresponding number of patients recovered.
- Impact of Recovery Measures: An analysis of the charts suggests that effective recovery measures led to a decrease in death rates, as evidenced from January to March, May to September in 2020, and February to March in 2021. Conversely, death cases surged when recovery efforts lagged behind the rapid increase in confirmed cases, notably in April 2020 and from October 2020 to January 2021.
- Monthly Average Trends: The monthly averages for confirmed, death, and recovered cases showed significant increases in January, April, and from August to December. Conversely, there were decreases in February and June.
-
Death and Recovery Percentages: The death-to-confirmed case ratio was approximately 2.16%, indicating a relatively small impact on the global population. In contrast, the recovery-to-confirmed case ratio was about 67%, suggesting that a majority of patients recovered.
-
Countries Most Affected: The United States, India, Brazil, France, and Turkey were among the countries with the highest number of confirmed cases.
The highest death tolls were recorded in the United States, Brazil, India, Mexico, and Peru.
-
Countries Least Affected: Kiribati, Samoa, Marshall Islands, Dominica, and Tanzania reported the lowest confirmed cases. The fewest deaths occurred in the Marshall Islands, Samoa, Kiribati, Dominica, and Bhutan.
-
High Recovery Cases: India demonstrated a remarkable recovery rate, with 28.1 million citizens recovering over 18 months. Brazil, the United States, Turkey, and Russia also reported high numbers of recoveries.
-
Incomplete Timeline: The dataset covers Covid-19 cases only from January 2020 to June 2021, missing nearly two years of data up to May 2023 when the WHO’s emergency declaration ended.
-
Limited Data for Analysis: With just three columns for confirmed, deaths, and recovered cases, the dataset lacks depth, omitting key statistics like hospitalization and testing rates needed for a thorough analysis.
-
No Demographic Details: The absence of patient demographics such as age, gender, and health conditions prevents detailed group analysis and affects strategy development for public health responses.
-
Missing Mitigation Measures: Information on country-specific actions like lockdowns and vaccinations is not included, limiting understanding of their effectiveness in managing the pandemic’s spread and impact.
To address these limitations in future research, the following steps could be considered:
-
Update the Dataset: Include data up to the latest date to capture the entire scope of the pandemic.
-
Expand Data Points: Add more data types like hospitalization and testing rates for richer analysis.
-
Add Patient Demographics: Collect age, gender, and health condition data to identify vulnerable populations and improve health strategies.
-
Leverage Advanced Analytics: Use machine learning and predictive models to reveal deeper insights and patterns in the data.