- Description
- Used Data and answered questions
- Data Analysis and insights extraction notebook
- Running the notebook
- Author
- License
This Project is done in the context of Data Science Nanodegree Program by Udacity.
Key Steps of the project in finding the solutions are:
-
Picking a dataset.
-
Posing at least three questions related to business or real-world applications of how the data could be used.
-
Performing necessary cleaning, analysis, and modeling.
-
Data preparation:
- Gather necessary data to answer the questions
- Handle categorical and missing data
- Provide insight into the chosen methods and why they were chosen
-
Data Analyzis, Modeling, and Visualization to provide a clear connection between the business questions and how the data answers them.
-
-
Sharing the business insights with stakeholders.
The project is following the CRISP-DM (Cross Industry Standard Process for Data Mining) process or methodology which consists of the following steps
- Business Understanding
- Data Understanding
- Data Preparation
- Data Modeling
- Results Evaluation
The Stackoverflow Developer survey data from 2019 is used to answer the following questions regarding Open Source Software (OSS) contributions:
- How often do developers contribute to OSS?
- Do Hobyist developers contribute more often to OSS?
- Does OSS quality perception play a bias role towards OSS contribution?
- Are experienced developers contributing more frequently to OSS?
- Do developers contributing to the OSS have a higher income?
The analysis notebook is available here
-
Create a
Python 3.6
conda virtual environmentconda create --name py36 python=3.6
-
Activate the new environment
conda activate py36
-
Install required packages by running the following command in the app's directory
pip install -r requirements.txt
-
Extract data folder
unzip data/so_survey_2019/so_developer_survey_2019.zip -d data/so_survey_2019/
-
run
jupyter lab
If you want to just display the notebook content and its outputs use nbviewer. Also an html format of the notebook can be viewed here.