Skip to content

slitayem/stackoverflow_survey_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StackOverflow 2019 survey data analysis

Table of Contents

  1. Description
  2. Used Data and answered questions
  3. Data Analysis and insights extraction notebook
  4. Running the notebook
  5. Author
  6. License

Description

This Project is done in the context of Data Science Nanodegree Program by Udacity.

Key Steps of the project in finding the solutions are:

  1. Picking a dataset.

  2. Posing at least three questions related to business or real-world applications of how the data could be used.

  3. Performing necessary cleaning, analysis, and modeling.

    • Data preparation:

      • Gather necessary data to answer the questions
      • Handle categorical and missing data
      • Provide insight into the chosen methods and why they were chosen
    • Data Analyzis, Modeling, and Visualization to provide a clear connection between the business questions and how the data answers them.

  4. Sharing the business insights with stakeholders.

The project is following the CRISP-DM (Cross Industry Standard Process for Data Mining) process or methodology which consists of the following steps

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Data Modeling
  • Results Evaluation

Used Data and answered questions

The Stackoverflow Developer survey data from 2019 is used to answer the following questions regarding Open Source Software (OSS) contributions:

  • How often do developers contribute to OSS?
  • Do Hobyist developers contribute more often to OSS?
  • Does OSS quality perception play a bias role towards OSS contribution?
  • Are experienced developers contributing more frequently to OSS?
  • Do developers contributing to the OSS have a higher income?

Blog post

Analysis

The analysis notebook is available here

Running the notebook

  • Create a Python 3.6 conda virtual environment

    conda create --name py36 python=3.6

  • Activate the new environment

    conda activate py36

  • Install required packages by running the following command in the app's directory pip install -r requirements.txt

  • Extract data folder

    unzip data/so_survey_2019/so_developer_survey_2019.zip -d data/so_survey_2019/

  • run jupyter lab

If you want to just display the notebook content and its outputs use nbviewer. Also an html format of the notebook can be viewed here.

Author

License

License: MIT