- About the Project
- Data Modeling
- Data Engineering
- Data Analysis
- Salary Analysis
- Conclusion
- Resources
- Contact
As a recent employee at Pewlett Hackard, I have been tasked to organize and analyze data for employees at the corporation in the 1980s and 1990s. I have used a database mapping tool to visualize all of the data and then PostgreSQL to analyze this information. Below you will see a step by step guide to how the information was modeled and reviewed.
To visualize my data, I used an Entity Relational Database called Quick Database Diagrams . First, I reviewed my 6 different csv
files
After reviewing the data, I was able to determine what columns would be Primary Keys for when comparing my data files. I then created tables for each csv
file and included the details for the data going into each column. For example, numbers were assigned as INTEGERS
and names and personal information were assigned as VARCHAR
.
Once my tables were created in the mapping tool, I then linked the tables on Primary keys and other data relationships. I linked employee_id
and other id
information.
Below is what the code looks like in the data mapping tool. Once I completed organizing the data, I exported this into a file to use in PostgreSQL
.
After exporting the schema from the data mapping tool, I used the code to create my tables in pgAdmin4
. The code exported with " "
around the variables, so I deleted the quotes
when transferring the code to easier manipulate the data later.
The code includes the CREATE TABLE
command and then includes one line per column of data. Each column has syntax
for the type of data included in the cells and the command NOT NULL
to only include cells with information included and not blank cells. The end syntax
refers to the Primary Key
which I set to be the Employee_ID
.
Employee Table:
Department Table: includes Department_ID
and Department_Name
Department Manager Table: includes Department_ID
and manager Employee_ID
Employee Department Table: includes Employee_ID
and Department_ID
Job Title Table: includes Title_ID
and Title
Salary Table: includes Employee_ID
and Salary
amount
By mapping my data tables in the mapping tool, I created CONSTRAINTS
, PRIMARY KEYS
, and FOREIGN KEYS
. These criteria help my data tables interact when combining information into one queries
.
- List the following details of each employee: employee number, last name, first name, sex, and salary.
Query : To show this view, I used an INNER JOIN
on the Employee
and Salaries
tables. I joined
the data on the Employee_ID
.
Data Output:
- List first name, last name, and hire date for employees who were hired in 1986.
Query: To show this view, I used a WHERE
function to only show Employees who were hired in the year 1986 - WHERE hire_data LIKE ‘%1986%’
. If you put the search value in ’% %’
it searches similarly to ‘contains’.
Data Output:
- List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name.
Query: To display this view, I had to INNER JOIN
3 different data tables- Employee
, Managers
, and Departments
. I joined the Manager
and Employee
tables on Employee_ID
. I joined the Managers
and Departments
on Department_ID
.
Data Output:
- List the department of each employee with the following information: employee number, last name, first name, and department name.
Query: This view is similar to the view in question 3 accept the INNER JOIN
is on the Employee
, Employee_Departmentand
Departmenttables instead of
Managerdepartments. I also joined the tables on
Employee_IDand
Department_ID` .
Data Output:
- List first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B."
Query: To show this view, I used a WHERE
statement with 2 criteria- WHERE
the First_Name
is equal to ’Hercules’
and WHERE
the Last_Name
starts with the letter ’%B%”
.
Data Output:
- List all employees in the Sales department, including their employee number, last name, first name, and department name.
Query: To show this view, I had to use INNER JOIN
3 tables- Employee
, Employee_Department
, and Department
. This is how the view is able to display the Employee information as well as the Department_Name
. I also used a WHERE
clause to only show Employee’s who are in the Sales
department.
Data Output:
- List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name.
Query: This view is similar to the view in the above question accept in the WHERE
statement, there are 2 criteria- the Department
is equal to Sales
OR
Development
.
Data Output:
- In descending order, list the frequency count of employee last names, i.e., how many employees share each last name.
Query: To show this view, I used a COUNT
function to count the number of times a Last_Name
was shared by employees. I labeled this new count as ‘Frequency’ to appear as a new Column in the Data Output. To count the ‘like’ Last_Names
, I used a GROUP BY
function to group the same Last_Name
. Then I sorted the data in descending order by using an ORDER BY
command.
Data Output:
To further analyze the employee data, I imported the database into Jupyter Notebook
. I had to use the create_engine
and engine.connect( )
commands to import the data.
Once it was linked to notebook, I was able to use code I created in PostgreSQL
to view the data in a table. When importing the code, I use 3 sets of “ “
to organize my code on separate lines (the way it is displayed in pgAdmin
.
I then viewed the Salary data in a histogram
. I used the code data.hist(‘salary)
to display the chart.
To further display the data in a bar chart, I use a groupby
function to group the data by Job Title and then used .mean( )
to calculate the average salary by Job Title. I used .plot(kind = bar)
to create the bar chart.
After reviewing the data in the bar chart format, it was clear there was something strange about the information. The Bar chart shows that Sr level employees were making the same salaries as manager level and very close salaries to assistant level.
When I brought this up to my new manager, he told me to do a search in the database for my Employee ID – 499942.
I did this by using a WHERE
clause to find the employee_id = 499942
. The data showed below :
Very Funny….
Link to Jupyter Notebook
Link to SQL Employee Database
Sara Simoes