Optimised (read "better") solutions for the subset of questions from Top 30 Python Interview Questions and Answers.
Difficulty: Medium
- Find the probability of total rides each weather-hour combination constitutes.
- Output the weather, hour along with the corresponding probability.
- Sort records by the weather and the hour in ascending order.
import pandas as pd
lyft_rides['probability'] = lyft_rides.groupby(['weather', 'hour'])['index'].transform('count')/lyft_rides.shape[0]
lyft_rides[['weather', 'hour', 'probability']].sort_values(['weather', 'hour'], ascending=True).drop_duplicates()
Difficulty: Medium
- Write a query that returns the rate_type, loan_id and balance of each loan type.
- Add a column that shows with what % the loan's balance contributes to the total balance among the loans of the same rate type.
import pandas as pd
submissions['loan_share'] = submissions['balance']/submissions.groupby('rate_type')['balance'].transform('sum') * 100
submissions[['rate_type', 'loan_id', 'balance', 'loan_share']].sort_values('loan_share', ascending=False)
Difficulty: Hard (?)
- Output ids of students with a median score from the writing SAT.
import pandas as pd
sat_writing_median = sat_scores['sat_writing'].median()
list(sat_scores.query('sat_writing == @sat_writing_median')['student_id'])
Difficulty: Easy
- Find all emails with duplicates.
import pandas as pd
Difficulty: Medium
- Find the average number of days between the booking and check-in dates for AirBnB hosts.
- Order the results based on the average number of days (avg_days_between_booking_and_checkin) in descending order.
import pandas as pd
airbnb_contacts['days_between_booking_and_checkin'] = (
(airbnb_contacts['ds_checkin'] - airbnb_contacts['ts_booking_at'])
.agg(avg_days_between_booking_and_checkin=('days_between_booking_and_checkin', 'mean'))
.sort_values('avg_days_between_booking_and_checkin', ascending=False)