This is a group final project of Data Visualization subject. The task was to apply all the knowledge introduced during the course to analyze a dataset (optional).
This dataset is uploaded on Kaggle, named "Starbucks Customer Survey". We used the data that have not been encoded, as we figured out that the encoded one was incorrect. This dataset is composed of a survey questions of over 100 respondents for their buying behavior at Starbucks. The survey includes the demographic information about customers, their consuming behavior at Starbucks, and their ratings on various criteria related to facilities and features of Starbucks. With this dataset, we aim to predict the loyalty of each customers.
Here is the Kaggle link of the data: https://www.kaggle.com/datasets/mahirahmzh/starbucks-customer-retention-malaysia-survey?select=Starbucks+satisfactory+survey+encode+cleaned.csv
- Import data
- Preprocess Data
- Encode categorical variables
- Visualize data with different types of diagrams, tables, figures and come up with some key conclusion, analysis told by the visual
- Dimensionality reduction
- Classification: target variable: loyal (20th question: Will you continue buying at Starbucks?)
The report was written in Vietnamese. In summary, we have used various types of diagrams appropriately to visualize, unveil hidden message from the data. After the dimensionality reduction, we used the scatter plot to see how seperate the classes are.