Student: He Sun
Comments for Task 1:
You were successful in importing and pre-processing the data for Task 1.
You were successful in applying the missingno function to identify missing data. You didn't delete all of the null data in the data-file.
You correctly imputed missing data. You successfully converted all of your data to a numeric data type. You didn't standardize your data. Although it was not completely necessary in this case, it is often useful to standardize data to speed optimisation algorithms. There were some significant outliers in the dataset. You might have considered removing these. You might have found it useful to apply the '.describe()' function to your data frame to obtain more information about the data contained within it. You successfully applied the '.info()' function to your dataset. You could have used the '.dtypes' function to find out what types of value were in each feature. You could have additionally used the 'shape' attribute to determine the number of columns (features) and rows (observations) in your dataset You might have considered using a scatter chart to help gain insights into your data - is distribution and any obvious correlations in the data. You might have considered using a correlation matrix to help find any existing correlations in the data-set. There were other charts presented during the course that might have helped with your data exploration.
You might have considered selection sub-sets of the data that you were given. You applied Principle Component Analysis - which is good. You correctly applied the 'k nearest neighbours' algorithm to organise your data into similar type of asteroid.
You might have considered applying the 'Gaussian Mixture Model' (GMM) to your data. GMM is a 'soft' clustering algorithm similar to KNN but gives a probability of each item being in a particular group. The assignment question asked you to provide a 'spread' (or 'distribution') for the data. A Gaussian Mixture Model (GMM) would have allowed you to obtain a variance in the data of each cluster. I see that you also successfully applied clustering algorithms to your data that were not presented during the course. You correctly applied the 'elbow method' to estimate the number of types of asteroid. You correctly applied the 'Silhouette Score' to estimate the number of types of asteroid. You correctly obtained the typical asteroid composition using the cluster centroids. You have provided some useful description and rationale for your analysis
Comments for Task 2:
You were successful in importing and pre-processing the data for Task 1. You were successful in applying the missingno function to identify missing data. You didn't delete all of the null data in the data-file. You correctly imputed missing data. You successfully converted all of your data to a numeric data type. You didn't standardize your data. Although it was not completely necessary in this case, it is often useful to standardize data to speed optimisation algorithms. There were some significant outliers in the dataset. You might have considered removing these. You might have found it useful to apply the '.describe()' function to your data frame to obtain more information about the data contained within it.
You successfully applied the '.info()' function to your dataset. You could have used the '.dtypes' function to find out what types of value were in each feature. You could have additionally used the 'shape' attribute to determine the number of columns (features) and rows (observations) in your dataset
You might have considered using a scatter chart to help gain insights into your data - is distribution and any obvious correlations in the data. You might have considered using a correlation matrix to help find any existing correlations in the data-set. There were other charts presented during the course that might have helped with your data exploration. You might have considered selection sub-sets of the data that you were given. Additionally you might have considered applying Principle Component Analysis (PCA) - although this was not strongly emphasised in the course material. You successfully applied Logistic Regression to classify the asteroids Your decision tree was correctly applied and produced a useful result Your Random Forrest Classifier seems to have been successful. The Support Vector Machine (SVM) you created seems to have worked well in this case. You built a deep learning model to classify the data set. This is good as this type of model is more advanced than the others taught on the course.
I see that you also applied classification models that were not presented on the course.
You correctly created a train/test split for your data. You generated an effective confusion matrix. This is a particularly useful tool to understand the performance of classification models. Your use of an accuracy score was correct and appropriate. Your comparison of model performance is useful. You might have included a discussion of over-fitting. You might have additionally attempted to tune the hyper-parameters of your model. You correctly applied the use of 'grid search' to tune the hyper-parameters of your model.
You provided an excellent level of discussion / rationale for task 2
Programme Administration Oxford Study Abroad Programme Belsyre Court, First Floor | 57 Woodstock Road | Oxford, OX2 6HJ, United Kingdom T: +44 (0) 1865521959 | W: www.oxfordstudyabroad.org.uk