#Lower Capacity Strategy
-
Review the fields in a meeting and discuss which are likelier to be easier to clean.
-
Ignore the harder fields and focus only on the easier ones. Build the simplest possible classifier (usually logistic regression or decision tree) on these fields.
-
Focus next on producing summary statistics for this subset of fields, and some basic charts/ graphs.
-
Collate into a notebook.
-
Add commentary and interpretation but acknowledge the limitations of not considering all the fields.
-
Prepare into a simple slide deck and practice timed delivery.
-
If you accomplish the above quicker than expected you can go back over to:
- clean and incorporate some of the omitted fields
- rerun the model with these, or consider alternative models
- expand your presentation – perhaps include more impressive visuals
- do further background reading to improve your understanding of the data
Trello board: https://trello.com/invite/b/t28LcStz/ATTIa9253884ed88829cf8c54d13d7bb910c893D98C1/group10-technical-challenge
Machine Learning Algorithms Cheatsheet [Python/R]: https://www.kaggle.com/discussions/getting-started/156497
Extensive overview of ML with Caret: https://www.machinelearningplus.com/machine-learning/caret-package/#:~:text=Caret%20Package%20is%20a%20comprehensive%20framework%20for%20building,the%20optimal%20model%20in%20the%20shortest%20possible%20time.
Feature selection with Caret: https://machinelearningmastery.com/feature-selection-with-the-caret-r-package/
ML Ensembles: https://machinelearningmastery.com/machine-learning-ensembles-with-r/