-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC_2019_project_modelselection
Following up on one of our very first GSoC (2011) projects, this project intends to clean up, unify, extend, and scale-up Shogun's modelselection and hyper-parameter tuning framework. It is a cool mixture of modernizing existing code, using multi-threaded (and potentially distributed) concepts, and playing with black-box optimization frameworks.
Medium to advanced. Depends on ambitions, but we are flexible on student's abilities.
You need to know about
- Modelselection basics (x-validation, search-algorithms, implementation)
- Shogun's old modelselection framework (check the master branch, as this has been removed from develop)
- Shogun's parameter framework
- C++ (we want to use modern C++, so read up on mix-ins, concepts, and friends)
- Optimisation frameworks like MOE or cma-es
- Knowledge of other libraries' approaches (sklearn, MLPack)
Every learning algorithm (CMachine
subclass) should work with x-validation ... fast!
This is completely independent of any hyper-parameter tuning.
- All model classes should be systematically tested with x-validation, see issue. This is similar to the trained model tests.
- Identify models that do only perform read-only operations on the features (this will be all models later, depending on the progress offeatures-detox project).
- Enable multi-core x-validation using openmp or std::thread, via cloning of the underlying learning machine, but with shared features (memory efficiency!).
- Carefully test the chosen models for race-conditions, memory errors, etc.
- Add algorithms on a one-by-one basis.
- Generalise code of the "trained model serialization" tests to a "trained model" tests, where multiple things can be checked for the trained models (serialization, x-validation for now).
- Make sure model-selection has a progress bar, is stoppable, continue-able, etc. See also the black-box project
We recently started pushing the use of modern C++ concepts for overload dispatching, mixins, concepts and more. Elegant solutions to the cross-validation problem using such are of course highly welcome :)
We want to build a better way to specify free parameters to learn, which overlaps with the user experience project. The current way is to build parameter trees whose structure matches the learning machine, see e.g. here We would like to shop around other libraries for ideas on specifying this.
Potential API:
params = start().add("C", [1,2,3,4]).add("kernel::log_width", [1,2,3,4]).build()
And then of course from a string
params = parse(" { "C": [1,2,3,4], "epsilon": [0.01], "kernel": } ")
Some steps:
- Review and compare other libraries ' approaches
- Collect the most common use cases (random search, grid-search, gradient search (e.g. in our Gaussian Process framework))
- Come up with a set of clean API examples / user stories for those cases
- Draft code how to implement this API. This will include ways to annotate the spaces that parameters live in, as well as whether gradients are available.
- Implement and test systematically
- Make sure it works nicely in all target languages.
Bayesian optimisation and stochastic optimisation are powerful frameworks for blackbox optimisation. We aim to integrate bindings for both during the project. There is plenty of external libraries that do the algorithms for us, so this task is mostly about designing interfaces that tell Shogun to cross-validate the algorithm on the next set of parameters and reporting its performance. We aim for both MOE and CMA-ES.
There is hardly any algorithm without free parameters. Currently Shogun only has brute force search to tune them automatically. While this works for SVMs, it it hopeless for anything more than 2 parameters. Certainly, a clean and easy way to quickly tune parameters would massively boost Shogun's usability. The project spans a huge range on topics within and outside of Shogun, including framework internals as well as cutting edge algorithms for optimisation. Super interesting even for ourselves. Be ready to learn a lot.
- Shogun's modelselection classes
- Parameter trees
- MOE are a plus
- CMA-ES
- entrance task on testing xvalidation