This is our implementation of PILCO from Deisenroth, et al.
The implementation is largely based on the matlab code and the PhD thesis of Deisenroth.
Other cool implementations can be found here and here.
- controller: Controller/policy models.
- cost_functions: Cost functions for computing a trajectory's performance.
- gaussian_process: (Sparse) Gaussian Process models for learning dynamics and RBF policy.
- kernel: Kernel functions for Gaussian Process models.
- test: Test cases to ensure the implementation is working as intended.
- util: Helper methods to make main code more readable.
- Activate the anaconda environment
source activate my_env
- Execute the pilco_runner script (the default environment is CartpoleStabShort-v0)
Training run from scratch:
python3 my/path/to/pilco_runner.py
Training run from an existing policy:
python3 my/path/to/pilco_runner.py --weight-dir my_model_directory
More console arguments (e.g. hyperparameter changes) can be added to the run, for details see
python3 my/path/to/pilco_runner.py --help
- Activate the anaconda environment
source activate my_env
- Execute the pilco_runner script
python3 my/path/to/pilco_runner.py --weight-dir my_model_directory --test
e.g. load pretrained models in test mode:
python3 pilco_runner.py --env-name CartpoleStabShort-v0 --test --max-action 5 --weight-dir experiments/best_models/pilco/stabilization/sparse_gp_50hz/
python3 pilco_runner.py --env-name CartpoleSwingShort-v0 --test --max-action 10 --weight-dir experiments/best_models/pilco/swing_up/sparse_gp_100hz/
python3 pilco_runner.py --env-name Qube-v0 --test --weight-dir experiments/best_models/pilco/qube/sparse_gp_100hz/