PILCO — Probabilistic Inference for Learning COntrol

Image source

This is our implementation of PILCO from Deisenroth, et al.
The implementation is largely based on the matlab code and the PhD thesis of Deisenroth. Other cool implementations can be found here and here.

Code structure

controller: Controller/policy models.
cost_functions: Cost functions for computing a trajectory's performance.
gaussian_process: (Sparse) Gaussian Process models for learning dynamics and RBF policy.
kernel: Kernel functions for Gaussian Process models.
test: Test cases to ensure the implementation is working as intended.
util: Helper methods to make main code more readable.

Executing experiments

Activate the anaconda environment

source activate my_env

Execute the pilco_runner script (the default environment is CartpoleStabShort-v0)

Training run from scratch:

python3 my/path/to/pilco_runner.py

Training run from an existing policy:

python3 my/path/to/pilco_runner.py --weight-dir my_model_directory

More console arguments (e.g. hyperparameter changes) can be added to the run, for details see

python3 my/path/to/pilco_runner.py --help

Executing evaluation run for existing policy

Activate the anaconda environment

source activate my_env

Execute the pilco_runner script

python3 my/path/to/pilco_runner.py --weight-dir my_model_directory --test

e.g. load pretrained models in test mode:

CartpoleStabShort-v0 (500Hz)

python3 pilco_runner.py --env-name CartpoleStabShort-v0 --test --max-action 5 --weight-dir experiments/best_models/pilco/stabilization/sparse_gp_50hz/

CartpoleSwingShort-v0 (500Hz)

python3 pilco_runner.py --env-name CartpoleSwingShort-v0 --test --max-action 10 --weight-dir experiments/best_models/pilco/swing_up/sparse_gp_100hz/

Qube-v0 (50Hz)

python3 pilco_runner.py --env-name Qube-v0 --test --weight-dir experiments/best_models/pilco/qube/sparse_gp_100hz/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PILCO — Probabilistic Inference for Learning COntrol

Code structure

Executing experiments

Executing evaluation run for existing policy

CartpoleStabShort-v0 (500Hz)

CartpoleSwingShort-v0 (500Hz)

Qube-v0 (50Hz)

Files

README.md

Latest commit

History

README.md

File metadata and controls

PILCO — Probabilistic Inference for Learning COntrol

Code structure

Executing experiments

Executing evaluation run for existing policy

CartpoleStabShort-v0 (500Hz)

CartpoleSwingShort-v0 (500Hz)

Qube-v0 (50Hz)