Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --gym-packages parameter to push_to_hub #315

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
bb00557
Use the MaxTrialsCallback to set the number of trials idnependent fro…
ernestum Apr 11, 2022
edbd3ac
Add seals environments and corresponding (tentative) hyperparameters.
ernestum Mar 29, 2022
fc82444
Use `NopPruner` when pruner is set to `"none"` (#234)
qgallouedec Apr 17, 2022
217cf1f
Dox fixes and move to python 3.7+ style (#237)
araffin Apr 21, 2022
bcb4299
Fix division by zero with n-evaluations (closes #238)
araffin Apr 21, 2022
763aceb
Add command line option for total number of trials.
ernestum Apr 26, 2022
975c520
Fix formatting.
ernestum Apr 26, 2022
0cafc71
Add test for training with multiple workers and the new --total-n-tri…
ernestum Apr 28, 2022
6110f3c
Ensure Pruned trials are counted and that no optimization is started …
ernestum Apr 28, 2022
069556e
Update CHANGELOG.md
ernestum Apr 28, 2022
7a195bc
Ensure that pruned trials are understood as completed trials.
ernestum Apr 28, 2022
645ea48
Fix formatting.
ernestum Apr 28, 2022
374b3f9
Add tuned hyperparameters for ppo and seals environments.
ernestum Apr 29, 2022
ea16e61
Add tuned hyperparameters for SAC and seals environments.
ernestum Apr 29, 2022
d6dc4d9
Merge branch 'master' into add_seals_environments
AdamGleave May 2, 2022
4388c18
Merge branch 'master' into add_seals_environments
ernestum May 27, 2022
9ef5fba
Merge pull request #1 from HumanCompatibleAI/add_seals_environments
ernestum May 27, 2022
9ae9ea4
Merge branch 'master' of github.com:DLR-RM/rl-baselines3-zoo
ernestum Oct 26, 2022
1c4b851
Merge branch 'master' of github.com:DLR-RM/rl-baselines3-zoo
ernestum Nov 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions hyperparams/a2c.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ CartPole-v1:
policy: 'MlpPolicy'
ent_coef: 0.0

seals/CartPole-v0:
n_envs: 8
n_timesteps: !!float 5e5
policy: 'MlpPolicy'
ent_coef: 0.0

LunarLander-v2:
n_envs: 8
n_timesteps: !!float 2e5
Expand All @@ -35,6 +41,13 @@ MountainCar-v0:
policy: 'MlpPolicy'
ent_coef: .0

seals/MountainCar-v0:
normalize: true
n_envs: 16
n_timesteps: !!float 1e6
policy: 'MlpPolicy'
ent_coef: .0

Acrobot-v1:
normalize: true
n_envs: 16
Expand Down Expand Up @@ -170,19 +183,39 @@ HalfCheetah-v3: &mujoco-defaults
n_timesteps: !!float 1e6
policy: 'MlpPolicy'

seals/HalfCheetah-v0:
<<: *mujoco-defaults

Ant-v3:
<<: *mujoco-defaults

seals/Ant-v0:
<<: *mujoco-defaults

Hopper-v3:
<<: *mujoco-defaults

seals/Hopper-v0:
<<: *mujoco-defaults

Walker2d-v3:
<<: *mujoco-defaults

seals/Walker2d-v0:
<<: *mujoco-defaults

Humanoid-v3:
<<: *mujoco-defaults
n_timesteps: !!float 2e6

seals/Humanoid-v0:
<<: *mujoco-defaults
n_timesteps: !!float 2e6

Swimmer-v3:
<<: *mujoco-defaults
gamma: 0.9999

seals/Swimmer-v0:
<<: *mujoco-defaults
gamma: 0.9999
77 changes: 77 additions & 0 deletions hyperparams/ars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ CartPole-v1:
policy: 'LinearPolicy'
n_delta: 2

seals/CartPole-v0:
n_envs: 1
n_timesteps: !!float 5e4
policy: 'LinearPolicy'
n_delta: 2

# Tuned
Pendulum-v1: &pendulum-params
n_envs: 1
Expand Down Expand Up @@ -41,6 +47,11 @@ MountainCar-v0:
n_delta: 8
n_timesteps: !!float 5e5

seals/MountainCar-v0:
<<: *pendulum-params
n_delta: 8
n_timesteps: !!float 5e5

# Tuned
MountainCarContinuous-v0:
<<: *pendulum-params
Expand Down Expand Up @@ -119,6 +130,17 @@ Swimmer-v3:
alive_bonus_offset: 0
# normalize: "dict(norm_obs=True, norm_reward=False)"

seals/Swimmer-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 2e6
learning_rate: !!float 0.02
delta_std: !!float 0.01
n_delta: 1
n_top: 1
alive_bonus_offset: 0
# normalize: "dict(norm_obs=True, norm_reward=False)"

Hopper-v3:
n_envs: 1
policy: 'LinearPolicy'
Expand All @@ -130,6 +152,17 @@ Hopper-v3:
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"

seals/Hopper-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 7e6
learning_rate: !!float 0.01
delta_std: !!float 0.025
n_delta: 8
n_top: 4
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"

HalfCheetah-v3:
n_envs: 1
policy: 'LinearPolicy'
Expand All @@ -141,6 +174,17 @@ HalfCheetah-v3:
alive_bonus_offset: 0
normalize: "dict(norm_obs=True, norm_reward=False)"

seals/HalfCheetah-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 1.25e7
learning_rate: !!float 0.02
delta_std: !!float 0.03
n_delta: 32
n_top: 4
alive_bonus_offset: 0
normalize: "dict(norm_obs=True, norm_reward=False)"

Walker2d-v3:
n_envs: 1
policy: 'LinearPolicy'
Expand All @@ -152,6 +196,17 @@ Walker2d-v3:
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"

seals/Walker2d-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 7.5e7
learning_rate: !!float 0.03
delta_std: !!float 0.025
n_delta: 40
n_top: 30
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"

Ant-v3:
n_envs: 1
policy: 'LinearPolicy'
Expand All @@ -163,6 +218,17 @@ Ant-v3:
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"

seals/Ant-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 7.5e7
learning_rate: !!float 0.015
delta_std: !!float 0.025
n_delta: 60
n_top: 20
alive_bonus_offset: -1
normalize: "dict(norm_obs=True, norm_reward=False)"


Humanoid-v3:
n_envs: 1
Expand All @@ -175,6 +241,17 @@ Humanoid-v3:
alive_bonus_offset: -5
normalize: "dict(norm_obs=True, norm_reward=False)"

seals/Humanoid-v0:
n_envs: 1
policy: 'LinearPolicy'
n_timesteps: !!float 2.5e8
learning_rate: 0.02
delta_std: 0.0075
n_delta: 256
n_top: 256
alive_bonus_offset: -5
normalize: "dict(norm_obs=True, norm_reward=False)"

# Almost tuned
BipedalWalker-v3:
n_envs: 1
Expand Down
22 changes: 22 additions & 0 deletions hyperparams/ddpg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,21 +131,43 @@ HalfCheetah-v3: &mujoco-defaults
noise_type: 'normal'
noise_std: 0.1

seals/HalfCheetah-v0:
<<: *mujoco-defaults

Ant-v3:
<<: *mujoco-defaults

seals/Ant-v0:
<<: *mujoco-defaults

Hopper-v3:
<<: *mujoco-defaults

seals/Hopper-v0:
<<: *mujoco-defaults

Walker2d-v3:
<<: *mujoco-defaults

seals/Walker2d-v0:
<<: *mujoco-defaults

Humanoid-v3:
<<: *mujoco-defaults
n_timesteps: !!float 2e6

seals/Humanoid-v0:
<<: *mujoco-defaults
n_timesteps: !!float 2e6

Swimmer-v3:
<<: *mujoco-defaults
gamma: 0.9999
train_freq: 1
gradient_steps: 1

seals/Swimmer-v0:
<<: *mujoco-defaults
gamma: 0.9999
train_freq: 1
gradient_steps: 1
30 changes: 30 additions & 0 deletions hyperparams/dqn.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,21 @@ CartPole-v1:
exploration_final_eps: 0.04
policy_kwargs: "dict(net_arch=[256, 256])"

seals/CartPole-v0:
n_timesteps: !!float 5e4
policy: 'MlpPolicy'
learning_rate: !!float 2.3e-3
batch_size: 64
buffer_size: 100000
learning_starts: 1000
gamma: 0.99
target_update_interval: 10
train_freq: 256
gradient_steps: 128
exploration_fraction: 0.16
exploration_final_eps: 0.04
policy_kwargs: "dict(net_arch=[256, 256])"

# Tuned
MountainCar-v0:
n_timesteps: !!float 1.2e5
Expand All @@ -49,6 +64,21 @@ MountainCar-v0:
exploration_final_eps: 0.07
policy_kwargs: "dict(net_arch=[256, 256])"

seals/MountainCar-v0:
n_timesteps: !!float 1.2e5
policy: 'MlpPolicy'
learning_rate: !!float 4e-3
batch_size: 128
buffer_size: 10000
learning_starts: 1000
gamma: 0.98
target_update_interval: 600
train_freq: 16
gradient_steps: 8
exploration_fraction: 0.2
exploration_final_eps: 0.07
policy_kwargs: "dict(net_arch=[256, 256])"

# Tuned
LunarLander-v2:
n_timesteps: !!float 1e5
Expand Down
Loading