Skip to content

Commit

Permalink
Enable training for fraction of total steps; enable early stopping fr…
Browse files Browse the repository at this point in the history
…om trial 0

Summary:
Pull Request resolved: #627

Enable training for fraction of total steps: when doing HPO, users may want to train for a fraction of the number of training steps of a regular (baseline) training run. In this case, it is not enough to just change SOLVER.MAX_ITER because that also changes the learning rate schedule. We introduce a multiplier to be used on top of SOLVER.MAX_ITER when deciding how many steps to train for. This multiplier does not scale the number of steps over which the learning rate schedule is defined.

Reviewed By: raghuramank100

Differential Revision: D48699087

fbshipit-source-id: 903f7c957ee471f36365c1449e9cd6a919fd260a
  • Loading branch information
ifed-ucsd authored and facebook-github-bot committed Oct 12, 2023
1 parent 54d9d91 commit 3c72441
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion d2go/runner/default_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,18 @@ def do_train(self, cfg, model, resume):
# The checkpoint stores the training iteration that just finished, thus we start
# at the next iteration (or iter zero if there's no checkpoint).
start_iter += 1
max_iter = cfg.SOLVER.MAX_ITER

if "EARLY_STOPPING_FRACTION" in cfg.SOLVER:
assert (
cfg.SOLVER.EARLY_STOPPING_FRACTION >= 0
), f"Early stopping fraction must be non-negative, but is {cfg.SOLVER.EARLY_STOPPING_FRACTION}"
assert (
cfg.SOLVER.EARLY_STOPPING_FRACTION <= 1
), f"Early stopping fraction must not be larger than 1, but is {cfg.SOLVER.EARLY_STOPPING_FRACTION}"
max_iter = int(cfg.SOLVER.MAX_ITER * cfg.SOLVER.EARLY_STOPPING_FRACTION)
else:
max_iter = cfg.SOLVER.MAX_ITER

periodic_checkpointer = PeriodicCheckpointer(
checkpointer, cfg.SOLVER.CHECKPOINT_PERIOD, max_iter=max_iter
)
Expand Down

0 comments on commit 3c72441

Please sign in to comment.