Enable training for fraction of total steps; enable early stopping fr…

…om trial 0 Summary: Pull Request resolved: #627 Enable training for fraction of total steps: when doing HPO, users may want to train for a fraction of the number of training steps of a regular (baseline) training run. In this case, it is not enough to just change SOLVER.MAX_ITER because that also changes the learning rate schedule. We introduce a multiplier to be used on top of SOLVER.MAX_ITER when deciding how many steps to train for. This multiplier does not scale the number of steps over which the learning rate schedule is defined. Reviewed By: raghuramank100 Differential Revision: D48699087 fbshipit-source-id: 903f7c957ee471f36365c1449e9cd6a919fd260a
facebookresearch · Oct 12, 2023 · 3c72441 · 3c72441
1 parent 54d9d91
commit 3c72441
Showing 1 changed file with 12 additions and 1 deletion.
diff --git a/d2go/runner/default_runner.py b/d2go/runner/default_runner.py
@@ -572,7 +572,18 @@ def do_train(self, cfg, model, resume):
             # The checkpoint stores the training iteration that just finished, thus we start
             # at the next iteration (or iter zero if there's no checkpoint).
             start_iter += 1
-            max_iter = cfg.SOLVER.MAX_ITER
+
+            if "EARLY_STOPPING_FRACTION" in cfg.SOLVER:
+                assert (
+                    cfg.SOLVER.EARLY_STOPPING_FRACTION >= 0
+                ), f"Early stopping fraction must be non-negative, but is {cfg.SOLVER.EARLY_STOPPING_FRACTION}"
+                assert (
+                    cfg.SOLVER.EARLY_STOPPING_FRACTION <= 1
+                ), f"Early stopping fraction must not be larger than 1, but is {cfg.SOLVER.EARLY_STOPPING_FRACTION}"
+                max_iter = int(cfg.SOLVER.MAX_ITER * cfg.SOLVER.EARLY_STOPPING_FRACTION)
+            else:
+                max_iter = cfg.SOLVER.MAX_ITER
+
             periodic_checkpointer = PeriodicCheckpointer(
                 checkpointer, cfg.SOLVER.CHECKPOINT_PERIOD, max_iter=max_iter
             )