Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support constant lr with cooldown #35453

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

LoserCheems
Copy link

@LoserCheems LoserCheems commented Dec 29, 2024

What does this PR do?

Fixes #35449
Added the 'warmup_stable_cooldown' learning rate scheduler method.
This method allows three phases, linear warmup, stable, and cooldown, where cooldown can be done using the linear, cosine and 1-sqrt methods.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@muellerzr and @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Jan 2, 2025

Thanks for the PR @LoserCheems, we already have warmup_stable_decay scheduler with get_wsd_schedule that does practically the same thing as the new scheduler you proposed. I think what you be good is to maybe add more options to the decay stage of the wsd_scheduler as you did in this PR (e.g linear/ 1-sqrt) WDYT ?

@LoserCheems
Copy link
Author

LoserCheems commented Jan 3, 2025

Thanks for the PR @LoserCheems, we already have warmup_stable_decay scheduler with get_wsd_schedule that does practically the same thing as the new scheduler you proposed. I think what you be good is to maybe add more options to the decay stage of the wsd_scheduler as you did in this PR (e.g linear/ 1-sqrt) WDYT ?

oh, thank for your suggestion, I intend to integrate different cooldown methods and minimum learning rate into get_wsd_schedule.
In addition, is it possible to rename get_wsd_schedule to get_wsc_schedule, which is more in line with the three phases of warmup, stable, cooldown.

@SunMarc
Copy link
Member

SunMarc commented Jan 3, 2025

Thanks ! We can't really rename it wsc, wsd is a real term that appears in the MiniCPM paper.

@LoserCheems
Copy link
Author

Thanks ! We can't really rename it wsc, wsd is a real term that appears in the MiniCPM paper.

Thank you, the renaming of the function is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Constant Learning Rate with Cooldown
2 participants