Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new feature of FPDT (#441) #70

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

saforem2
Copy link
Member

@saforem2 saforem2 commented Dec 6, 2024

  • pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1

  • add FPDT support; add Ulysses rotary position embedding support

  • add FPDT support; add Ulysses rotary position embedding support

  • add FPDT support; add Ulysses rotary position embedding support

  • add FPDT support; add Ulysses rotary position embedding support

  • remove unnecessary files

  • set the warmup length to be FPDT chunk size if enabled


YJHMITWEB and others added 3 commits December 4, 2024 17:34
* pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* remove unnecessary files

* set the warmup length to be FPDT chunk size if enabled

---------

Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw02.ten.osc.edu>
Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw01.ten.osc.edu>
* [tools]GQA convert support

* fix readme
Previously, `deepspeed_to_megatron.py` would raise an import error
due to the relative import.

This commit fixes this issue by changing from the relative import
to the absolute import like in `deepspeed_to_transformers.py`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants