Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Aurora DeepSpeed content #581

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Added Aurora DeepSpeed content #581

wants to merge 1 commit into from

Conversation

hatanp
Copy link

@hatanp hatanp commented Dec 16, 2024

First draft for DeepSpeed on Aurora. Modified from Polaris instructions and tested to work.

@hatanp hatanp requested a review from saforem2 December 16, 2024 18:20

The base `frameworks` environment on Aurora does not come with Microsoft's
[DeepSpeed](https://github.com/microsoft/DeepSpeed) pre-installed and it needs to be installed by the user. Instructions
for using / cloning the base environment can be found [here](../python.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for using / cloning the base environment can be found [here](../python.md).
for using and cloning the base environment can be found [here](../python.md).


!!! example "Launching DeepSpeed"
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed
config embedded in the python file cifar10_deepspeed.py. This is because the default of 16 is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config embedded in the python file cifar10_deepspeed.py. This is because the default of 16 is not
config embedded in the Python file `cifar10_deepspeed.py`. This is because the default of 16 is not

```

!!! example "Launching DeepSpeed"
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the DeepSpeed

sed -e 's/$/ slots=12/' -i hostfile
```

2. Create a `#!bash .deepspeed_env` containing the environment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a conversation with @saforem2 about the use of syntax highlighting for inline code markup like you use here with #!bash ...

We concluded that it isnt really worth the marginal gains for such short pieces of text, since it doesnt render in GitHub Flavored Markdown and has different escaping conventions.
image

Kinda degrades the readability when you are editing someone else's source text. Have you found it to be helpful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants