-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Aurora DeepSpeed content #581
base: main
Are you sure you want to change the base?
Conversation
|
||
The base `frameworks` environment on Aurora does not come with Microsoft's | ||
[DeepSpeed](https://github.com/microsoft/DeepSpeed) pre-installed and it needs to be installed by the user. Instructions | ||
for using / cloning the base environment can be found [here](../python.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for using / cloning the base environment can be found [here](../python.md). | |
for using and cloning the base environment can be found [here](../python.md). |
|
||
!!! example "Launching DeepSpeed" | ||
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed | ||
config embedded in the python file cifar10_deepspeed.py. This is because the default of 16 is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config embedded in the python file cifar10_deepspeed.py. This is because the default of 16 is not | |
config embedded in the Python file `cifar10_deepspeed.py`. This is because the default of 16 is not |
``` | ||
|
||
!!! example "Launching DeepSpeed" | ||
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the deepspeed | |
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the DeepSpeed |
sed -e 's/$/ slots=12/' -i hostfile | ||
``` | ||
|
||
2. Create a `#!bash .deepspeed_env` containing the environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a conversation with @saforem2 about the use of syntax highlighting for inline code markup like you use here with #!bash ...
We concluded that it isnt really worth the marginal gains for such short pieces of text, since it doesnt render in GitHub Flavored Markdown and has different escaping conventions.
Kinda degrades the readability when you are editing someone else's source text. Have you found it to be helpful?
First draft for DeepSpeed on Aurora. Modified from Polaris instructions and tested to work.