LlaMa Pretraining in A100 80G #79

mohammadaminabbasi · 2023-07-25T10:52:47Z

I would like to request information regarding the pretraining configuration of LlaMa on the A100 80G GPU for my project. As I am planning to use this setup for my research, having access to the specific pretraining configuration details would greatly help me in replicating and benchmarking the results and best speed.

young-geng · 2023-07-25T18:53:08Z

As described in this issue, the configurations would be similar to the pretraining on TPU pods, with the additional jax distributed configurations. However, you'll have to tune the mesh shape and batch size yourself according to the configuration of your own cluster in order to obtain the best throughput. Unfortunately I don't have access to a few hundred A100s so I cannot provide a good example for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlaMa Pretraining in A100 80G #79

LlaMa Pretraining in A100 80G #79

mohammadaminabbasi commented Jul 25, 2023

young-geng commented Jul 25, 2023

LlaMa Pretraining in A100 80G #79

LlaMa Pretraining in A100 80G #79

Comments

mohammadaminabbasi commented Jul 25, 2023

young-geng commented Jul 25, 2023