You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to request information regarding the pretraining configuration of LlaMa on the A100 80G GPU for my project. As I am planning to use this setup for my research, having access to the specific pretraining configuration details would greatly help me in replicating and benchmarking the results and best speed.
The text was updated successfully, but these errors were encountered:
As described in this issue, the configurations would be similar to the pretraining on TPU pods, with the additional jax distributed configurations. However, you'll have to tune the mesh shape and batch size yourself according to the configuration of your own cluster in order to obtain the best throughput. Unfortunately I don't have access to a few hundred A100s so I cannot provide a good example for that.
I would like to request information regarding the pretraining configuration of LlaMa on the A100 80G GPU for my project. As I am planning to use this setup for my research, having access to the specific pretraining configuration details would greatly help me in replicating and benchmarking the results and best speed.
The text was updated successfully, but these errors were encountered: