Making large AI models cheaper, faster and more accessible
-
Updated
Dec 25, 2024 - Python
Making large AI models cheaper, faster and more accessible
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A GPipe implementation in PyTorch
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Slicing a PyTorch Tensor Into Parallel Shards
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
A curated list of awesome projects and papers for distributed training or inference
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Distributed training (multi-node) of a Transformer model
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
SC23 Deep Learning at Scale Tutorial Material
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Fast and easy distributed model training examples.
Adaptive Tensor Parallelism for Foundation Models
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Performance Estimates for Transformer AI Models in Science
Serving distributed deep learning models with model parallel swapping.
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."