Release v0.15.2 Patch Release · microsoft/DeepSpeed

What's Changed

Update version.txt after 0.15.1 release by @loadams in #6493
HPU: add required ENV vars to acccelerator init by @nelyahu in #6495
Op_builder->is_compatible quite warning by @terry-for-github in #6093
fix pipeline eval_batch micro_batches argument for schedule by @nelyahu in #6484
Fix the broken url link by @rogerxfeng8 in #6500
fix environment variable export bug for MultiNodeRunner by @TideDra in #5878
Revert "BF16 optimizer: Clear lp grads after updating hp grads in hook" by @nelyahu in #6508
wrap include cuda_bf16.h with ifdef BF16_AVAILABLE by @oelayan7 in #6520
Avoid security issues of subprocess shell by @tjruwase in #6498
Add conditional on torch version for scaled_dot_product_attention by @loadams in #6517
Added Intel Gaudi to Accelerator Setup Guide by @ShifaAbu in #6543
Skip failing newly added tests in accelerate by @loadams in #6574
Use msgpack for p2p comm by @tohtana in #6547
DeepNVMe perf tuning by @tjruwase in #6560
[Accelerator] Cambricon MLU support by @Andy666G in #6472
Fix gradient accumulation for Z2+offload by @tohtana in #6550
fix errors when setting zero3 leaf modules with torch.compile by @NirSonnenschein in #6564
[XPU] Support DeepNVMe new code structure by @Liangliang-Ma in #6532
Add APIs to offload states of model, optimizer, and engine by @tohtana in #6011
add bfloat16 to inference support dtypes by @nelyahu in #6528
[COMPILE] workflow for deepspeed + torch.compile by @YizhouZ in #6570
Fixes on the accelerate side mean we do not need to skip this test by @loadams in #6583
Fix torch include in op_builder/mlu/fused_adam.py and update no-torch workflow triggers by @loadams in #6584
[ROCm] Fix subprocess error by @jagadish-amd in #6587
Cleanup CODEOWNERS file to be valid by @loadams in #6603
Add SSF Best practices badge by @loadams in #6604
Move V100 workflows from cuda 11.1/11.7 to 12.1 by @loadams in #6607
Fix SD workflow by @loadams in #6609
Pin accelerate to fix CI failures/issues by @loadams in #6610
Add llama3.2 vision autotp by @Yejing-Lai in #6577
Improve DS logging control by @tjruwase in #6602
Fix device selection using CUDA_VISIBLE_DEVICES by @tohtana in #6530
Handle when backend is also in compile_kwargs by @oraluben in #6502
Rearrange inference OPS and stop using builder.load by @oelayan7 in #5490
Unpin accelerate tests, update lightning with node16 removal. by @loadams in #6611
Enabled Qwen2-MoE Tensor Parallelism (TP) inference by @gyou2021 in #6551

New Contributors

@TideDra made their first contribution in #5878
@ShifaAbu made their first contribution in #6543
@jagadish-amd made their first contribution in #6587
@gyou2021 made their first contribution in #6551

Full Changelog: v0.15.1...v0.15.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.2 Patch Release

What's Changed

New Contributors

Contributors