Releases · microsoft/DeepSpeed

20 Mar 18:00

jeffra

v0.8.3

f1e4fb0

v0.8.3: Patch release

What's Changed

[deepspeed/autotuner] Bug fix for skipping mbs on gas by @rahilbathwal5 in #2171
Fix issue between our abstract accelerator and colossalai's version of op_builder by @jeffra in #2963
[zero] prevent poor configs from running w. zero-offload by @jeffra in #2971
Fix Meta Tensor checkpoint load for OPT models by @lekurile in #2990
ckpt: create directories in checkpoint_engine by @adammoody in #2988
Fix buffer size for pipeline parallel and communication schedule by @tohtana in #2862
[docs] add new paper to readme/docs by @jeffra in #3018
fix language by @stas00 in #3019
BF Optimizer Attribute Checks by @jomayeri in #3022
[logger] implement logger.warning_once by @stas00 in #3021
Convert model parameters from generator to list. by @jomayeri in #3017
Improve loss overflow logs by @Quentin-Anthony in #3008
Fix Broken Links by @satpalsr in #3048

New Contributors

@satpalsr made their first contribution in #3048

Full Changelog: v0.8.2...v0.8.3

Contributors

jeffra, adammoody, and 7 other contributors

Assets 2

07 Mar 18:19

jeffra

v0.8.2

db15ef5

v0.8.2: Patch release

What's Changed

add auto-generated PR workflow by @mrwyattii in #2822
Fix typo in auto-sync workflow by @mrwyattii in #2850
Fix example command for building wheel with dev version specified. by @loadams in #2815
Create tensor parallelism blog/tutorial by @molly-smith in #2766
Data efficiency library update by @conglongli in #2866
Make z3 respect comm dtype by @tjruwase in #2807
Automatic Tensor Parallelism Blog Links by @molly-smith in #2877
Check device count before running dist tests by @HeyangQin in #2799
AutoTP tutorial web formatting and news by @molly-smith in #2883
Remove deprecated torch._six imports by @yasyf in #2863
Reduce I/O size by @tjruwase in #2814
add missing license info to top of all source code by @jeffra in #2889
Enable tensor fragments for zero 2 & 3 by @tjruwase in #2727
better eval sampler for val or test dataset by @mayank31398 in #2907
using container when loading inference checkpoints by @HeyangQin in #2875
Fix CPUAdam for when vendor_id_raw is not provided by @FarzanT in #2836
Fix Bloom logits mismatch by @molly-smith in #2851
Fixes AttributeError in #2853 by @saforem2 in #2854
Add MPICH Multinode Runner by @inkcherry in #2839
TP unsupported models and assertions by @molly-smith in #2810
AutoTP Assert Kernel Injection Support by @molly-smith in #2939
Check for local CUDA graphs when enable_cuda_graph=True by @lekurile in #2941
Improve overflow handling by @tjruwase in #2944
[RFC] add device abstraction to allow other device than CUDA be used by @delock in #2221
deepspeed.init_distributed() support for TCP protocols by @noabauma in #2905

New Contributors

@HeyangQin made their first contribution in #2799
@yasyf made their first contribution in #2863
@mayank31398 made their first contribution in #2907
@FarzanT made their first contribution in #2836
@saforem2 made their first contribution in #2854
@noabauma made their first contribution in #2905

Full Changelog: v0.8.1...v0.8.2

Contributors

jeffra, yasyf, and 13 other contributors

Assets 2

17 Feb 22:11

jeffra

v0.8.1

6c85fe6

v0.8.1: Patch release

What's Changed

CUDA optional deepspeed ops by @tjruwase in #2507
Remove CI trigger for push to master by @mrwyattii in #2712
[install] only add deepspeed pkg at install by @jeffra in #2714
Fix nightly tests for new lm-eval release by @mrwyattii in #2713
BF16 optimizer for BF16+ZeRO Stage 1 by @jomayeri in #2706
Fix typo in diffusers transformer block by @mrwyattii in #2718
Inference Refactor (replace_with_policy, model_implementations) by @awan-10 in #2554
Change zero_grad() argument to match pytorch by @loadams in #2741
Automatic tensor parallelism v2 by @molly-smith in #2670
Fixing Optimizer Sanity Check by @jomayeri in #2742
[GatheredParameters] fix memory leak by @stas00 in #2665
Abstract accelerator (step 3) by @delock in #2677
Fix autotuning so that it records Floating Point Operations per second, not microsecond by @dashstander in #2711
fix a misspelled attribute by @stas00 in #2750
[zero] remove misleading dtype log by @jeffra in #2732
Fix softmax backward by @RezaYazdaniAminabadi in #2709
Skip test_bias_gelu unit test if torch < 1.12 by @lekurile in #2754
Conditionally Make Op Building More Verbose by @cmikeh2 in #2759
Bing/formatting correction by @xiexbing in #2764
Add links to new azureML examples by @cassieesvelt in #2756
Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. by @loadams in #2743
Refactor/Pydantify monitoring config by @mrwyattii in #2640
Pin minimum packaging requirement by @carmocca in #2771
Fix for diffusers v0.12.0 by @mrwyattii in #2753
some fix in flops_profiler by @lucasleesw in #2068
fix upsample flops compute by skipping unused kargs by @cli99 in #2773
Fix broken kernel inject bug by @molly-smith in #2776
Fix Checkpoint-loading with Meta-tensor by @RezaYazdaniAminabadi in #2781
Add hjson support for user configs by @mrwyattii in #2783
Reset KV-cache at the beginning of text-generation by @RezaYazdaniAminabadi in #2669
Container param cleanup + remove qkv_merging by @lekurile in #2780
Common location to install libaio-dev by @tjruwase in #2779
Fixing broken link to azureml-examples recipes by @rtanase in #2795
remove outdated comment by @stas00 in #2786
Enable page-locked tensors without CUDA by @tjruwase in #2775
Add container load checkpoint error reporting + refactor by @lekurile in #2792
Add user defined launcher args for PDSH launcher by @loadams in #2804
Fix Slurm launcher user args by @loadams in #2806
Handle hanged tests in CI by @mrwyattii in #2808
Fix inference CI device error by @mrwyattii in #2824
Fix permissions issue with pip upgrade by @mrwyattii in #2823
Fix cpu-only CI hangs by @mrwyattii in #2825
Fix Pipeline Parallel resize unit test by @mrwyattii in #2833
Fix auto TP for duplicate modules with different gems by @molly-smith in #2784
Refactor DS inference API. No longer need replace_method. by @awan-10 in #2831
Port Reza's INT8-quantization fix to container architecture by @lekurile in #2725
Fix gpt-Neox rotary embedding implementation by @RezaYazdaniAminabadi in #2782
Fix for CI failure on system upgrade by @mrwyattii in #2849

New Contributors

@loadams made their first contribution in #2741
@xiexbing made their first contribution in #2764
@carmocca made their first contribution in #2771
@lucasleesw made their first contribution in #2068
@rtanase made their first contribution in #2795

Full Changelog: v0.8.0...v0.8.1

Contributors

jeffra, rtanase, and 17 other contributors

Assets 2

17 Jan 18:46

jeffra

v0.8.0

bf6b980

DeepSpeed v0.8.0

New features

DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality
DeepSpeed Data Efficiency Library by @conglongli in #2585

What's Changed

fix blog link by @conglongli in #2600
Migrate ops tests to new inference_ops marker by @cmikeh2 in #2599
Move layer norm to new schedule by @lokoppakmsft in #2590
[deepspeed/autotuner] Bug fix for binary search for batch size by @rahilbathwal5 in #2162
Fix for older versions of pydantic by @mrwyattii in #2611
Use rocm/pytorch:latest for ROCm Dockerfile by @jithunnair-amd in #2613
skip torch.zeros and tensor.copy_ when model parallel is not used by @guoyejun in #2479
call empty_cache to really free up GPU memory as described in comment by @guoyejun in #2620
Remove GatheredParameters context from replace_with_policy by @lekurile in #2591
fixes #2498 by @clumsy in #2603
Update AVX512 Detection by @cmikeh2 in #2621
Add Megatron CI workflow by @mrwyattii in #2614
[inference] check for unsupported model generate args by @jeffra in #2627
[launcher] parse hostfile via regex and added error checks by @jeffra in #2626
Unit tests setup own venv by @mrwyattii in #2628
Fix #2409: add enable_each_rank_log to deepspeed/launcher/runner.py by @inkcherry in #2571
Fix typo in autotuner.py by @eltociear in #2639
[zero-3] Handle forward parameter return correctly in nested cases by @samyam in #2642
[inference] ds-attention refactor w.r.t. ops by @jeffra in #2623
Fix issue w. bloom int8 when changing tp size by @jeffra in #2645
fix assertion error in zero stage 3 by @GuanhuaWang in #2647
tweaks to ds-attn, distilbert policy, and mup by @jeffra in #2649
[doc] fix min_loss_scale default by @stas00 in #2660
[launcher] fail gracefully if hostname -i doesn't work as expected by @jeffra in #2631
Fix Opt injection by @RezaYazdaniAminabadi in #2541
Abstract accelerator (step 2) by @delock in #2560
Remove unnecessary device synchronization for stage 2 by @li-yi-dong in #2500
[Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by @wkcn in #2661
[fp16] lower initial_scale_power to 16 by @stas00 in #2663
fix Tensor contiguous bug in model_compression by @xiaoxiawu-microsoft in #2671
[inference] ds-mlp refactor w.r.t. ops by @jeffra in #2668
real_accelerator validation check for both accelerator and deepspeed accelerator path by @delock in #2685
fix typo and remove duplicated code in ZeRO stage 1 and 2 by @wkcn in #2655
Add mlflow logging for aml by @cassieesvelt in #2495
Fix import error of op_builder by @tohtana in #2687
Pass training flag to forward call from module config by @lokoppakmsft in #2604
Extend quantization utils features by @lokoppakmsft in #2683
[GatheredParameters] add support for any iterable by @stas00 in #2664
Fix for latest diffusers by @mrwyattii in #2699
exclude benchmarks during install by @jeffra in #2698
Correct loss scale in ZeRO step by @jomayeri in #2695
[ZeRO] non-MoE stage 1 requires CG disabled by @jeffra in #2703
remove print side effect from importing deepspeed by @jeffra in #2704
ZeRO3 handling frozen weights by @tjruwase in #2653

New Contributors

@eltociear made their first contribution in #2639
@li-yi-dong made their first contribution in #2500
@wkcn made their first contribution in #2661
@xiaoxiawu-microsoft made their first contribution in #2671
@cassieesvelt made their first contribution in #2495
@tohtana made their first contribution in #2687

Full Changelog: v0.7.7...v0.8.0

Contributors

clumsy, jeffra, and 22 other contributors

Assets 2

12 Dec 20:52

jeffra

v0.7.7

2076bf2

v0.7.7: Patch release

What's Changed

Update the locator for Megatron-LM by @rapsealk in #2564
use get_global_rank if available by @jeffra in #2567
Add Determined to open-source DL frameworks by @sirredbeard in #2573
Support fp32 gradaccum for bf16 model by @delock in #2566
Drop Maxwell Support by @cmikeh2 in #2574
Fix quantized-inference & Add generic support of checkpoint loading by @RezaYazdaniAminabadi in #2547
Fix MegatronLayerPolicy to have megatron_v2=True by @lekurile in #2579
Update barrier and reduce_scatter_base to conform to PyTorch signatures by @Quentin-Anthony in #2570
Support N-dimension input in quantization kernel by @lokoppakmsft in #2575
Add checkpoint sharding unit tests by @mrwyattii in #2561
Updating docs README by @jomayeri in #2587
Updating API docs by @jomayeri in #2586
Fix issues w. python 3.6 + add py-version checks to CI by @jeffra in #2589
[benchmarks] get mask token from tokenizer by @jeffra in #2592

New Contributors

@rapsealk made their first contribution in #2564
@sirredbeard made their first contribution in #2573

Full Changelog: v0.7.6...v0.7.7

Contributors

jeffra, cmikeh2, and 9 other contributors

Assets 2

01 Dec 20:25

jeffra

v0.7.6

aeda7f9

v0.7.6: Patch release

What's Changed

DeepSpeed inference config. (#2459) by @awan-10 in #2472
Update docs to autogenerate pydantic config model docs by @mrwyattii in #2509
Add max_tokens alias to max_out_tokens arg to maintain backwards compatibility by @lekurile in #2508
Deepspeed quantization library v0.1 by @lokoppakmsft in #2450
Fix backward compatibility for InferenceConfig by @mrwyattii in #2516
Add missing Inference sub-configs by @mrwyattii in #2518
Add note about nvcc/hipcc requirement by @jeffra in #2519
Update codeowners by @jeffra in #2525
Dequantization Utils Library by @cmikeh2 in #2521
Fixes for torch 1.14 due to new torch.numel return type by @jeffra in #2522
Ensure MOE is initialized for SD by @cmikeh2 in #2534
Make DS-Inference config readable from JSON by @mrwyattii in #2537
Add MII tests by @mrwyattii in #2533
Remove mutable default parameter in init_inference() by @aphedges in #2540
Change Where DS/Triton is Used in Stable Diffusion by @cmikeh2 in #2536
Pass down the new DS inference config to replace_transformer_layer. by @awan-10 in #2539
Adding Gradient Accumulation Data Type Config by @jomayeri in #2512
Report progress at gradient accumulation boundary by @ShijieZZZZ in #2553
encoded ds config into command line argument when launching child processes in autotuning by @cli99 in #2524
Add missing MoE fields to inference config for backward compatibility by @mrwyattii in #2556
Abstract accelerator (step 1) by @delock in #2504
Fix invalid check of recorded parameter orders in zero stage3. by @inkcherry in #2550

New Contributors

@ShijieZZZZ made their first contribution in #2553
@delock made their first contribution in #2504
@inkcherry made their first contribution in #2550

Full Changelog: v0.7.5...v0.7.6

Contributors

jeffra, cmikeh2, and 10 other contributors

Assets 2

14 Nov 17:52

jeffra

v0.7.5

06e00f6

v0.7.5: Patch release

What's Changed

Fix Bug #2319 by @jomayeri in #2438
update pytorch pool operator function signiture by @cli99 in #2443
Fix build issues on Windows by @eltonzheng in #2428
rollback ds config changes by @cli99 in #2395
Use CUDA events for inference model profiling by @mrwyattii in #2371
Fixing a config mismatch in unit test. by @jomayeri in #2447
Reduction Kernel Utility by @cmikeh2 in #2436
deepspeed/launcher/launch.py: add option enable_each_rank_log by @guoyejun in #2409
Fixes for various CI problems by @mrwyattii in #2457
Cache Allocation and Softmax Fixes by @cmikeh2 in #2433
Fix checkpoint loading at inference-engine by @RezaYazdaniAminabadi in #2429
Create a new folder structure to isolate model-specific code in DS by @awan-10 in #2464
don't gather partitioned activations for mp size 1 by @guoyejun in #2454
Updating autotune json default in docs. by @jomayeri in #2476
Added MLFLOW environment variables for logging metrics within trainig… by @savitamittal1 in #2477
fix accelerate link in README by @kyoto7250 in #2481
Fix Stable-Diffusion: Add correct memory-allocation at DeepSpeed-Attention by @RezaYazdaniAminabadi in #2474
Fix CI issues related to cupy install by @mrwyattii in #2483
Add scale_attn_by_inverse_layer_idx feature by @hyunwoongko in #2486
Stable Diffusion Enhancements by @cmikeh2 in #2491
stage_1_and_2.py: no allreduce needed when mp size is 1 by @guoyejun in #2494
Make bf16_optimizer work for non pipeline parallelism by @tjruwase in #2470
Fix nightly CI tests by @mrwyattii in #2493
Make data contiguous before the inplace reshape-copy_ function. by @lokoppakmsft in #2489
Fix typos: deepseed -> deepspeed by @jinyouzhi in #2499

New Contributors

@guoyejun made their first contribution in #2409
@savitamittal1 made their first contribution in #2477
@kyoto7250 made their first contribution in #2481
@lokoppakmsft made their first contribution in #2489
@jinyouzhi made their first contribution in #2499

Full Changelog: v0.7.4...v0.7.5

Contributors

cmikeh2, tjruwase, and 12 other contributors

Assets 2

21 Oct 21:11

jeffra

v0.7.4

877a881

v0.7.4: Patch release

What's Changed

MOE residual matmult unit test by @samadejacobs in #2323
MOE matmult with memaccess by @samadejacobs in #2336
Refactor residual add kernels by @arashb in #2333
mem access for quantize kernel by @GuanhuaWang in #2331
increase min pre-commit versions by @jeffra in #2346
Extend scratch buffer for long prompts by @cmikeh2 in #2212
[docs] fix zero docs by @jeffra in #2350
Staging profile inference v1 (#2348) by @awan-10 in #2349
Kernel Data Conversion Utility by @cmikeh2 in #2327
Add Onebit Optimizers in init by @l4d2boomer in #2340
docs(mixture-of-experts-inference): fix typo in tuto by @jqueguiner in #2345
Use blob storage for datasets in unit tests by @mrwyattii in #2342
Refactor gptj_residual_add kernels for better readability by @arashb in #2358
Updated issue templates by @jeffra in #2363
fix cuda invalid config error in dequant kernel by @GuanhuaWang in #2362
Add missing pytest fixture scope by @arashb in #2353
Extend residual_add kernel tests to cover pre_attn_norm by @arashb in #2354
Refactor fused_bias_residual kernels for better readability by @arashb in #2356
Capture error message during sweep tests by @molly-smith in #2351
Fix an exception when auto-casting dicts to fp16 by @mjksmith in #2370
Refactor remaining distributed tests by @mrwyattii in #2216
Fix the MLP output tensor's shape by @arashb in #2380
add 11.8 to cuda_minor_mismatch_ok to allow building with current CUDA by @Thomas-MMJ in #2390
Pin Transformers test version by @mrwyattii in #2402
Change type to tuple in replace_wo_policy isinstance check by @lekurile in #2387
Checkpoint backwards-compatbility workaround by @tjruwase in #2384
Add Predicated Global Load to Memory Access Utils by @cmikeh2 in #2373
MII blog post by @jeffra in #2418
Fix figure reference by @awan-10 in #2419
Add SLURM Multinode Runner by @dashstander in #2404
Fix issue with corrupted output on long generation for GPT by @andrewchernyh in #2359
Fix GPT Neo-X multi-gpu inference by @andrewchernyh in #2401
CI fixes related to triton by @jeffra in #2422
[docs] update mii blog title by @jeffra in #2423
add SD injection policy by @jeffra in #2381
Fix checkpoint loading when it is a dictionary by @RezaYazdaniAminabadi in #2425
Make error regex more generic in collect_results.py by @molly-smith in #2415
fixes #2389 by @clumsy in #2411
Fix for inference gpt-j test by @mrwyattii in #2430
Fixing bug 2361 by @jomayeri in #2410
Universal checkpoint for zero stage 1 by @tjruwase in #2284
only add deps if extra is explicitly called by @jeffra in #2432
Add TestInjectionPolicy inference unittest class for testing custom injection policies by @lekurile in #2426
[memory estimators] new config args sync by @stas00 in #2431
parallelize writing of layer checkpoint files across data parallel instances by @adammoody in #1419
Fix broken link to DeepSpeed Megatron fork by @lekurile in #2440

New Contributors

@l4d2boomer made their first contribution in #2340
@jqueguiner made their first contribution in #2345
@mjksmith made their first contribution in #2370
@Thomas-MMJ made their first contribution in #2390
@lekurile made their first contribution in #2387
@dashstander made their first contribution in #2404
@andrewchernyh made their first contribution in #2359
@clumsy made their first contribution in #2411
@jomayeri made their first contribution in #2410

Full Changelog: v0.7.3...v0.7.4

Contributors

clumsy, arashb, and 19 other contributors

Assets 2

19 Sep 16:41

jeffra

v0.7.3

1592381

v0.7.3: Patch release

What's Changed

Add blob storage to CI runners by @mrwyattii in #2260
Update replace_module.py, test-gptj.py related fix by @molly-smith in #2269
Fix OrderedDict import for python3.6 by @Dipet in #2267
Ds inference/fix mp2 by @RezaYazdaniAminabadi in #2270
Trajepl: nebula load fix by @trajepl in #2182
Prevent torch ext folder mkdir at tmp by @jeffra in #2274
Ds-inference Int8 support through ZeroQuant technology by @RezaYazdaniAminabadi in #2217
add a new unit test for cuda ops by @awan-10 in #2278
Addition to code owners file by @cmikeh2 in #2279
Memory Access Utility by @cmikeh2 in #2276
Fp32 accuracy bug fix by @RezaYazdaniAminabadi in #2285
Refactor universal checkpointing and tensor fragments by @tjruwase in #2253
[ds-inference] fix progress bar by @stas00 in #2286
Offload all gradients to nvme by @tjruwase in #2282
fused bias relu unittest by @molly-smith in #2297
Fix for pytest picking up wrong deepspeed by @mrwyattii in #2299
Fix for Zero3 when MP>1 by @Quentin-Anthony in #2289
Unit test for bias add kernel by @mrwyattii in #2298
Update relu.cu with mem_access_utils by @molly-smith in #2306
Add tensor parallel inference unit tests by @mrwyattii in #2232
Fix the residual add mp scaling for GPTNeoX by @arashb in #2310
Add unit tests for residual_add kernel by @arashb in #2307
add inference eval scripts by @jeffra in #2303
Upgrade P40 tests to torch 1.8 by @mrwyattii in #2316
ZeRO-Inference blog by @tjruwase in #2271
ZeRO-Inference blog - wrap up by @tjruwase in #2321
ZeRO-Inference blog - Update README by @tjruwase in #2322
Refactor relu bias add with mem_access utils by @mrwyattii in #2317
add quant unit test by @GuanhuaWang in #2315
only override forward if using cuda-graph by @jeffra in #2291
Add more options to inference benchmark by @mrwyattii in #2325

New Contributors

@molly-smith made their first contribution in #2269

Full Changelog: v0.7.2...v0.7.3

Contributors

arashb, jeffra, and 11 other contributors

Assets 2

25 Aug 19:20

jeffra

v0.7.2

80f94c1

v0.7.2: Patch release

What's Changed

Enable contiguous gradients with Z1+MoE by @siddharth9820 in #2250
Correctly detect CPU optimizer usage by @tjruwase in #2257
Update Half Precision Kernel Compatibility by @cmikeh2 in #2261
fix #2240: wrong time unit in flops_profiler by @yzs981130 in #2241

New Contributors

@cmikeh2 made their first contribution in #2261
@yzs981130 made their first contribution in #2241

Full Changelog: v0.7.1...v0.7.2

Contributors

cmikeh2, tjruwase, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New features

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: microsoft/DeepSpeed

v0.8.3: Patch release

What's Changed

New Contributors

Contributors

v0.8.2: Patch release

What's Changed

New Contributors

Contributors

v0.8.1: Patch release

What's Changed

New Contributors

Contributors

DeepSpeed v0.8.0

New features

What's Changed

New Contributors

Contributors

v0.7.7: Patch release

What's Changed

New Contributors

Contributors

v0.7.6: Patch release

What's Changed

New Contributors

Contributors

v0.7.5: Patch release

What's Changed

New Contributors

Contributors

v0.7.4: Patch release

What's Changed

New Contributors

Contributors

v0.7.3: Patch release

What's Changed

New Contributors

Contributors

v0.7.2: Patch release

What's Changed

New Contributors

Contributors