Skip to content

Actions: microsoft/DeepSpeed

nv-torch-latest-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
5,022 workflow runs
5,022 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-torch-latest-v100 #12768: Pull request #6909 synchronize by loadams
December 30, 2024 21:02 4h 25m 44s hj-wei:dev_hjwei
December 30, 2024 21:02 4h 25m 44s
Stage3: Use new torch grad accumulation hooks API
nv-torch-latest-v100 #12767: Pull request #6773 synchronize by loadams
December 30, 2024 18:54 2h 24m 36s deepcharm:stage3-use-new-grad-acc-api
December 30, 2024 18:54 2h 24m 36s
Fix checkpointable_layers Logic
nv-torch-latest-v100 #12766: Pull request #6881 synchronize by loadams
December 30, 2024 18:53 1h 43m 38s Quentin-Anthony:qanthony/fix-act-recomp
December 30, 2024 18:53 1h 43m 38s
Add fp8_gemm fallback for non-triton systems
nv-torch-latest-v100 #12765: Pull request #6916 synchronize by loadams
December 30, 2024 17:57 1h 29m 33s oelayan7:fp8_gemm_no_triton
December 30, 2024 17:57 1h 29m 33s
fix: RuntimeError for UCP large DP
nv-torch-latest-v100 #12764: Pull request #6918 synchronize by loadams
December 30, 2024 17:17 1h 41m 36s saforem2/ucp-bug
December 30, 2024 17:17 1h 41m 36s
Fix: forbid repeated deepspeed.initialize on training objects
nv-torch-latest-v100 #12763: Pull request #6874 synchronize by traincheck-team
December 30, 2024 02:05 Action required traincheck-team:fix-6848-forbid-repeated-init
December 30, 2024 02:05 Action required
Fix: forbid repeated deepspeed.initialize on training objects
nv-torch-latest-v100 #12762: Pull request #6874 synchronize by traincheck-team
December 30, 2024 02:02 Action required traincheck-team:fix-6848-forbid-repeated-init
December 30, 2024 02:02 Action required
nv-torch-latest-v100
nv-torch-latest-v100 #12761: Scheduled
December 30, 2024 00:21 1h 28m 42s master
December 30, 2024 00:21 1h 28m 42s
fix: RuntimeError for UCP large DP
nv-torch-latest-v100 #12760: Pull request #6918 opened by saforem2
December 29, 2024 18:23 6h 0m 23s saforem2/ucp-bug
December 29, 2024 18:23 6h 0m 23s
nv-torch-latest-v100
nv-torch-latest-v100 #12759: Scheduled
December 29, 2024 00:23 1h 32m 49s master
December 29, 2024 00:23 1h 32m 49s
Use ds-specific module id to avoid conflicts
nv-torch-latest-v100 #12758: Pull request #6847 synchronize by tjruwase
December 28, 2024 19:44 1h 21m 5s olruwase/pr_6772
December 28, 2024 19:44 1h 21m 5s
nv-torch-latest-v100
nv-torch-latest-v100 #12757: Scheduled
December 28, 2024 00:20 1h 34m 9s master
December 28, 2024 00:20 1h 34m 9s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-torch-latest-v100 #12756: Pull request #6909 synchronize by hj-wei
December 27, 2024 03:06 3h 52m 2s hj-wei:dev_hjwei
December 27, 2024 03:06 3h 52m 2s
nv-torch-latest-v100
nv-torch-latest-v100 #12753: Scheduled
December 27, 2024 00:20 1h 32m 55s master
December 27, 2024 00:20 1h 32m 55s
Stage3: Use new torch grad accumulation hooks API
nv-torch-latest-v100 #12752: Pull request #6773 synchronize by loadams
December 26, 2024 20:09 1h 40m 35s deepcharm:stage3-use-new-grad-acc-api
December 26, 2024 20:09 1h 40m 35s
Change compile for pipeline module torch.compile
nv-torch-latest-v100 #12751: Pull request #6478 synchronize by loadams
December 26, 2024 20:08 1h 31m 7s NirSonnenschein:torch_compile_micro_offset_fix
December 26, 2024 20:08 1h 31m 7s
Stage3: Use new torch grad accumulation hooks API
nv-torch-latest-v100 #12750: Pull request #6773 synchronize by loadams
December 26, 2024 17:40 1h 38m 33s deepcharm:stage3-use-new-grad-acc-api
December 26, 2024 17:40 1h 38m 33s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-torch-latest-v100 #12749: Pull request #6909 synchronize by loadams
December 26, 2024 17:15 Action required hj-wei:dev_hjwei
December 26, 2024 17:15 Action required
Use ds-specific module id to avoid conflicts
nv-torch-latest-v100 #12748: Pull request #6847 synchronize by loadams
December 26, 2024 17:13 1h 31m 7s olruwase/pr_6772
December 26, 2024 17:13 1h 31m 7s
Fix checkpointable_layers Logic
nv-torch-latest-v100 #12747: Pull request #6881 synchronize by loadams
December 26, 2024 17:12 1h 32m 5s Quentin-Anthony:qanthony/fix-act-recomp
December 26, 2024 17:12 1h 32m 5s
Update Gaudi2 jobs to latest 1.19 build
nv-torch-latest-v100 #12746: Pull request #6905 synchronize by loadams
December 26, 2024 17:12 6h 0m 24s raza-sikander:master
December 26, 2024 17:12 6h 0m 24s
Add fp8_gemm fallback for non-triton systems
nv-torch-latest-v100 #12744: Pull request #6916 opened by oelayan7
December 26, 2024 08:52 Action required oelayan7:fp8_gemm_no_triton
December 26, 2024 08:52 Action required