Merge OpenAI Triton commit `0702320` #3149

whitneywhtsang · 2025-01-13T18:31:04Z

This PR change the Triton base from 3bac3be to 0702320 (Jan 13).
Pass rate: 97.63%

Please do not squash and merge this PR.

…end to LLVM codegen. Ignore NaN when set. (#5582)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

# Overview Atomics in triton have two optional attributes: 1) `sem` -- describing the memory semantics of the operation 2) `scope` -- describing which threads will see the effect of a memory operation (e.g., GPU, CTA) Presently, the `scope` is ignored by the AMD backend and defaults to `agent`-scope in the emitted LLVM (which roughly corresponds to `gpu` memscope in triton). This is correct (in most cases? maybe not all?), as this is a "stricter" scope than CTA (and I'm guessing it is rare that system scope is needed for AMD kernels, so no bugs have shown up). That being said, emitting atomics at CTA scope can be more efficient since there can be fewer cache invalidations/barriers. I think that this is fixable by just passing through the attribute to the generated `llvm.atomicrmw` op. There are some additional optimizations potentially possible (e.g., !amdgpu.no.remote.memory, since Triton doesn't support this today), but it isn't clear to me if those would have any real impact on end-to-end performance and those optimizations would be specific to the `sys`-scope that doesn't appear to be frequently used. # Testing I added a lit test to ensure that the generated LLVM instructions have the correct sem/scope attributes for atomicrmw, but I also ran the following 386 unit tests locally on an MI300x: ```bash pytest test/unit/language/test_core.py -k test_atomic_ ``` I then locally ran some kernels with the scope set to CTA/SYSTEM to make sure that they worked.

Following triton-lang/triton#5582.

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

there is currently a weird bug causing capability overrides to persist when users pass `arch=None`. Rather than making `CUDABackend.sw_capability` stateful, we now retrieve capability lazily from compilation options also fix an amd bug encountered in the wild

…t_cmd.py` (#5588) Relates to triton-lang/triton#5537 --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

This reverts commit 70359fa which was causing some of our internal tests to fail. Co-authored-by: Adam P. Goucher <goucher@statslab.cam.ac.uk>

…elates to c++20 (#5585) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

We found regressions for moe kernel with fp8 inputs. This PR effectively reverts part of #4767 and disables the swap-operand feature for fp8 inputs matmul kernels for now while we investigate the regression.

…degen Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

…ional argument but 2 were given` Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

Jokeren and others added 13 commits January 11, 2025 16:28

[FRONTEND][NFC] Remove unused strings (#5578)

199fd8a

[FRONTEND][BACKEND] plumb fast_math attribute from scaled_dot front…

22ac447

…end to LLVM codegen. Ignore NaN when set. (#5582)

Remove examples folder (#5574)

a3095b3

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

[AMD] Bypass NaN check for fast math scaled dot (#5584)

9649f71

Following triton-lang/triton#5582.

Enable ruff-pre-commit for third_party/proton (#5586)

6b41bcf

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

Proper use of subprocess.check_call in `third_party/proton/test/tes…

7db39a9

…t_cmd.py` (#5588) Relates to triton-lang/triton#5537 --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

Revert "Revert "Reverting #5389 (#5528)" (#5555)" (#5592)

3ed479f

This reverts commit 70359fa which was causing some of our internal tests to fail. Co-authored-by: Adam P. Goucher <goucher@statslab.cam.ac.uk>

Don't use designated initializers in MatmulLoopPipeline.cpp as it r…

4523d38

…elates to c++20 (#5585) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

Enable ruff-pre-commit for third_party/amd (#5589)

194a21f

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

[AMD] Disable swap operands for fp8 matmul (#5577)

e8ef0bb

We found regressions for moe kernel with fp8 inputs. This PR effectively reverts part of #4767 and disables the swap-operand feature for fp8 inputs matmul kernels for now while we investigate the regression.

[INTERPRETER] Fix typo in attribute name (#5593)

0702320

whitneywhtsang requested a review from pbchekin January 13, 2025 18:31

whitneywhtsang self-assigned this Jan 13, 2025

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 3bac3be~~ Merge OpenAI Triton commit 7cc6799 Jan 13, 2025

pbchekin approved these changes Jan 13, 2025

View reviewed changes

whitneywhtsang marked this pull request as ready for review January 13, 2025 19:30

whitneywhtsang added 4 commits January 13, 2025 20:04

Merge commit '7cc6799ddb76a18830874259bcaf2da59484c684'

0dfcad6

[Intel] Plumb fast_math attribute from scaled_dot frontend to LLVM co…

f501e97

…degen Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

Fix `TypeError: XPUBackend.get_codegen_implementation() takes 1 posit…

0dfb744

…ional argument but 2 were given` Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

Merge commit '07023209bfc88c06a9f06b655da6d25e6208f9fa'

865cfae

whitneywhtsang force-pushed the whitneywhtsang/merge branch from e7e7ed5 to 865cfae Compare January 13, 2025 20:54

whitneywhtsang merged commit 865cfae into main Jan 13, 2025
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch January 13, 2025 21:59

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 7cc6799~~ Merge OpenAI Triton commit 0702320 Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `0702320` #3149

Merge OpenAI Triton commit `0702320` #3149

whitneywhtsang commented Jan 13, 2025 •

edited

Loading

Merge OpenAI Triton commit 0702320 #3149

Merge OpenAI Triton commit 0702320 #3149

Conversation

whitneywhtsang commented Jan 13, 2025 • edited Loading

Merge OpenAI Triton commit `0702320` #3149

Merge OpenAI Triton commit `0702320` #3149

whitneywhtsang commented Jan 13, 2025 •

edited

Loading