-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge OpenAI Triton commit 0702320
#3149
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…end to LLVM codegen. Ignore NaN when set. (#5582)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
# Overview Atomics in triton have two optional attributes: 1) `sem` -- describing the memory semantics of the operation 2) `scope` -- describing which threads will see the effect of a memory operation (e.g., GPU, CTA) Presently, the `scope` is ignored by the AMD backend and defaults to `agent`-scope in the emitted LLVM (which roughly corresponds to `gpu` memscope in triton). This is correct (in most cases? maybe not all?), as this is a "stricter" scope than CTA (and I'm guessing it is rare that system scope is needed for AMD kernels, so no bugs have shown up). That being said, emitting atomics at CTA scope can be more efficient since there can be fewer cache invalidations/barriers. I think that this is fixable by just passing through the attribute to the generated `llvm.atomicrmw` op. There are some additional optimizations potentially possible (e.g., !amdgpu.no.remote.memory, since Triton doesn't support this today), but it isn't clear to me if those would have any real impact on end-to-end performance and those optimizations would be specific to the `sys`-scope that doesn't appear to be frequently used. # Testing I added a lit test to ensure that the generated LLVM instructions have the correct sem/scope attributes for atomicrmw, but I also ran the following 386 unit tests locally on an MI300x: ```bash pytest test/unit/language/test_core.py -k test_atomic_ ``` I then locally ran some kernels with the scope set to CTA/SYSTEM to make sure that they worked.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
there is currently a weird bug causing capability overrides to persist when users pass `arch=None`. Rather than making `CUDABackend.sw_capability` stateful, we now retrieve capability lazily from compilation options also fix an amd bug encountered in the wild
…t_cmd.py` (#5588) Relates to triton-lang/triton#5537 --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
This reverts commit 70359fa which was causing some of our internal tests to fail. Co-authored-by: Adam P. Goucher <goucher@statslab.cam.ac.uk>
…elates to c++20 (#5585) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
We found regressions for moe kernel with fp8 inputs. This PR effectively reverts part of #4767 and disables the swap-operand feature for fp8 inputs matmul kernels for now while we investigate the regression.
whitneywhtsang
changed the title
Merge OpenAI Triton commit
Merge OpenAI Triton commit Jan 13, 2025
3bac3be
7cc6799
pbchekin
approved these changes
Jan 13, 2025
…degen Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
…ional argument but 2 were given` Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
whitneywhtsang
force-pushed
the
whitneywhtsang/merge
branch
from
January 13, 2025 20:54
e7e7ed5
to
865cfae
Compare
whitneywhtsang
changed the title
Merge OpenAI Triton commit
Merge OpenAI Triton commit Jan 13, 2025
7cc6799
0702320
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 3bac3be to 0702320 (Jan 13).
Pass rate: 97.63%
Please do not squash and merge this PR.