-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] [APX] Enable additional General Purpose Registers. #108799
base: main
Are you sure you want to change the base?
[JIT] [APX] Enable additional General Purpose Registers. #108799
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
CC @jakobbotsch and @tannergooding for code review. |
@DeepakRajendrakumaran What is the status of this PR? It's marked as ready but the description says it's built on top of #108796 that is not marked as ready. |
Thanks for pointing that out. It has some dependencies on other PRs - specifically the Rex2 encoding PR. Considering that, do you have a suggestion on how to mark this for now? |
eb47ede
to
ec3388f
Compare
a3e8331
to
6bbccb4
Compare
Now that CPUID changes have merged, ran superpmi TP and I have a problem Ran the scripts shared by Kunal a while back to debug why this is happening The following is for libraries
|
Trying to further make sure the Rex2 changes are not causing TP regression. We can safely conclude the TP regression is from eGPR enablement The following is with/without Rex2 changes(without reg alloc changes)Overall (+0.08% to +0.23%)
MinOpts (+0.28% to +0.48%)
FullOpts (+0.08% to +0.14%)
With Rex2 as base and eGPR changes as diffOverall (+3.60% to +4.65%)
MinOpts (+6.09% to +8.79%)
FullOpts (+3.47% to +4.29%)
|
# Conflicts: # src/coreclr/jit/lsraxarch.cpp
6bbccb4
to
a2575db
Compare
5466739
to
a9d71a2
Compare
b93e387
to
9905646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the first pass, need to evaluate where TP regression is coming from. However, I still see some asmdiffs...can you please fix it?
@@ -12534,6 +12555,9 @@ void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation curr | |||
LinearScan::RegisterSelection::RegisterSelection(LinearScan* linearScan) | |||
{ | |||
this->linearScan = linearScan; | |||
#if defined(TARGET_AMD64) | |||
rbmAllInt = linearScan->compiler->get_RBM_ALLINT(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why we need it here instead of LinearScan
ctor (which you are already doing)?
@@ -742,6 +743,7 @@ class emitter | |||
// The instrDescCGCA struct's member keeping the GC-ness of the first return register is _idcSecondRetRegGCType. | |||
GCtype _idGCref : 2; // GCref operand? (value is a "GCtype") | |||
|
|||
#if !defined(TARGET_AMD64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for this change?
@@ -62,7 +62,12 @@ bool regMaskTP::IsRegNumInMask(regNumber reg, var_types type) const | |||
// | |||
void regMaskTP::AddGprRegs(SingleTypeRegSet gprRegs) | |||
{ | |||
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. | |||
#if defined(TARGET_AMD64) | |||
assert((gprRegs == RBM_NONE) || ((gprRegs & RBM_ALLINT_STATIC_ALL) != RBM_NONE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for non-APX machines, gpr will still be 0-15
and with this assert, we will allow float register to get set, right?
|
||
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. Deprecated???? | ||
#if defined(TARGET_AMD64) | ||
sprintf_s(regmask, cchRegMask, REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need RBM_ALLINT_STATIC_ALL
here? it should just use RBM_ALLINT
and it should return the right mask depending on if high int registers are available or not.
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. These are used by GC | ||
// exclusively | ||
#if defined(TARGET_AMD64) | ||
printf(REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here...can just use RBM_ALLINT
?
@@ -3136,4 +3347,51 @@ inline SingleTypeRegSet LinearScan::BuildEvexIncompatibleMask(GenTree* tree) | |||
#endif | |||
} | |||
|
|||
inline bool LinearScan::DoesThisUseGPR(GenTree* op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add method docs for this and below method?
return false; | ||
} | ||
|
||
inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree, bool forceLowGpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the goal of this method?
SingleTypeRegSet op1Candidates = candidates; | ||
SingleTypeRegSet op2Candidates = candidates; | ||
int srcCount = 0; | ||
// SingleTypeRegSet op1Candidates = candidates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are lot of such comments in this file. can you please delete them?
|
||
// We are dealing exclusively with HWIntrinsics here | ||
return (op->AsHWIntrinsic()->OperIsBroadcastScalar() || | ||
(op->AsHWIntrinsic()->OperIsMemoryLoad() && DoesThisUseGPR(op->AsHWIntrinsic()->Op(1)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we only care if Op(1)
uses GPR, not any other operand?
else | ||
{ | ||
// ToDo-APX : imul currently doesn't have rex2 support. So, cannot use R16-R31. | ||
dstCandidates = BuildApxIncompatibleGPRMask(tree, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls to BuildApxIncompatibleGPRMask
for many nodes seems expensive. Wondering if we can do something like:
- at the top just set
SingleTypeRegSet incompatibleGprMask = compiler->canUseApxEncoding() ? lowGPRRegs() : RBM_NONE;
- Places where you are passing
forceLowGpr= true
can instead just useincompatibleGprMask
. - Places where you are not forcing lowGPr, can just use
DoesThisUseGPR(tree) ? incompatibleGprMask : RBM_NONE
Also, might worth caching the value of lowGPRRegs()
because currently it is evaluated every time to be (availableIntRegs & RBM_LOWINT.GetIntRegSet())
and I see lowGprRegs()
is used at lot of places.
This reverts commit 0ecbbd3.
It seems from your latest change, there are still asmdiffs coming up. I think there are places in vs. what CI is showing what does it show for you locally? |
This PR is built on top of #108796
What this PR does
Link to related commit
Link to related commit
Link to related commit
Link to related commit
Link to related commit
Testing
src/tests/JIT
) using Ruihan's scriptsrc/tests/JIT
using altjit featureAnalysis of superpmi results
Summary from JitAnalyze
Detail diffs
Why arm tests from
\HardwareIntrinsics\Arm\AdvSimd\AdvSimd_ro\AdvSimd_ro.dasm
show up here and generates x86 code. My theory is that since we are compiling for x86 using aljit, it takes the software fallback path for arm instrinsics and generates x86 code. See how IsSupported() is generating false belowI'm ignoring these for now
Some interesting code samples highlighting changes introduced due to enabling additional GPRs
Case 1
A very simple case with r16 being used
see
V03 loc0
In this case, a spill is reduced and we see instruction reduction. The cost of using this eGPR is slightly higher encoding size with Rex2. We do not add this to the calculus while doing reg allocation
Case 2
An example of lack of eEVEX/instructions not having eGPR support causing regression
In this example, we use
imul
. We currently have not enabled eGPR usage forimul
. This means if the input toimul
is in an eGPR, we insert amov
to move it to a lower GPR. This further adds to register usage