-
Notifications
You must be signed in to change notification settings - Fork 165
Conversation
the only concrete numbers I have about the performance differences are some Sightglass runs from my laptop, but as the ambient temperature is currently ~15F warmer than the last time, the numbers don't line up perfectly. So I'm mostly drawing conclusions from the difference in ratio between Non-pinned heap ("before"):
pinned heap register ("after"):
Edit: updated "before" with a more appropriate "before" - |
2229af3
to
eb5006b
Compare
eb5006b
to
71152a1
Compare
Now that |
Tests are failing to compile because of the wrong number of args to Compiler::new - probably needs to be rebased or merged with master |
turns out I'd only run |
@iximeow - just saw your comment on the cranelift thread. This PR looks great! I think this may allow me to prototype my changes for now. So eagerly waiting for this to land 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. This will create a merge conflict with #418, can you follow up with a patch into that PR which will resolve it?
In collecting performance numbers from now, rather than the ones from September, I found that I neglected to pass the state of the flag through to the compiler. oops. With that fixed, I have new numbers, and a surprising finding: on my desktop, an i7-3960x, the use or lack thereof of a pinned heap register makes for what seems to be no difference. From this machine.. No pinned register (default),
I don't totally trust I was hesitant to accept these numbers since they're so constant, but I checked the generated assembly and it all seems correct: before:
after:
giveaways that this change is actually making it through to generated code are the missing My suspicion is that the changes to make reloads of spilled registers a few months ago happen to achieve a similar effect as keeping the heap base in Either way! Given that this is still useful for @shravanrn's work I'll fix the imminent merge conflict with #418 and not sweat the performance difference too much. |
Thanks a bunch. Looking forward to this landing 👍 I have plans to use this right away :) |
@pchickey @iximeow Hmm... I just tried this out and noticed that the heap base (r15) is still spilt and unspilt. Is there an easy way to remove this as well? While this is probably of minor perf impact, it does play havoc with the spectre mitigations I'm working on. Please let me know! __attribute__((noinline))
void spec_printBranch2(int val)
{
for(int i = 0; i < val; i++){
printf("Val: %d\n", val);
}
} 000000000000f600 <guest_func_spec_printBranch2>:
f600: 55 push %rbp
f601: 48 89 e5 mov %rsp,%rbp
f604: 41 54 push %r12
f606: 41 55 push %r13
f608: 41 56 push %r14
--------------------------------heap base being spilt
f60a: 41 57 push %r15
f60c: 48 83 ec 20 sub $0x20,%rsp
f610: 48 89 bc 24 08 00 00 mov %rdi,0x8(%rsp)
f617: 00
f618: 89 b4 24 1c 00 00 00 mov %esi,0x1c(%rsp)
f61f: 49 89 fe mov %rdi,%r14
f622: 49 8b 46 f0 mov -0x10(%r14),%rax
f626: 8b 00 mov (%rax),%eax
f628: 83 c0 f0 add $0xfffffff0,%eax
f62b: 89 84 24 18 00 00 00 mov %eax,0x18(%rsp)
f632: 49 8b 46 f0 mov -0x10(%r14),%rax
f636: 44 8b b4 24 18 00 00 mov 0x18(%rsp),%r14d
f63d: 00
f63e: 44 89 30 mov %r14d,(%rax)
f641: 41 89 f6 mov %esi,%r14d
f644: 41 83 fe 01 cmp $0x1,%r14d
f648: 7c 7d jl f6c7 <guest_func_spec_printBranch2+0xc7>
f64a: 44 8b b4 24 18 00 00 mov 0x18(%rsp),%r14d
f651: 00
f652: 44 89 f0 mov %r14d,%eax
f655: 48 89 84 24 00 00 00 mov %rax,0x0(%rsp)
f65c: 00
f65d: b8 00 04 00 00 mov $0x400,%eax
f662: 89 84 24 14 00 00 00 mov %eax,0x14(%rsp)
f669: 41 89 f6 mov %esi,%r14d
f66c: 44 89 b4 24 10 00 00 mov %r14d,0x10(%rsp)
f673: 00
f674: 44 8b b4 24 1c 00 00 mov 0x1c(%rsp),%r14d
f67b: 00
f67c: 4c 8b ac 24 00 00 00 mov 0x0(%rsp),%r13
f683: 00
f684: 47 89 34 2f mov %r14d,(%r15,%r13,1)
f688: 4c 8b b4 24 08 00 00 mov 0x8(%rsp),%r14
f68f: 00
f690: 44 8b ac 24 14 00 00 mov 0x14(%rsp),%r13d
f697: 00
f698: 44 8b a4 24 18 00 00 mov 0x18(%rsp),%r12d
f69f: 00
f6a0: 4c 89 f7 mov %r14,%rdi
f6a3: 44 89 ee mov %r13d,%esi
f6a6: 44 89 e2 mov %r12d,%edx
f6a9: e8 22 42 00 00 callq 138d0 <guest_func_printf>
f6ae: 44 8b b4 24 10 00 00 mov 0x10(%rsp),%r14d
f6b5: 00
f6b6: 41 83 c6 ff add $0xffffffff,%r14d
f6ba: 44 89 b4 24 10 00 00 mov %r14d,0x10(%rsp)
f6c1: 00
f6c2: 45 85 f6 test %r14d,%r14d
f6c5: 75 ad jne f674 <guest_func_spec_printBranch2+0x74>
f6c7: 44 8b b4 24 18 00 00 mov 0x18(%rsp),%r14d
f6ce: 00
f6cf: 41 83 c6 10 add $0x10,%r14d
f6d3: 4c 8b ac 24 08 00 00 mov 0x8(%rsp),%r13
f6da: 00
f6db: 49 8b 45 f0 mov -0x10(%r13),%rax
f6df: 44 89 30 mov %r14d,(%rax)
f6e2: 48 83 c4 20 add $0x20,%rsp
--------------------------------heap base being unspilt
f6e6: 41 5f pop %r15
f6e8: 41 5e pop %r14
f6ea: 41 5d pop %r13
f6ec: 41 5c pop %r12
f6ee: 5d pop %rbp
f6ef: c3 retq |
I think in cases of cranelift-calling-cranelift we can avoid spilling, but at calls to non-cranelift code it would depend on the calling convention we're conforming with. sysv (linux/macos) and microsoft calling conventions look to require r15 be preserved, so maybe it'd end up straightforward anyway. Can you expand on how the spill/reload impacts your mitigation work? It'd be good to note as a motivator for changes in Cranelift. |
Its a little difficult to summarize, but we will be submitting a paper shortly with details and I can share that writeup with you. But the short version is that this is necessary to prevent abuse of store to load forwarding to read outside the sandbox memory. If we ever unspill into r15, speculatively, it is possible to trick the processor into believe the heap_base register holds some other value allowing a read from outside the heap bounds. |
I'm not deeply familiar with load forwarding but I assume you mean that at the tail end of a call after reloading A similar instance of this would crop up when making hostcalls though, since we'd be calling into native code with no register allocation constraints, and it could be possible to construct code with a heap ( Either way, I'll happily read the paper once submitted! And the push/pop are definitely unnecessary in this case. |
Hi @pchickey @awortman-fastly @iximeow |
This PR was closed as a byproduct of deleting the branch named |
closing in favor of #512 |
This gets everything in place to use a pinned register for heap accesses via bytecodealliance/cranelift#960
There's a subset of this PR that's just updating to match changed Cranelift APIs - I'll pull that out into a subsequent PRthemaster
I'd branched from was out of date, oops.