Skip to content
This repository has been archived by the owner on Mar 24, 2022. It is now read-only.

Stack unwinding improvements #254

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

Conversation

acfoltzer
Copy link
Contributor

@acfoltzer acfoltzer commented Jul 18, 2019

The main problem: resource leaks

If a guest terminates due to a signal or an explicit termination call rather than returning normally via lucet_context_backstop, the stack still needs to be unwound, otherwise resources such as RAII memory or locks can leak. Specifically, the "exception handling personality" functions associated with each stack frame must be called, which can only be done (I think) by the system unwinding runtime, such as libgcc (see section 6.2 of the AMD64 ABI).

This is currently only an issue for hostcall code in RAII style languages like Rust or C++, but will also be an issue for guest code once we support wasm exceptions. Handling resource leaks in hostcalls implemented in C is beyond the scope of this PR, as that will likely require in-band error signaling rather than relying on out-of-band exceptions.

A subproblem: debugging info

Runtime support for stack unwinding overlaps quite a lot with support for backtraces and debuggers, so a solution to the main unwinding problem should entail better backtraces and debugger navigation.

Undefined behavior

The approaches described below involve unwinding between guest code emitted by Cranelift, and hostcall code which is primarily (for now) written in Rust. Sayeth the Nomicon:

Rust's unwinding strategy is not specified to be fundamentally compatible with any other language's unwinding. As such, unwinding into Rust from another language, or unwinding into another language from Rust is Undefined Behavior. You must absolutely catch any panics at the FFI boundary! What you do at that point is up to you, but something must be done. If you fail to do this, at best, your application will crash and burn. At worst, your application won't crash and burn, and will proceed with completely clobbered state.

We need to study why this is undefined behavior, because I haven't yet found a fundamental reason for that to be the case, at least on x86-64 Linux.

Leading approach: unwinding smoothly from guest to host context

(As of Sept 6, 2019, at least)

Between the current contents of this branch and bytecodealliance/cranelift#902, we now have the capability to unwind from a guest stack all the way back to the host stack that called the initial entrypoint.

The remaining bit of implementation is to induce unwinding when resetting or dropping an instance, so that we can clean up the stack left behind by a signal handler or a suspended instance that never resumes. The main complication for this is if the guest stack has overflowed (triggering a signal) or has almost overflowed; we'll need to be able to unprotect the guard page(s) in this case so that the unwind invocation doesn't fault.

Other approaches

We're still investigating which approach is actually the right one to use, but it's probably not the current approach.

Pervasive catching in hostcalls (current approach, possibly doomed)

Implemented in #157, this largely punts on the issue of unwinding and makes each hostcall responsible for catching exceptions and switching contexts to the host before the exception bubbles up through guest segments of the stack. The native unwinding mechanism of the hostcall implementation language is responsible for calling the personality functions.

The nested hostcall problem

This works reasonably well for the case where there's at most one hostcall in the stack, but as that PR notes:

...this only works for the case where we are in at most one hostcall context. If we have a stack with guest -> host -> guest -> host, the first host segment will not be properly unwound.

Unfortunately, this pattern is more common than it might seem at first, due to how we typically interact with guest memory managers in order to reserve space for host data to be copied into linear memory.

To use Terrarium as an example, when a guest asks for the headers of a request, the req_get_headers hostcall invokes the guest's malloc to reserve space in linear memory for copying in the header strings. The guest malloc in turn could invoke the host's grow_memory implementation, leaving us with a stack that has two guest segments and two hostcall segments: run (guest) -> req_get_headers (host) -> malloc (guest) -> grow_memory (host). If grow_memory were to try terminating the instance, any resources acquired by req_get_headers would leak.

We could probably get around this by keeping track of how many hostcalls we're in, and reraising the exception until it unwinds all of the hostcall segments, however this does not help at all with...

The signal handling problem

This approach only addresses panics that arise from hostcall code (specifically the panics we use to implement guest termination). It does not address at all what happens to the stack after running a signal handler, which can happen completely in guest code.

Suppose we have a guest function div0 that divides an integer by zero, and a stack like run (guest) -> hostcall_foo (host) -> div0 (guest). The SIGFPE raised by div0 will be handled by the Lucet signal handler, which will then use Context::set_from_signal() to clear the signal mask and swap back to the host context. Since no panic/exception is ever raised, there's not even an opportunity for the hostcall_foo segment of the stack to be unwound.

We definitely don't want to panic directly from a signal handler. Setting up the return from the signal handler to a landing pad that promptly panics might work, but seems fragile.

Unwinding smoothly from guest to host context (nice but not sufficient)

Edit: this has been promoted to the leading approach.

Why don't we put the catch_unwind in Instance::run(), rather than at the hostcall entrypoints? Because right now, the unwinding runtime bottoms out at lucet_context_backstop rather than following the call chain back to the host context. It looks like it should be possible using CFI directives to convince the runtime to follow the call chain back to the host context's saved stack pointer, but I haven't had any luck so far.

Getting this working would make it easier to deal with the panics we use to implement instance termination, and would improve debugger backtraces. It wouldn't address the signal handler case.

Manual unwinding from the host context (probably doomed)

libunwind lets us capture a context, and later step through its stack frames. We can even get the address of the personality function from unw_get_proc_info()! The hope was that with this information, we could manually unwind guest stacks after switching back to host code by stepping through the frames and invoking each personality function directly.

However, personality functions for x86-64 Linux are implemented in terms of functions provided by the unwinding runtime and described in section 6.2.5 of the AMD64 ABI. These functions all take an opaque pointer to an _Unwind_Context struct that's provided to the personality function by the unwinding runtime when it is unwinding in response to an exception. This appears to be the only way to get a valid _Unwind_Context pointer, and so we cannot get our hands on valid arguments for the personality function; they can only be provided by the system unwinding runtime.

Resuming a terminated guest at a function that panics

So, if we can't invoke the personality functions ourselves, we need to figure out a way to invoke the unwinding runtime while in the context of the guest we wish to unwind. We might be able to retain the guest Context after termination, and then swap into it with bootstrapping code that would land us in a function that panics or calls _Unwind_RaiseException() directly.

This would probably require adding some new interfaces to the Context API. Specifically, we do not currently have an interface for swapping to an existing guest context, but at a new function entrypoint. Instead, the entrypoint is set once during Context::init(), and the empty stack is carefully arranged to return into the entrypoint from lucet_context_bootstrap.

We'll need to do something like pushing an additional frame that would return to the panicking function, or pushing a synthetic frame and setting the instruction pointer to the panicking function directly. Since the panicking function won't need to take any arguments, we can probably avoid a lot of the complications that the init() interface has. This interface could also be useful for implementing a yield operator in the future.

We also would need to figure out how the unwinding would bottom out. Preferably, we could figure out "Unwinding smoothly from guest to host context", but an alternative would be to add a custom exception handler personality function to lucet_context_backstop. This function would use the _Unwind_* APIs to specify a landing pad back in the host.

Use panic/catch_unwind in hostcalls, and error propagation in guest code

Given the specter of undefined behavior mentioned above, we should also be considering solutions that do not rely on exceptions ever crossing the boundary between host and guest code. The current solution does exactly this, albeit with the limitations of a single host segment, and not addressing signal handlers.

Strawman: make 👏 guest 👏 funcs 👏 optional

Perhaps we could add the moral equivalent of Option<T> to the return type of each guest function. Or, for the Haskellers in the crowd, we could lift all guest code into the Maybe monad.

If a panic arises in a hostcall, we would have the existing catch_unwind, but rather than swapping back to the host context upon termination, it would return the moral equivalent of None to the caller.

Then in guest code, any time a call results in a None, we promptly return another None. The None would propagate until we reach host context, which would be doing the moral equivalent of .unwrap() on all calls to guest funcs.

This would propagate the panic between segments of host stack without actually relying on the unwinding runtime to cross language boundaries.

I'm not sure what the signal handling case would look like here, but perhaps we could arrange to return from the handler into the previous stack frame with a None?

@acfoltzer acfoltzer force-pushed the acf/unwinding-improvements branch from bce2247 to 9389a4c Compare July 22, 2019 22:18
@philipc
Copy link

philipc commented Aug 7, 2019

I noticed the commit adding eh_frame support. FYI, gimli supports writing eh_frame. Might be a bit of a larger dependency if you only need something minimal, just something to consider, but I think cranelift will eventually add this as a dependency for debuginfo too.

@awortman-fastly
Copy link
Contributor

oh that's super cool! last I'd looked gimli only knew how to read dwarf, not write :D the bulk of that commit is bumping cranelift to bytecodealliance/cranelift@3f74458 which actually does nontrivial eh_frame stuff. It looks like gimli does all the things there, but better.

I only saw https://github.com/CraneStation/cranelift/issues/426 when looking for eh_frame-related conversation in cranelift - is there any reason I shouldn't rewrite my changes in terms of gimli and put up a cranelift PR soonish?

@philipc
Copy link

philipc commented Aug 7, 2019

There's some frame layout work in bytecodealliance/cranelift#679, and bytecodealliance/wasmtime#53. I'm not sure how eh_frame support in cranelift-faerie will tie in with that.

awortman-fastly and others added 21 commits June 19, 2020 14:34
This is for debugging purposes so that we have a test case that unwinds to `lucet_context_backstop`.

This causes several of the existing test cases to fail, but if you want one that is solely focused
on this behavior, run:

```
cargo test -p lucet-runtime --test host unwind
```
... when guest code has correct .eh_frame information, anyway.

this currently results in the personality function being called if you
run the lucet-runtime hostcall tests under gdb with the following
commands
```
set args unwind --test-threads=1 --nocapture
b rust_panic
r
```
then, when the breakpoint (`rust_panic`) is hit, replace the first
return into guest code with a return to `lucet_context_backstop`:
```
`#      v-- this is the address of lucet_context_backstop`
printf '\x61\x7f\xa0\x56\x55\x55\x00\x00' \
  | dd `# because gdb doesnt like "set *(long long*)0xaddr = value ` \
    `#         v--- just finding the pid of the test debugee ` \
    of=/proc/$(ps -ef | grep lucet | grep unwind | cut -d' ' -f2)/mem \
    `#           v-- this is where the first guest return address is `\
    `#           v   ..for me anyway. replaces 0x00007ffff6878685` \
    `#           v   for "guest_func___original_main". `\
    bs=1 seek=$((0x7ffff6872fa8)) \
    `# dd would try to truncate guest memory by default. do not do this. `\
    conv=notrunc
```
(if you can figure out how to do this standalone in gdb, i'm all ears)

at this point, continuing in gdb to allow the panic mechanism to run
should ... call into the provided personality function!
Also includes a not-quite-working personality function for the backstop that fails because the
system unwinder doesn't want us to be able to set rdi, even though that's explicitly one of the
registers listed as being supported for landing pad passing purposes.
This means backtraces and panics now work across the host/guest stack boundary
This test works with the nightly-only `#[unwind(allowed)]` attribute, which we'll hopefully be able
to help move along.
The current state of the repo is such that only the `fault_unwind` test is currently relevant:

```
cargo test -p lucet-runtime --test host fault_unwind -- --nocapture
```

Currently stuck figuring out how to set up the stack properly in order to return into the function
that panics. If I pad the stack with a zero word in order to keep it 16-byte aligned, the unwinding
runtime interprets that zero as a return address and fails. If I don't add the padding, later
instructions fault because of unaligned arguments.

We probably need to add a shim that uses `.cfi` directives in order to make the unwinding runtime
skip over the padding.
acfoltzer and others added 9 commits June 19, 2020 14:34
the struct at rbp changed between first authorship and today, to fix a bug where lucet instances were accidentally tied to their starting thread
when forced unwinding was first envisioned, guests did not run at all
from the point they faulted. this mean that the fault address would be a
simple `guest_ctx.get_ip()` away. in the mean time, the Lucet signal
handler learning how to be crossplatform broke this assumption: it now
works by *overwriting* the guest's instruction pointer, swapping to the
guest, and letting a function run. consequently, the guest instruction
pointer is replaced and when a guest unwind is instigated after a guest
faults, the return address before `initiate_unwind` (or `unwind_stub`,
if present) will no longer be correct. libgcc_s will then fail to locate
an FDE to describe the call frame above runtime-added unwind machinery,
fail to unwind, and SIGABRT.

the solution is quite simple: since the rip-accessing code is already
handling a guest fault, we know the original faulting guest `rip` is
preserved in the fault's `details`. insted of `guest_ctx.get_ip()`, get
the address from `details.rip_addr`.
@iximeow iximeow force-pushed the acf/unwinding-improvements branch from 2a79626 to bb27ee7 Compare June 19, 2020 21:34
@iximeow
Copy link
Contributor

iximeow commented Jun 19, 2020

🧹 ✨ I've rebased this branch and fixed a few instances of bitrot between then and now - the new unwinding tests are now run under MmapRegion and UffdRegion like any other tests.

The tests here will not pass as things stand: the only diff for this branch between "broken' and "working" is adding a features = ["unwind"] declaration to our cranelift-object dependency. That feature doesn't exist yet, but 🤞 the PR I'm putting together to add such a feature will go smoothly and it'll exist soonly. With the prototyping I've done, everything seems to be working, except some UffdRegion-based tests, which likely fail due to the very naively implemented *_redzone_stack functions there.

A few more details and updates from the OP:

Leading approach: unwinding smoothly from guest to host context

(As of Sept 6, 2019, at least)

This is almost the approach taken in this branch, though not as directly. We actually do...

Resuming a terminated guest at a function that panics

to initiate an unwind, and allow that to go up the guest stack to the host.

On the topic of mixed guest/host code:

Unfortunately, this pattern is more common than it might seem at first, due to how we typically interact with guest memory managers

We have, so far, successfully designed around this problem. As noted above,

We could probably get around this by keeping track of how many hostcalls we're in, and reraising the exception until it unwinds all of the hostcall segments,

which would be necessary in some form for timeouts to work correctly given guest callbacks, so this continues to be a plausible approach.

The signal handling problem

I don't think this is a problem anymore!

Lucet signal handler, which will then use Context::set_from_signal() to clear the signal mask and swap back to the host context.

We don't quite do this anymore, but the behavior now is equivalent.

Since no panic/exception is ever raised, there's not even an opportunity for the hostcall_foo segment of the stack to be unwound.

This circumstance should result in a State::Faulted of the guest, where the runtime can then figure out it should unwind from the inner guest, through hostcall frames, and up the rest of the stack.

Because right now, the unwinding runtime bottoms out at lucet_context_backstop rather than following the call chain back to the host context. It looks like it should be possible using CFI directives to convince the runtime to follow the call chain back to the host context's saved stack pointer, but I haven't had any luck so far.

@acfoltzer I think you have in fact had success doing this? moving the catch_unwind all the way out to Instance::run doesn't seem unreasonable now.

We'll need to do something like pushing an additional frame that would return to the panicking function, or pushing a synthetic frame and setting the instruction pointer to the panicking function directly.

And this is exactly how force_unwind works. :)

@@ -1 +1 @@
1.43.1
nightly-2020-06-06
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admission: this choice of nightly is totally arbitrary. Nightly is still needed for unwind_attributes, and I wanted to pick a newer one.

@iximeow iximeow marked this pull request as ready for review June 25, 2020 22:42
@acfoltzer acfoltzer changed the title [WIP] Stack unwinding improvements Stack unwinding improvements Jun 25, 2020
@acfoltzer acfoltzer self-assigned this Jun 25, 2020
@acfoltzer
Copy link
Contributor Author

Assigned myself just so it's somewhat harder for me to lose track of this. As the originator of the PR I can't mark myself as a reviewer 😞

@pchickey pchickey closed this Jun 26, 2020
@pchickey
Copy link
Contributor

This PR was closed as a byproduct of deleting the branch named master. If this is still an active PR, re-open as a new PR against main.

@acfoltzer acfoltzer reopened this Jun 26, 2020
@acfoltzer acfoltzer changed the base branch from master to main June 26, 2020 00:48
@iximeow iximeow mentioned this pull request Jul 13, 2020
acfoltzer added a commit that referenced this pull request Jul 29, 2020
This prevents panics that aren't intentionally initiated by the Lucet runtime from attempting to
unwind across Wasm stack frames. When #254 lands, this will no longer be necessary, but until then
this allows us to safely transport the unknown panic payload across the FFI boundary so that
panicking can resume on the other side.
@iximeow
Copy link
Contributor

iximeow commented Aug 25, 2020

🎂! totally forgot to celebrate this one 💔

@iximeow
Copy link
Contributor

iximeow commented Jul 19, 2021

🎂 🎂 🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants