-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data corruption in callback to SML from C with c
and llvm
codegen
#600
Comments
Interesting. My initial question would be whether or not the data corruption appears on amd64 platforms using the C codegen ( One place that you could get data corruption with callbacks is if there is an SML object pointer retained by some C code when an SML GC occurs (which would invalidate that object pointer). Currently, the runtime system does not have a mechanism for registering roots. The |
Ah, yes - it does! I hadn't considered the different backends until now. On macOS arm64, the issue occurs with
Certainly - I no longer need to use my wife's Mac!
I considered this and will keep looking out for this but I cannot see where this could be happening at the moment. C functions should be passed objects on the C heap only, created by C code. Also, if this was occurring, would it not affect native codegen also? Furthermore, the address that appears to be corrupted is Below is the equivalent debugging transcript to the LLDB one above but in GDB on Linux x86_64. In deriving the pointer argument of
I have attached the generated C files in example-4-mlton.tar.gz. I have also attached the minimal Giraffe library dependencies (Linux x64_64) in giraffe-lib-mlton.tar.gz. When these files are extracted in the same place, I can build the application with the command:
replacing
To produce the same error in The original SML program is attached in example-4.sml.txt. I will try to create a smaller example.
Ok. |
c
and llvm
codegen
I was able to reproduce the segmentation fault on amd64-linux with example-4-mlton.tar.gz and giraffe-lib-mlton.tar.gz, although I could not build from source using giraffe-1.0.0-alpha.12.tar.gz. I changed the compilation command from
That corresponds to https://github.com/MLton/mlton/blob/on-20241230-release/runtime/gc/stack.c#L49-L56, which suggests that the |
Sorry - the previously attached Alternatively, the previously attached
I also see the assertion failure:
I presume these assertions are enabled by For the 1.0.0-alpha.12 version of this example, I sometimes see various other assertion failures with MLton 20241230 but these could be due to missing 'reentrant' keywords now added in Git. Sometimes there is no assertion failure, just a seg. fault. With 20180207, I see the following when the application starts (no window appears):
This occurs with 1.0.0-alpha.12 and the latest Git version. With MLton 20130715, I have found that there is no issue using the C codegen. In 20130715, all imported functions are reentrant so this makes me wonder whether I have missed 'reentrant' off an _import somewhere. On the other hand, the native codegen works fine, so perhaps 'reentrant' is not the issue. |
This is suggesting some kind of heap corruption, where a supposed object point isn't actually pointing at a valid object (because there is no valid object header in the expected place). This is consistent with the assertion error that I was seeing. Adding in some prints of the stack just before making the assertion, I see:
Tha's clearly a "bogus" stack --- no way that we have a 140TB stack with only 107 bytes used!! So, we're actually trying to read an object, but getting a somewhat invalid header --- one that has a 1 in the low bit (so passes the gc/object.c:59: splitHeader) and has 0 for the type index bits (so gets interpreted as the object header for a STACK), but clearly this is not a stack. Turning on more debugging output (requires recompiling the runtime system with
What's interesting here is that checking the old generation succeeded without issue, but this is the first object in the nursery that is tripping things.
Yes, there are
That could still arise from the same corruption --- a bogus object pointer that gets interpretted as a STACK, where the bogus data has an aligned
Any As to why the native codegen works, one possibility is the fact that the native codegen "knows" that the StackTop and Frontier locals "belong" to the GC state. So, when it needs to free up a register, it will write those back to the GC state. But, the C and LLVM codegen's don't "know" that the GC state is a "safe" place to write those locals; instead, they will be spilled to the C stack. So, with the native codgen, those values might happen to be in the right place during a GC. |
I am testing on macOS arm64 for the first time. I have an example that works fine on Linux x64_64 plaforms and on (the very old) OS X 10.10 (Yosemite) x86_64 but crashes on macOS 12.5 (Monterey) arm64.
I am using MLton provided by MacPorts. The version is called '20240519' which corresponds to commit 475cf2b and I still see the issue with a local build of the latest version, 20241230.
The example is a GTK application (using Giraffe Library) so the SML application's main function calls
g_application_run
(which is imported as a reentrant function) and most SML code is run in callbacks from C. Usinglldb
, I see crashes in various places but I will describe a scenario that is repeatable.The SML application has a top-level declaration
where
Cairo.Surface.t
is an abstract type that is internally a finalizable pointer,MLton.Pointer.t Finalizable.t
.surface
is set to some valueSOME ...
in an SML callback and read in subsequent SML callbacks. In some subsequent SML callbacks, I can see thatsurface
has the value that was set but at some point an SML callback occurs and the value ofsurface
is corrupted, causing the program to crash. The SML code of this callback containsWhen running under
lldb
the crash appears to occur when looking up the value for the argument ofcairo_create
(imported asCairo.Context.create
) due to an invalid address:The C function
cairo_create
is declared in cairo.h asThe pointer argument of
cairo_create
goes inx0
in arm64 and should be 0x101F4A1B0 but the pointer to where this value is stored has become corrupted and is 0xB. Here are the previous few instructions, in case that is useful:The corresponding C code generated by MLton is:
(This must be as it is the only reference to
cairo_create
in the generated code.)I am fairly sure that no finalization or garbage collection is occuring in the above scenario because the finalizers log to stdout and there is no such logging output.
Are there any MLton flags that I could use to get further information about what could be going wrong?
Also, in one version of this application, I had the error:
Is this something that an SML application could cause?
The text was updated successfully, but these errors were encountered: