Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling certain native functions from custom hosted dotnet causes error #110975

Open
Linx145 opened this issue Dec 28, 2024 · 4 comments
Open

Calling certain native functions from custom hosted dotnet causes error #110975

Linx145 opened this issue Dec 28, 2024 · 4 comments
Labels
area-Interop-coreclr untriaged New issue has not been triaged by the area owner

Comments

@Linx145
Copy link

Linx145 commented Dec 28, 2024

Description

I'm writing a game engine that relies on running the hosted dotnet via hostfxr and the like. In lieu of pinvoke, I enable the scripting runtime to call native engine functions by passing function pointers of the native function to the C# runtime, via regular reverse-pinvoke as stated in the custom dotnet runtime host guide.

In doing so, I've stumbled upon a very strange possible bug where certain native functions get random values as arguments, as compared to what the C# assembly had called the function pointer with. All function pointers on C# side are marked as unmanaged [cdecl], and on the C++ side they are cdecl too. (Though this doesn't seem to matter at all.)

Upon further investigation, it appears that when calling certain function pointers, C# pushes input arguments to the wrong register (I believe it's RCX) while C++ expects it to be in a different register. This may seem to be a calling convention issue, but it only ever happens with a certain, apparently random set of functions (around 10 functions cause this issue amongst the 200+ passed as function pointers to native), which is probably going to be an issue when reproducing this bug. All other 190+ functions call using this same method just fine. Perhaps most baffling to me is the current workaround for the issue, which is just changing the name of the return type.

This issue happens on both native executables compiled with MSVC and clang, so I assume it's an issue with dotnet itself and not the C++ compiler.

Reproduction Steps

C++ end:

//.hpp
struct Actor
{
    u32 ID;
    u32 generation;
}

Actor Actor_NewActor(u32 world)
{
    printf("%u\n", world);
    return Actor();
}

C# end:

//autogenerated native function pointer imports + calls
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe EternityWorks.Actor Actor_NewActor(uint world) {
    var _result = ((delegate* unmanaged[Cdecl]<uint, EternityWorks.Actor>)functionPointers[8])(world);
    return _result;
}
...
public static void Initialize()
{
    Actor_NewActor(1);
}

Expected behavior

Native code Actor_NewActor prints 1

Actual behavior

Native code Actor_NewActor prints a random integer

Regression?

The bug has been here since .net 8.0, persists in .net 9.0

Known Workarounds

Changing the name of the output type of the function seems to fix it, somehow. For example, Actor_New(uint32_t worldID) returns an Actor, which is two uint32s. Changing that to a return a NativeActor, which is the exact same thing (2 uint32s), fixes the bug.

Configuration

Windows 10, .net 8.0/9.0, x64

Other information

I'll try to get a minimal reproducible build in a separate repo, but my project is currently quite massive and from the fickle nature of the fix, there's no guarantee that the exact same configuration of functions, operations etc will cause the same issue if it is in another codebase.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 28, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 28, 2024
@huoyaoyuan huoyaoyuan added area-Interop-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 28, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

@huoyaoyuan
Copy link
Member

P/Invoke are sensitive to the exact definition of structs. In this case, the exact definition of Actor or NativeActor matters, including attributes and field layout.

Additionally, you can use decompilation tools like https://github.com/EgorBo/Disasmo to find the difference of assembly code for the P/Invoke.

@Linx145
Copy link
Author

Linx145 commented Dec 29, 2024

It is not an actual P/Invoke - as what my report says, it's calling a function pointer passed to it by the custom dotnet runtime. The fix is renaming the native C struct, not changing its variable layout, and not changing the C# struct.

@huoyaoyuan
Copy link
Member

Function pointers are performing a P/Invoke.

If renaming at native side fixes the issue, you should check decompilation of native code to find any difference. There is a chance that name confliction causes binding to other type accidentally. Function pointer itself shouldn't be affected by type name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Interop-coreclr untriaged New issue has not been triaged by the area owner
Projects
Status: No status
Development

No branches or pull requests

2 participants