Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in Objective-C thread on macOS #361

Open
teor2345 opened this issue Dec 18, 2024 · 12 comments
Open

Segmentation fault in Objective-C thread on macOS #361

teor2345 opened this issue Dec 18, 2024 · 12 comments
Labels
bug Something isn't working frontend Frontend/user interface macOS macOS-specific

Comments

@teor2345
Copy link
Member

This is likely a bug in the graphics framework or OS, but I thought it was worth reporting anyway in case others are seeing it:

Process:               space-acres [90486]
Path:                  /Applications/SpaceAcres.app/Contents/MacOS/space-acres
Identifier:            space-acres
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        space-acres [90482]
Responsible:           space-acres [90482]
User ID:               502

Date/Time:             2024-12-18 14:00:34.4650 +1000
OS Version:            macOS 13.7.1 (22H221)
...
Crashed Thread:        11  pool-space-acre

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x00000f61d4583740
Exception Codes:       0x0000000000000001, 0x00000f61d4583740

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [90486]

VM Region Info: 0xf61d4583740 is not in any region.  Bytes after previous region: 11498874746689  Bytes before following region: 88640267471040
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      IOAccelerator             4ec82000000-4ec8a000000  [128.0M] rw-/rwx SM=PRV  
--->  GAP OF 0x5b1376000000 BYTES
      MALLOC_NANO              600000000000-600008000000 [128.0M] rw-/rwx SM=PRV  
Thread 11 Crashed:: pool-space-acre
0   libobjc.A.dylib               	       0x18154814c objc_release + 16
1   libobjc.A.dylib               	       0x18154fbd4 AutoreleasePoolPage::releaseUntil(objc_object**) + 196
2   libobjc.A.dylib               	       0x18154c79c objc_autoreleasePoolPop + 256
3   libobjc.A.dylib               	       0x181574480 objc_tls_direct_base<AutoreleasePoolPage*, (tls_key)3, AutoreleasePoolPage::HotPageDealloc>::dtor_(void*) + 168
4   libsystem_pthread.dylib       	       0x1818e1970 _pthread_tsd_cleanup + 620
5   libsystem_pthread.dylib       	       0x1818e469c _pthread_exit + 84
6   libsystem_pthread.dylib       	       0x1818e3fb4 _pthread_start + 160
7   libsystem_pthread.dylib       	       0x1818deda0 thread_start + 8

Previous versions have run for weeks without crashing.

full report:
space-acres-crash-report.txt

@teor2345 teor2345 added bug Something isn't working macOS macOS-specific frontend Frontend/user interface labels Dec 18, 2024
@teor2345
Copy link
Member Author

There doesn't seem to be anything significant in the logs, just some block import errors: space-acres 2.log.zip

@nazar-pc
Copy link
Member

Thanks a lot for backtraces and logs, very helpful!

There was tray-icon update in the latest release, but I don't think it was the reason, the change is pretty small: tauri-apps/tray-icon@tray-icon-v0.19.1...tray-icon-v0.19.2

Also a bunch of GTK-related libraries were updated, but who knows how to possibly debug that, especially without reliable reproduction.

Can you check what thread called pool-space-acre might be related to?

I'm not immediately sure what it might be, but it can also be something flaky and with just 10-15 users it is totally possible that this was always happening, including CLI, just no one faced or reported it before. It is also possible that there was a hardware instability under load (I see it was verifying PoT at that time), though I don't know how likely that is on Macs.

This was supposed to be handled as unexpected exit and restart the app:

space-acres/src/main.rs

Lines 460 to 463 in 983b7d3

if last_start.elapsed() >= MIN_RUNTIME_DURATION_FOR_AUTORESTART {
self.after_crash = true;
continue;
}

However, on macOS it exits with exit code instead of signal, so I opened #362 to handle that case as well, now you'll see Space Acres restart after such one-off errors with a warning that application crashed and was restarted.

@teor2345
Copy link
Member Author

This was supposed to be handled as unexpected exit and restart the app:

space-acres/src/main.rs

Lines 460 to 463 in 983b7d3

if last_start.elapsed() >= MIN_RUNTIME_DURATION_FOR_AUTORESTART {
self.after_crash = true;
continue;
}

However, on macOS it exits with exit code instead of signal, so I opened #362 to handle that case as well, now you'll see Space Acres restart after such one-off errors with a warning that application crashed and was restarted.

It restarted with that message for me. (Although I guess it’s possible there were multiple restarts.)

@nazar-pc
Copy link
Member

Hm, I'm a little confused then because it seemed that the log ended there, I guess you cleaned it up a little?

I think handling of unexpected exit codes as restart reason wouldn't hurt either way, so I'll keep it.

@teor2345
Copy link
Member Author

Hm, I'm a little confused then because it seemed that the log ended there, I guess you cleaned it up a little?

I think handling of unexpected exit codes as restart reason wouldn't hurt either way, so I'll keep it.

I just compressed the latest logs and sent them to you. I think it might have crashed earlier in the logs, and restarted.

@nazar-pc
Copy link
Member

I see, makes sense now, I have not seen that due to beginning being truncated.

Still curious about that pool-space-acre thread.

@teor2345
Copy link
Member Author

I see, makes sense now, I have not seen that due to beginning being truncated.

Still curious about that pool-space-acre thread.

The stack trace shows a double free on an Objective C object:
https://developer.apple.com/documentation/xcode/investigating-crashes-for-zombie-objects

It is surprisingly difficult to find information on macOS threads named pool-binary-name online. It’s possible it is created by one of the GUI libraries we’re using, or by the system GUI framework.

@nazar-pc
Copy link
Member

I meant you can try to look at the backtrace during normal operation and try to guess what it might be based on the stack frames. If we know where it comes from we'll be able to report it upstream.

@teor2345
Copy link
Member Author

I can't get a backtrace with a pool-space-acre thread in it. Which means it is likely short-lived, or only spawned when something unusual happens.

But this thread might be related, which would mean the bug is in glib, or code called by glib:

Thread 1:: pool-spawner
0   libsystem_kernel.dylib        	       0x1818a76f0 __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x1818e4574 _pthread_cond_wait + 1232
2   libglib-2.0.0.dylib           	       0x108817d84 g_cond_wait + 44
3   libglib-2.0.0.dylib           	       0x1087c1214 g_async_queue_pop_intern_unlocked + 116
4   libglib-2.0.0.dylib           	       0x108819004 g_thread_pool_spawn_thread + 124
5   libglib-2.0.0.dylib           	       0x1088181ac g_thread_proxy + 68
6   libsystem_pthread.dylib       	       0x1818e3fa8 _pthread_start + 148
7   libsystem_pthread.dylib       	       0x1818deda0 thread_start + 8

https://github.com/GNOME/glib/blob/bfeca8c13aab28f77a62e5502459375e2c443c07/glib/gthreadpool.c#L632

(I also can't get a trace with pool-spawner in it.)

I'm not sure if it is related, because the latest version of glib thread pools spawns numbered threads which don't include the binary name:
https://github.com/GNOME/glib/blob/bfeca8c13aab28f77a62e5502459375e2c443c07/glib/gthreadpool.c#L290

But the name is exactly 16 C string bytes, just like in this code.

@nazar-pc
Copy link
Member

That does look exactly like what we have in there. Since it is in C, the root cause of it might be basically anywhere.

I found https://gitlab.gnome.org/GNOME/gtk/-/issues/5717 and https://gitlab.gnome.org/GNOME/gtk/-/issues/7132 that mention spawner with a similar backtrace. Did application crash while you were in front of the computer or not? I think it is worth reporting this upstream, but ideally we'd have some more details to share.

@teor2345
Copy link
Member Author

I don’t think I was in front of the computer, it was quite early that morning. So the app was in the background, and the user account was logged in but not on the screen (I had used fast user switching to go to another account).

@teor2345
Copy link
Member Author

This kind of fast user switching situation is unusual, and it isn’t often covered in testing. Most users just have the one account.

It has caused other apps to hang before on my machine. Those apps use SDL though, so it’s unlikely to be the same bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working frontend Frontend/user interface macOS macOS-specific
Projects
None yet
Development

No branches or pull requests

2 participants