-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RADICAL executor MPI test leaves threads running at shutdown #3708
Comments
Hi @benclifford - that's really good timing you are showing here, as we work on those code paths anyway right now :-D We track the work related to this ticket now in radical-cybertools/radical.pilot#3272. One thing: we are a bit optimistic about thread termination, in the sense that we set termination flags but don't join the threads. So there is usual a delay of up to a second until the threads are actually gone (the slowest one needs 0.5 seconds max to pick up that signal). Is that something you would worry about, or is it ok for you to add a delay to the test to capture that behavior? |
@andre-merzky its preferable for my work that everything is gone by the time the executor says it is shut down, rather than adding in delays in the test. |
Yeah, I thought so. All right, this might need a bit more time then, but we'll get it done. |
In addition to the work happening [here](radical-cybertools/radical.pilot#3269) to address this issue #3708. This PR ensures the cleanup of threads generated by RPEX. - Bug fix (partially #3708) --------- Co-authored-by: Ben Clifford <benc@hawaga.org.uk>
#3718 removes one of the 20 leftover threads - the bulk collector one - leaving 18 leftovers (+ 1 expected main thread) |
@benclifford : the remaining threads are now cleanly terminated by radical-cybertools/radical.pilot#3269 - that goes into the next RP release, before the break. |
This brings in fixes for a few issues that are fixed on the radical side of things: #3722 - a race condition on task completion #3708 - cleaner shutdown handling as part of #3397 #3646 - Python 3.13 support # Changed Behaviour whatever has changed in radical-pilot # Fixes #3722 ## Type of change - Bug fix
Describe the bug
In PR #3397 I've been pushing slowly but aggressively on making sure all parts of Parsl close down OS-ish resources they have allocated, rather than relying on some later OS cleanup. Right now primarily threads, file descriptors (but also ideally processes). Work from that has slowly trickled into master over time.
Right now I'm getting these threads left behind at the end of the RADICAL-Pilot MPI test. The assert failure is the aggressive thread checking I have added, the threads are I think from inside the radical.pilot codebase.
Is there a way to shut these down at executor shutdown?
cc @AymenFJA @andre-merzky
To Reproduce
Checkout #PR 3397, run the above command
Expected behavior
Assert passes, because threads are cleaned up.
Environment
commit 254d8ba of PR #3397
The text was updated successfully, but these errors were encountered: