Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPy is segfaulting in CI - how can I help? #632

Open
mattip opened this issue Jul 4, 2022 · 10 comments
Open

PyPy is segfaulting in CI - how can I help? #632

mattip opened this issue Jul 4, 2022 · 10 comments

Comments

@mattip
Copy link

mattip commented Jul 4, 2022

Hi. PyPy dev here, new to the project but curious about the segfault in CI. What would be the best way to get to the root cause? Pair programming? Read some documentation and get a dev environment set up? What would be the easiest way to get a minimal cython reproducer without Java?

@misl6
Copy link
Member

misl6 commented Jul 4, 2022

Hi @mattip, nice to see you here !

I guess you're referring to conda-forge/pyjnius-feedstock#35

As you noticed, some fixes were introduced in #627, and everything seemed great. (All the tests passed on the PR, and the same happened for the following CI runs)

Unfortunately (as for https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=530828&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1) seems that we now have a new segfault (seems that happens in a different test phase from the previous one).

To speed-up the process, I'm available to chat on #dev channel @ Our Discord Chat, and I'm sure that other Core Devs (and contributors) are also happy to help.

If I'm right, there's a chance that I was able to reproduce a segfault on the same test with a specific setup on macOS 12 + Apple Silicon in a Rosetta Terminal (that then disappeared) 🧐 .

@mattip
Copy link
Author

mattip commented Jul 4, 2022

I was refering to the the segfault in this repo's CI here where both PyPy CI runs segfault, the message does not help me understand what is going on:

home/runner/work/_temp/20709d7f-80bf-4abe-8495-2fd0235fb524.sh: line 2:  \
    1840 Aborted          \
    (core dumped) CLASSPATH=../build/test-classes:../build/classes python -m pytest -v

@cmacdonald
Copy link
Contributor

NB: As jnius loads the JVM, its usually the case that Java's signal fault handler takes priority and generates a hs_err_pid log file which can contain a more meaningful stacktrace, even for native/Cython.

@misl6
Copy link
Member

misl6 commented Jul 4, 2022

Meanwhile, I was trying to reproduce the segfault on the above-mentioned config.

  • pytest-rerunfailures seems to be hiding the issue (at least partially)

Local test configuration:

  • macOS 12 (on Apple Silicon, PyPy runs on Rosetta, but I guess is the same on an Intel mac)
  • PyPy v7.3.9-osx64
  • JDK 17.0.3 (x86_64) "Eclipse Adoptium"

Manually running the failing test (tests/test_lambdas.py) reports:

Fatal RPython error: a thread is trying to wait for the GIL, but the GIL was not initialized
(For PyPy, see https://foss.heptapod.net/pypy/pypy/-/issues/2274)
zsh: abort      ..../pypy3.9-v7.3.9-osx64/bin/pypy tests/test_lambdas.py

@cmacdonald
Copy link
Contributor

tests/test_lambdas.py came from me. Can you narrow down to a particular test method?

@misl6
Copy link
Member

misl6 commented Jul 4, 2022

tests/test_lambdas.py came from me. Can you narrow down to a particular test method?

Looks that is failing here:

future = executor.submit(callFn)

@cmacdonald
Copy link
Contributor

cmacdonald commented Jul 4, 2022

So the java thread pool are calling back into a python class which implements Callable which calls the Python lambda.

One big(!) hack that is there is to ensure that the /Callable/ object, rather than the lambda itself (IIRC) is not GCd.

activeLambdaJavaProxies.add(py_arg)

If it has been GCd by Python, segfaults can occur. Could it have been GCd by Pypy?

@mattip
Copy link
Author

mattip commented Jul 4, 2022

Is there use of forking plus threads? We have seen some hairy bugs with this, the state is shared in strange and wondrous ways.

@mattip
Copy link
Author

mattip commented Jul 4, 2022

If it has been GCd by Python, segfaults can occur. Could it have been GCd by Pypy?

Typically, the PyPy GC is less aggressive than the CPython one: objects tend to stay around a little longer. I wonder if changing the order to set up the thread pool before creating the function will change anything:

-     callFn = lambda: "done"
     executor = autoclass("java.util.concurrent.Executors").newFixedThreadPool(1)
+     callFn = lambda: "done"
     future = executor.submit(callFn)

@kuzeyron
Copy link

Is this still a problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants