Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase idle GC interval to workaround CPU leak issue #2581

Merged
merged 1 commit into from
Jul 24, 2019

Conversation

jberryman
Copy link
Collaborator

This seems to resolve the issue locally (and has worked in the past),
but it's not clear what exactly is going on here (in particular, why
this should resolve what looks like a memory leak). It certainly seems
like a GHC issue of some sort.

Closes #2565

@jberryman jberryman requested a review from 0x777 as a code owner July 22, 2019 09:39
@netlify
Copy link

netlify bot commented Jul 22, 2019

Deploy preview for hasura-docs ready!

Built with commit 7c9b300

https://deploy-preview-2581--hasura-docs.netlify.com

@hasura-bot
Copy link
Contributor

Review app for commit 3d0c16f deployed to Heroku: https://hge-ci-pull-2581.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2581-3d0c16fe

@0x777
Copy link
Member

0x777 commented Jul 22, 2019

Is there any reason why we can't disable idle gc completely?

@jberryman
Copy link
Collaborator Author

jberryman commented Jul 22, 2019

@0x777 The reason not to (which I forgot when I responded to the ticket at first) is idle GC is necessary to ensure finalizers are run promptly and deadlocked threads are cleaned up promptly. It could result in some weird behavior or resource leakage with idle GC disabled. It's just hard to anticipate what might go wrong, e.g. what libraries might rely on finalizers running somewhat promptly in all cases. It could be fine but this is more cautious and seems to fix things.

@hasura-bot
Copy link
Contributor

Review app for commit 7f8fd0c deployed to Heroku: https://hge-ci-pull-2581.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2581-7f8fd0c4

@jberryman
Copy link
Collaborator Author

I scaled up the timing intervals for polling threads by the same degree as the proposed fix here (0.3 sec -> 2 sec), and the fix seems to still work. So hopefully the -I parameter isn't dependent on polling frequencies (it was never really clear how or if that is related).

0x777
0x777 previously approved these changes Jul 23, 2019
@jberryman
Copy link
Collaborator Author

@lexi-lambda if you could review when you have the chance ^ :)

lexi-lambda
lexi-lambda previously approved these changes Jul 23, 2019
@lexi-lambda
Copy link
Contributor

Just asking: is there an open GHC issue about this?

@jberryman
Copy link
Collaborator Author

@lexi-lambda Nothing I've found where anyone suggests GHC is probably doing something broken. Thanks for asking, a little survey:

The two old tickets I was familiar with and linked elsewhere
https://gitlab.haskell.org/ghc/ghc/issues/3408
https://gitlab.haskell.org/ghc/ghc/issues/4322

A confusing writeup that might have good information but I didn't try to parse:
https://gitlab.haskell.org/ghc/ghc/issues/15402

Two newish tickets that might be exhibiting the issue
https://gitlab.haskell.org/ghc/ghc/issues/11134
https://gitlab.haskell.org/ghc/ghc/issues/13164 (I will see if the simple test case in this one exhibits the issue (i.e. growing cpu/memory) and may comment there)

I wasn't inclined to open a new issue without a small test case, and wasn't real confident creating one would be easy.

@lexi-lambda
Copy link
Contributor

Makes sense. This comment from Simon Marlow is interesting:

if your application has a timer signal that executes some Haskell code every 1s, say, then you'll get an idle GC after each one. Setting the idle GC to a longer period than your timer signal (e.g. 2s) will prevent this happening.

That seems like it could apply to us, but on its own, it doesn’t get us much further to understanding why that would trigger anything pathological. In any case, this workaround seems harmless.

@jberryman
Copy link
Collaborator Author

@lexi-lambda I think you may have to press the button yourself (I'm not authorized to merge, not sure if that's the intention) :-)

@lexi-lambda
Copy link
Contributor

I think @0x777 is actually still the only person who can push the button, even though I’ve been marked as the code owner. I think he can also fix that, though.

This seems to resolve the issue locally (and has worked in the past),
but it's not clear what exactly is going on here (in particular, why
this should resolve what looks like a memory leak). It certainly seems
like a GHC issue of some sort.

Closes hasura#2565
@lexi-lambda lexi-lambda dismissed stale reviews from 0x777 and themself via 7c9b300 July 24, 2019 05:09
@lexi-lambda lexi-lambda merged commit 02ae91c into hasura:master Jul 24, 2019
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-2581.herokuapp.com is deleted

@hasura-bot
Copy link
Contributor

Review app for commit 7c9b300 deployed to Heroku: https://hge-ci-pull-2581.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2581-7c9b3009

@jberryman jberryman deleted the 2565 branch July 24, 2019 05:36
@elitan
Copy link
Contributor

elitan commented Aug 1, 2019

Thanks for the PR. Does this mean we wont have to use - GHCRTS=-I0 anymore?

@jberryman
Copy link
Collaborator Author

@elitan that's the idea, but if you can give it a try and confirm it solves the issue for your deployment that would be really helpful!

@elitan
Copy link
Contributor

elitan commented Aug 1, 2019

@jberryman Will do!

polRk pushed a commit to polRk/graphql-engine that referenced this pull request Feb 12, 2020
This seems to resolve the issue locally (and has worked in the past),
but it's not clear what exactly is going on here (in particular, why
this should resolve what looks like a memory leak). It certainly seems
like a GHC issue of some sort.

Closes hasura#2565
@davidpanic davidpanic mentioned this pull request Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hasura using more CPU over time
5 participants