Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage abnormaly growing overtime #2794

Closed
TAnas0 opened this issue Aug 28, 2019 · 11 comments
Closed

Memory usage abnormaly growing overtime #2794

TAnas0 opened this issue Aug 28, 2019 · 11 comments
Assignees
Labels
c/server Related to server

Comments

@TAnas0
Copy link

TAnas0 commented Aug 28, 2019

Hello,

I am using Hasura as a backend, deployed on a droplet in DigitalOcean, initially using the DO marketplace. The database is managed and everything is working fine.

The problem I am facing is that the memory usage displays a weird behavior, where it slowly accumulates and then seems to get "flushed". I was able to verify this by watching the consumption of the graphql-engine process. Also, near the peaks, the API becomes unresponsive/slow for quite some time.

image

11Screenshot_2019-08-23 DigitalOcean - frontscan-api

I looked into this similar issue #2565 (dealing with CPU consumption), and tried to disable garbage collection in my docker-compose file, but the behavior still reappeared.

I must also note that this happens "randomly", or rather I have no way of reliably reproducing it, since I don't know what is causing it. It may be related to Hasura, Docker or Digitalocean.

I am looking for help to pinpoint its cause and maybe solve it. Any help would be appreciated.

Also, can this be related to the console being opened? I heard the most memory leakage problems are due to hungry frontends ran on browsers.

@ecthiender
Copy link
Member

ecthiender commented Aug 29, 2019

@TAnas0 are you able to reproduce this by doing the exact same actions (like having the console opened etc.) but on your local machine?

@TAnas0
Copy link
Author

TAnas0 commented Aug 29, 2019

@ecthiender Thanks for your interest.

I am working on a local setup of the architecture, and it has a lot of (suspect) components that might be triggering this.

But as I said, I have no reliable way of reproducing the behavior, so I will have to wait for it to (un?)hopefully reappear.

I'll be back at you after some testing

@TAnas0
Copy link
Author

TAnas0 commented Aug 30, 2019

I replicated a Hasura instance using Docker Compose and configured it to use the same managed PostgreSQL instance that is used in production. I left it on for several hours, made sure to have several consoles open, and used the API for read/writes originating from the same components I am using in production. But I couldn't reproduce the behavior and graphql-engine's consumption of memory stayed rationale.

It is also worth noting the the read/writes were not as heavy as they are in production, since we are using scalable microservices for inserting into the database (Serverless and Google Cloud Run). So I believe my local setup is considerably different than production, hence the results can't be really conclusive.

The only interaction with the Hasura API that I couldn't try on my local setup, is a bunch of frontend GraphQL subscriptions.

  1. Can these subscription be the source of the problem? Is there a possibility that subscriptions are left opened and consuming resources, after the page is exited? Or is this possibility handled by Hasura? (I admit this is rather related to my frontend code, but this involves a browser too)

  2. Also, now I remember the source of my suspicion of the console: Server runs out of memory when accessing console #1942. I wonder how much this issue is relevant to my situation.

  3. Finaly, what kind of data/logs would you be interested in if the behavior occurs again? Because beyond listing processes and their consumption of RAM/CPU, I am not that sysadmin savy 😅

Cheers

@ecthiender
Copy link
Member

@TAnas0 I think the biggest thing would be the steps to reproduce. Apart from that RAM/CPU consumption over time is useful.

@0x777 any other ideas?

@TAnas0
Copy link
Author

TAnas0 commented Sep 14, 2019

Update on the situation:

My droplet's RAM spiked again, after droping. This is a graph from the last 7 days:
image

I am also exposing a dashboard, thanks to Netdata, from the droplet to get further insight: droplet's dashboard.
You will notice that CPU usage is low while the RAM's usage in above 95%. The most interesting part of the dashboard is the RAM section.

Executing top from inside the droplet clearly indicate that this issue is related to the process graphql-engine:

image

I can see from the latest release's notes that Websockets gracefull shutdown is not yet supported, but is on the map and there is subject of talk in the following issue and pull request: #2698 & #2717
If someone can take a look at that and confirm, or not, that it is related, that would be of tremendous help.

Cheers

@fmilkovic37
Copy link

Hi, maybe this can help. I figured query plan cache was to blame, but limiting it didn't help.

some info:
v.1.0.0
HASURA_GRAPHQL_QUERY_PLAN_CACHE_SIZE=1000
HASURA_GRAPHQL_PG_STRIPES=2

node exporter screenshot

@jberryman
Copy link
Collaborator

(Linking to #3388 which I've been using as the canonical memory leak issue, for referencing in PRs, etc.)

@jberryman
Copy link
Collaborator

Maybe related to #3879

@0x777
Copy link
Member

0x777 commented Mar 30, 2020

@TAnas0 @fmilkovic We've released v1.1.1 which should fix this, can you please try it out and let us know?

@fmilkovic37
Copy link

Running new version for about an hour, looks like its fixed. Thank you.

@TAnas0
Copy link
Author

TAnas0 commented Mar 31, 2020

@0x777 We have moved on from this setup for quite some time now
But I'll take @fmilkovic words for it

I must note that for me it took more than few hours before the droplet's CPU starts acting out.

Also the issue #3879 linked by @jberryman looks like the perfect suspect, because I remember using subscriptions heavily.

Will close for now, thanks all, and @fmilkovic feel free to reopen if you face it again

@TAnas0 TAnas0 closed this as completed Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/server Related to server
Projects
None yet
Development

No branches or pull requests

6 participants