Hasura using more CPU over time #2565

JordanCheney · 2019-07-18T04:08:09Z

Hello,

I recently upgraded to hasura 1.0.0.beta.3 and I've noticed that CPU use in the docker container is generally increasing over time, even if no requests are being made to the hasura server.

Here are the logs produced by Hasura (mostly showing nothing is happening):

{"timestamp":"2019-07-17T21:51:27.827+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:27.827+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.129.0.3) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
{"timestamp":"2019-07-17T21:51:29.951+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:29.951+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.129.0.3) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
{"timestamp":"2019-07-17T21:51:31.891+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:31.891+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"db_init","info":"successfully initialised"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"db_migrate","info":"already at the latest version. current version: \"17\""}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 86","instance_id":"5e7dd705-9331-474a-bda4-986d78801ffa","message":"listener thread started"}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 87","instance_id":"5e7dd705-9331-474a-bda4-986d78801ffa","message":"processor thread started"}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"event_triggers","info":"preparing data"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"event_triggers","info":"starting workers"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"server","info":{"time_taken":2.216395209,"message":"starting API server"}}}

Here is a screenshot of a grafana dashboard showing the CPU use over 6 hours:

Here is a minimal docker-compose script that can recreate the test environment and the monitoring tools.

hasura_test.zip

The text was updated successfully, but these errors were encountered:

lorenzo · 2019-07-19T13:25:43Z

Out of curiosity, is memory increasing as well?

JordanCheney · 2019-07-19T19:12:51Z

I had to recreate the experiment so, the times are different, but on run 2 it appears memory is also increasing.

lorenzo · 2019-07-20T18:46:21Z

I think the CPU usage increase is due to the garbage collector doing more work as time passes. This may be hinting at a possible memory leak or just at the need to move some live data out of the garbage collector.

What's the experiment about? Does this happen in other "normal" circumstances?

lorenzo · 2019-07-20T18:50:26Z

@JordanCheney also, out of curiosity, would you mind adding this env var and see if it makes any difference?

GHCRTS=-I0

This flag disables the idle garbage collection, which may reduce the cpu load when the server is not doing other work.

JordanCheney · 2019-07-21T04:41:17Z

@lorenzo I added the new env flag, I'll report back results in a few hours when it's finished. The experiment is intentionally not doing anything besides running the containers (no requests to hasura). I don't have any normal runs of the beta3 to compare to since this behavior got flagged so the new version never made it to real testing

JordanCheney · 2019-07-21T12:59:06Z

@lorenzo looks like disabling the idle garbage collector fixed the behavior-

What impact will this change have on system performance?

jberryman · 2019-07-22T05:19:36Z

@JordanCheney thanks for reporting. I've seen this crop up before, see e.g. PostgREST/postgrest#565

It should be perfectly safe to disable idle GC as @lorenzo helpfully suggested (thanks!). The idea is that if the program is doing nothing we might as well perform a GC. With idle GC disabled you might find that e.g. the next request to the server after an idle period has higher average latency (because a GC might happen to be triggered). It really shouldn't matter much for a production service. I've also found in another service I worked on that -I2 caused the memory increase issue to go away.

We're going to look a bit closer at this and will update the ticket when the fix is merged.

bartjuh4 · 2019-07-22T06:40:40Z

We experience the same performance issues, is there a workaround at this moment to disable the idle GC?

This seems to resolve the issue locally (and has worked in the past), but it's not clear what exactly is going on here (in particular, why this should resolve what looks like a memory leak). It certainly seems like a GHC issue of some sort. Closes hasura#2565

JordanCheney · 2019-07-22T18:23:06Z

@bartjuh4- @lorenzo suggested adding GHCRTS=-I0 as an environment flag which solved the problem for me

bartjuh4 · 2019-07-23T04:58:24Z

@bartjuh4- @lorenzo suggested adding GHCRTS=-I0 as an environment flag which solved the problem for me

Thanks, one (silly) question how did you add that flag to a docker instance?

lexi-lambda · 2019-07-23T05:06:33Z

@bartjuh4 If you’re running the container from the command line via docker run, you can add the -e GHCRTS=-I0 flag. If you’re using docker-compose, you can add

environment:
- GHCRTS=-I0

to your container configuration. Other mechanisms of running containers have other ways to control the environment.

This seems to resolve the issue locally (and has worked in the past), but it's not clear what exactly is going on here (in particular, why this should resolve what looks like a memory leak). It certainly seems like a GHC issue of some sort. Closes hasura#2565

This seems to resolve the issue locally (and has worked in the past), but it's not clear what exactly is going on here (in particular, why this should resolve what looks like a memory leak). It certainly seems like a GHC issue of some sort. Closes #2565

jberryman · 2019-07-24T05:38:08Z

Survey of possibly relevant GHC issues: #2581 (comment)

This seems to resolve the issue locally (and has worked in the past), but it's not clear what exactly is going on here (in particular, why this should resolve what looks like a memory leak). It certainly seems like a GHC issue of some sort. Closes hasura#2565

jberryman self-assigned this Jul 22, 2019

marionschleifer added the k/question label Jul 22, 2019

jberryman mentioned this issue Jul 22, 2019

Increase idle GC interval to workaround CPU leak issue #2581

Merged

lexi-lambda closed this as completed in #2581 Jul 24, 2019

TAnas0 mentioned this issue Aug 28, 2019

Memory usage abnormaly growing overtime #2794

Closed

davidpanic mentioned this issue Jun 7, 2023

Memory leak #9592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hasura using more CPU over time #2565

Hasura using more CPU over time #2565

JordanCheney commented Jul 18, 2019 •

edited by shahidhk

Loading

lorenzo commented Jul 19, 2019

JordanCheney commented Jul 19, 2019

lorenzo commented Jul 20, 2019

lorenzo commented Jul 20, 2019

JordanCheney commented Jul 21, 2019

JordanCheney commented Jul 21, 2019

jberryman commented Jul 22, 2019 •

edited

Loading

bartjuh4 commented Jul 22, 2019

JordanCheney commented Jul 22, 2019

bartjuh4 commented Jul 23, 2019

lexi-lambda commented Jul 23, 2019

jberryman commented Jul 24, 2019

Hasura using more CPU over time #2565

Hasura using more CPU over time #2565

Comments

JordanCheney commented Jul 18, 2019 • edited by shahidhk Loading

lorenzo commented Jul 19, 2019

JordanCheney commented Jul 19, 2019

lorenzo commented Jul 20, 2019

lorenzo commented Jul 20, 2019

JordanCheney commented Jul 21, 2019

JordanCheney commented Jul 21, 2019

jberryman commented Jul 22, 2019 • edited Loading

bartjuh4 commented Jul 22, 2019

JordanCheney commented Jul 22, 2019

bartjuh4 commented Jul 23, 2019

lexi-lambda commented Jul 23, 2019

jberryman commented Jul 24, 2019

JordanCheney commented Jul 18, 2019 •

edited by shahidhk

Loading

jberryman commented Jul 22, 2019 •

edited

Loading