Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N+1 queries with cache_fragment and GraphQL Ruby Dataloader #126

Closed
tsugitta opened this issue Jan 2, 2025 · 8 comments · Fixed by #129 or #130
Closed

N+1 queries with cache_fragment and GraphQL Ruby Dataloader #126

tsugitta opened this issue Jan 2, 2025 · 8 comments · Fixed by #129 or #130
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@tsugitta
Copy link

tsugitta commented Jan 2, 2025

Summary

When using graphql-ruby (v2.4.8) together with graphql-fragment_cache (v1.20.5) on Rails 8.0.1, I encounter an unexpected N+1 query problem if I wrap a Dataloader-based field resolver with cache_fragment. Without cache_fragment, Dataloader batches the queries correctly. With cache_fragment, each user (in this example) triggers its own query, leading to N+1 behavior.

cache_fragment(expires_in: 1.minute) do
  dataloader.with(Sources::PostsByUser).load(object.id)
end

I have written an implementation and a test to illustrate the issue. Specifically:

  1. cache_fragment disabled: only 2 SQL SELECT statements (one for users, one for posts).
  2. cache_fragment enabled: 3 SQL SELECT statements (or more if more users are present), i.e., an N+1 pattern emerges.
=== Queries without cache ===
1. SELECT "users".* FROM "users" /*action='execute',application='GraphqlRubyIssueReproduction',controller='graphql',current_graphql_field='Query.users'*/
2. SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1, 2) /*action='execute',application='GraphqlRubyIssueReproduction',controller='graphql',current_dataloader_source='Sources%3A%3APostsByUser'*/
=== Queries with cache ===
1. SELECT "users".* FROM "users" /*action='execute',application='GraphqlRubyIssueReproduction',controller='graphql',current_graphql_field='Query.users'*/
2. SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1 /*action='execute',application='GraphqlRubyIssueReproduction',controller='graphql',current_dataloader_source='Sources%3A%3APostsByUser'*/
3. SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 2 /*action='execute',application='GraphqlRubyIssueReproduction',controller='graphql',current_dataloader_source='Sources%3A%3APostsByUser'*/

Environment

Rails: 8.0.1
Ruby: 3.3.2
graphql-ruby: 2.4.8
graphql-fragment_cache: 1.20.5

@DmitryTsepelev DmitryTsepelev reopened this Jan 6, 2025
@DmitryTsepelev DmitryTsepelev added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Jan 6, 2025
@DmitryTsepelev
Copy link
Owner

Hey! Thanks for the reproduction steps, and you're right, it turned out that the gem does not play well with Dataloader.

The problem is that Fiber eventually returns the value, but when you call cache_fragment — it enforces the resolution of the block (so Fiber cannot finish). You can move cache_fragment inside the dataloader.with(Sources::PostsByUser).load(object.id).then { ... } but that obviously not what we want, cause it will call the database even when we have the data in the cache.

I guess we need to find a way to fix it (somewhere around this line) but I'm not sure I will be able to work on that in the near future.

@DmitryTsepelev
Copy link
Owner

DmitryTsepelev commented Jan 6, 2025

I think I found a workaround — please see #130. In order to make it work you'll just need to pass dataloader: true to the cache_fragment (see here).

I'll let it sit in the branch for a while and then make a release 🙂

@tsugitta
Copy link
Author

tsugitta commented Jan 7, 2025

@DmitryTsepelev
Thank you for addressing this!
I tried it, and I can confirm that the N+1 queries are resolved, and the cache is being utilized correctly. However, it seems that the batch query is executed regardless of whether the cache exists or not, due to calling block.call.
Is my understanding correct?

@DmitryTsepelev
Copy link
Owner

Good point! I tried to move the block call later, when we can check if we already have the value in the cache. Need to polish specs a bit before it's ready for another round

@DmitryTsepelev
Copy link
Owner

Looks like I fixed the issue (not for free—we'll have to perform an additional exist? query to the store :( ). Could you please try again from the branch?

@tsugitta
Copy link
Author

tsugitta commented Jan 7, 2025

Thanks again!
I confirmed that no queries are executed to database if the cache exists.
As you mentioned as "not for free", I thought that perhaps this time we’d be encountering an N+1 issue with the cache store (meaning this additional query may be not once but N times), but I also think it might still be better compared to accessing the database.

@DmitryTsepelev
Copy link
Owner

It will happen once, but might be still an overhead. In this sense batch approach was easier, because we could easily tell if it was already resolved or not, but in case of fibers we have to check that we don't have it in cache and, if not — make sure we let it resolve by itself (block.call forced it to resolve, that's why we got 2 requests instead of a single one)

@tsugitta
Copy link
Author

tsugitta commented Jan 7, 2025

It will happen once

When dataloader is set to false, there seems 1 call to read_multi. On the other hand, when dataloader is set to true, there seems 1 call to read_multi plus N calls to exist?.
I tested with below.
https://github.com/tsugitta/graphql-ruby-issue-reproduction/blob/7d65f9c57248dbc24de0c8e9f4215aec05a03d06/spec/requests/graphql_spec.rb#L97-L177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
2 participants