Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior of correlated subqueries involving aggregations #18979

Closed
zhaner08 opened this issue Sep 8, 2023 · 2 comments
Closed

Comments

@zhaner08
Copy link
Contributor

zhaner08 commented Sep 8, 2023

Sample query:

with
  tmp as (select * from (VALUES (1,3),(2,5)) t(k,v))
select
  (select 3 from tmp t1 where t1.k=t0.k and t1.k>1 group by t1.v)
from (select k, v from tmp) t0;

This query works fine and returning 2 rows, 1 with value and 1 with null (due to t1.k>1)

When changing the aggregation column to t1.k, which is the correlated column, only single row will be returned

with
  tmp as (select * from (VALUES (1,3),(2,5)) t(k,v))
select
  (select 3 from tmp t1 where t1.k=t0.k and t1.k>1 group by **t1.k**)
from (select k, v from tmp) t0;

By looking through the optimizer, seems like the first one triggers TransformCorrelatedScalarSubquery before TransformCorrelatedJoinToJoin that creating am inner join instead of left join, with filter predicates t1.k>1 pushed down to the outer query so the row was eliminated.

If the aggregation is removed, both queries returns the same result.

@martint
Copy link
Member

martint commented Sep 13, 2023

I believe this is related to, or even the same as the issue reported and fixed here: #19002

@zhaner08
Copy link
Contributor Author

Looks like it is the same issue, closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants