Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the effective predicate when doing partition matching #20256

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jan 2, 2024

Description

Use the effective predicate instead of the dynamic filter predicate to check for partition matching.
This change results in short circuiting the page source and not having to read anymore the data file footer in the exotic case where a partition filter acts as unenforced
predicate due to table partition spec evolution.

This PR is a follow-up of the hint #20212 (comment)

Additional context and related issues

Imagine the following scenario:

trino> create table iceberg.default.t1 (year integer, month integer, data integer) with (partitioning=array['year']);
CREATE TABLE
trino> insert into iceberg.default.t1 values (2023, 1, 10), (2023, 1, 11), (2023, 2, 20);
INSERT: 3 rows
trino> alter table iceberg.default.t1 set properties partitioning = array['year', 'month'];
SET PROPERTIES
trino> insert into iceberg.default.t1 values (2023, 1, 12), (2023, 1, 13), (2023, 2, 21);
INSERT: 3 rows
trino> select * from iceberg.default.t1 where year = 2023 and month = 1;
 year | month | data 
------+-------+------
 2023 |     1 |   12 
 2023 |     1 |   13 
 2023 |     1 |   10 
 2023 |     1 |   11 
(4 rows)

If the table iceberg.default.t1 (with the filter month = 1 - which is unenforced due to partition spec evolution) would be involved in a JOIN operation and the dynamic filter from the build table would prove to be very selective (e.g. year = 2023), it makes sense for us to use the unenforced filter as well in checking whether we should read at all the data file.
Before this change, the page source provider was just using the dynamic filter for partition matching which can potentially result in processing unnecessary data files.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 2, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Jan 2, 2024
Use the effective predicate instead of the dynamic filter predicate
to check for partition matching.
This change results in short circuiting the page source and not
having to read anymore the data file footer in the exotic case
where a partition filter acts as unenforced
predicate due to table partition spec evolution.
@findinpath findinpath force-pushed the findinpath/short-circuit-page-source-provider-iceberg branch from 7bfacbe to 007eda2 Compare January 2, 2024 13:58
@raunaqmorarka raunaqmorarka merged commit e0c0c01 into trinodb:master Jan 2, 2024
43 checks passed
@github-actions github-actions bot added this to the 436 milestone Jan 2, 2024
@findinpath findinpath self-assigned this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

3 participants