Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache result of evaluating constraint per partition in iceberg #20304

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

raunaqmorarka
Copy link
Member

Description

Currently we evaluate constraint on partitioning columns for every file in iceberg
split source. This change caches the result of constraint evaluation to avoid repeating
this computation for every file in a partition.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 9, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Jan 9, 2024
}
Map<ColumnHandle, NullableValue> partitionValues = partitionValuesSupplier.get();
try {
return partitionConstraintResults.get(ImmutableMap.copyOf(Maps.filterKeys(partitionValues, predicatePartitionColumns::contains)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the filtering of the partitionValues map keys be a preparatory commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only doing that to improve hit rate for cache when there are multiple partitioning columns and predicate is only on a subset of the partitioning columns.
I don't think we would get a benefit from doing this in the previous code.

@raunaqmorarka
Copy link
Member Author

Ran TPC sf1k iceberg partitioned parquet benchmarks

Screenshot 2024-01-10 at 10 38 44 AM Not much change, which is expected since there aren't many files per partition or complex partition constraints to be evaluated in TPC sf1k.

@raunaqmorarka raunaqmorarka merged commit 2fd2fee into trinodb:master Jan 10, 2024
42 checks passed
@raunaqmorarka raunaqmorarka deleted the ice-ss branch January 10, 2024 05:17
@github-actions github-actions bot added this to the 436 milestone Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

4 participants