-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude files that do not satisfy the metadata predicate #17693
Conversation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Show resolved
Hide resolved
Retrieving stats for a single file works. Iceberg connector provides the stats as expected through the changes of @chenjian2664 . @raunaqmorarka do you happen to know why this is happening? |
Iceberg doesn't produce stats for |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@findinpath and @chenjian2664 could you connect and figure out next steps here? |
@chenjian2664 pls rebase and I'll make another round of review. Appologies for not following up with the review on the last request. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@ebyhr @findinpath Need your help to review on this pr |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
I think this is still important to pursue. Hopefully @chenjian2664 can rebase and then @findinpath and others can follow up. |
Thanks @mosabua |
Move getPathDomain, getFileModifiedTimePathDomain into IcebergUtil. Refactor the getModificationTime and move it into IcebergUtil.
Yes, just notice the issue already has been fixed. 👍🏻 |
.row(null, null, null, null, 4.0, null, null) | ||
.build(); | ||
if (format == AVRO) { | ||
expectedTableStatistics = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a comment why we have different stats when working with avro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know all the reasons, but what I understand is that Avro is a row-based storage format and does not inherently support storing column-level statistics. Therefore, it's somewhat different from columnar storage formats like Parquet and ORC
Could you help to provide a concise comment
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
Co-Authored-By: Marius Grama <findinpath@gmail.com>
@findinpath @mosabua @ebyhr @findepi Thank you all for the help |
Description
Exclude files that do not satisfy the metadata predicate while computing the table statistics in Iceberg.
While building the tableScan metadata predicates are ignored, it results scan files more than expectation.
Additional context and related issues
Release notes
(x) This is not user-visible or docs only and no release notes are required.