-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Hudi MOR snapshot #14786
Support Hudi MOR snapshot #14786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willzgw Thanks for adding the MOR snapshot query support. This would be very helpful for the community. Please edit the PR description accordingly and point to issue#14432.
Also, if you could add integ tests for MOR snapshot query that would be great. You can extend the existing setup. Note that you will have to add an upsert operation (folowwing after insert) in TpchHudiTablesInitializer#load
.
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplit.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java
Outdated
Show resolved
Hide resolved
I am trying to add some unit test cases for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willzgw This is looking more and more in shape. It would be great if you could add a few tests as well.
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableType.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/HiveHudiPartitionInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiFile.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiModule.java
Outdated
Show resolved
Hide resolved
Path path = new Path(baseFile.getPath()); | ||
|
||
HdfsContext context = new HdfsContext(session); | ||
Configuration configuration = hdfsEnvironment.getConfiguration(context, path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HudiSplit
already carry configuration. See if we can pass from the split in page source provider.
jobConf.set(READ_COLUMN_NAMES_CONF_STR, join(dataColumns, HiveColumnHandle::getBaseColumnName)); | ||
schema.stringPropertyNames() | ||
.forEach(name -> jobConf.set(name, schema.getProperty(name))); | ||
refineCompressionCodecs(jobConf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, what's the intention of setting compression codecs here?
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiDirectoryLister.java
Outdated
Show resolved
Hide resolved
Could you rebase on upstream to resolve conflicts and add product tests? |
@willzgw I have a PR for async split processing #16034 that improves the Hudi connector performance. On top of it, I have MoR support in my fork - https://github.com/codope/trino/tree/mor-snapshot |
Hi @codope Let's continue with this PR, I will rebase this branch with #16034. |
28b7b67
to
92ec871
Compare
@shan-shanmugam This PR is currently held back due to an ongoing effort to decouple Trino from Hadoop APIs - #15921 |
@codope, Thanks for the update. Any expected timeline for this? |
@shan-shanmugam Hudi 0.14.0 RC vote is ongoing and the community is working on RC2. It should be out in a couple of weeks and then the remaining work on this PR can be picked up. |
@codope It looks like even the latest version of |
@electrum The current version of |
This pr is very helpful to me, but when we use 433 version and I tried to cherry-pick this pr found that there are many conflicts after this pr 【#18241 】,especially the RecordCursor is not used after this pr. |
👋 @willzgw - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
Hi @mosabua |
Great .. feel free to reach out to us if you need help. @codope can help on the Hudi side.. |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Guowei Zhang.
|
Looks like github still thinks you have a lot of conflicts to resolve @willzgw |
Yeah, still in progress. |
I am not sure if it is worth refactoring this PR. Currently @yihua and @codope are working towards a new Hudi release that removes the Hadoop dependencies. Once that is out they will work on Hudi connector updates that will probably change the codebase quite a bit. So maybe its worth waiting to see if MOR snapshot support will come out of these PRs or just be easier to implement from scratch then. |
@mosabua |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
@yihua @codope Did Hudi 0.15.0 remove all Hadoop dependencies? I see https://issues.apache.org/jira/browse/HUDI-6497 contains 0.15.0 as a fix version. |
Description
Non-technical explanation
Release notes
( ) Support Hudi MOR Tables
( ) Support MOR snapshot querying for Hudi tables