-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][POC][DeltaLake] Prototype PR to add support reading tables using Delta Kernel library #23119
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed the io.trino.plugin.deltalake.kernel.clients
code and the usage of hadoop
's Configuration
class is a no-go for trinodb/trino
. Please see into #15921 for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing unrelated property files is likely unintended.
private final TypeManager typeManager; | ||
|
||
public KernelTableClient( | ||
Configuration configuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trinodb/trino
has migrated off from hadoop
related API in favor of using natively the file system clients.
We'd likely not accept reintroducing this compile dependency in trino-delta-lake
} | ||
|
||
Utils.closeCloseables(currentFileReader); | ||
if (!fileIter.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fileIter
-> fileIterator
Trino code conventions state that variable name shortcuts are disincouraged.
8afcabb
to
53e5df1
Compare
53e5df1
to
d388182
Compare
add config to enable kernel, create KernelDeltaLakeMetadata override get table handle using Kernel APIs, stubs for TableClient custom impls, build changes delegators, split manager, page source provider (not yet tested) end-2-end working! support for partition column
d388182
to
e267a4d
Compare
Switched to stale-ignore label since this is an ongoing effort. |
Description
The Delta Kernel project is a set of Java libraries for building Delta connectors that can read from and write into Delta tables using a narrow set of APIs without understanding the Delta protocol details.
There are two sets of public APIs to build connectors.
More Information about Delta Kernel can be found:
Example Java programs that illustrate how to read and write Delta tables using the Kernel APIs.
Migration guide
Currently, the Trino Delta connector has its own implementation of the Delta Log. We want to see if there is a way we can use Delta Kernel in Trino Delta connector so that Trino-Delta connector doesn't need to reimplement the same protocol updates.
This PR is just attempt to use the Kernel for read path with a session option. The prototype is in a very early stage, and a lot of details need to be implemented. Currently, it allows reading Delta tables, including the one with deletion vectors. It uses Trino's own Parquet reader. Looking for some early feedback on how Kernel can help reduce the development burden on the Trino-Delta connector to keep up with the Delta protocol updates.