Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling rubix for trino-iceberg #12812

Closed
wants to merge 1 commit into from

Conversation

rupal-s
Copy link
Member

@rupal-s rupal-s commented Jun 13, 2022

Description

Enabling rubix caching for trino-iceberg connector. With this change, we can use properties like hive.cache.enabled,
hive.cache.location etc to cinfigure the iceberg cache

Is this change a fix, improvement, new feature, refactoring, or other?
Improvements
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
trino-iceberg
How would you describe this change to a non-technical end user or system administrator?
Enabling rubix caching for trino-iceberg connector.

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot
Copy link

cla-bot bot commented Jun 13, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@rupal-s rupal-s changed the title Enabling rubix for trino -ceberg Enabling rubix for trino-iceberg Jun 13, 2022
@findepi
Copy link
Member

findepi commented Jun 13, 2022

I remember @raunaqmorarka expressing some strong concerns about this idea https://trinodb.slack.com/archives/CJ6UC075E/p1650894628024159?thread_ts=1650888689.756519&cid=CJ6UC075E

The main problem I see is that upstream Rubix is an unmaintained project at this point. It works fine in tests, however there have been reports of bugs once in a while at scale from actual users.
So if you run into any issues with it which can't be worked around using configs or from Trino side, it's unlikely to receive a code fix.

@rupal-s
Copy link
Member Author

rupal-s commented Jun 13, 2022

Hi @findepi / @raunaqmorarka ,
Thank you for the quick feedback.
I have gone through the thread referenced, and saw the following:

There are thoughts about building caching into Trino natively, however there is no concrete plan at the moment.
You could look into Varada's caching solution and see if that solves your needs

The above comment was in April, so I wanted to check if there are now any concrete plans of building a native cache, or maybe even moving to alluxio etc.
We need a data cache for the iceberg connector as object stores tends to degrade and even throttle on very high loads. Currently we are facing poor performance with large data sizes due to degradation in s3 latencies (sometimes calls even throttle)

@rukartok
Copy link

rukartok commented Jan 3, 2023

@rupal-s @findepi
this seems to be a working solution (I did the same change in my private branch and tested it) - what are any other concerns preventing this from check-in?

@raunaqmorarka
Copy link
Member

@rupal-s @findepi
this seems to be a working solution (I did the same change in my private branch and tested it) - what are any other concerns preventing this from check-in?

The main problem IMO continues to be :
Upstream Rubix is an unmaintained project at this point. It works fine in tests, however there have been reports of bugs once in a while at scale from actual users.
So if you run into any issues with it which can't be worked around using configs or from Trino side, it's unlikely to receive a code fix.

I would prefer to not increase dependence on an unmaintained external project.

@rukartok
Copy link

rukartok commented Jan 3, 2023

@rupal-s @findepi
this seems to be a working solution (I did the same change in my private branch and tested it) - what are any other concerns preventing this from check-in?

The main problem IMO continues to be : Upstream Rubix is an unmaintained project at this point. It works fine in tests, however there have been reports of bugs once in a while at scale from actual users. So if you run into any issues with it which can't be worked around using configs or from Trino side, it's unlikely to receive a code fix.

I would prefer to not increase dependence on an unmaintained external project.

The dependency is already there (Rubix is already used in Hive connector).
Assuming that Rubix is buggy also should not be a problem for Iceberg as this is controlled by configuration parameter (so it can be turned off if an issue occurs).

@hackeryang
Copy link
Member

The replacement of Rubix: #18719

@wendigo
Copy link
Contributor

wendigo commented Nov 20, 2023

Rubix is going to be deprecated and removed once new Alluxio lands

@mosabua mosabua closed this Jan 8, 2024
@mosabua
Copy link
Member

mosabua commented Jan 8, 2024

#20102

@rupal-s
Copy link
Member Author

rupal-s commented Jan 9, 2024

@mosabua will Alluxio be present on Trino iceberg connector as well?

@mosabua
Copy link
Member

mosabua commented Jan 9, 2024

Yes - see the code and docs PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants