Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Getting older data using trino hudi connector, for real time MOR table .. whereas getting correct data in hive for same mor_rt table #8185

Closed
pravin1406 opened this issue Mar 14, 2023 · 8 comments
Assignees
Labels
trino The trino query engine

Comments

@pravin1406
Copy link

pravin1406 commented Mar 14, 2023

Tips before filing an issue

  • Have you gone through our FAQs?

  • yes

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Hi @nsivabalan
A clear and concise description of the problem.
I upserted some data into a MOR table using spark (java code) with hive sync enabled using HMS. When i read this data on spark shell using output path or db.employee_test_mor_rt , i get the correct output, records are read from .log files as expected and same from hive.
But on reading this data from trino for employee_test_mor_rt table, i get the output from last commits only and not from .log files of MOR table.

To Reproduce

Steps to reproduce the behavior:

  1. Upsert data using spark multiple times , so there are multiple .log files with hive sync enabled using HMS/jdbc
  2. Read through trino 410 (latest) using hudi connector

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.12.2

  • Spark version : 3.2.0

  • Hive version : 3.1.2_1

  • Hadoop version : Hadoop 3.2.1

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Trino server log for query on trino
Screenshot 2023-03-14 at 11 35 24 PM
Screenshot 2023-03-14 at 11 35 57 PM
Screenshot 2023-03-14 at 11 36 06 PM

@danny0405 danny0405 added the trino The trino query engine label Mar 15, 2023
@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Mar 15, 2023
@danny0405
Copy link
Contributor

A known issue, @codope , what is the approximate fix time ?

@pravin1406
Copy link
Author

Adding output of trino for proofs!

Screenshot 2023-03-14 at 11 35 14 PM

@codope
Copy link
Member

codope commented Mar 15, 2023

@pravin1406 The MOR snapshot query (_rt table) is not yet supported in the trino-hudi connector. You're getting incorrect results because it is still returning the result from the base files only. We have an active PR which is expected to merge soon - trinodb/trino#14786
Stay tuned!

@pravin1406
Copy link
Author

@codope Thanks for that. But hive connector works right ? I tried using that as well but the result still remains the same.

Please see below for the output when using hive connector

Screenshot 2023-03-15 at 10 27 31 PM

@codope
Copy link
Member

codope commented Mar 15, 2023

Sorry, the Hive connector also does not support yet in Trino. The support matrix for different query engines is documented at https://hudi.apache.org/docs/querying_data#support-matrix

@pravin1406
Copy link
Author

Ohkay, Will wait for the support ! Thanks for the quick response.

@github-project-automation github-project-automation bot moved this from ⏳ Awaiting Triage to ✅ Done in Hudi Issue Support Mar 15, 2023
@pravin1406
Copy link
Author

Hi @codope When is this support expected ?

@codope
Copy link
Member

codope commented Sep 4, 2023

Hi @pravin1406 , please check my comment here trinodb/trino#14786 (comment)
I think we will have to wait for a couple of months more to get this out. Hudi 0.14.0 is still in the RC phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trino The trino query engine
Projects
Archived in project
Development

No branches or pull requests

3 participants