-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector #13595
Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector #13595
Conversation
fdb9b83
to
81f5866
Compare
Could you resolve the conflicts on this PR @zielmicha ? |
Looks like rebasing is non-trivial due to changes related to #15921 - I'll try to get it done next week. |
Thank you @zielmicha .. cc @jacobbaskin @martint @arhimondr |
81f5866
to
31ecc19
Compare
da26d56
to
3cde8a9
Compare
@mosabua Ok, rebase turned out to fairly simple in the end, code is ready to review. Right now we don't even have to insert fake UTC timezone into the type, as it's not used in the Trino-native Hive types. |
Awesome... what do you think @martint .. ready to merge? |
if (type instanceof TimestampWithTimeZoneType) { | ||
return HIVE_TIMESTAMPLOCALTZ.getTypeInfo(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right choice. If we want to add support for Hive's TIMESTAMP WITH TIME ZONE type later, this will be unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to go with TIMESTAMP WITH LOCAL TIME ZONE, because it's roughly Hive equivalent of Instant. And in practice, the formats used with Hive (ORC, Parquet, Avro) store the timestamps as instants (they don't keep timezone information with each value).
I'm not sure if there is any way to to support both TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE (at least without some custom annotations), considering Trino doesn't have a separate Instant-like type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things to consider:
- How does a Hive
TIMESTAMP WITH LOCAL TIME ZONE
present within Trino? We concluded that it'd be reasonable to expose that as a TrinoTIMESTAMP WITH TIME ZONE
with an arbitrary time zone. - How does a Hive
TIMESTAMP WITH TIME ZONE
present within Trino? We don't have an answer yet, and it's out of the scope of this change. The likely reasonable choice is to expose it as a TrinoTIMESTAMP WITH TIME ZONE
. - How does a Trino
TIMESTAMP WITH TIME ZONE
present within Hive? Specifically, when creating a table. This is also outside the scope of this PR, but the likely choice is also a HiveTIMESTAMP WITH TIME ZONE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see - that makes sense. I'll remove this line and adjust the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I removed mapping from Trino to Hive types (and had to change test to create the table directly in the metastore)
51d3ce8
to
f0ea5aa
Compare
It is possible to read from a table with a column of this type, but the column | ||
data is not accessible. Writing to such a table is not supported. | ||
* Hive's ``timestamp with local zone`` data type is mapped to | ||
``timestamp with time zone`` with UTC timezone. It only supports reading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would writing not work? If the physical representation is the same, then writing should just work, I think. Have you tried it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires some additional changes, like adding cases to isWritablePrimitiveType
andParquetSchemaConverter.getPrimitiveType
. We'll need support for writing eventually too - I can add it in this PR or in a following one.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
MaterializedResult expected = resultBuilder(getSession(), TimestampWithTimeZoneType.TIMESTAMP_TZ_MILLIS) | ||
.row(ZonedDateTime.parse("2022-07-26T17:13Z[UTC]")) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to do something like:
assertThat(query("SELECT * FROM test_timestamptz"))
.matches("VALUES TIMESTAMP '....'");
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
db0e0a4
to
7dc77fc
Compare
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor last comment. I'll go ahead and merge it once you address that and CI passes. Thanks!
7dc77fc
to
8128e49
Compare
Reminder to merge @martint ... |
Description
Resolves #1240 by mapping Hive TIMESTAMP WITH LOCAL TIME ZONE to TIMESTAMP WITH TIME ZONE with UTC zone.
Mapping to UTC is a bit of a judgement call (see #1240), but I believe is better than the alternatives (like using session zones) and the direction Trino maintainers want to take (see e.g. #8680).
In addition to the new automated test, I have tested reading the timestamps from Parquet files. Notably, Hive Avro serde doesn't support TIMESTAMP WITH TIME ZONE.
A new feature.
Hive connector
Adds support for TIMESTAMP WITH TIME ZONE Trino type to Hive connector. This type is translated to TIMESTAMP WITH LOCAL TIME ZONE - a Hive type that represents an instant of time. There is no explicit time zone associated with Hive TIMESTAMP WITH LOCAL TIME ZONE, so Trino always chooses to represent them in UTC timezone.
Related issues, pull requests, and links
Documentation
(updated the existing mention of this type in docs, not sure if there are other places I should update)
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: