Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector #13595

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

zielmicha
Copy link
Member

@zielmicha zielmicha commented Aug 10, 2022

Description

Resolves #1240 by mapping Hive TIMESTAMP WITH LOCAL TIME ZONE to TIMESTAMP WITH TIME ZONE with UTC zone.

Mapping to UTC is a bit of a judgement call (see #1240), but I believe is better than the alternatives (like using session zones) and the direction Trino maintainers want to take (see e.g. #8680).

In addition to the new automated test, I have tested reading the timestamps from Parquet files. Notably, Hive Avro serde doesn't support TIMESTAMP WITH TIME ZONE.

Is this change a fix, improvement, new feature, refactoring, or other?

A new feature.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Hive connector

How would you describe this change to a non-technical end user or system administrator?

Adds support for TIMESTAMP WITH TIME ZONE Trino type to Hive connector. This type is translated to TIMESTAMP WITH LOCAL TIME ZONE - a Hive type that represents an instant of time. There is no explicit time zone associated with Hive TIMESTAMP WITH LOCAL TIME ZONE, so Trino always chooses to represent them in UTC timezone.

Related issues, pull requests, and links

Documentation

(updated the existing mention of this type in docs, not sure if there are other places I should update)

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Hive
* Add support for reading TIMESTAMP WITH LOCAL TIME ZONE values ({issue}`1240`)

@cla-bot cla-bot bot added the cla-signed label Aug 10, 2022
@zielmicha zielmicha changed the title Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector [WIP] Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector Aug 10, 2022
@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch 2 times, most recently from fdb9b83 to 81f5866 Compare August 18, 2022 12:52
@zielmicha zielmicha changed the title [WIP] Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector Aug 18, 2022
@zielmicha zielmicha requested a review from skrzypo987 August 26, 2022 15:37
@mosabua mosabua requested review from martint and arhimondr April 26, 2023 18:43
@mosabua
Copy link
Member

mosabua commented May 3, 2023

Could you resolve the conflicts on this PR @zielmicha ?

@zielmicha
Copy link
Member Author

Looks like rebasing is non-trivial due to changes related to #15921 - I'll try to get it done next week.

@mosabua
Copy link
Member

mosabua commented May 5, 2023

Thank you @zielmicha .. cc @jacobbaskin @martint @arhimondr

@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch from 81f5866 to 31ecc19 Compare May 10, 2023 21:58
@github-actions github-actions bot added the hive Hive connector label May 11, 2023
@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch 3 times, most recently from da26d56 to 3cde8a9 Compare May 12, 2023 11:03
@zielmicha
Copy link
Member Author

@mosabua Ok, rebase turned out to fairly simple in the end, code is ready to review. Right now we don't even have to insert fake UTC timezone into the type, as it's not used in the Trino-native Hive types.

@mosabua
Copy link
Member

mosabua commented May 12, 2023

Awesome... what do you think @martint .. ready to merge?

Comment on lines 150 to 152
if (type instanceof TimestampWithTimeZoneType) {
return HIVE_TIMESTAMPLOCALTZ.getTypeInfo();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right choice. If we want to add support for Hive's TIMESTAMP WITH TIME ZONE type later, this will be unexpected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to go with TIMESTAMP WITH LOCAL TIME ZONE, because it's roughly Hive equivalent of Instant. And in practice, the formats used with Hive (ORC, Parquet, Avro) store the timestamps as instants (they don't keep timezone information with each value).

I'm not sure if there is any way to to support both TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE (at least without some custom annotations), considering Trino doesn't have a separate Instant-like type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things to consider:

  • How does a Hive TIMESTAMP WITH LOCAL TIME ZONE present within Trino? We concluded that it'd be reasonable to expose that as a Trino TIMESTAMP WITH TIME ZONE with an arbitrary time zone.
  • How does a Hive TIMESTAMP WITH TIME ZONE present within Trino? We don't have an answer yet, and it's out of the scope of this change. The likely reasonable choice is to expose it as a Trino TIMESTAMP WITH TIME ZONE.
  • How does a Trino TIMESTAMP WITH TIME ZONE present within Hive? Specifically, when creating a table. This is also outside the scope of this PR, but the likely choice is also a Hive TIMESTAMP WITH TIME ZONE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see - that makes sense. I'll remove this line and adjust the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I removed mapping from Trino to Hive types (and had to change test to create the table directly in the metastore)

@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch 4 times, most recently from 51d3ce8 to f0ea5aa Compare May 31, 2023 23:48
@electrum electrum requested a review from martint June 2, 2023 01:15
It is possible to read from a table with a column of this type, but the column
data is not accessible. Writing to such a table is not supported.
* Hive's ``timestamp with local zone`` data type is mapped to
``timestamp with time zone`` with UTC timezone. It only supports reading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would writing not work? If the physical representation is the same, then writing should just work, I think. Have you tried it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires some additional changes, like adding cases to isWritablePrimitiveType andParquetSchemaConverter.getPrimitiveType. We'll need support for writing eventually too - I can add it in this PR or in a following one.

Comment on lines 8312 to 8314
MaterializedResult expected = resultBuilder(getSession(), TimestampWithTimeZoneType.TIMESTAMP_TZ_MILLIS)
.row(ZonedDateTime.parse("2022-07-26T17:13Z[UTC]"))
.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do something like:

assertThat(query("SELECT * FROM test_timestamptz"))
    .matches("VALUES TIMESTAMP '....'");

@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch 2 times, most recently from db0e0a4 to 7dc77fc Compare July 7, 2023 16:28
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor last comment. I'll go ahead and merge it once you address that and CI passes. Thanks!

@zielmicha zielmicha force-pushed the hive-timestamp-with-timezone branch from 7dc77fc to 8128e49 Compare July 7, 2023 17:09
@mosabua
Copy link
Member

mosabua commented Jul 10, 2023

Reminder to merge @martint ...

@martint martint merged commit 1ba0143 into trinodb:master Jul 10, 2023
@github-actions github-actions bot added this to the 422 milestone Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Support Hive's TIMESTAMP WITH LOCAL TIME ZONE
3 participants