Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Display incorrect datatype in partitioned parquet file #286

Open
JonasTan2015 opened this issue Oct 17, 2024 · 0 comments
Open

[Bug] Display incorrect datatype in partitioned parquet file #286

JonasTan2015 opened this issue Oct 17, 2024 · 0 comments

Comments

@JonasTan2015
Copy link

Background

I have a parquet file and here is the schema.

id: varchar,
date: varchar

When I put the file in a partitioned directory, tst-data/date=2024-01-01/file.parquet, opened it in Tad, the UI showed the date column data type was Date.

But when I moved the same file to a non-partitioned directory, tst-data/file.parquet, Tad showed the date column data type was varchar.

This was misleading. When I viewed a partitioned parquet file generated by my Apache Spark job, I thought my Spark job output incorrect data type. And it took me some time to figure out it was Tad.

Expected Behavior

As a file viewer, TAD should only display data types as they are, and should not infer data types from partitioned directories

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant