Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

Closed
spens opened this issue Dec 26, 2023 · 3 comments

Comments

@spens
Copy link

spens commented Dec 26, 2023

After creating a simple table in parquet format with an integer column, selecting * from table causes an error.

Steps to reproduce:

1. Create sample data
Save the following content into a file called myTable.csv:

column1,operation_date,name,offer_type,time_dy,rate
1001,1/31/2023,ABC.LOC_01,offerType001,3/24/2023,0.8
1002,2/1/2023,ABC.LOC_01,offerType002,3/25/2023,0.8
1004,2/2/2023,ABC.LOC_01,offerType003,3/26/2023,0.8

2. Convert the sample data to parquet format.
Use this URL: https://observablehq.com/@observablehq/csv-to-parquet
Click the button "Choose File"
Select the file created above "myTable.csv"
After file uploads, click the button for "Download myTable.parquet"

3. Run Trino
You can use this project: https://github.com/njanakiev/trino-minio-docker
git clone https://github.com/njanakiev/trino-minio-docker.git
Launch the containers:

cd trino-minio-docker
docker-compose up -d

4. Load the parquet file into S3
Open http://localhost:9000 in your browser (default login credentials: access key = "minio_access_key", secret key = "minio_secret_key"
Click on the + button in the lower right corner of the screen
Click on the icon for "Create Bucket" and use the name "mybucket"
Click on the + button again and choose "Upload file"
Select the myTable.parquet file created earlier

5. Start Trino command line client
Download jar file from here: https://trino.io/download.html
Find the button in the section called "Command Line client" and download the trino-cli-xxx-executable.jar file
Run the jar file (assumes you have java installed and on your path)
(replace 435 with your version)
java -jar trino-cli-435-executable.jar

6. Create SCHEMA
At the trino command prompt, run the create schema command:

CREATE SCHEMA IF NOT EXISTS minio.mybucket
WITH (location = 's3a://mybucket/');

7. Create TABLE
Run the following to create the table:

CREATE TABLE IF NOT EXISTS minio.mybucket.mytable_parquet (
column1 INTEGER,
operation_date DATE,
name VARCHAR,
offer_type VARCHAR,
time_dy DATE,
rate DOUBLE
)
WITH (
  external_location = 's3a://mybucket/',
  format = 'PARQUET'
);

8. Query the table
SELECT * FROM minio.mybucket.mytable_parquet;

9. See error:

trino> SELECT * FROM minio.mybucket.mytable_parquet;

Query 20231226_142547_00002_bmngc, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
1.96 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20231226_142547_00002_bmngc failed: io.trino.spi.type.VarcharType

trino>

10. View log

2023-12-26 08:25:49 java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType
2023-12-26 08:25:49 at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
2023-12-26 08:25:49 at io.trino.parquet.reader.IntColumnReader.readValue(IntColumnReader.java:32)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:183)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:203)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:182)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:170)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:262)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:314)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:297)
2023-12-26 08:25:49 at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:164)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:381)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:360)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:276)
2023-12-26 08:25:49 at io.trino.spi.Page.getLoadedPage(Page.java:279)
2023-12-26 08:25:49 at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:304)
2023-12-26 08:25:49 at io.trino.operator.Driver.processInternal(Driver.java:379)
2023-12-26 08:25:49 at io.trino.operator.Driver.lambda$processFor$8(Driver.java:283)
2023-12-26 08:25:49 at io.trino.operator.Driver.tryWithLock(Driver.java:675)
2023-12-26 08:25:49 at io.trino.operator.Driver.processFor(Driver.java:276)
2023-12-26 08:25:49 at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2023-12-26 08:25:49 at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2023-12-26 08:25:49 at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2023-12-26 08:25:49 at io.trino.$gen.Trino_351____20231226_141624_2.run(Unknown Source)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2023-12-26 08:25:49 at java.base/java.lang.Thread.run(Thread.java:834)


SUMMARY
It looks like the IntColumnReader.readValue method is failing with an error:
java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType

Obviously, from the text above, the int column only contains integers. I've opened the parquet file in Power BI so that is valid. I've checked the COLUMNS_V2 table in the metastore_db and it looks correct (column types are: int, date, string, string, date, double).

@raunaqmorarka
Copy link
Member

Can you try this with a recent version of Trino ? The 351 version being used here is very old and the code in this area has changed a lot since then.

@spens
Copy link
Author

spens commented Dec 28, 2023

Yes, I will try it with the latest version and report back.

@spens
Copy link
Author

spens commented Jan 6, 2024

I'm going to close this. After further troubleshooting, it is not related to a Trino version, but my issue seems to be related to the parquet file.

If I create the file by attaching DuckDB to a Postgres instance and dumping the tables as parquet files, I get many errors in Trino with those parquet files. However, if I export the table from Postgres to a CSV (using DBeaver) then use DuckDB to create the parquet file from the CSV, Trino is happy.

Somehow my environment gets in a bad state and then even files that would normally work have problems (that's why the CSV in my issue description reproduced the error for me, but not when I cleared out my docker volumes and started over).

Both parquet files work in some other tools (like Power BI), but I don't mind going through a CSV step for what I'm doing.

@spens spens closed this as completed Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants