SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

spens · 2023-12-26T14:40:20Z

After creating a simple table in parquet format with an integer column, selecting * from table causes an error.

Steps to reproduce:

1. Create sample data
Save the following content into a file called myTable.csv:

column1,operation_date,name,offer_type,time_dy,rate
1001,1/31/2023,ABC.LOC_01,offerType001,3/24/2023,0.8
1002,2/1/2023,ABC.LOC_01,offerType002,3/25/2023,0.8
1004,2/2/2023,ABC.LOC_01,offerType003,3/26/2023,0.8

2. Convert the sample data to parquet format.
Use this URL: https://observablehq.com/@observablehq/csv-to-parquet
Click the button "Choose File"
Select the file created above "myTable.csv"
After file uploads, click the button for "Download myTable.parquet"

3. Run Trino
You can use this project: https://github.com/njanakiev/trino-minio-docker
git clone https://github.com/njanakiev/trino-minio-docker.git
Launch the containers:

cd trino-minio-docker
docker-compose up -d

4. Load the parquet file into S3
Open http://localhost:9000 in your browser (default login credentials: access key = "minio_access_key", secret key = "minio_secret_key"
Click on the + button in the lower right corner of the screen
Click on the icon for "Create Bucket" and use the name "mybucket"
Click on the + button again and choose "Upload file"
Select the myTable.parquet file created earlier

5. Start Trino command line client
Download jar file from here: https://trino.io/download.html
Find the button in the section called "Command Line client" and download the trino-cli-xxx-executable.jar file
Run the jar file (assumes you have java installed and on your path)
(replace 435 with your version)
java -jar trino-cli-435-executable.jar

6. Create SCHEMA
At the trino command prompt, run the create schema command:

CREATE SCHEMA IF NOT EXISTS minio.mybucket
WITH (location = 's3a://mybucket/');

7. Create TABLE
Run the following to create the table:

CREATE TABLE IF NOT EXISTS minio.mybucket.mytable_parquet (
column1 INTEGER,
operation_date DATE,
name VARCHAR,
offer_type VARCHAR,
time_dy DATE,
rate DOUBLE
)
WITH (
  external_location = 's3a://mybucket/',
  format = 'PARQUET'
);

8. Query the table
SELECT * FROM minio.mybucket.mytable_parquet;

9. See error:

trino> SELECT * FROM minio.mybucket.mytable_parquet;

Query 20231226_142547_00002_bmngc, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
1.96 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20231226_142547_00002_bmngc failed: io.trino.spi.type.VarcharType

trino>

10. View log

2023-12-26 08:25:49 java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType
2023-12-26 08:25:49 at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
2023-12-26 08:25:49 at io.trino.parquet.reader.IntColumnReader.readValue(IntColumnReader.java:32)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:183)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:203)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:182)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:170)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:262)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:314)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:297)
2023-12-26 08:25:49 at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:164)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:381)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:360)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:276)
2023-12-26 08:25:49 at io.trino.spi.Page.getLoadedPage(Page.java:279)
2023-12-26 08:25:49 at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:304)
2023-12-26 08:25:49 at io.trino.operator.Driver.processInternal(Driver.java:379)
2023-12-26 08:25:49 at io.trino.operator.Driver.lambda$processFor$8(Driver.java:283)
2023-12-26 08:25:49 at io.trino.operator.Driver.tryWithLock(Driver.java:675)
2023-12-26 08:25:49 at io.trino.operator.Driver.processFor(Driver.java:276)
2023-12-26 08:25:49 at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2023-12-26 08:25:49 at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2023-12-26 08:25:49 at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2023-12-26 08:25:49 at io.trino.$gen.Trino_351____20231226_141624_2.run(Unknown Source)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2023-12-26 08:25:49 at java.base/java.lang.Thread.run(Thread.java:834)

SUMMARY
It looks like the IntColumnReader.readValue method is failing with an error:
java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType

Obviously, from the text above, the int column only contains integers. I've opened the parquet file in Power BI so that is valid. I've checked the COLUMNS_V2 table in the metastore_db and it looks correct (column types are: int, date, string, string, date, double).

The text was updated successfully, but these errors were encountered:

raunaqmorarka · 2023-12-28T16:26:18Z

Can you try this with a recent version of Trino ? The 351 version being used here is very old and the code in this area has changed a lot since then.

spens · 2023-12-28T16:36:18Z

Yes, I will try it with the latest version and report back.

spens · 2024-01-06T22:57:27Z

I'm going to close this. After further troubleshooting, it is not related to a Trino version, but my issue seems to be related to the parquet file.

If I create the file by attaching DuckDB to a Postgres instance and dumping the tables as parquet files, I get many errors in Trino with those parquet files. However, if I export the table from Postgres to a CSV (using DBeaver) then use DuckDB to create the parquet file from the CSV, Trino is happy.

Somehow my environment gets in a bad state and then even files that would normally work have problems (that's why the CSV in my issue description reproduced the error for me, but not when I cleared out my docker volumes and started over).

Both parquet files work in some other tools (like Power BI), but I don't mind going through a CSV step for what I'm doing.

spens closed this as completed Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

spens commented Dec 26, 2023 •

edited

Loading

raunaqmorarka commented Dec 28, 2023

spens commented Dec 28, 2023

spens commented Jan 6, 2024

SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

SELECT * FROM TABLE; (parquet) causes java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType #20222

Comments

spens commented Dec 26, 2023 • edited Loading

raunaqmorarka commented Dec 28, 2023

spens commented Dec 28, 2023

spens commented Jan 6, 2024

spens commented Dec 26, 2023 •

edited

Loading