You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Convert the sample data to parquet format.
Use this URL: https://observablehq.com/@observablehq/csv-to-parquet
Click the button "Choose File"
Select the file created above "myTable.csv"
After file uploads, click the button for "Download myTable.parquet"
4. Load the parquet file into S3
Open http://localhost:9000 in your browser (default login credentials: access key = "minio_access_key", secret key = "minio_secret_key"
Click on the + button in the lower right corner of the screen
Click on the icon for "Create Bucket" and use the name "mybucket"
Click on the + button again and choose "Upload file"
Select the myTable.parquet file created earlier
5. Start Trino command line client
Download jar file from here: https://trino.io/download.html
Find the button in the section called "Command Line client" and download the trino-cli-xxx-executable.jar file
Run the jar file (assumes you have java installed and on your path)
(replace 435 with your version) java -jar trino-cli-435-executable.jar
6. Create SCHEMA
At the trino command prompt, run the create schema command:
CREATE SCHEMA IF NOT EXISTS minio.mybucket
WITH (location = 's3a://mybucket/');
7. Create TABLE
Run the following to create the table:
CREATE TABLE IF NOT EXISTS minio.mybucket.mytable_parquet (
column1 INTEGER,
operation_date DATE,
name VARCHAR,
offer_type VARCHAR,
time_dy DATE,
rate DOUBLE
)
WITH (
external_location = 's3a://mybucket/',
format = 'PARQUET'
);
8. Query the table SELECT * FROM minio.mybucket.mytable_parquet;
2023-12-26 08:25:49 java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType
2023-12-26 08:25:49 at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
2023-12-26 08:25:49 at io.trino.parquet.reader.IntColumnReader.readValue(IntColumnReader.java:32)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:183)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:203)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:182)
2023-12-26 08:25:49 at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:170)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:262)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:314)
2023-12-26 08:25:49 at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:297)
2023-12-26 08:25:49 at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:164)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:381)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:360)
2023-12-26 08:25:49 at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:276)
2023-12-26 08:25:49 at io.trino.spi.Page.getLoadedPage(Page.java:279)
2023-12-26 08:25:49 at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:304)
2023-12-26 08:25:49 at io.trino.operator.Driver.processInternal(Driver.java:379)
2023-12-26 08:25:49 at io.trino.operator.Driver.lambda$processFor$8(Driver.java:283)
2023-12-26 08:25:49 at io.trino.operator.Driver.tryWithLock(Driver.java:675)
2023-12-26 08:25:49 at io.trino.operator.Driver.processFor(Driver.java:276)
2023-12-26 08:25:49 at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
2023-12-26 08:25:49 at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
2023-12-26 08:25:49 at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
2023-12-26 08:25:49 at io.trino.$gen.Trino_351____20231226_141624_2.run(Unknown Source)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2023-12-26 08:25:49 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2023-12-26 08:25:49 at java.base/java.lang.Thread.run(Thread.java:834)
SUMMARY
It looks like the IntColumnReader.readValue method is failing with an error: java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType
Obviously, from the text above, the int column only contains integers. I've opened the parquet file in Power BI so that is valid. I've checked the COLUMNS_V2 table in the metastore_db and it looks correct (column types are: int, date, string, string, date, double).
The text was updated successfully, but these errors were encountered:
I'm going to close this. After further troubleshooting, it is not related to a Trino version, but my issue seems to be related to the parquet file.
If I create the file by attaching DuckDB to a Postgres instance and dumping the tables as parquet files, I get many errors in Trino with those parquet files. However, if I export the table from Postgres to a CSV (using DBeaver) then use DuckDB to create the parquet file from the CSV, Trino is happy.
Somehow my environment gets in a bad state and then even files that would normally work have problems (that's why the CSV in my issue description reproduced the error for me, but not when I cleared out my docker volumes and started over).
Both parquet files work in some other tools (like Power BI), but I don't mind going through a CSV step for what I'm doing.
After creating a simple table in parquet format with an integer column, selecting * from table causes an error.
Steps to reproduce:
1. Create sample data
Save the following content into a file called myTable.csv:
2. Convert the sample data to parquet format.
Use this URL: https://observablehq.com/@observablehq/csv-to-parquet
Click the button "Choose File"
Select the file created above "myTable.csv"
After file uploads, click the button for "Download myTable.parquet"
3. Run Trino
You can use this project: https://github.com/njanakiev/trino-minio-docker
git clone https://github.com/njanakiev/trino-minio-docker.git
Launch the containers:
4. Load the parquet file into S3
Open http://localhost:9000 in your browser (default login credentials: access key = "minio_access_key", secret key = "minio_secret_key"
Click on the + button in the lower right corner of the screen
Click on the icon for "Create Bucket" and use the name "mybucket"
Click on the + button again and choose "Upload file"
Select the myTable.parquet file created earlier
5. Start Trino command line client
Download jar file from here: https://trino.io/download.html
Find the button in the section called "Command Line client" and download the trino-cli-xxx-executable.jar file
Run the jar file (assumes you have java installed and on your path)
(replace 435 with your version)
java -jar trino-cli-435-executable.jar
6. Create SCHEMA
At the trino command prompt, run the create schema command:
7. Create TABLE
Run the following to create the table:
8. Query the table
SELECT * FROM minio.mybucket.mytable_parquet;
9. See error:
10. View log
SUMMARY
It looks like the IntColumnReader.readValue method is failing with an error:
java.lang.UnsupportedOperationException: io.trino.spi.type.VarcharType
Obviously, from the text above, the int column only contains integers. I've opened the parquet file in Power BI so that is valid. I've checked the COLUMNS_V2 table in the metastore_db and it looks correct (column types are: int, date, string, string, date, double).
The text was updated successfully, but these errors were encountered: