Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read compressed(4mc) TEXT file after upgrading from 410 to 411 #19084

Closed
pangyifish opened this issue Sep 19, 2023 · 3 comments
Closed

Comments

@pangyifish
Copy link

pangyifish commented Sep 19, 2023

We added a 4mc compression jar. The 4mc compressed text files were readable before 411.

Error opening Hive split s3a://xxx/xxxxxxxx.4mz (offset=23042995, length=23042995) using org.apache.hadoop.mapred.TextInputFormat: Cannot seek in FourMzUltraCodec compressed stream', 'errorCode': 16777219, 'errorName': 'HIVE_CANNOT_OPEN_SPLIT', 'errorType': 'EXTERNAL', 'failureInfo': {'type': 'io.trino.spi.TrinoException'
@pangyifish
Copy link
Author

io.trino.spi.TrinoException: Error opening Hive split s3a://xxxx.4mz (offset=23042995, length=23042995) using org.apache.hadoop.mapred.TextI
nputFormat: Cannot seek in FourMzUltraCodec compressed stream
        at io.trino.plugin.hive.util.HiveUtil.createRecordReader(HiveUtil.java:289)
        at io.trino.plugin.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$1(GenericHiveRecordCursorProvider.java:96)
        at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
        at io.trino.hdfs.HdfsEnvironment.doAs(HdfsEnvironment.java:93)
        at io.trino.plugin.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:95)
        at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:255)
        at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:154)
        at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49)
        at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:62)
        at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:298)
        at io.trino.operator.Driver.processInternal(Driver.java:402)
        at io.trino.operator.Driver.lambda$process$8(Driver.java:305)
        at io.trino.operator.Driver.tryWithLock(Driver.java:701)
        at io.trino.operator.Driver.process(Driver.java:297)
        at io.trino.operator.Driver.processForDuration(Driver.java:268)
        at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:845)
        at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
        at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:537)
        at io.trino.$gen.Trino_411____20230919_030613_2.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Cannot seek in FourMzUltraCodec compressed stream
        at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:126)
        at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at io.trino.plugin.hive.util.HiveUtil.createRecordReader(HiveUtil.java:269)
        ... 21 more

@raunaqmorarka
Copy link
Member

cc: @dain @electrum

@electrum
Copy link
Member

electrum commented Oct 9, 2023

Apologies for breaking your use case. As part of the project to decouple Trino from Hadoop and Hive codebases, custom compression codecs no longer work (they were never officially supported or tested).

This could likely be supported by adding a native Trino implementation. I recommend starting at io.trino.hive.formats.compression.CompressionKind and seeing how the existing implementations work.

@dain dain closed this as completed Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants