Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive partition parsing error #19420

Open
wangmiao1002 opened this issue Oct 17, 2023 · 3 comments
Open

Hive partition parsing error #19420

wangmiao1002 opened this issue Oct 17, 2023 · 3 comments

Comments

@wangmiao1002
Copy link

Hi, community

I have a hive external table with a four layer partition
logtype、mod and date are all partition fields
Running the SQL below was unable to parse the partition fields. A full partition scan was performed, which delayed the query time. The table has a large number of partitions, so the query is slow, and the slow logs of the hive metadata were also captured, resulting in obtaining all the SQL statements for the problems. How can I avoid this issue
environmental information
trino :426
hive:2.1.1-cdh6.3.2

image

image

image

@wangmiao1002
Copy link
Author

Previously used presto queries, the cluster was using ec to compress data, so it was replaced with trino; Presto can parse partition information during queries

@electrum
Copy link
Member

Can you share the output of SHOW CREATE TABLE applog_online, at least for the partition columns?

@wangmiao1002
Copy link
Author

wangmiao1002 commented Oct 19, 2023

Can you share the output of SHOW CREATE TABLE applog_online, at least for the partition columns?

Okay, here is the table information. The partition field date is also set with a date type, and I have also added the cache parameters in trino. I found that it is still slow. Could you please give me some advice on how to avoid it? Thank you

CREATE EXTERNAL TABLE `applog_online`(
	  `message` string)
	PARTITIONED BY ( 
	  `date` date, 
	  `mod` string, 
	  `logtype` string, 
	  `host` string)
	ROW FORMAT SERDE 
	  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
	WITH SERDEPROPERTIES ( 
	  'field.delim'='\t', 
	  'serialization.format'='\t') 
	STORED AS INPUTFORMAT 
	  'org.apache.hadoop.mapred.TextInputFormat' 
	OUTPUTFORMAT 
	  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
	LOCATION
	  'hdfs://nameservice1/user/hive/warehouse/applog_online'
	TBLPROPERTIES (
	  'transient_lastDdlTime'='1678851016')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants