You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
t1 left join t2 on t1.id = t2.id column id has one key, for example 0000-00-00 ,has 100,000 records t2 has same key in column id also has 100,000 records ,this will generate 100000*100000 = 10B records!! for only one reducer.
carbon solution no use for it,please check it. -- call hw.
The text was updated successfully, but these errors were encountered:
hi ,kongxianghe, We have also found a similar problem. If two tables are join, it will be very time-consuming if there is no de-duplication. And spark only uses a few executors..
spark.sql.adaptive.enabled=true
spark.sql.adaptive.skewedJoin.enabled=true
spark.sql.adaptive.skewedPartitionMaxSplits=5
spark.sql.adaptive.skewedPartitionRowCountThreshold=10000000
spark.sql.adaptive.skewedPartitionSizeThreshold=67108864
spark.sql.adaptive.skewedPartitionFactor : 5
--- In Spark2x JDBC no use for it.
t1 left join t2 on t1.id = t2.id column id has one key, for example 0000-00-00 ,has 100,000 records t2 has same key in column id also has 100,000 records ,this will generate 100000*100000 = 10B records!! for only one reducer.
carbon solution no use for it,please check it. -- call hw.
The text was updated successfully, but these errors were encountered: