• 欢迎关注微信公众号:九万里大数据
  • 请使用Ctrl+D收藏本站到书签栏
  • 手机也可访问本站 jwldata.com

执行动态分区INSERT OVERWRITE报错org.apache.hadoop.mapred.YarnChild GC overhead limit exceeded

大数据技术 九万里大数据 10个月前 (07-27) 622次浏览 0个评论 扫描二维码
文章目录[隐藏]

在执行动态分区INSERT OVERWRITE时,如果源表是有很多分区的大表,任务可能会报错org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded。

YARN报错

2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: 3 finished. closing... 
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: DESERIALIZE_ERRORS:0
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: RECORDS_IN:1447355
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator: 1 finished. closing... 
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator: 2 finished. closing... 
2021-07-07 11:38:29,493 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[2]: records written - 1447835
2021-07-07 11:38:44,703 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
	at parquet.hadoop.metadata.ParquetMetadata.<clinit>(ParquetMetadata.java:41)
	at parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:468)
	at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
	at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.close(ParquetRecordWriterWrapper.java:127)
	at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.close(ParquetRecordWriterWrapper.java:144)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:253)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1027)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2021-07-07 11:38:46,482 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
Jul 7, 2021 11:38:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
Jul 7, 2021 11:38:17 AM WARNING: parquet.hadoop.MemoryManager: Total allocation exceeds 50.00% (3,817,865,216 bytes) of heap memory
Scaling row group sizes to 6.01% for 473 writers
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
Jul 7, 2021 11:38:17 AM INFO: parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
Jul 7, 2021 11:38:17 AM WARNING: parquet.hadoop.MemoryManager: Total allocation exceeds 50.00% (3,817,865,216 bytes) of heap memory
Scaling row group sizes to 6.00% for 474 writers

考虑调大以下四个参数

mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
 
mapreduce.map.java.opts
mapreduce.reduce.java.opts

建表语句

假如我们用bank.account表示源表,bank.account7是相同表结构的目标表。

CREATE TABLE IF NOT EXISTS bank.account (
  `id_card` int,
  `tran_time` string,
  `name` string,
  `cash` int
  )
partitioned by(ds string)
stored as parquet
TBLPROPERTIES ("parquet.compression"="SNAPPY");
 
INSERT INTO bank.account partition(ds='2020-09-21') values (1000, '2020-09-21 14:30:00', 'Tom', 100);
INSERT INTO bank.account partition(ds='2020-09-20') values (1000, '2020-09-20 14:30:05', 'Tom', 50);
INSERT INTO bank.account partition(ds='2020-09-20') values (1000, '2020-09-20 14:30:10', 'Tom', -25);
INSERT INTO bank.account partition(ds='2020-09-21') values (1001, '2020-09-21 15:30:00', 'Jelly', 200);
INSERT INTO bank.account partition(ds='2020-09-21') values (1001, '2020-09-21 15:30:05', 'Jelly', -50);

调大参数

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=60000;
set hive.exec.max.dynamic.partitions.pernode=60000;
set mapreduce.map.memory.mb=65536;
set mapreduce.reduce.memory.mb=65536;
set mapreduce.map.java.opts=-Xmx32768m;
set mapreduce.reduce.java.opts=-Xmx32768m;
 
INSERT OVERWRITE TABLE bank.account7 partition (ds) SELECT id_card, tran_time, name, cash, ds FROM bank.account;

参考文档

https://zhuanlan.zhihu.com/p/90953401
https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_ig_yarn_tuning.html
https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cm_mc_yarn_service.html
https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_ig_mapreduce_to_yarn_migrate1.html

欢迎关注我的微信公众号“九万里大数据”,原创技术文章第一时间推送。
欢迎访问原创技术博客网站 jwldata.com,排版更清晰,阅读更爽快。


执行动态分区INSERT OVERWRITE报错org.apache.hadoop.mapred.YarnChild GC overhead limit exceeded
 


本站文章,如未注明,均为原创 | 原创文章版权归九万里大数据所有,未经许可不得转载。
本文链接:执行动态分区INSERT OVERWRITE报错org.apache.hadoop.mapred.YarnChild GC overhead limit exceeded
喜欢 (3)

您必须 登录 才能发表评论!