org.apache.flink.client.deployment.ClusterDeploymentException: Couldn‘t deploy Yarn session cluster

吕华彩
2023-12-01

flink集群启动异常如下:

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
        at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:381)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:549)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:786)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:786)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1616074139205_0001 failed 1 times due to AM Container for appattempt_1616074139205_0001_000001 exited with  exitCode: -103
For more detailed output, check application tracking page:http://hdfs-04:8088/cluster/app/application_1616074139205_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=3042,containerID=container_1616074139205_0001_01_000001] is running beyond virtual memory limits. Current usage: 312.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1616074139205_0001_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 3059 3042 3042 3042 (java) 493 727 2293784576 79582 /usr/java/default/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/bigdata/hadoop-2.7.7/logs/userlogs/application_1616074139205_0001/container_1616074139205_0001_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 
        |- 3042 3040 3042 3042 (bash) 0 2 115908608 304 /bin/bash -c /usr/java/default/bin/java -Xms424m -Xmx424m -Dlog.file=/opt/bigdata/hadoop-2.7.7/logs/userlogs/application_1616074139205_0001/container_1616074139205_0001_01_000001/jobmanager.log -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> /opt/bigdata/hadoop-2.7.7/logs/userlogs/application_1616074139205_0001/container_1616074139205_0001_01_000001/jobmanager.out 2> /opt/bigdata/hadoop-2.7.7/logs/userlogs/application_1616074139205_0001/container_1616074139205_0001_01_000001/jobmanager.err

使用命令:yarn-session.sh -n 3 -s 3 -nm flink-session -d -q 

原因是:running beyond virtual memory limits. Current usage: 312.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container,很明显是启动应用是内存不足所导致的

解决办法:在启动ApplicationManager时使用的虚拟内存不够,yarn中的虚拟内存默认设置=mapreduce.map.memory.mb 乘以 yarn.nodemanager.vmem-pmem-ratio ,而yarn.nodemanager.vmem-pmem-ratio 默认值为2.1 
所有这里最直接的解决办法就是,在hadoop的yarn-site.xml下添加如下配置:将value设置为3.1即可

<property>
   <name>yarn.nodemanager.vmem-pmem-ratio</name>
   <value>3.1</value>
</property>

当然,你也可以将mapreduce.map.memory.mb的值设置大一点,可以修改mapred-site.xml中的配置:但是这个需要根据机子运行 内存设置,不宜设置过大

<property>
   <name>mapreduce.map.memory.mb</name>
   <value>2048</value>
</property>
<property>
   <name>mapreduce.reduce.memory.mb</name>
   <value>2048</value>
</property>

其他的方法,比如设置不检查虚拟内存大小:yarn.nodemanager.vmem-check-enabled=false,就不建议了,容易内存溢出

 

 类似资料:

相关阅读

相关文章

相关问答