WARN TaskSetManager: Lost task 44.0 in stage 1368.0 (TID 17283, 172.19.32.66, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 167411 ms
参考文章:
https://blog.csdn.net/datadev_sh/article/details/83541153
提交任务到spark集群,老是超时。至于超时原因,可能是计算量太大。
解决方案:提交时,加一个参数“–conf spark.network.timeout=10000000”。
spark-submit
–conf spark.network.timeout=10000000 \
自身修改如下:
–conf spark.network.timeout 1200s --conf spark.executor.heartbeatInterval=1200s --conf spark.driver.maxResultSize=4g
同时还会报出如下错误:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.