问题：

Spark独立集群调优

云弘壮

2023-03-14

--driver-memory = 7GB (default - 1core is used)
--worker-memory = 43GB (all remaining cores - 7 cores)

17/12/14 03:29:39 WARN HeartbeatReceiver: Removing executor 2 with no recent heartbeats: 3658237 ms exceeds timeout 3600000 ms  
17/12/14 03:29:39 ERROR TaskSchedulerImpl: Lost executor 2 on 10.150.143.81: Executor heartbeat timed out after 3658237 ms  
17/12/14 03:29:39 WARN TaskSetManager: Lost task 23.0 in stage 316.0 (TID 9449, 10.150.143.81, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 3658237 ms  
17/12/14 03:29:39 WARN TaskSetManager: Lost task 9.0 in stage 318.0 (TID 9459, 10.150.143.81, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 3658237 ms  
17/12/14 03:29:39 WARN TaskSetManager: Lost task 8.0 in stage 318.0 (TID 9458, 10.150.143.81, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 3658237 ms  
17/12/14 03:29:39 WARN TaskSetManager: Lost task 5.0 in stage 318.0 (TID 9455, 10.150.143.81, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 3658237 ms  
17/12/14 03:29:39 WARN TaskSetManager: Lost task 7.0 in stage 318.0 (TID 9457, 10.150.143.81, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 3658237 ms

应用程序不是那么占用内存，有两个连接和写数据集到目录。同样的代码在spark-shell上运行没有任何失败。

寻找群集调优或任何配置设置，这将减少执行器被杀死。

共有1个答案

秦宏硕

2023-03-14

首先，我建议不要给任何应用程序分配总共50GB的RAM如果您的实例正好有50GB的RAM。系统应用程序的其余部分也需要一些RAM来工作，应用程序不使用的RAM被系统用来缓存文件和减少磁盘读取量。JVM本身在外部也有很小的内存开销。

如果您的spark作业使用了所有内存，那么您的实例将不可避免地交换，如果它交换了，它将开始出现错误行为。您可以通过运行命令HTOP轻松地检查内存使用情况，并查看服务器是否正在进行交换。您还应该确保swapiness减少到0，这样它就不会进行交换，除非它真的必须进行交换。

鉴于您提供的信息，我只能说这些，如果这没有帮助，您应该考虑提供更多的信息，如您的spark工作的完整准确参数。

类似资料：

Spark独立集群

工人出现在图片上。为了运行我的代码，我使用了以下命令：
Spark 1.2.1独立集群模式spark-submit不起作用

/usr/local/spark-1.2.1-bin-hadoop2.4/bin/--类com.fst.firststep.aggregator.firststepmessageProcessor--主spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077--部署模式集群--监督文件：///home/xyz/sparkstreaming-0.0.1
网络“桥”中docker上的Spark独立集群

null sbin/start-slave.sh spark://c96___37fb:7077--用于并置从机的端口7078 sbin/start-slave.sh spark：//masternodeip:7077--其他两个从机的端口7078 前面引用的所有端口都从nodeMaster重定向到相应的Docker。因此，webUI向我显示，我的集群有3个连接的节点，不幸的是，当运行时，只有并
Kubernetes集群上运行的Spark独立集群的Hadoop集群Kerberos身份验证

我已经在Kubernetes上建立了Spark独立集群，并试图连接到Kubernetes上没有的Kerberized Hadoop集群。我已经将core-site.xml和hdfs-site.xml放在Spark集群的容器中，并相应地设置了HADOOP_CONF_DIR。我能够成功地在Spark容器中为访问Hadoop集群的principal生成kerberos凭据缓存。但是当我运行spark-s
Spark独立集群-从机未连接到主机

我正试图按照官方文档设置一个Spark独立集群。我的主人在一个运行ubuntu的本地vm上，我也有一个工作人员在同一台机器上运行。它是连接的，我能够在大师的WebUI中看到它的地位。以下是WebUi图像- 我已经在两台机器上的/etc/hosts中添加了主IP地址和从IP地址。我遵循了SPARK+独立集群中给出的所有解决方案：无法从另一台机器启动worker，但它们对我不起作用。我在两台机器
无法通过SparkyR连接到独立的spark群集。如何调试？

我可以确认使用spark shell连接到仪表盘，例如。作品但是没有并给出错误

Spark独立集群调优

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档