我正在尝试使用spark submit命令将Python应用程序提交到集群(AWS-EMR上的3机集群)。
令人惊讶的是,我看不到任务的任何预期输出。然后我简化了我的应用程序,只打印了一些固定字符串,但我仍然没有看到任何打印的消息。我在下面附上应用程序和命令。希望有人能帮我找到原因。非常感谢!
import sys
from pyspark import SparkContext
if __name__ == "__main__":
sc = SparkContext(appName="sparkSubmitTest")
for item in range(50):
print "I love this game!"
sc.stop()
./spark/bin/spark-submit --master yarn-cluster ./submit-test.py
[hadoop@ip-172-31-34-124 ~]$ ./spark/bin/spark-submit --master yarn-cluster ./submit-test.py
15/08/04 23:50:25 INFO client.RMProxy: Connecting to ResourceManager at /172.31.34.124:9022
15/08/04 23:50:25 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/08/04 23:50:25 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
15/08/04 23:50:25 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/04 23:50:25 INFO yarn.Client: Setting up container launch context for our AM
15/08/04 23:50:25 INFO yarn.Client: Preparing resources for our AM container
15/08/04 23:50:25 INFO yarn.Client: Uploading resource file:/home/hadoop/.versions/spark-1.3.1.e/lib/spark-assembly-1.3.1-hadoop2.4.0.jar -> hdfs://172.31.34.124:9000/user/hadoop/.sparkStaging/application_1438724051797_0007/spark-assembly-1.3.1-hadoop2.4.0.jar
15/08/04 23:50:26 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500
15/08/04 23:50:26 INFO metrics.MetricsSaver: Created MetricsSaver j-2LU0EQ3JH58CK:i-048c1ded:SparkSubmit:24928 period:60 /mnt/var/em/raw/i-048c1ded_20150804_SparkSubmit_24928_raw.bin
15/08/04 23:50:27 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 1053 raw values into 1 aggregated values, total 1
15/08/04 23:50:27 INFO yarn.Client: Uploading resource file:/home/hadoop/submit-test.py -> hdfs://172.31.34.124:9000/user/hadoop/.sparkStaging/application_1438724051797_0007/submit-test.py
15/08/04 23:50:27 INFO yarn.Client: Setting up the launch environment for our AM container
15/08/04 23:50:27 INFO spark.SecurityManager: Changing view acls to: hadoop
15/08/04 23:50:27 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/08/04 23:50:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/08/04 23:50:27 INFO yarn.Client: Submitting application 7 to ResourceManager
15/08/04 23:50:27 INFO impl.YarnClientImpl: Submitted application application_1438724051797_0007
15/08/04 23:50:28 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:28 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1438732227551
final status: UNDEFINED
tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/
user: hadoop
15/08/04 23:50:29 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:30 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:31 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:32 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:33 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)
15/08/04 23:50:34 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:34 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ip-172-31-39-205.ec2.internal
ApplicationMaster RPC port: 0
queue: default
start time: 1438732227551
final status: UNDEFINED
tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/
user: hadoop
15/08/04 23:50:35 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:36 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:37 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:38 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:39 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:40 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:41 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:42 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:43 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)
15/08/04 23:50:44 INFO yarn.Client: Application report for application_1438724051797_0007 (state: FINISHED)
15/08/04 23:50:44 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ip-172-31-39-205.ec2.internal
ApplicationMaster RPC port: 0
queue: default
start time: 1438732227551
final status: SUCCEEDED
tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/A
user: hadoop
您可以做的另一件快速而肮脏的事情是使用te命令将命令输出管道到文本文件:
./spark/bin/spark-submit --master yarn-cluster ./submit-test.py | tee temp_output.file
把我的答案贴在这里,因为我在其他地方没有找到。
我第一次尝试:纱线日志-applicationId applicationId\u xxxx被告知“日志聚合尚未完成或未启用”。
下面是挖掘打印消息的步骤:
1. Follow the link at the end of the execution, http://172.31.34.124:9046/proxy/application_1438724051797_0007/A (here reverse ssh and proxy needs to be setup).
2. at the application overview page, find out the AppMaster Node id: ip-172-31-41-6.ec2.internal:9035
3. go back to AWS EMR cluster list, find out the public dns for this id.
4. ssh from the driver node into this AppMaster Node. same key_pair.
5. cd /var/log/hadoop/userlogs/application_1438796304215_0005/container_1438796304215_0005_01_000001 (always choose the first container).
6. cat stdout
正如你所看到的,它非常复杂。将输出写入S3中托管的文件可能会更好。
问题内容: 我有一个表格,如果单击“提交”,则需要显示验证错误消息。 这是一个工作的家伙 用户开始进行更改时,验证工作正常。但是它不会显示任何错误消息。如果单击“提交”而不输入任何内容。 有实现这个的想法吗?或者以其他方式如何在单击“提交”按钮时使每个输入字段$ dirty 问题答案: 我在http://jsfiddle.net/thomporter/ANxmv/2/上找到了这个小提琴,它巧妙地进
本文向大家介绍Android使用Toast显示消息提示框,包括了Android使用Toast显示消息提示框的使用技巧和注意事项,需要的朋友参考一下 在前面的实例中,已经应用过Toast类来显示一个简单的提示框了。这次将对Toast进行详细介绍。Toast类用于在屏幕中显示一个消息提示框,该消息提示框没有任何控制按钮,并且不会获得焦点,经过一段时间后自动消失。通常用于显示一些快速提示信息,应用范围非
问题内容: 我陷入一个奇怪的问题。我的表格可以在本地计算机上正常工作,但是当我将其上传到某些Web服务器时,它不能正常工作。无需在线上传,当我添加一些值并单击提交时。它显示感谢消息。但是,在上载时,在添加任何值之后,在添加一些值并按下Submit时,值将输入到数据库中,但它不会显示Thankyou消息,而是保持原样。请提出一些解决方案。我应该怎么做才能使其在网上正常工作?我应该发送包含表单的单个H
gdb在退出时会提示: A debugging session is active. Inferior 1 [process 29686 ] will be killed. Quit anyway? (y or n) n 如果不想显示这个信息,则可以在gdb中使用如下命令把提示信息关掉: (gdb) set confirm off 也可以把这个命令加到.gdbinit文件里。
给出测试URL并运行文件后,它会显示错误消息,如“uncaughtexceptionjava.lang.NoClassDefFoundError:无法初始化类org.apache.jmeter.save.SaveService。有关详细信息,请参阅日志文件。” 已创建线程组已添加http采样器已添加listner 错误消息是“未捕获异常java.lang.NoClassDefFoundError: