问题：

纱线在收缩作业失败时将收缩作业报告为完成和成功

周培

2023-03-14

我在Thread上运行flink作业，我们使用命令行中的“fink run”将作业提交给Thread，有一天我们在flink作业上出现异常，因为我们没有启用flink重启策略，所以它只是失败了，但最终我们从Thread应用程序列表中发现作业状态为“成功”，我们预期为“失败”。

Flink CLI日志：

06/12/2018 03:13:37 FlatMap (getTagStorageMapper.flatMap)(23/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (ResultReducer.reduceGroup)(31/32) switched to CANCELED 
06/12/2018 03:13:37 FlatMap (SubClassEDFJoinMapper.flatMap)(29/32) switched to CANCELED 
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (SubClassInventoryMapper.flatMap)(27/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (OutputReducer.reduceGroup)(28/32) switched to CANCELED 
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (BIMBQMInstrumentMapper.flatMap)(27/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (BIMBQMGovCorpReduce.reduceGroup)(30/32) switched to CANCELED 
06/12/2018 03:13:37 FlatMap (BIMBQMEVMJoinMapper.flatMap)(32/32) switched to CANCELED 
06/12/2018 03:13:37 Job execution switched to status FAILED.
No JobSubmissionResult returned, please make sure you called ExecutionEnvironment.execute()
2018-06-12 03:13:37,625 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master
2018-06-12 03:13:37,625 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.
2018-06-12 03:13:37,630 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,632 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2018-06-12 03:13:37,633 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,634 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.
2018-06-12 03:13:37,635 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager.
2018-06-12 03:13:37,688 INFO  org.apache.flink.yarn.ApplicationClient                       - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345]
2018-06-12 03:13:38,648 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2018-06-12 03:13:39,480 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1528772982594_0001 finished with state FINISHED and final state SUCCEEDED at 1528773218662
2018-06-12 03:13:39,480 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down
2018-06-12 03:13:39,582 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.
2018-06-12 03:13:39,583 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager Actor[akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345].

Flink作业管理器日志：

FlatMap (BIMBQMEVMJoinMapper.flatMap) (32/32) (67a002e07fe799c1624a471340c8cf9d) switched from CANCELING to CANCELED.
Try to restart or fail the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) if no longer possible.
Requesting new TaskManager container with 8192 megabytes memory. Pending requests: 1
Job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) switched from state FAILING to FAILED.
Could not restart the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) because the restart strategy prevented it.
Unregistered task manager ip-10-97-44-186/10.97.44.186. Number of registered task managers 31. Number of available slots 31
Stopping JobManager with final application status SUCCEEDED and diagnostics: Flink YARN Client requested shutdown
Shutting down cluster with status SUCCEEDED : Flink YARN Client requested shutdown
Unregistering application from the YARN Resource Manager
Waiting for application to be successfully unregistered.

有谁能帮我理解为什么塞恩说我的Flink工作是“成功的”？

共有1个答案

章琛

2023-03-14

纱线中报告的应用程序状态并不反映已执行作业的状态，而是反映Flink簇的状态，因为这是纱线应用程序。因此，纱线应用的最终状态仅取决于燧石簇是否正确完成。换言之，如果作业失败，则不一定意味着Flink集群失败。这是两件不同的事情。

类似资料：

纱线验收后，MapReduce作业失败

即使是一个简单的WordCount mapduce也会因相同的错误而失败。 Hadoop 2.6.0 下面是纱线原木。似乎在资源协商期间发生了某种超时但我无法验证这一点，即超时的确切原因。 2016-11-11 15:38:09313信息组织。阿帕奇。hadoop。纱线服务器resourcemanager。amlauncher。AMLauncher：启动appattempt\u 1478856
成批缩松作业纱线簇的低性能

aws上的3台机器（32个内核和64 GB内存）我手动安装了带有hdfs和yarn服务的Hadoop2（没有使用EMR）。机器#1运行hdfs-(NameNode&SeconderyNameNode)和yarn-(resourcemanager)，在masters文件中定义问题是，我认为我做错了，因为这项工作需要相当多的时间，大约一个小时，我认为它不是很优化。我使用以下命令运行flink：
线程：线程在完成作业之前退出

我试图理解Python线程的“守护进程”标志。这我知道线程可以标记为“守护线程”。此标志的意义在于，当只剩下守护进程线程时，整个Python程序将退出。初始值从创建线程继承。但在我的例子中，python程序在守护进程线程离开并且线程没有完成其工作之前退出。主程序第一个线程只写5000个第一个整数，而第二个线程不写任何数字
在作业完成之前返回Spring Batch作业ID

> 我试图在作业完成之前返回Spring Batch作业ID。我当前的实现只在作业完成后返回信息。我使用批处理程序控制器和批处理服务，发布在下面。谢谢，我是新来的Spring Batch，经过详尽的搜索，找不到太多与我的问题相关的。有一个帖子有人使用Apache骆驼，但我没有。控制器服务再次感谢。编辑我已将其添加到批处理配置中编辑在马哈茂德·本·哈辛的评论的帮助下，我解决了这个问
在纱线簇上执行Spark Submit时看不到纱线作业

我正在使用spark submit执行以下命令： spark submit script\u测试。py—主纱线—部署模式群集spark submit script\u测试。py—主纱线簇—部署模式簇这工作做得很好。我可以在Spark History Server UI下看到它。但是，我无法在RessourceManager UI（纱线）下看到它。我感觉我的作业没有发送到集群，但它只在一个节点上
作业在完成后未删除 ttlSecondsAfterDone

我有一项服务，每天在Kubernetes上部署数千个短期工作。我试图让Kubernetes在完成后使用这里描述的功能删除这些作业: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically 作业完成，但在表示的时间限制之

纱线在收缩作业失败时将收缩作业报告为完成和成功

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档