当前位置: 首页 > 知识库问答 >
问题:

无法使用Zookeeper启动Flink HA群集

龙景澄
2023-03-14

我试图安装一个Flink HA群集(动物园管理员模式),但任务管理器找不到作业管理器。

这里我给你介绍一下建筑;

- Machine 1 : Job Manager + Zookeeper
- Machine 2 : Task Manager

大师:

Machine1

奴隶:

Machine2

flink-conf.yaml:

#jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
blob.server.port: 50100-50200
taskmanager.data.port: 6121
high-availability: zookeeper
high-availability.zookeeper.quorum: Machine1:2181
high-availability.zookeeper.path.root: /flink-1.5.1
high-availability.cluster-id: /default_b
high-availability.storageDir: file:///shareflink/recovery

这里是任务管理器的日志,它试图连接到localhost而不是Machine1:

2018-08-17 10:46:44,875 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to select the network interface and address to use by connecting to the leading JobManager.
2018-08-17 10:46:44,876 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
2018-08-17 10:46:44,966 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved new target address /127.0.0.1:37133.
2018-08-17 10:46:45,324 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address /127.0.0.1:37133
2018-08-17 10:46:45,325 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'Machine2/IP-Machine2': Connection refused
2018-08-17 10:46:45,325 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,325 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/IP_Machine2': Connection refused
2018-08-17 10:46:45,325 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,326 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/IP_Machine2': Connection refused
2018-08-17 10:46:45,326 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,726 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address /127.0.0.1:37133
2018-08-17 10:46:45,727 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'Machine2/IP-Machine2

2018-08-17 10:47:22,022 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:36515] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:36515]] Caused by: [Connection refused: /127.0.0.1:36515]

2018-08-17 10:47:22,022 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:36515/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:36515/user/resourcemanager..
2018-08-17 10:47:32,037 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:36515

PS.:/etc/hosts包含localhost、Machine1和Machine2

你能告诉我任务经理如何连接到工作经理吗?

当做

共有1个答案

傅胡媚
2023-03-14

这就是我们对TaskManager的看法。它是否可以作为一个没有HA的集群工作?

root@flink-taskmanager-deployment-nonprod-597f858cb-4nmbr:/opt/flink# cat conf/masters 
flink-jobmanager-nonprod.rpds.svc.cluster.local:8081

root@flink-taskmanager-deployment-nonprod-597f858cb-4nmbr:/opt/flink# cat conf/slaves 
localhost

root@flink-taskmanager-deployment-nonprod-597f858cb-4nmbr:/opt/flink# cat conf/flink-conf.yaml 

jobmanager.rpc.address: flink-jobmanager-nonprod.rpds.svc.cluster.local
...
...
 类似资料:
  • 我有3个运行在和的Zoomaster服务器。 我想在和启动4台Solr服务器,以指向上面的Zoomaster Ensemble。所以在server1,我运行一个命令: 我收到一条错误消息: 但是,如果我指向一个ZooGuard服务器,例如: 它成功启动。 在Windows中运行的所有服务器。 我做错了什么?或者Windows中的Solr start脚本是否有错误?

  • 我正在亚马逊EC2和ubuntu上运行Kafka。首先,我尝试运行zookeeper服务器并创建一个测试主题。最终目的是将spark与Kafka结合起来进行情感分析。 当我尝试启动zookeeper服务器时,我收到以下警告,并且进程似乎没有结束,即键入此命令后,我没有看到shell提示:bin/zookeeper服务器启动。sh配置/zookeeper。属性 警告配置中未定义配置或未定义仲裁,以独

  • 我使用的是Kafka版本Kafka2.12-2.4.1 我已验证不存在防火墙问题 已验证端口可用 我只需下载tar,解压缩它并修改配置中zookeeper.properties文件中的dataDir属性,以指向自定义zookeeper文件夹 将kafka2.12-2.4.1/bin/windows添加到环境变量 使用与Kafka打包的zookeeper 使用windows 7 我知道连接还没有建立

  • 我不明白我做错了什么。有人能帮忙吗。我成功地能够在Linux“bin/zookeeper-server-start.sh config/zookeeper.properties”中启动zookeeper和kafka服务器。

  • 在CentOS7上运行HBase 2.0.4和Hadoop 2.8.5,其中有1个主节点、4个从节点。我在HBase2.1.3上尝试过同样的设置,也出现了同样的问题。 由于Zookeeper未解析HRegionservers,HMaster无法启动,如此错误日志所示。 我的配置文件如下所示: ----hbase-site.xml--- ----区域服务器---- ----/etc/hosts---