当前位置: 首页 > 知识库问答 >
问题:

如何以编程方式为多播发现机制配置hazelcast?

左丘宜年
2023-03-14

如何以编程方式为多播发现机制配置hazelcast?

详情:

该文档仅提供了TCP/IP的一个示例,并且已经过时:它使用Config。setPort(),它已不存在。

我的配置看起来像这样,但发现不工作(即。我得到输出成员: 1

Config cfg = new Config();                  
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);

JoinConfig join = network.getJoin();
join.getTcpIpConfig().setEnabled(false);
join.getAwsConfig().setEnabled(false);
join.getMulticastConfig().setEnabled(true);

join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS);
join.getMulticastConfig().setMulticastPort(PORT_NUMBER);
join.getMulticastConfig().setMulticastTimeoutSeconds(200);

HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("Members: "+hazelInst.getCluster().getMembers().size());

如果我对多播超时进行了摸索,我要么得到“Members:1”,要么

12月05, 2013 8:50:42PMcom.hazelcast.nio.ReadHandler警告:[192.168.0.9]: 4446[dev]hz._hzInstance_1_dev。IO. thell-in-0关闭套接字到endpoint地址[192.168.0.7]: 4446,原因:java.io.EOFExc0019:远程套接字关闭!12月05, 2013 8:57:24下午com.hazelcast.instance.节点严重:[192.168.0.9]: 4446[开发]无法加入集群,关闭!com.hazelcast.core.黑兹尔卡斯特例外:300秒内加入失败!

Config cfg = new Config();                  
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);

JoinConfig join = network.getJoin();

join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2").
addMember("192.168.0.3").addMember("192.168.0.4").
addMember("192.168.0.5").addMember("192.168.0.6").
addMember("192.168.0.7").addMember("192.168.0.8").
addMember("192.168.0.9").addMember("192.168.0.10").
addMember("192.168.0.11").setRequiredMember(null).setEnabled(true);

//this sets the allowed connections to the cluster? necessary for multicast, too?
network.getInterfaces().setEnabled(true).addInterface("192.168.0.*");

HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster()
.getMembers().size()+" members.");

更准确地说,此运行生成输出

debug:通过JoinConfig{multicastConfig=MulticastConfig[启用=false,multicastGroup=224.2.2.3,multicastPort=54327,multicastTimeToLive=32,multicastTimeout秒=2,受托人接口=[]],tcpIpConfig=TcpIpConfig[启用=true,连接Timeout秒=5,成员=[192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4,192.168.0.5,192.168.0.6,192.168.0.7,192.168.0.8,192.168.0.9,192.168.0.10,192.168.0.11],awsConfig=AwsConfig{使能=false,区域='us-East-1',securityGroupName='null',tagKey='null',tagValue='null',host Header='ec2.amazonaws.com',连接Timeout秒=5}}具有1个成员。

我的非hazelcast实现使用UDP多播,工作正常。那么,防火墙真的是问题所在吗?

由于我没有iptable或安装iperf的权限,我正在使用com.hazelcast.examples.TestApp来检查我的网络是否正常工作,如第2章“直接展示”一节中的“开始使用黑兹尔卡斯特”所述:

我调用java-cp hazelcast-3.1。2.jar.com。黑泽尔卡斯特。例子。在192.168上测试PP。0.1并获得输出

...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node
INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING
Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.0.1]:5701 [dev] 

Members [1] {
    Member [192.168.0.1]:5701 this
}

Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED

然后我调用java-cp hazelcast-3.1。2.jar.com。黑泽尔卡斯特。例子。在192.168上测试PP。0.2并获得输出

...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node
INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector
INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701
Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.0.2]:5701 [dev] 

Members [2] {
    Member [192.168.0.1]:5701
    Member [192.168.0.2]:5701 this
}

Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED

所以多播发现通常对我的集群有效,对吗?5701也是发现的端口吗?38476在最后一个输出中是ID还是端口?

对于我自己的编程配置代码,加入仍然不起作用:(

修改后的TestApp给出了输出

joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, 
multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, 
trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, 
connectionTimeoutSeconds=5, members=[], requiredMember=null], 
awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', 
tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}

并且在几秒钟后检测到其他成员(如果所有实例都在同一时间启动,则在每个实例一次后仅将自己列为成员),而

我的程序给出输出

joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica\
stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond\
s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu\
ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.

并且在大约1分钟的运行时间内没有检测到成员(我大约每5秒计算一次成员)。

但是,如果集群上至少有一个TestApp实例同时运行,则会检测到所有TestApp实例和所有myProgram实例,并且我的程序工作正常。如果我启动TestApp一次,然后并行启动myProgram两次,TestApp会给出以下输出:

java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node
INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner
Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING
Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.180.240]:5701 [dev] 


Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED
Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService
INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/
Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement...
hazelcast[default] > Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [3] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.8]:5701
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty.
Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [2] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 
Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 

我在TestApp配置中看到的唯一区别是

config.getManagementCenterConfig().setEnabled(true);
        config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version);

for(int k=1;k<= LOAD_EXECUTORS_COUNT;k++){
    config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k));
}

所以我也不顾一切地把它添加到我的程序中。但是它并没有解决这个问题——仍然每个实例在整个运行过程中只检测到自己是成员。

可能是程序运行的时间不够长(正如pveentjer所说的那样)?

我的实验似乎证实了这一点:如果Hazelcast.newHazelcastInstance(cfg);和初始化清除()(即不再通过hazelcast通信,不再检查成员数量)之间的时间t是

  • 少于30秒,无通信和成员:1
  • 超过30秒:所有成员都被找到,通信发生(奇怪的是,似乎发生的时间远远超过t-30秒)。

30秒是Hazelcast集群需要的现实时间跨度,还是有什么奇怪的事情发生?下面是来自4个同时运行的我的程序的日志(在实例1和实例3中,查找hazelcast-成员会重叠30秒):

instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100  
2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model  SymToSim is about to\  exit

instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100 
2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model  SymToSim is about to\  exit

instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100  
2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model  SymToSim is about to\  exit

INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100  
2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model  SymToSim is about to\  exit

只有在Hazelcast集群中有足够多的成员后,我如何最好地启动我的实际分布式算法?我可以以编程方式设置hazelcast.initial.min.cluster.size吗?https://groups.google.com/forum/#!主题/hazelcast/sa-lmpEDa6A听起来像这样会阻止Hazelcast.newHazelcastInstance(cfg);直到达到initial.min.cluster.size。正确吗?不同的实例将如何同步(在哪个时间跨度内)解除屏蔽?


共有3个答案

梁泰
2023-03-14

看起来您正在使用TCP/IP集群,所以这很好。尝试以下内容(摘自hazelcast手册)

如果使用iptable,可以添加以下规则以允许从端口33000-31000出站流量:

iptables -A OUTPUT -p TCP --dport 33000:31000 -m state --state NEW -j ACCEPT

并控制从任何地址到端口5701的传入流量:

iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 5701 -j ACCEPT

并允许传入多播流量:

iptables -A INPUT -m pkttype --pkt-type multicast -j ACCEPT

连接测试如果由于计算机无法加入群集而出现问题,则可以检查两台计算机之间的网络连接。您可以使用名为iperf的工具来实现这一点。在一台机器上执行:iperf-s-p5701这意味着您正在侦听端口5701。

在另一台机器上执行以下命令:

iperf -c 192.168.1.107 -d -p 5701

将192.168.1.107替换为第一台计算机的ip地址。如果你运行这个命令,你会得到这样的输出:

------------------------------------------------------------
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.1.107, TCP port 5701
TCP window size: 59.4 KByte (default)
------------------------------------------------------------
[  5] local 192.168.1.105 port 40524 connected with 192.168.1.107 port 5701
[  4] local 192.168.1.105 port 5701 connected with 192.168.1.107 port 33641
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.2 sec  55.8 MBytes  45.7 Mbits/sec
[  5]  0.0-10.3 sec  6.25 MBytes  5.07 Mbits/sec

你知道这两台机器可以互相连接。然而,如果你看到这样的东西:

Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
connect failed: No route to host

然后你知道你可能有一个网络连接问题。

颜志业
2023-03-14

您的配置是正确的,但您已将很长的多播超时设置为200秒,默认值为2秒。设置较小的值将解决此问题。

来自Hazelcast Java API文档:MulticastConfig。html#设置多播TimeOutSeconds(int)

指定节点在声明自己为主节点并创建自己的群集之前,应等待来自网络中运行的另一个节点的有效多播响应的时间(秒)。这仅适用于尚未分配主节点的节点启动。如果指定一个高值,例如60秒,则意味着在选择主节点之前,每个节点都将等待60秒才能继续,因此请小心提供高值。如果该值设置得太低,则可能是节点过早放弃,将创建自己的集群

谭富
2023-03-14

问题显然是集群启动(和停止)并没有等到集群中有足够的成员。您可以设置hazelcast。最初的最小群集。size属性,以防止发生这种情况。

您可以通过编程方式设置hazelcast.initial.min.cluster.size:

Config config = new Config(); 
config.setProperty("hazelcast.initial.min.cluster.size","3");
 类似资料:
  • 问题内容: 您如何以编程方式为多播发现机制配置hazelcast? 细节: 该文档仅提供TCP / IP的示例,并且已过时:它使用了Config.setPort(),该属性不再存在。 我的配置如下所示,但是发现不起作用(即,我得到了输出: 更新1,考虑了asimarslan的答案 如果我对MulticastTimeout感到困惑,我要么得到要么 2013年12月5日晚上8:50:42 com.ha

  • 我正在尝试使用设置log4j2以写入日志。我希望以编程方式配置日志记录系统,而不是使用XML文件。 以下是我尝试过的内容(大部分与网站上的文档相同)https://logging.apache.org/log4j/2.x/manual/customconfig.html#Configurator): 我在main方法的开头调用方法。当我运行我的程序时,会创建一个名为的文件,但是所有日志输出都会转到

  • 我需要实现一个Android应用程序,允许用户配置VPN连接,而无需访问Android设备的本机菜单。我有两个问题: > 在Android 4.0(api级别14及以上)中,我发现有一个名为VpnService的新组件,它提供了一个钩子来创建虚拟网络接口,配置它,并从它拦截/转发包到VPN服务器,但是有没有像PPTP或IPSec这样的内置vpn协议,只有实现它们的可能性。我的问题是PPTP和IPS

  • 我在logback.xml中定义了一个logback appender,它是一个DB appender,但是我想知道是否有任何方法可以使用我自己定义为bean的连接池在java中配置appender。 我发现了类似的事情,但从来没有真正的答案。

  • 问题内容: 我想在运行时触发一个元素,例如: 要么 如何才能做到这一点? 问题答案: 语法如下: 有关Angular Extend方式的更多信息,请参见此处。 如果使用的是 旧版本的angular ,则应使用 trigger 而不是 triggerHandler 。 如果需要应用停止传播,则可以按以下方式使用此方法:

  • 问题内容: 我的问题很简单 如何以编程方式设置我的按钮layout_gravity? 我在互联网上找到了它,但它只是抛出了一个Nullpointer异常: 有什么办法吗? 问题答案: Java Kotlin 有关重力值以及如何设置重力,请检查“重力”。 基本上,您应该选择依赖于父项。可以是等等。