Cluster Formation and Peer Discovery

康言

2023-12-01

RabbitMQ

Features
Get Started
Support
Community
Docs
Blog

Overview

This guide covers various automation-oriented cluster formation and peer discovery features. For a general overview of RabbitMQ clustering, please refer to the Clustering Guide.

This guide assumes general familiarity with RabbitMQ clustering and focuses on the peer discovery subsystem. For example, it will not cover what ports must be open for inter-node communication, how nodes authenticate to each other, and so on. Besides discovery mechanisms and their configuration, this guide also covers a closely related topic of rejoining nodes, the problem of initial cluster formation with nodes booting in parallel as well as additional health checks offered by some discovery implementations.

The guide also covers the basics of peer discovery troubleshooting.
What is Peer Discovery?

To form a cluster, new (“blank”) nodes need to be able to discover their peers. This can be done using a variety of mechanisms (backends). Some mechanisms assume all cluster members are known ahead of time (for example, listed in the config file), others are dynamic (nodes can come and go).

All peer discovery mechanisms assume that newly joining nodes will be able to contact their peers in the cluster and authenticate with them successfully. The mechanisms that rely on an external service (e.g. DNS or Consul) or API (e.g. AWS or Kubernetes) require the service(s) or API(s) to be available and reachable on their standard ports. Inability to reach the services will lead to node’s inability to join the cluster.
Available Discovery Mechanisms

The following mechanisms are built into the core and always available:

Config file
Pre-configured DNS A/AAAA records

Additional peer discovery mechanisms are available via plugins. The following peer discovery plugins ship with RabbitMQ as of 3.7.0:

AWS (EC2)
Kubernetes
Consul
etcd

The above plugins do not need to be installed but like all plugins they must be enabled or preconfigured before they can be used.

For peer discovery plugins, which must be available on node boot, this means they must be enabled before first node boot. The example below uses rabbitmq-plugins’ --offline mode:

rabbitmq-plugins --offline enable

A more specific example:

rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s

A node with configuration settings that belong a non-enabled peer discovery plugin will fail to start and report those settings as unknown.
Specifying the Peer Discovery Mechanism

The discovery mechanism to use is specified in the config file, as are various mechanism-specific settings, for example, discovery service hostnames, credentials, and so on. cluster_formation.peer_discovery_backend is the key that controls what discovery module (implementation) is used:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config

The module has to implement the rabbit_peer_discovery_backend behaviour. Plugins therefore can introduce their own discovery mechanisms.
How Peer Discovery Works

When a node starts and detects it doesn’t have a previously initialised database, it will check if there’s a peer discovery mechanism configured. If that’s the case, it will then perform the discovery and attempt to contact each discovered peer in order. Finally, it will attempt to join the cluster of the first reachable peer.

Depending on the backend (mechanism) used, the process of peer discovery may involve contacting external services, for example, an AWS API endpoint, a Consul node or performing a DNS query. Some backends require nodes to register (tell the backend that the node is up and should be counted as a cluster member): for example, Consul and etcd both support registration. With other backends the list of nodes is configured ahead of time (e.g. config file). Those backends are said to not support node registration.

In some cases node registration is implicit or managed by an external service. AWS autoscaling groups is a good example: AWS keeps track of group membership, so nodes don’t have to (or cannot) explicitly register. However, the list of cluster members is not predefined. Such backends usually include a no-op registration step and apply one of the race condition mitigation mechanisms described below.

When a cluster is first formed and there are no registered nodes yet, a natural race condition between booting nodes occurs. Different backends address this problem differently: some try to acquire a lock with an external service, others rely on randomized delays. This problem does not apply to the backends that require listing all nodes ahead of time.

When the configured backend supports registration, nodes unregister when they stop.

If peer discovery isn’t configured, or it fails, or no peers are reachable, a node that wasn’t a cluster member in the past will initialise from scratch and proceed as a standalone node.

If a node previously was a cluster member, it will try to contact its “last seen” peer for a period of time. In this case, no peer discovery will be performed. This is true for all backends.
Nodes Rejoining Their Existing Cluster

A new node joining a cluster is just one possible case. Another common scenario is when an existing cluster member temporarily leaves and then rejoins the cluster. While the peer discovery subsystem does not affect the behavior described in this section, it important to understand how nodes behave when they rejoin their cluster after a restart or failure.

Existing cluster members will not perform peer discovery. Instead they will try to contact their previously known peers.

If a node previously was a cluster member, when it boots it will try to contact its “last seen” peer for a period of time. If the peer is not booted (e.g. when a full cluster restart or upgrade is performed) or cannot be reached, the node will retry the operation a number of times.

Default values are 10 retries and 30 seconds per attempt, respectively, or 5 minutes total. In environments where nodes can take a long and/or uneven time to start it is recommended that the number of retries is increased.

If a node is reset since losing contact with the cluster, it will behave like a blank node. Note that other cluster members might still consider it to be a cluster member, in which case there two sides will disagree and the node will fail to join. Such reset nodes must also be removed from the cluster using rabbitmqctl forget_cluster_node executed against an existing cluster member.

If a node was explicitly removed from the cluster by the operator and then reset, it will be able to join the cluster as a new member. In this case it will behave exactly like a blank node would.

A node rejoining after a node name or host name change can start as a blank node if its data directory path changes as a result. Such nodes will fail to rejoin the cluster. While the node is offline, its peers can be reset or started with a blank data directory. In that case the recovering node will fail to rejoin its peer as well since internal data store cluster identity would no longer match.

Consider the following scenario:

A cluster of 3 nodes, A, B and C is formed
Node A is shut down
Node B is reset
Node A is started
Node A tries to rejoin B but B's cluster identity has changed
Node B doesn't recognise A as a known cluster member because it's been reset

in this case node B will reject the clustering attempt from A with an appropriate error message in the log:

Node ‘rabbit@node1.local’ thinks it’s clustered with node ‘rabbit@node2.local’, but ‘rabbit@node2.local’ disagrees

In this case B can be reset again and then will be able to join A, or A can be reset and will successfully join B.
How to Configure Peer Discovery

Peer discovery plugins are configured just like the core server and other plugins: using a config file.

cluster_formation.peer_discovery_backend is the key that controls what peer discovery backend will be used. Each backend will also have a number of configuration settings specific to it. The rest of the guide will cover configurable settings specific to a particular mechanism as well as provide examples for each one.
Config File Peer Discovery Backend
Config File Peer Discovery Overview

The most basic way for a node to discover its cluster peers is to read a list of nodes from the config file. The set of cluster members is assumed to be known at deployment time.

Race condition during initial cluster formation is addressed by using a randomized startup delay.
Configuration

The peer nodes are listed using the cluster_formation.classic_config.nodes config setting:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config

cluster_formation.classic_config.nodes.1 = rabbit@hostname1.eng.example.local
cluster_formation.classic_config.nodes.2 = rabbit@hostname2.eng.example.local

The following example demonstrates the same configuration in the classic config format. The 2nd member of the rabbit.cluster_nodes tuple is the node type to use for the current node. In the vast majority of cases all nodes should be disc nodes.

[
{rabbit, [
{cluster_nodes, {[‘rabbit@hostname1.eng.example.local’,
‘rabbit@hostname2.eng.example.local’], disc}}
]}
].

DNS Peer Discovery Backend
DNS Peer Discovery Overview

Another built-in peer discovery mechanism as of RabbitMQ 3.7.0 is DNS-based. It relies on a pre-configured hostname (“seed hostname”) with DNS A (or AAAA) records and reverse DNS lookups to perform peer discovery. More specifically, this mechanism will perform the following steps:

Query DNS A records of the seed hostname.
For each returned DNS record's IP address, perform a reverse DNS lookup.
Append current node's prefix (e.g. rabbit in rabbit@hostname1.example.local) to each hostname and return the result.

For example, let’s consider a seed hostname of discovery.eng.example.local. It has 2 DNS A records that return two IP addresses: 192.168.100.1 and 192.168.100.2. Reverse DNS lookups for those IP addresses return node1.eng.example.local and node2.eng.example.local, respectively. Current node’s name is not set and defaults to rabbit@$(hostname). The final list of nodes discovered will contain two nodes: rabbit@node1.eng.example.local and rabbit@node2.eng.example.local.
Configuration

The seed hostname is set using the cluster_formation.dns.hostname config setting:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns

cluster_formation.dns.hostname = discovery.eng.example.local

Peer Discovery on AWS (EC2)
AWS Peer Discovery Overview

An AWS (EC2)-specific discovery mechanism is available via a plugin.

As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:

rabbitmq-plugins --offline enable rabbitmq_peer_discovery_aws

The plugin provides two ways for a node to discover its peers:

Using EC2 instance tags
Using AWS autoscaling group membership

Both methods rely on AWS-specific APIs (endpoints) and features and thus cannot work in other IaaS environments. Once a list of cluster member instances is retrieved, final node names are computed using instance hostnames or IP addresses.

When the AWS peer discovery mechanism is used, nodes will delay their startup for a randomly picked value to reduce the probability of a race condition during initial cluster formation.
Configuration and Credentials

Before a node can perform any operations on AWS, it needs to have a set of AWS account credentials configured. This can be done in a couple of ways:

Via config file
Using environment variables

EC2 Instance Metadata service for the region will also be consulted.

The following example snippet configures RabbitMQ to use the AWS peer discovery backend and provides information about AWS region as well as a set of credentials:

note: this value is slightly different from plugin name

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

If region is left unconfigured, us-east-1 will be used by default. Sensitive values in configuration file can optionally be encrypted.

If an IAM role is assigned to EC2 instances running RabbitMQ nodes, a policy has to be used to allow said instances use EC2 Instance Metadata Service. When the plugin is configured to use Autoscaling group members, a policy has to grant access to describe autoscaling group members (instances). Below is an example of a policy that covers both use cases:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“autoscaling:DescribeAutoScalingInstances”,
“ec2:DescribeInstances”
],
“Resource”: [
“*”
]
}
]
}

Using Autoscaling Group Membership

When autoscaling-based peer discovery is used, current node’s EC2 instance autoscaling group members will be listed and used to produce the list of discovered peers.

To use autoscaling group membership, set the cluster_formation.aws.use_autoscaling_group key to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.use_autoscaling_group = true

Using EC2 Instance Tags

When tags-based peer discovery is used, the plugin will list EC2 instances using EC2 API and filter them by configured instance tags. Resulting instance set will be used to produce the list of discovered peers.

Tags are configured using the cluster_formation.aws.instance_tags key. The example below uses three tags: region, service, and environment.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.instance_tags.region = us-east-1
cluster_formation.aws.instance_tags.service = rabbitmq
cluster_formation.aws.instance_tags.environment = staging

Using Private EC2 Instance IPs

By default peer discovery will use private DNS hostnames to compute node names. It is possible to opt into using private IPs instead by setting the cluster_formation.aws.use_private_ip key to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.use_autoscaling_group = true
cluster_formation.aws.use_private_ip = true

Peer Discovery on Kubernetes
Kubernetes Peer Discovery Overview

A Kubernetes-based discovery mechanism is available via a plugin.

As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:

rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s

Prerequisites

With this mechanism, nodes fetch a list of their peers from a Kubernetes API endpoint using a set of configured values: a URI scheme, host, port, as as well as the token and certificate paths.

A RabbitMQ cluster deployed to Kubernetes will use a set of pods. The set must be a stateful set. A headless service must be used to control network identity of the pods (their hostnames), which in turn affect RabbitMQ node names.

If a stateless set is used recreated nodes will not have their persisted data and will start as blank nodes. This can lead to data loss and higher network traffic volume due to more frequent eager synchronisation of classic queue mirrors on newly joining nodes.

Stateless sets are also prone to the natural race condition during initial cluster formation, unlike stateful sets that initialise pods one by one.

Peer discovery mechanism will filter out nodes whose pods are not yet ready (initialised) according to their readiness probe as reported by the Kubernetes API. For example, if pod management policy of a stateful set is set to Parallel, some nodes can be discovered but will not be joined.

It is therefore necessary to use OrderedReady pod management policy for the sets used by RabbitMQ nodes. This policy is used by default by Kubernetes.

Other stateful set aspects such as how storage is configured is orthogonal to peer discovery.
Examples

A minimalistic runnable example of Kubernetes peer discovery mechanism can be found on GitHub.
Configuration

To use Kubernetes for peer discovery, set the cluster_formation.peer_discovery_backend to rabbit_peer_discovery_k8s (note: this value is slightly different from plugin name):

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

Kubernetes API hostname (or IP address). Default value is kubernetes.default.svc.cluster.local

cluster_formation.k8s.host = kubernetes.default.example.local

It is possible to configure Kubernetes API port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

443 is used by default

cluster_formation.k8s.port = 443

https is used by default

cluster_formation.k8s.scheme = https

Kubernetes token file path is configurable via cluster_formation.k8s.token_path:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

default value is /var/run/secrets/kubernetes.io/serviceaccount/token

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token

It must point to a local file that exists and is readable by RabbitMQ.

Certificate and namespace paths use cluster_formation.k8s.cert_path and cluster_formation.k8s.namespace_path, respectively:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

default value is /var/run/secrets/kubernetes.io/serviceaccount/token

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token

default value is /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

default value is /var/run/secrets/kubernetes.io/serviceaccount/namespace

cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace

Just like with the token path key both must point to a local file that exists and is readable by RabbitMQ.

When a list of peer nodes is computed from a list of pod containers returned by Kubernetes, either hostnames or IP addresses can be used. This is configurable using the cluster_formation.k8s.address_type key:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace

should result set use hostnames or IP addresses

of Kubernetes API-reported containers?

supported values are “hostname” and “ip”

cluster_formation.k8s.address_type = hostname

Supported values are ip or hostname. hostname is the recommended option but has limitations: it can only be used with stateful sets (also highly recommended) and headless services. ip is used by default for better compatibility.

It is possible to append a suffix to peer hostnames returned by Kubernetes using cluster_formation.k8s.hostname_suffix:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

no suffix is appended by default

cluster_formation.k8s.hostname_suffix = rmq.eng.example.local

Service name is rabbitmq by default but can be overridden using the cluster_formation.k8s.service_name key if needed:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

overrides Kubernetes service name. Default value is “rabbitmq”.

cluster_formation.k8s.service_name = rmq-qa

As mentioned above, stateful sets is the recommended way of running RabbitMQ on Kubernetes. Stateful set pods are initialised one at a time. That effectively addresses the natural race condition during the initial cluster formation. Randomized startup delay in such scenarios can use a significantly lower delay value range (e.g. 0 to 1 second):

cluster_formation.randomized_startup_delay_range.min = 0
cluster_formation.randomized_startup_delay_range.max = 2

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

…

Peer Discovery Using Consul
Consul Peer Discovery Overview

A Consul-based discovery mechanism is available via a plugin.

As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:

rabbitmq-plugins --offline enable rabbitmq_peer_discovery_consul

The plugiin supports Consul 0.8.0 and later versions.

Nodes register with Consul on boot and unregister when they leave. Prior to registration, nodes will attempt to acquire a lock in Consul to reduce the probability of a race condition during initial cluster formation. When a node registers with Consul, it will set up a periodic health check for itself (more on this below).
Configuration

To use Consul for peer discovery, set the cluster_formation.peer_discovery_backend to rabbit_peer_discovery_consul (note: this value is slightly different from plugin name):

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

Consul host (hostname or IP address). Default value is localhost

cluster_formation.consul.host = consul.eng.example.local

Consul Endpoint

It is possible to configure Consul port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

8500 is used by default

cluster_formation.consul.port = 8500

http is used by default

cluster_formation.consul.scheme = http

Consul ACL Token

To configure Consul ACL token, use cluster_formation.consul.acl_token:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
cluster_formation.consul.acl_token = acl-token-value

Service name (as registered in Consul) defaults to “rabbitmq” but can be overridden:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

rabbitmq is used by default

cluster_formation.consul.svc = rabbitmq

Service Address

Service hostname (address) as registered in Consul will be fetched by peers and therefore must resolve on all nodes. The hostname can be computed by the plugin or specified by the user. When computed automatically, a number of nodes and OS properties can be used:

Hostname (as returned by gethostname(2))
Node name (without the rabbit@ prefix)
IP address of an NIC (network controller interface)

When cluster_formation.consul.svc_addr_auto is set to false, service name will be taken as is from cluster_formation.consul.svc_addr. When it is set to true, other options explained below come into play.

In the following example, the service address reported to Consul is hardcoded to hostname1.rmq.eng.example.local instead of being computed automatically from the environment:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq

do not compute service address, it will be specified below

cluster_formation.consul.svc_addr_auto = false

service address, will be communicated to other nodes

cluster_formation.consul.svc_addr = hostname1.rmq.eng.example.local

use long RabbitMQ node names?

cluster_formation.consul.use_longname = true

In this example, the service address reported to Consul is parsed from node name (the rabbit@ prefix will be dropped):

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq

do compute service address

cluster_formation.consul.svc_addr_auto = true

compute service address using node name

cluster_formation.consul.svc_addr_use_nodename = true

use long RabbitMQ node names?

cluster_formation.consul.use_longname = true

cluster_formation.consul.svc_addr_use_nodename is a boolean field that instructs Consul peer discovery backend to compute service address using RabbitMQ node name.

In the next example, the service address is computed using hostname as reported by the OS instead of node name:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq

do compute service address

cluster_formation.consul.svc_addr_auto = true

compute service address using host name and not node name

cluster_formation.consul.svc_addr_use_nodename = false

use long RabbitMQ node names?

cluster_formation.consul.use_longname = true

In the example below, the service address is computed by taking the IP address of a provided NIC, en0:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq

do compute service address

cluster_formation.consul.svc_addr_auto = true

compute service address using the IP address of a NIC, en0

cluster_formation.consul.svc_addr_nic = en0
cluster_formation.consul.svc_addr_use_nodename = false

use long RabbitMQ node names?

cluster_formation.consul.use_longname = true

Service Port

Service port as registered in Consul can be overridden. This is only necessary if RabbitMQ uses a non-standard port for client (technically AMQP 0-9-1 and AMQP 1.0) connections since default value is 5672.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

5672 is used by default

cluster_formation.consul.svc_port = 6674

Service Tags and Metadata

It is possible to provide Consul service tags:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

Define tags for the RabbitMQ service: “qa” and “3.8”

cluster_formation.consul.svc_tags.1 = qa
cluster_formation.consul.svc_tags.2 = 3.8

It is possible to configure Consul service metadata, which is a map of string keys to string values with certain restrictions (see Consul documentation to learn more):

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

Define metadata for the RabbitMQ service. Both keys and values have a

maximum length limit enforced by Consul. This can be used to provide additional

context about the service (RabbitMQ cluster) for operators or other tools.

cluster_formation.consul.svc_meta.owner = team-xyz
cluster_formation.consul.svc_meta.service = service-one
cluster_formation.consul.svc_meta.stats_url = https://service-one.eng.megacorp.local/stats/

Service Health Checks

When a node registers with Consul, it will set up a periodic health check for itself. Online nodes will periodically send a health check update to Consul to indicate the service is available. This interval can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

health check interval (node TTL) in seconds

default: 30

cluster_formation.consul.svc_ttl = 40

A node that failed its health check is considered to be in the warning state by Consul. Such nodes can be automatically unregistered by Consul after a period of time (note: this is a separate interval value from the TTL above). The period cannot be less than 60 seconds.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

health check interval (node TTL) in seconds

cluster_formation.consul.svc_ttl = 30

how soon should nodes that fail their health checks be unregistered by Consul?

this value is in seconds and must not be lower than 60 (a Consul requirement)

cluster_formation.consul.deregister_after = 90

Please see a section on automatic cleanup of nodes below.

Nodes in the warning state are excluded from peer discovery results by default. It is possible to opt into including them by setting cluster_formation.consul.include_nodes_with_warnings to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

health check interval (node TTL) in seconds

cluster_formation.consul.svc_ttl = 30

include node in the warning state into discovery result set

cluster_formation.consul.include_nodes_with_warnings = true

Node Name Suffixes

If node name is computed and long node names are used, it is possible to append a suffix to node names retrieved from Consul. The format is .node.{domain_suffix}. This can be useful in environments with DNS conventions, e.g. when all service nodes are organised in a separate subdomain. Here’s an example:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq

do compute service address

cluster_formation.consul.svc_addr_auto = true

compute service address using node name

cluster_formation.consul.svc_addr_use_nodename = true

use long RabbitMQ node names?

cluster_formation.consul.use_longname = true

append a suffix (node.rabbitmq.example.local) to node names retrieved from Consul

cluster_formation.consul.domain_suffix = example.local

With this setup node names will be computed to rabbit@192.168.100.1.node.example.local instead of rabbit@192.168.100.1.
Distributed Lock Acquisition

When a node tries to acquire a lock on boot and the lock is already taken, it will wait for the lock to become available for a limited amount of time. Default value is 300 seconds but it can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

lock acquisition timeout in seconds

default: 300

cluster_formation.consul.lock_wait_time is an alias

cluster_formation.consul.lock_timeout = 60

Lock key prefix is rabbitmq by default. It can also be overridden:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
cluster_formation.consul.lock_timeout = 60

should the Consul key used for locking be prefixed with something

other than “rabbitmq”?

cluster_formation.consul.lock_prefix = environments-qa

Peer Discovery Using Etcd
Etcd Peer Discovery Overview

An etcd-based discovery mechanism is available via a plugin.

As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:

rabbitmq-plugins --offline enable rabbitmq_peer_discovery_etcd

etcd v3 and v2 are supported.

Nodes register with etcd on boot by creating a key in a conventionally named directory. The keys have a short (say, a minute) expiration period. The keys are deleted when nodes stop cleanly. Prior to registration, nodes will attempt to acquire a lock in etcd to reduce the probability of a race condition during initial cluster formation.

Nodes contact etcd periodically to refresh their keys. Those that haven’t done so in a configurable period of time (node TTL) are cleaned up from etcd. If configured, such nodes can be forcefully removed from the cluster.
Configuration

To use etcd for peer discovery, set the cluster_formation.peer_discovery_backend to rabbit_peer_discovery_etcd (note: this value is slightly different from plugin name). The plugin requires a configured etcd node hostname for the plugin to connect to:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

etcd host (hostname or IP address). This property is required or peer discovery won’t be performed.

cluster_formation.etcd.host = etcd.eng.example.local

It is possible to configure etcd port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local

2379 is used by default

cluster_formation.etcd.port = 2379

http is used by default

cluster_formation.etcd.scheme = http

Directories and keys used by the peer discovery mechanism follow a naming scheme:

/v2/keys/{key_prefix}/{cluster_name}/nodes/{node_name}

Here’s an example of a key that would be used by node rabbit@hostname1 with default key prefix and cluster name:

/v2/keys/rabbitmq/default/nodes/rabbit@hostname1

Default key prefix is simply “rabbitmq”. It rarely needs overriding but that’s supported:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local

rabbitmq is used by default

cluster_formation.etcd.key_prefix = rabbitmq_discovery

If multiple RabbitMQ clusters share an etcd installation, each cluster must use a unique name:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local

default name: “default”

cluster_formation.etcd.cluster_name = staging

Key used for node registration will have a TTL interval set for them. Online nodes will periodically refresh their key(s). The TTL value can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local

node TTL in seconds

default: 30

cluster_formation.etcd.node_ttl = 40

Key refreshes will be performed every TTL/2 seconds. It is possible to forcefully remove the nodes that fail to refresh their keys from the cluster. This is covered later in this guide.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local

lock acquisition timeout in seconds

default: 300

cluster_formation.consul.lock_wait_time is an alias

cluster_formation.etcd.lock_timeout = 60

Race Conditions During Initial Cluster Formation

Consider a deployment where the entire cluster is provisioned at once and all nodes start in parallel. In this case there’s a natural race condition between node registration and more than one node can become “first to register” (discovers no existing peers and thus starts as standalone).

Different peer discovery backends use different approaches to minimize the probability of such scenario. Some use locking (etcd, Consul), others use a technique known as randomized startup delay. With randomized startup delay nodes will delay their startup for a randomly picked value (between 5 and 60 seconds by default).

Some backends (config file, DNS) rely on a pre-configured set of peers and avoid the issue that way.

Effective delay interval, if used, is logged on node boot.

Lastly, some mechanism rely on ordered node startup provided by the underlying provisioning and orchestration tool. Kubernetes stateful sets is one example of an environment that offers such a guarantee.
Node Health Checks and Forced Removal

Sometimes a node is a cluster member but not known to the discovery backend. For example, consider a cluster that uses the AWS backend configured to use autoscaling group membership. If an EC2 instance in that group fails and is later re-created, it will be considered an unavailable node in the RabbitMQ cluster. With some peer discovery backends such unknown nodes can be logged or forcefully removed from the cluster. They are

AWS (EC2)
Kubernetes
Consul
etcd

Forced node removal can be dangerous and should be carefully considered. For example, a node that’s temporarily unavailable but will be rejoining (or recreated with its persistent storage re-attached from its previous incarnation) can be kicked out of the cluster permanently by automatic cleanup, thus failing to rejoin.

Before enabling the configuration keys covered below make sure that a compatible peer discovery plugin is enabled. If that’s not the case the node will report the settings to be unknown and will fail to start.

To log warnings for the unknown nodes, cluster_formation.node_cleanup.only_log_warning should be set to true:

Don’t remove cluster members unknown to the peer discovery backend but log

warnings.

This setting can only be used if a compatible peer discovery plugin is enabled.

cluster_formation.node_cleanup.only_log_warning = true

This is the default behavior.

To forcefully delete the unknown nodes from the cluster, cluster_formation.node_cleanup.only_log_warning should be set to false.

Forcefully remove cluster members unknown to the peer discovery backend. Once removed,

the nodes won’t be able to rejoin. Use this mode with great care!

This setting can only be used if a compatible peer discovery plugin is enabled.

cluster_formation.node_cleanup.only_log_warning = false

Note that this option should be used with care, in particular with discovery backends other than AWS.

Some backends (Consul, etcd) support node health checks (or TTL). Nodes periodically notify their respective discovery service (e.g. Consul) that they are still available. If no notifications from a node come in after a period of time, the node is considered to be in the warning state. With etcd, such nodes will no longer show up in discovery results. With Consul, they can either be removed (deregistered) or their warning state can be reported. Please see documentation for those backends to learn more.

Automatic cleanup of absent nodes makes most sense in environments where failed/discontinued nodes will be replaced with brand new ones (including cases when persistent storage won’t be re-attached).

When automatic node cleanup is disabled (switched to the warning mode), operators have to explicitly remove absent cluster nodes using CLI tools.
HTTP Proxy Settings

Peer discovery mechanisms that use HTTP to interact with its dependencies (e.g. AWS, Consul and etcd ones) can proxy their requests using an HTTP proxy.

There are separate proxy settings for HTTP and HTTPS:

example HTTP and HTTPS proxy servers, values in your environment

will vary

cluster_formation.proxy.http_proxy = 192.168.0.98
cluster_formation.proxy.https_proxy = 192.168.0.98

Some hosts can be excluded from proxying, e.g. the link-local AWS instance metadata IP address:

example HTTP and HTTPS proxy servers, values in your environment

will vary

cluster_formation.proxy.http_proxy = 192.168.0.98
cluster_formation.proxy.https_proxy = 192.168.0.98

requests to these hosts won’t go via proxy

cluster_formation.proxy.proxy_exclusions.1 = 169.254.169.254
cluster_formation.proxy.proxy_exclusions.2 = excluded.example.local

Troubleshooting

The peer discovery subsystem and individual mechanism implementations log important discovery procedure steps at the info log level. More extensive logging is available at the debug level. Mechanisms that depend on external services accessible over HTTP will log all outgoing HTTP requests and response codes at debug level. See the logging guide for more information about logging configuration.

If the log does not contain any entries that demonstrate peer discovery progress, for example, the list of nodes retrieved by the mechanism or clustering attempts, it may mean that the node already has an initialised data directory or is already a member of the cluster. In those cases peer discovery won’t be performed.

Peer discovery relies on inter-node network connectivity and successful authentication via a shared secret. Verifying that nodes can communicate with one another and use the expected Erlang cookie value (that’s also identical across all cluster nodes). See the main Clustering guide for more information.

A methodology for network connectivty troubleshooting as well as commonly used tools are covered in the Troubleshooting Network Connectivity guide.
Getting Help and Providing Feedback

If you have questions about the contents of this guide or any other topic related to RabbitMQ, don’t hesitate to ask them on the RabbitMQ mailing list.
Help Us Improve the Docs ❤️

If you’d like to contribute an improvement to the site, its source is available on GitHub. Simply fork the repository and submit a pull request. Thank you!
In This Section

Server Documentation
    Configuration
    Management UI
    Monitoring
    Production Checklist
    TLS Support
    Feature Flags
    Distributed RabbitMQ
    Clustering
        Cluster Formation
        High Availability
        Network Partitions
        Net Tick Time
        TLS for Inter-node (Clustering) Traffic
    Reliable Delivery
    Backup and restore
    Alarms
    Memory Use
    Networking
    Virtual Hosts
    High Availability (pacemaker)
    Access Control (Authorisation)
    Authentication Mechanisms
    LDAP
    Lazy Queues
    Internal Event Exchange
    Firehose (Message Tracing)
    Manual Pages
    Windows Quirks
Client Documentation
Plugins
News
Protocol
Our Extensions
Building
Previous Releases
License

RabbitMQ

Features
Get Started
Support
Community
Docs
Blog

Cluster Formation and Peer Discovery

note: this value is slightly different from plugin name

Kubernetes API hostname (or IP address). Default value is kubernetes.default.svc.cluster.local

443 is used by default

https is used by default

default value is /var/run/secrets/kubernetes.io/serviceaccount/token

default value is /var/run/secrets/kubernetes.io/serviceaccount/token

default value is /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

default value is /var/run/secrets/kubernetes.io/serviceaccount/namespace

should result set use hostnames or IP addresses

of Kubernetes API-reported containers?

supported values are “hostname” and “ip”

no suffix is appended by default

overrides Kubernetes service name. Default value is “rabbitmq”.

…

Consul host (hostname or IP address). Default value is localhost

8500 is used by default

http is used by default

rabbitmq is used by default

do not compute service address, it will be specified below

service address, will be communicated to other nodes

use long RabbitMQ node names?

do compute service address

compute service address using node name

use long RabbitMQ node names?

do compute service address

compute service address using host name and not node name

use long RabbitMQ node names?

do compute service address

compute service address using the IP address of a NIC, en0

use long RabbitMQ node names?

5672 is used by default

Define tags for the RabbitMQ service: “qa” and “3.8”

Define metadata for the RabbitMQ service. Both keys and values have a

maximum length limit enforced by Consul. This can be used to provide additional

context about the service (RabbitMQ cluster) for operators or other tools.

health check interval (node TTL) in seconds

default: 30

health check interval (node TTL) in seconds

how soon should nodes that fail their health checks be unregistered by Consul?

this value is in seconds and must not be lower than 60 (a Consul requirement)

health check interval (node TTL) in seconds

include node in the warning state into discovery result set

do compute service address

compute service address using node name

use long RabbitMQ node names?

append a suffix (node.rabbitmq.example.local) to node names retrieved from Consul

lock acquisition timeout in seconds

default: 300

cluster_formation.consul.lock_wait_time is an alias

should the Consul key used for locking be prefixed with something

other than “rabbitmq”?

etcd host (hostname or IP address). This property is required or peer discovery won’t be performed.

2379 is used by default

http is used by default

rabbitmq is used by default

default name: “default”

node TTL in seconds

default: 30

lock acquisition timeout in seconds

default: 300

cluster_formation.consul.lock_wait_time is an alias

Don’t remove cluster members unknown to the peer discovery backend but log

warnings.

This setting can only be used if a compatible peer discovery plugin is enabled.

Forcefully remove cluster members unknown to the peer discovery backend. Once removed,

the nodes won’t be able to rejoin. Use this mode with great care!

This setting can only be used if a compatible peer discovery plugin is enabled.

example HTTP and HTTPS proxy servers, values in your environment

will vary

example HTTP and HTTPS proxy servers, values in your environment

will vary

requests to these hosts won’t go via proxy

相关阅读

相关文章

相关问答