Consul 提供了分布式系统的服务发现和配置的解决方案。基于go语言实现。并且在git上开放了源码consul-git。consul还包括了分布式一致协议的实现,健康检查和管理UI。Consul和zk相比较起来,更加轻量级,而且一致性上基于RAFT算法,zk使用的Paxos 算法。跟zk比较起来更加轻量级,Consul提供了通过一个DNS或者HTTP接口的方式来控制执行,而zk却需要自己定制解决方案。同样比较被广泛使用的服务发现解决方案中也有etcd 。etcd也是采用RAFT算法实现,但是etcd不提供管理UI。Consul跟Vagrant都是Hashicorp 公司的产品。作为一整套的分布式系统解决方案,配合同样基于go语言实现的Docker开源技术,都是一个不错的选择。Docker 的简单介绍,可以参考 Docker 介绍 (本文后面将不再介绍docker 命令以及容器等相关概念)。配合Docker来做应用容器,用Consul 来做集群的服务发现和健康检查,并且还可以轻量级得做到水平和垂直可扩展。
通过运行 consul agent 命令,可以通过后台守护进程的方式运行在所有consul集群节点中。并且可以以server或者client 模式运行。并且以HTTP或者DNS 接口方式,负责运行检查和服务同步。Server模式的agent负责维护consul集群状态,相应RPC查询,并且还要负责和其他数据中心(DataCenter)进行WAN Gossips交换。Client 节点是相对无状态的,Client的唯一活动就是转发(foward)请求给Server节点,以保持低延迟和少资源消耗。
如下图,是官网的一个典型系统结构,Consul建议我们每个DataCenter的Server的节点最好在3到5个之间,以方便在失败以及数据复制的性能。Client的数量可以任意。图中,最重要的两个概念一个是Gossip协议,一个是Consensus 协议。DataCenter的所有节点都会参与到Gossip协议。Client 到Server 会通过LAN Gossip。所有的节点都在Gossip pool中,通过消息层来实现节点之间的通信以达到故障检测的目的,并且不需要给Client配置Server的地址。而Server节点还会参与到WAN Gossip池中。这样,通过Server节点就可以让DataCenter之间做简单的服务发现。比如增加一个Datacenter就只需要让Server节点参与到Gossip Pool中。并且,DataCneter之间的通信和服务请求就可以通过WAN Gossip 来随机请求另外一个DataCenter的Server节点,然后被请求的Server 会再把请求foward到本DataCenter的leader节点。Server leader的选举是通过Consul的Raft 算法实现。Leader 节点需要负责所有请求和处理,并且这些请求也必须复制给所有的其他非leader的Server节点。同样,非Leader节点接收到RPC请求的时候也会foward 到Leader节点。
docker@boot2docker:~$ docker run -p 8600:53/udp -h node1 progrium/consul -server -bootstrap
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'node1'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
Cluster Addr: 172.17.0.1 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2015/09/29 03:13:43 [INFO] serf: EventMemberJoin: node1 172.17.0.1
2015/09/29 03:13:43 [INFO] serf: EventMemberJoin: node1.dc1 172.17.0.1
2015/09/29 03:13:43 [INFO] raft: Node at 172.17.0.1:8300 [Follower] entering Follower state
2015/09/29 03:13:43 [INFO] consul: adding server node1 (Addr: 172.17.0.1:8300) (DC: dc1)
2015/09/29 03:13:43 [INFO] consul: adding server node1.dc1 (Addr: 172.17.0.1:8300) (DC: dc1)
2015/09/29 03:13:43 [ERR] agent: failed to sync remote state: No cluster leader
2015/09/29 03:13:45 [WARN] raft: Heartbeat timeout reached, starting election
2015/09/29 03:13:45 [INFO] raft: Node at 172.17.0.1:8300 [Candidate] entering Candidate state
2015/09/29 03:13:45 [INFO] raft: Election won. Tally: 1
2015/09/29 03:13:45 [INFO] raft: Node at 172.17.0.1:8300 [Leader] entering Leader state
2015/09/29 03:13:45 [INFO] consul: cluster leadership acquired
2015/09/29 03:13:45 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2015/09/29 03:13:45 [INFO] consul: New leader elected: node1
2015/09/29 03:13:45 [INFO] consul: member 'node1' joined, marking health alive
2015/09/29 03:13:45 [INFO] agent: Synced service 'consul'
这里试验一下8600(DNS) 接口,然后我们就用dig的方式可以交互和访问了。
刚才启动单独的服务节点,用bootstrap。现在启动三个节点需要用bootstrap-expect 3 ,并且绑定到容器的同一个ip,这里绑定到server1上。
docker@boot2docker:~$ docker run -d –name server1 -h server1 progrium/consul -server -bootstrap-expect 3
docker@boot2docker:~JOINIP=”(docker inspect -f ‘{{ .NetworkSettings.IPAddress }}’ server1)”
docker@boot2docker:~dockerrun−d−−nameserver2−hserver2progrium/consul−server−joinJOIN_IP
docker@boot2docker:~dockerrun−d−−nameserver3−hserver3progrium/consul−server−joinJOIN_IP
然后用docker命令查看已经启动的容器。关于docker 相关内容可以参考“Docker介绍 ”
docker@boot2docker:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
87bd80f8132d progrium/consul "/bin/start -server -" 3 seconds ago Up 2 seconds 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8500/tcp, 8301-8302/udp server3
a18d0597bf2d progrium/consul "/bin/start -server -" 18 seconds ago Up 17 seconds 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8301-8302/udp, 8500/tcp server2
448a550224fb progrium/consul "/bin/start -server -" About a minute ago Up About a minute 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8500/tcp, 8301-8302/udp server1
docker@boot2docker:~dockerrun−d−p8400:8400−p8500:8500−p8600:53/udp−hclient1progrium/consul−joinJOIN_IP
查看容器信息:
docker@boot2docker:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0410ad7bb68c progrium/consul "/bin/start -join 172" 4 seconds ago Up 3 seconds 53/tcp, 0.0.0.0:8400->8400/tcp, 8300-8302/tcp, 8301-8302/udp, 0.0.0.0:8500->8500/tcp, 0.0.0.0:8600->53/udp focused_leakey
87bd80f8132d progrium/consul "/bin/start -server -" 3 minutes ago Up 3 minutes 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8500/tcp, 8301-8302/udp server3
a18d0597bf2d progrium/consul "/bin/start -server -" 3 minutes ago Up 3 minutes 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8500/tcp, 8301-8302/udp server2
448a550224fb progrium/consul "/bin/start -server -" 4 minutes ago Up 4 minutes 53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8301-8302/udp, 8500/tcp server1
我们可以进入容器来查看一下consul是如何管理agent节点,以及选举server 的leader的。这个时候我们关掉Server节点,容器name是server1的 448a550224fb 容器,再观察。
关闭后server1信息:
==> Gracefully shutting down agent...
2015/09/29 04:08:17 [INFO] consul: server starting leave
2015/09/29 04:08:17 [INFO] raft: Removed peer 172.17.0.28:8300, stopping replication (Index: 18)
2015/09/29 04:08:17 [INFO] raft: Removed peer 172.17.0.29:8300, stopping replication (Index: 18)
2015/09/29 04:08:17 [INFO] raft: Removed ourself, transitioning to follower
2015/09/29 04:08:17 [INFO] raft: Node at 172.17.0.27:8300 [Follower] entering Follower state
2015/09/29 04:08:17 [INFO] serf: EventMemberLeave: server1.dc1 172.17.0.27
2015/09/29 04:08:17 [INFO] consul: cluster leadership lost
2015/09/29 04:08:17 [INFO] raft: aborting pipeline replication to peer 172.17.0.28:8300
2015/09/29 04:08:17 [INFO] raft: aborting pipeline replication to peer 172.17.0.29:8300
2015/09/29 04:08:17 [INFO] consul: removing server server1.dc1 (Addr: 172.17.0.27:8300) (DC: dc1)
2015/09/29 04:08:18 [INFO] serf: EventMemberLeave: server1 172.17.0.27
2015/09/29 04:08:18 [INFO] consul: removing server server1 (Addr: 172.17.0.27:8300) (DC: dc1)
2015/09/29 04:08:18 [INFO] agent: requesting shutdown
2015/09/29 04:08:18 [INFO] consul: shutting down server
2015/09/29 04:08:18 [INFO] agent: shutdown complete
server2节点信息如下:
docker@boot2docker:~$ docker attach server2
2015/09/29 04:08:18 [INFO] serf: EventMemberLeave: server1 172.17.0.27
2015/09/29 04:08:18 [INFO] consul: removing server server1 (Addr: 172.17.0.27:8300) (DC: dc1)
2015/09/29 04:08:20 [WARN] raft: Rejecting vote from 172.17.0.29:8300 since we have a leader: 172.17.0.27:8300
2015/09/29 04:08:20 [WARN] raft: Heartbeat timeout reached, starting election
2015/09/29 04:08:20 [INFO] raft: Node at 172.17.0.28:8300 [Candidate] entering Candidate state
2015/09/29 04:08:21 [INFO] raft: Node at 172.17.0.28:8300 [Follower] entering Follower state
2015/09/29 04:08:21 [INFO] consul: New leader elected: server3
可以看到server1 节点下线,并且重新选举leader server节点为server3,再看一下server3 信息如下:
docker@boot2docker:~$ docker attach server3
2015/09/29 04:08:18 [INFO] serf: EventMemberLeave: server1 172.17.0.27
2015/09/29 04:08:18 [INFO] consul: removing server server1 (Addr: 172.17.0.27:8300) (DC: dc1)
2015/09/29 04:08:20 [WARN] raft: Heartbeat timeout reached, starting election
2015/09/29 04:08:20 [INFO] raft: Node at 172.17.0.29:8300 [Candidate] entering Candidate state
2015/09/29 04:08:20 [INFO] raft: Duplicate RequestVote for same term: 2
2015/09/29 04:08:21 [WARN] raft: Election timeout reached, restarting election
2015/09/29 04:08:21 [INFO] raft: Node at 172.17.0.29:8300 [Candidate] entering Candidate state
2015/09/29 04:08:21 [INFO] raft: Election won. Tally: 2
2015/09/29 04:08:21 [INFO] raft: Node at 172.17.0.29:8300 [Leader] entering Leader state
2015/09/29 04:08:21 [INFO] consul: cluster leadership acquired
2015/09/29 04:08:21 [INFO] consul: New leader elected: server3
2015/09/29 04:08:21 [INFO] raft: pipelining replication to peer 172.17.0.28:8300
2015/09/29 04:08:21 [INFO] consul: member 'server1' left, deregistering
client 节点信息如下:
docker@boot2docker:~$ docker attach focused_leakey
2015/09/29 04:08:18 [INFO] serf: EventMemberLeave: server1 172.17.0.27
2015/09/29 04:08:18 [INFO] consul: removing server server1 (Addr: 172.17.0.27:8300) (DC: dc1)
2015/09/29 04:08:21 [INFO] consul: New leader elected: server3