参考文档:
使用ubuntu charmed kubernetes 部署一套生产环境的集群
因为etcd/2在安装时使用了错误的节点,所以想将错误节点的etcd删除并重建。
删除节点
juju remove-unit etcd/2 --force --no-wait
重建节点
juju add-machine --constraints tags=etcd
machine 13
juju add-unit etcd --to 13
显示状态
juju status
etcd/0* active idle 1 10.0.4.139 2379/tcp UnHealthy with 3 known peers
etcd/3 waiting idle 17 10.0.4.153 Waiting to retry etcd registration
研究了下,应该是etcd的节点信息中原来的etcd/2没有除去,所以新增节点增加不进去。
获取凭证信息:
juju run-action --wait etcd/0 package-client-credentials
juju scp etcd/0:etcd_credentials.tar.gz etcd_credentials.tar.gz
解压:
tar -zxvf etcd_credentials.tar.gz
etcd_credentials/
etcd_credentials/ca.crt
etcd_credentials/README.txt
etcd_credentials/client.crt
etcd_credentials/client.key
转移到/root/etcd_credentials/目录
cd /root/etcd_credentials/
手工输入环境变量,因为etcd版本为3.4.5 ,所以格式如下:
juju expose etcd
export ETCDCTL_KEY=$(pwd)/client.key
export ETCDCTL_CERT=$(pwd)/client.crt
export ETCDCTL_CACERT=$(pwd)/ca.crt
export ETCDCTL_API=3 #否则会出现509错误
export ETCDCTL_ENDPOINT=https://10.0.4.139:2379 # etcd/0的ip
列出成员清单:
etcdctl member list
出现了如下错误:
Error: dial tcp 127.0.0.1:2379: connect: connection refused
查了下命令,原来在现在的k8s中,etcdctl命令格式变了,需要增加端点参数 --endpoints=https://10.0.4.139:2379
:
#10.0.4.139为处于leadership的etcd节点IP,目前为etcd/0节点。
如检验端点健康状态:
etcdctl --endpoints=https://10.0.4.139:2379 endpoint health
https://10.0.4.139:2379 is healthy: successfully committed proposal: took = 8.609559ms
列出成员名单:
etcdctl --endpoints=https://10.0.4.139:2379 member list
bb605e8c9ebece4, started, etcd0, https://10.0.4.139:2380, https://10.0.4.139:2379
54bba7baf27ccef7, started, etcd1, https://10.0.4.140:2380, https://10.0.4.140:2379
defc4e8a9c8f25bc, started, etcd2, https://10.0.4.145:2380, https://10.0.4.145:2379
其中defc4e8a9c8f25bc, started, etcd2, https://10.0.4.145:2380, https://10.0.4.145:2379
,就是已经删除的etcd节点,需要etcdctl删除。
删除多余的etcd节点:
etcdctl --endpoints=https://10.0.4.139:2379 member remove defc4e8a9c8f25bc
Member defc4e8a9c8f25bc removed from cluster 22b26385f89f7fa8
在过一会儿,新增的etcd节点已经添加到etcd集群中了
juju status
etcd/0* active idle 1 10.0.4.139 2379/tcp Healthy with 2 known peers
filebeat/2 active idle 10.0.4.139 Filebeat ready.
etcd/1 active idle 2 10.0.4.140 2379/tcp Healthy with 3 known peers
filebeat/1 active idle 10.0.4.140 Filebeat ready.
etcd/4 active idle 17 10.0.4.153 2379/tcp Healthy with 3 known peers
filebeat/10 active idle 10.0.4.153 Filebeat ready.