准备4台虚机,搭建一个多主高可用集群。
推荐内存2G/硬盘30G以上
最小化安装Ubuntu 16.04 server或者CentOS 7 Minimal
配置基础网络、更新源、SSH登录等
kubernetes官方github地址 https://github.com/kubernetes/kubernetes/releases
部署节点------x1 : 运行这份 ansible 脚本的节点
etcd节点------x3 : 注意etcd集群必须是1,3,5,7…奇数个节点
master节点----x2 : 根据实际集群规模可以增加节点数,需要额外规划一个master VIP(虚地址)
lb节点--------x2 : 负载均衡节点两个,安装 haproxy+keepalived
node节点------x2 : 真正应用负载的节点,根据需要提升机器配置和增加节点数
ip | 主机名 | 角色 |
---|---|---|
172.7.15.113.128 | master | deploy, master1, lb2, etcd |
172.7.15.129 | node1 | etcd, node1 |
172.7.15.130 | node2 | etcd, node2 |
172.7.15.131 | master2 | master2, lb1 |
172.7.15.250 | vip |
四台机器,全部执行:
# 文档中脚本默认均以root用户执行
yum update
# 安装python
yum install python -y
yum install git python-pip -y
# pip安装ansible(国内如果安装太慢可以直接用pip阿里云加速)
pip install pip --upgrade -i https://mirrors.aliyun.com/pypi/simple/
pip install ansible==2.6.18 netaddr==0.7.19 -i https://mirrors.aliyun.com/pypi/simple/
ssh-keygen -t rsa -b 2048 -N '' -f ~/.ssh/id_rsa
for s in 128 129 130 131; do ssh-copy-id 172.7.15.$s;done
#需要每一台机器的root密码;
# 下载工具脚本easzup,举例使用kubeasz版本2.0.2
export release=2.0.2
curl -C- -fLO --retry 3 https://github.com/easzlab/kubeasz/releases/download/${release}/easzup
chmod +x ./easzup
# 使用工具脚本下载
./easzup -D
cd /etc/ansible && cp example/hosts.multi-node hosts
vim hosts #按规划设置如下内容,其他部分不变;
[etcd]
172.7.15.113.128 NODE_NAME=etcd1
172.7.15.113.129 NODE_NAME=etcd2
172.7.15.113.130 NODE_NAME=etcd3
# master node(s)
[kube-master]
172.7.15.113.128
172.7.15.113.131
# work node(s)
[kube-node]
172.7.15.113.129
172.7.15.113.130
# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'yes' to install a harbor server; 'no' to integrate with existed one
[harbor]
#192.168.1.8 HARBOR_DOMAIN="harbor.yourdomain.com" NEW_INSTALL=no
# [optional] loadbalance for accessing k8s from outside
[ex-lb] #这部分在云主机应该是不起作用,不支持VIP;
172.7.15.113.128 LB_ROLE=backup EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
172.7.15.113.131 LB_ROLE=master EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
ansible all -m ping
# 分步安装
ansible-playbook 01.prepare.yml
ansible-playbook 02.etcd.yml
#02验证
# 根据hosts中配置设置shell变量 $NODE_IPS
export NODE_IPS="172.7.15.113.129 172.7.15.113.130"
for ip in ${NODE_IPS}; do
ETCDCTL_API=3 etcdctl \
--endpoints=https://${ip}:2379 \
--cacert=/etc/kubernetes/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem \
endpoint health; done
#预期结果:(忽略IP)
https://192.168.1.1:2379 is healthy: successfully committed proposal: took = 2.210885ms
https://192.168.1.2:2379 is healthy: successfully committed proposal: took = 2.784043ms
ansible-playbook 03.docker.yml
#03验证 - 每台机器都安装了docker;
systemctl status docker # 服务状态
journalctl -u docker # 运行日志
docker version
docker info
iptables-save|grep FORWARD 查看 iptables filter表 FORWARD链,最后要有一个 -A FORWARD -j ACCEPT 保底允许规则
#查看iptables验证:
iptables-save|grep FORWARD
:FORWARD ACCEPT [0:0]
:FORWARD DROP [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j ACCEPT
ansible-playbook 04.kube-master.yml
#04验证
运行 ansible-playbook 04.kube-master.yml 成功后,验证 master节点的主要组件:
# 查看进程状态
systemctl status kube-apiserver
systemctl status kube-controller-manager
systemctl status kube-scheduler
# 查看进程运行日志
journalctl -u kube-apiserver
journalctl -u kube-controller-manager
journalctl -u kube-scheduler
执行 kubectl get componentstatus 可以看到
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
ansible-playbook 05.kube-node.yml
#验证 node 状态
systemctl status kubelet # 查看状态
systemctl status kube-proxy
journalctl -u kubelet # 查看日志
journalctl -u kube-proxy
运行 kubectl get node 可以看到类似 #忽略IP;
NAME STATUS ROLES AGE VERSION
192.168.1.42 Ready <none> 2d v1.9.0
192.168.1.43 Ready <none> 2d v1.9.0
ansible-playbook 06.network.yml
ansible-playbook 07.cluster-addon.yml
# 一步安装
#ansible-playbook 90.setup.yml #一次过安装以上01-07;
查看kube-system namespace下的服务
kubectl get svc -n kube-system
查看集群信息:
kubectl cluster-info
查看node/pod使用资源情况:
kubectl top node
kubectl top pod --all-namespaces
a)创建nginx service
kubectl run nginx --image=nginx --expose --port=80
b)创建busybox 测试pod
kubectl run busybox --rm -it --image=busybox /bin/sh //进入到busybox内部
nslookup nginx.default.svc.cluster.local //结果如下
Server: 10.68.0.2
Address: 10.68.0.2:53
Name: nginx.default.svc.cluster.local
Address: 10.68.9.156
增加node节点 #未尝试
1)deploy节点免密码登录node
ssh-copy-id 新node ip
2)修改/etc/ansible/hosts
[new-node] 172.7.15.117
3)执行安装脚本
ansible-playbook /etc/ansible/20.addnode.yml
4)验证
kubectl get node
kubectl get pod -n kube-system -o wide
5)后续工作 修改/etc/ansible/hosts,将new-node里面的所有ip全部移动到kube-node组里去
升级集群 #未尝试
1)备份etcd
ETCDCTL_API=3 etcdctl snapshot save backup.db
查看备份文件信息
ETCDCTL_API=3 etcdctl --write-out=table snapshot status backup.db
2)到本项目的根目录kubeasz
cd /dir/to/kubeasz
拉取最新的代码
git pull origin master
3)下载升级目标版本的kubernetes二进制包,并替换/etc/ansible/bin/下的二进制文件
4)docker升级(略),除非特别需要,否则不建议频繁升级docker
5)如果接受业务中断,执行:
ansible-playbook -t upgrade_k8s,restart_dockerd 22.upgrade.yml
6)不能接受短暂中断,需要这样做:
a)ansible-playbook -t upgrade_k8s 22.upgrade.yml
b)到所有node上逐一:
kubectl cordon和kubectl drain //迁移业务pod
systemctl restart docker
kubectl uncordon //恢复pod