当前位置: 首页 > 工具软件 > kubespray > 使用案例 >

kubespray 部署 kubernetes 生产环境高可用集群

荣声
2023-12-01

原文地址:https://labdoc.cc/article/60/

写在前面

  1. 所有的命令都是在 kubespray 代码的目录下执行,包括在容器环境中

  2. 192.168.8.60 为 ansible 客户端IP,文中所有涉及此IP的都应替换成你的ansible 客户端IP

  3. 注意:sed 命令 在 Mac 下 和 Linux 略有不同,mac下多了 '',对比如下:

# mac
$ sed -i '' 's/old_string/new_string/' file.txt
# linux
$ sed -i 's/old_string/new_string/' file.txt

集群规划

角色主机名备注
Ansible-CLientNode60内存最低≥4G
Conntrl-planNode61、node62、node62
EtcdNode61、node62、node63
WorkerNode61… node66

客户端基础设置

安装客户端防火墙:先关为敬

$ systemctl stop firewalld.service && systemctl disable firewalld.service

安装 docker

执行如下命令,安装Docker

$ curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

配置 Docker 加速 和 信任仓库

$ cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "registry-mirrors": [
    "https://registry.docker-cn.com",
    "https://docker.mirrors.ustc.edu.cn",
    "https://hub-mirror.c.163.com",
    "https://mirror.ccs.tencentyun.com",
    "https://reg-mirror.qiniu.com",
    "https://dockerhub.azk8s.cn"
  ],
  "insecure-registries": [
    "192.168.8.60:5000"
  ]
}
EOF

启动 Docker

$ systemctl start docker

后需要用到的包,建议直接安装

$ yum install -y wget git unzip sshpass

下载代码

需要访问github,建议爬下梯子

# 可选步骤,换成自己的梯子
$ export https_proxy=http://192.168.8.3:7890 http_proxy=http://192.168.8.3:7890 all_proxy=socks5://192.168.8.3:7890

# 取消的方法
unset http_proxy
unset https_proxy
unset all_proxy

通过git克隆代码

$ yum install get
$ git clone https://github.com/kubernetes-sigs/kubespray.git
正克隆到 'kubespray'...
remote: Enumerating objects: 66750, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 66750 (delta 1), reused 0 (delta 0), pack-reused 66745
接收对象中: 100% (66750/66750), 20.88 MiB | 5.91 MiB/s, done.
处理 delta 中: 100% (37545/37545), done.

$ cd kubespray

或者 curl 下载

$ yum install unzip
$ curl -L --max-redirs 5 -k -o kubespray-master.zip https://github.com/kubernetes-sigs/kubespray/archive/refs/heads/master.zip
$ unzip kubespray-master.zip

$ cd kubespray-master

配置免密登录

# 创建 key
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

# 安装 sshpass
$ yum install -y sshpass

# 配置 192.168.8.61-66 免密登录
$ for i in {61..66}; do sshpass -p '密码' ssh-copy-id -o stricthostkeychecking=no root@192.168.8.$i ; done

准备 Kubespray 运行环境

拉取 kubespray 官方镜像

$ docker pull quay.io/kubespray/kubespray:v2.21.0

# 或,下面的国内速度会快一点,但记得 tag 回来
$ docker pull ju4t/kubespray:v2.21.0
$ docker pull quay.m.daocloud.io/kubespray/kubespray:v2.21.0
# 记得 tag 回来
$ docker tag ju4t/kubespray:v2.21.0 quay.io/kubespray/kubespray:v2.21.0

拉起 kubespray 运行环境

$ docker run --name kubespray -it --privileged \
  -v ~/kubespray/:/kubespray/ \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
  -v ~/.ssh/known_hosts:/root/.ssh/known_hosts \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /etc/docker/daemon.json:/etc/docker/daemon.json \
  -v /usr/bin/docker:/usr/bin/docker \
  quay.io/kubespray/kubespray:v2.21.0 \
  /bin/bash

# 避免小版本环境差异,礼貌性安装一下
$ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

配置 Kubespray

集群相关配置

复制一份配置文件:

$ cp -r inventory/sample inventory/mycluster

修改:

  • 在 control_plane 安装 kubectl 和 ~/.kube/config
$ tee -a inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml <<EOF
# master 配置 ~/.kube/config
kubeconfig_localhost: true
# master 安装 kubectl
kubectl_localhost: true
EOF
  • 给API 负载均衡域名设置,有多个集群时好区分
$ tee -a inventory/mycluster/group_vars/all/all.yml <<EOF
apiserver_loadbalancer_domain_name: "k8s.labdoc.cc"
# 负载均衡根据自己需要设置
# loadbalancer_apiserver:
#   address: 1.2.3.4
#   port: 1234
EOF
  • 支持 metrics
# metrics_server_enabled 改为 true
$ sed -i 's/metrics_server_enabled: false/metrics_server_enabled: true/' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 去掉 metrics_server 的注释
$ sed -i '/metrics_server_/s/^# //' inventory/mycluster/group_vars/k8s_cluster/addons.yml
  • Metallb: docs/metallb.md
# 设置 kube_proxy_strict_arp
$ sed -i 's/kube_proxy_strict_arp: false/kube_proxy_strict_arp: true/' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

# 启用 Metallb
$ sed -i 's/metallb_enabled: false/metallb_enabled: true/' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 设置 LoadBalancer IP范围
$ tee -a inventory/mycluster/group_vars/k8s_cluster/addons.yml <<EOF
metallb_speaker_enabled: true
metallb_avoid_buggy_ips: true
metallb_ip_range:
  - "192.168.8.80-192.168.8.89"
EOF

进入配置环境

$ docker exec -it kubespray /bin/bash

生成 hosts.yaml 文件

# 修改下面的 IP起、至及地址段
$ declare -a IPS=$(for i in {61..66}; do echo 192.168.8.$i; done)

# 生成 inventory/mycluster/hosts.yaml 文件
$ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

(可选)配置Daocloud国内加速

$ cp inventory/mycluster/group_vars/all/offline.yml inventory/mycluster/group_vars/all/mirror.yml

# 国内加速关键配置
$ sed -i '/{{ files_repo/s/^# //' inventory/mycluster/group_vars/all/mirror.yml
$ tee -a inventory/mycluster/group_vars/all/mirror.yml <<EOF
gcr_image_repo: "gcr.m.daocloud.io"
kube_image_repo: "k8s.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
files_repo: "https://files.m.daocloud.io"
EOF

离线部署

复制已经配置好的 mycluster 配置文件,删除其中的 mirror.yml 镜像加速配置

$ cp -r inventory/mycluster inventory/my_airgap_cluster
$ rm -f inventory/mycluster/group_vars/all/mirror.yml

$ sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
$ sed -i '/{{ registry_host/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
$ tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
files_repo: "http://192.168.8.60:8080"
registry_host: "192.168.8.60:5000"
EOF

本地服务器搭建

生成下载文件列表

生成 文件列表 files.list 和 镜像列表 images.list,完成后退出容器

$ ./contrib/offline/generate_list.sh

验证文件

$ ls -l contrib/offline/temp/
总用量 16
-rw-r--r--. 1 root root 2000 3月   2 16:45 files.list
-rw-r--r--. 1 root root 2797 3月   2 16:45 files.list.template
-rw-r--r--. 1 root root 2408 3月   2 16:45 images.list
-rw-r--r--. 1 root root 3365 3月   2 16:45 images.list.template

如果配置了 daocloud 国内加速 ,文件列表 和 镜像列表 地址内容都应包含 daocloud 的地址

下载文件并部署Nginx服务器

!退出 kubespray 容器,继续操作

方式一

manage-offline-files.sh 需要 wget 、docker支持,功能包括:

  • 下载 contrib/offline/temp/files.list 中的文件到 ./contrib/offline/offline-files/
  • 部署一个nginx的容器,供下载
# 安装 wget
$ yum install -y wget
# 执行
$ ./contrib/offline/manage-offline-files.sh

方式二

当然,也可以把下载好的文件放到 ./contrib/offline/offline-files/ 目录中

# 已经下好了的机器上执行
$ scp -r ./contrib/offline/offline-files root@192.168.8.60:/root/kubespray/contrib/offline/

启动Nginx:

docker run \
 --restart=always -d -p 8080:80 \
 --volume ./contrib/offline/offline-files/:/usr/share/nginx/html/download \
 --volume ./contrib/offline/nginx.conf:/etc/nginx/nginx.conf \
 --name nginx nginx:alpine

验证:http://192.168.8.60:8080/

Index of /
../
get.helm.sh/                                       02-Mar-2023 08:56       -
github.com/                                        02-Mar-2023 08:57       -
storage.googleapis.com/                            02-Mar-2023 08:56       -

通过Docker搭建私有镜像仓库

  1. 获取镜像文件
  2. 创建一个本地 registry
  3. 推送或倒入镜像到 registry

方式一

官方提供了脚本 manage-offline-container-images.sh

  • 从在线部署的环境中获取容器镜像
  • 部署本地容器注册中心,将容器镜像注册到注册中心
# 从在线部署的环境中获取容器镜像
$ ./contrib/offline/manage-offline-container-images.sh create

# 部署本地容器注册中心,将容器镜像注册到注册中心
$ ./contrib/offline/manage-offline-container-images.sh register

官方提供的方法坑比较多,且必须得有已经搭建好的集群,镜像不完整

方式二

通过docker,创建一个 registry 仓库

$ docker run --restart=always -d -p 5000:5000 --name registry \
 -v ~/registry:/var/lib/registry \
 registry:latest
# 持久化目录 /var/lib/registry

验证仓库 pushpull 是否可用:

$ docker tag nginx:alpine localhost:5000/nginx:alpine
$ docker push localhost:5000/nginx:alpine
$ docker rmi localhost:5000/nginx:alpine
$ docker pull localhost:5000/nginx:alpine

将下面的地址 192.168.8.60:5000 换成即将要部署的集群服务器可访问的地址,执行下面的脚本生成一个批量推拉的脚本 manage-offline-container-images.py

cat <<EOF | sudo tee ./contrib/offline/manage-offline-container-images.py
import os
import sys

images_file = "./temp/images.list"
target_rep = os.environ.get('target_rep', "192.168.8.60:5000")
target_dir = "./container-images"
file_path = os.path.join(os.path.dirname(__file__), images_file)
target_path = os.path.join(os.path.dirname(__file__), target_dir)


def main(arg):
    if arg not in ['create', 'save', 'load', 'registry']:
        help()
    file_object = open(file_path, 'r')
    try:
        while True:
            source_image = file_object.readline().rstrip()
            if source_image:
                target_image = source_image.replace('registry.k8s.io', target_rep) \
                    .replace('gcr.io', target_rep) \
                    .replace('docker.io', target_rep) \
                    .replace('quay.io', target_rep)
                save_file = '%s%s.tar.gz' % (target_path, target_image.split('/')[-1].replace(':', '_'))
                if arg == 'create':
                    os.system('docker pull %s' % source_image)
                    os.system('docker tag %s %s' % (source_image, target_image))
                if arg == 'save':
                    os.system('docker save -o %s %s' % (save_file, target_image))
                if arg == 'load':
                    os.system('docker load -i %s' % save_file)
                if arg == 'registry':
                    os.system('docker push %s' % target_image)
            else:
                break
    finally:
        file_object.close()


def help():
    print("一般地,使用 create 和 registry 即可,本机无法直接推送到目标仓库时,手动迁移文件执行分步操作。")
    print("分步操作时,请确保以下文件存在:")
    print("%s\n%s\n%s" % (images_file, target_dir, sys.argv[0]), end="\n\n")
    print("\t[*] Step(1) 创建镜像")
    print("\t$ python3 %s create" % sys.argv[0], end="\n\n")
    print("\t[?] Step(2) 保存镜像到 %s 目录" % target_dir)
    print("\t$ python3 %s save" % sys.argv[0], end="\n\n")
    print("\t[?] Step(3) 导入镜像到本地")
    print("\t$ python3 %s load" % sys.argv[0], end="\n\n")
    print("\t[*] Step(4) 推送目标仓库 %s" % target_rep)
    print("\t$ python3 %s registry" % sys.argv[0], end="\n\n")
    return


if __name__ == '__main__':
    if len(sys.argv) < 2:
        help()
    else:
        main(sys.argv[1])
EOF

因为脚本是 Python,所以需要在 kubespray 的容器中执行

执行前,检查 /etc/docker/daemon.json 中是否配置,否则会推送失败

...
  "insecure-registries": [
    "192.168.8.60:5000"
  ]
...

拉取 temp/images.list 文件中的镜像列表,并推送到 192.168.8.60:5000 仓库中

$ docker exec -it kubespray /bin/bash

# registry 地址设置
$ export registry="192.168.8.60:5000"

# 拉取并创建镜像
$ python3 ./contrib/offline/manage-offline-container-images.py create

# 下面两步可选
# 如果本地无法外网 或者 下载后无法直接推送 到 私有仓库,可以执行下面的命令将镜像保存到 container-images 目录中
# python3 ./contrib/offline/manage-offline-container-images.py save

# 然后 拷贝 到能够推送的服务器后,
# scp -r ./contrib/offline/contrib/offline/container-images root@192.168.8.60:/root/kubespray/contrib/offline/

# 再执行下面的命令导入镜像
# python3 ./contrib/offline/manage-offline-container-images.py load

# 推送镜像到私有仓库
$ python3 ./contrib/offline/manage-offline-container-images.py registry

**检查文件和镜像的完整性

  1. 如果在别的机器上下载,一定拷贝最新的 .list
  2. 核对下载文件数量和镜像,经常会因为网络问题,导致部分文件或镜像下载失败,执行完后应该通过 tree 等工具对文件数量进行核对

使用本地服务器

本地服务器配置

两种方式二选一,取决于你是否配置过是通过manage-offline-container-images.sh 获取镜像

  • 未配置过加速
sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
sed -i '/{{ registry_host/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
files_repo: "http://192.168.8.60:8080"
registry_host: "192.168.8.60:5000"
EOF
  • 使用官方 manage-offline-container-images.sh 从已有集群获取且 之前的集群配置配置过 加速,你可以参考以下配置
sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
gcr_image_repo: "192.168.8.60:8080/gcr.m.daocloud.io"
kube_image_repo: "192.168.8.60:8080/k8s.m.daocloud.io"
docker_image_repo: "192.168.8.60:8080/docker.m.daocloud.io"
quay_image_repo: "192.168.8.60:8080/quay.m.daocloud.io"
github_image_repo: "192.168.8.60:8080/ghcr.m.daocloud.io"

files_repo: "http://192.168.8.60:8080/files.m.daocloud.io"
registry_host: "192.168.8.60:5000"
EOF

配置 insecure_registries

# 配置 containerd 的 insecure-registry
$ cat <<EOF>inventory/my_airgap_cluster/group_vars/all/containerd.yml
containerd_insecure_registries:
  "192.168.8.60:5000": "http://192.168.8.60:5000"
EOF

部署集群

服务器防火墙设置

$ ansible -i inventory/mycluster/hosts.yaml all  -m systemd -a 'name=firewalld state=stopped enabled=no'

开始部署集群

# 在线
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa cluster.yml
# 离线
$ ansible-playbook -i inventory/my_airgap_cluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa cluster.yml

验证测试

[root@node1 ~]# kubectl get node -o wide
NAME    STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
node61   Ready    control-plane   11m     v1.26.2   192.168.8.61   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19
node62   Ready    control-plane   10m     v1.26.2   192.168.8.62   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19
node63   Ready    <none>          8m58s   v1.26.2   192.168.8.63   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19
node64   Ready    <none>          8m58s   v1.26.2   192.168.8.64   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19
node65   Ready    <none>          8m58s   v1.26.2   192.168.8.65   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19
node66   Ready    <none>          8m58s   v1.26.2   192.168.8.66   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.19

[root@node1 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok                              
controller-manager   Healthy   ok                              
etcd-0               Healthy   {"health":"true","reason":""}   
etcd-1               Healthy   {"health":"true","reason":""}   
etcd-2               Healthy   {"health":"true","reason":""}   

[root@node1 ~]# kubectl create deployment app --image=nginx --replicas=6
deployment.apps/app created

[root@node1 ~]# kubectl expose deployment app --port=80 --target-port=80 --type=LoadBalancer
service/app exposed

[root@node1 ~]# kubectl get svc
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
app          LoadBalancer   10.233.58.203   192.168.8.80   80:31624/TCP   5s
kubernetes   ClusterIP      10.233.0.1      <none>         443/TCP        9m37s

[root@node1 ~]# kubectl get ep kubernetes
NAME         ENDPOINTS                             AGE
kubernetes   192.168.8.61:6443,192.168.8.62:6443   10m

扩容 worker 节点

免登录

$ for i in {67..68}; do sshpass -p '密码' ssh-copy-id -o stricthostkeychecking=no root@192.168.8.$i ; done

防火墙

$ ansible -i inventory/mycluster/hosts.yaml all  -m systemd -a 'name=firewalld state=stopped enabled=no'

配置 hosts.yaml

$ vi inventory/mycluster/hosts.yaml
all:
  hosts:
    ...
    node67:
      ansible_host: 192.168.8.67
      ip: 192.168.8.67
      access_ip: 192.168.8.67
    node68:
      ansible_host: 192.168.8.68
      ip: 192.168.8.68
      access_ip: 192.168.8.68
  children:
    kube_node:
      hosts:
        ...
        node64:
        node65:

扩容

  1. 如果配置了 apiserver_loadbalancer_domain_name 确保,对应的地址可访问
cat >> /etc/hosts << EOF
192.168.8.xx k8s.labdoc.cc
EOF
  1. 扩容
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa scale.yml --limit=node67,node68 -b -v
  1. 验证
$ kubectl get node -o wide   
NAME     STATUS   ROLES           AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
...
node80   Ready    <none>          118s   v1.26.1   192.168.8.80   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.18
node81   Ready    <none>          118s   v1.26.1   192.168.8.81   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   containerd://1.6.18
 类似资料: