原文地址:https://labdoc.cc/article/60/
所有的命令都是在 kubespray
代码的目录下执行,包括在容器环境中
192.168.8.60 为 ansible 客户端IP,文中所有涉及此IP的都应替换成你的ansible 客户端IP
注意:sed 命令 在 Mac 下 和 Linux 略有不同,mac下多了 ''
,对比如下:
# mac
$ sed -i '' 's/old_string/new_string/' file.txt
# linux
$ sed -i 's/old_string/new_string/' file.txt
角色 | 主机名 | 备注 |
---|---|---|
Ansible-CLient | Node60 | 内存最低≥4G |
Conntrl-plan | Node61、node62、node62 | |
Etcd | Node61、node62、node63 | |
Worker | Node61… node66 |
安装客户端防火墙:先关为敬
$ systemctl stop firewalld.service && systemctl disable firewalld.service
执行如下命令,安装Docker
$ curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
配置 Docker 加速 和 信任仓库
$ cat <<EOF | sudo tee /etc/docker/daemon.json
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com",
"https://mirror.ccs.tencentyun.com",
"https://reg-mirror.qiniu.com",
"https://dockerhub.azk8s.cn"
],
"insecure-registries": [
"192.168.8.60:5000"
]
}
EOF
启动 Docker
$ systemctl start docker
后需要用到的包,建议直接安装
$ yum install -y wget git unzip sshpass
需要访问github,建议爬下梯子
# 可选步骤,换成自己的梯子
$ export https_proxy=http://192.168.8.3:7890 http_proxy=http://192.168.8.3:7890 all_proxy=socks5://192.168.8.3:7890
# 取消的方法
unset http_proxy
unset https_proxy
unset all_proxy
通过git克隆代码
$ yum install get
$ git clone https://github.com/kubernetes-sigs/kubespray.git
正克隆到 'kubespray'...
remote: Enumerating objects: 66750, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 66750 (delta 1), reused 0 (delta 0), pack-reused 66745
接收对象中: 100% (66750/66750), 20.88 MiB | 5.91 MiB/s, done.
处理 delta 中: 100% (37545/37545), done.
$ cd kubespray
或者 curl 下载
$ yum install unzip
$ curl -L --max-redirs 5 -k -o kubespray-master.zip https://github.com/kubernetes-sigs/kubespray/archive/refs/heads/master.zip
$ unzip kubespray-master.zip
$ cd kubespray-master
# 创建 key
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# 安装 sshpass
$ yum install -y sshpass
# 配置 192.168.8.61-66 免密登录
$ for i in {61..66}; do sshpass -p '密码' ssh-copy-id -o stricthostkeychecking=no root@192.168.8.$i ; done
拉取 kubespray 官方镜像
$ docker pull quay.io/kubespray/kubespray:v2.21.0
# 或,下面的国内速度会快一点,但记得 tag 回来
$ docker pull ju4t/kubespray:v2.21.0
$ docker pull quay.m.daocloud.io/kubespray/kubespray:v2.21.0
# 记得 tag 回来
$ docker tag ju4t/kubespray:v2.21.0 quay.io/kubespray/kubespray:v2.21.0
拉起 kubespray 运行环境
$ docker run --name kubespray -it --privileged \
-v ~/kubespray/:/kubespray/ \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
-v ~/.ssh/known_hosts:/root/.ssh/known_hosts \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /etc/docker/daemon.json:/etc/docker/daemon.json \
-v /usr/bin/docker:/usr/bin/docker \
quay.io/kubespray/kubespray:v2.21.0 \
/bin/bash
# 避免小版本环境差异,礼貌性安装一下
$ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
复制一份配置文件:
$ cp -r inventory/sample inventory/mycluster
修改:
$ tee -a inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml <<EOF
# master 配置 ~/.kube/config
kubeconfig_localhost: true
# master 安装 kubectl
kubectl_localhost: true
EOF
$ tee -a inventory/mycluster/group_vars/all/all.yml <<EOF
apiserver_loadbalancer_domain_name: "k8s.labdoc.cc"
# 负载均衡根据自己需要设置
# loadbalancer_apiserver:
# address: 1.2.3.4
# port: 1234
EOF
# metrics_server_enabled 改为 true
$ sed -i 's/metrics_server_enabled: false/metrics_server_enabled: true/' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 去掉 metrics_server 的注释
$ sed -i '/metrics_server_/s/^# //' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 设置 kube_proxy_strict_arp
$ sed -i 's/kube_proxy_strict_arp: false/kube_proxy_strict_arp: true/' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 启用 Metallb
$ sed -i 's/metallb_enabled: false/metallb_enabled: true/' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 设置 LoadBalancer IP范围
$ tee -a inventory/mycluster/group_vars/k8s_cluster/addons.yml <<EOF
metallb_speaker_enabled: true
metallb_avoid_buggy_ips: true
metallb_ip_range:
- "192.168.8.80-192.168.8.89"
EOF
$ docker exec -it kubespray /bin/bash
# 修改下面的 IP起、至及地址段
$ declare -a IPS=$(for i in {61..66}; do echo 192.168.8.$i; done)
# 生成 inventory/mycluster/hosts.yaml 文件
$ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
$ cp inventory/mycluster/group_vars/all/offline.yml inventory/mycluster/group_vars/all/mirror.yml
# 国内加速关键配置
$ sed -i '/{{ files_repo/s/^# //' inventory/mycluster/group_vars/all/mirror.yml
$ tee -a inventory/mycluster/group_vars/all/mirror.yml <<EOF
gcr_image_repo: "gcr.m.daocloud.io"
kube_image_repo: "k8s.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
files_repo: "https://files.m.daocloud.io"
EOF
复制已经配置好的 mycluster
配置文件,删除其中的 mirror.yml
镜像加速配置
$ cp -r inventory/mycluster inventory/my_airgap_cluster
$ rm -f inventory/mycluster/group_vars/all/mirror.yml
$ sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
$ sed -i '/{{ registry_host/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
$ tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
files_repo: "http://192.168.8.60:8080"
registry_host: "192.168.8.60:5000"
EOF
生成 文件列表 files.list
和 镜像列表 images.list
,完成后退出容器
$ ./contrib/offline/generate_list.sh
验证文件
$ ls -l contrib/offline/temp/
总用量 16
-rw-r--r--. 1 root root 2000 3月 2 16:45 files.list
-rw-r--r--. 1 root root 2797 3月 2 16:45 files.list.template
-rw-r--r--. 1 root root 2408 3月 2 16:45 images.list
-rw-r--r--. 1 root root 3365 3月 2 16:45 images.list.template
如果配置了 daocloud 国内加速 ,文件列表 和 镜像列表 地址内容都应包含 daocloud 的地址
!退出 kubespray 容器,继续操作
方式一
manage-offline-files.sh 需要 wget 、docker支持,功能包括:
./contrib/offline/offline-files/
中# 安装 wget
$ yum install -y wget
# 执行
$ ./contrib/offline/manage-offline-files.sh
方式二
当然,也可以把下载好的文件放到 ./contrib/offline/offline-files/
目录中
# 已经下好了的机器上执行
$ scp -r ./contrib/offline/offline-files root@192.168.8.60:/root/kubespray/contrib/offline/
启动Nginx:
docker run \
--restart=always -d -p 8080:80 \
--volume ./contrib/offline/offline-files/:/usr/share/nginx/html/download \
--volume ./contrib/offline/nginx.conf:/etc/nginx/nginx.conf \
--name nginx nginx:alpine
验证:http://192.168.8.60:8080/
Index of /
../
get.helm.sh/ 02-Mar-2023 08:56 -
github.com/ 02-Mar-2023 08:57 -
storage.googleapis.com/ 02-Mar-2023 08:56 -
方式一
官方提供了脚本 manage-offline-container-images.sh
# 从在线部署的环境中获取容器镜像
$ ./contrib/offline/manage-offline-container-images.sh create
# 部署本地容器注册中心,将容器镜像注册到注册中心
$ ./contrib/offline/manage-offline-container-images.sh register
官方提供的方法坑比较多,且必须得有已经搭建好的集群,镜像不完整
方式二
通过docker,创建一个 registry 仓库
$ docker run --restart=always -d -p 5000:5000 --name registry \
-v ~/registry:/var/lib/registry \
registry:latest
# 持久化目录 /var/lib/registry
验证仓库 push
和 pull
是否可用:
$ docker tag nginx:alpine localhost:5000/nginx:alpine
$ docker push localhost:5000/nginx:alpine
$ docker rmi localhost:5000/nginx:alpine
$ docker pull localhost:5000/nginx:alpine
将下面的地址 192.168.8.60:5000
换成即将要部署的集群服务器可访问的地址,执行下面的脚本生成一个批量推拉的脚本 manage-offline-container-images.py
cat <<EOF | sudo tee ./contrib/offline/manage-offline-container-images.py
import os
import sys
images_file = "./temp/images.list"
target_rep = os.environ.get('target_rep', "192.168.8.60:5000")
target_dir = "./container-images"
file_path = os.path.join(os.path.dirname(__file__), images_file)
target_path = os.path.join(os.path.dirname(__file__), target_dir)
def main(arg):
if arg not in ['create', 'save', 'load', 'registry']:
help()
file_object = open(file_path, 'r')
try:
while True:
source_image = file_object.readline().rstrip()
if source_image:
target_image = source_image.replace('registry.k8s.io', target_rep) \
.replace('gcr.io', target_rep) \
.replace('docker.io', target_rep) \
.replace('quay.io', target_rep)
save_file = '%s%s.tar.gz' % (target_path, target_image.split('/')[-1].replace(':', '_'))
if arg == 'create':
os.system('docker pull %s' % source_image)
os.system('docker tag %s %s' % (source_image, target_image))
if arg == 'save':
os.system('docker save -o %s %s' % (save_file, target_image))
if arg == 'load':
os.system('docker load -i %s' % save_file)
if arg == 'registry':
os.system('docker push %s' % target_image)
else:
break
finally:
file_object.close()
def help():
print("一般地,使用 create 和 registry 即可,本机无法直接推送到目标仓库时,手动迁移文件执行分步操作。")
print("分步操作时,请确保以下文件存在:")
print("%s\n%s\n%s" % (images_file, target_dir, sys.argv[0]), end="\n\n")
print("\t[*] Step(1) 创建镜像")
print("\t$ python3 %s create" % sys.argv[0], end="\n\n")
print("\t[?] Step(2) 保存镜像到 %s 目录" % target_dir)
print("\t$ python3 %s save" % sys.argv[0], end="\n\n")
print("\t[?] Step(3) 导入镜像到本地")
print("\t$ python3 %s load" % sys.argv[0], end="\n\n")
print("\t[*] Step(4) 推送目标仓库 %s" % target_rep)
print("\t$ python3 %s registry" % sys.argv[0], end="\n\n")
return
if __name__ == '__main__':
if len(sys.argv) < 2:
help()
else:
main(sys.argv[1])
EOF
因为脚本是 Python,所以需要在 kubespray 的容器中执行
执行前,检查 /etc/docker/daemon.json
中是否配置,否则会推送失败
...
"insecure-registries": [
"192.168.8.60:5000"
]
...
拉取 temp/images.list
文件中的镜像列表,并推送到 192.168.8.60:5000
仓库中
$ docker exec -it kubespray /bin/bash
# registry 地址设置
$ export registry="192.168.8.60:5000"
# 拉取并创建镜像
$ python3 ./contrib/offline/manage-offline-container-images.py create
# 下面两步可选
# 如果本地无法外网 或者 下载后无法直接推送 到 私有仓库,可以执行下面的命令将镜像保存到 container-images 目录中
# python3 ./contrib/offline/manage-offline-container-images.py save
# 然后 拷贝 到能够推送的服务器后,
# scp -r ./contrib/offline/contrib/offline/container-images root@192.168.8.60:/root/kubespray/contrib/offline/
# 再执行下面的命令导入镜像
# python3 ./contrib/offline/manage-offline-container-images.py load
# 推送镜像到私有仓库
$ python3 ./contrib/offline/manage-offline-container-images.py registry
两种方式二选一,取决于你是否配置过是通过manage-offline-container-images.sh 获取镜像
sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
sed -i '/{{ registry_host/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
files_repo: "http://192.168.8.60:8080"
registry_host: "192.168.8.60:5000"
EOF
sed -i '/{{ files_repo/s/^# //' inventory/my_airgap_cluster/group_vars/all/offline.yml
tee -a inventory/my_airgap_cluster/group_vars/all/offline.yml <<EOF
gcr_image_repo: "192.168.8.60:8080/gcr.m.daocloud.io"
kube_image_repo: "192.168.8.60:8080/k8s.m.daocloud.io"
docker_image_repo: "192.168.8.60:8080/docker.m.daocloud.io"
quay_image_repo: "192.168.8.60:8080/quay.m.daocloud.io"
github_image_repo: "192.168.8.60:8080/ghcr.m.daocloud.io"
files_repo: "http://192.168.8.60:8080/files.m.daocloud.io"
registry_host: "192.168.8.60:5000"
EOF
# 配置 containerd 的 insecure-registry
$ cat <<EOF>inventory/my_airgap_cluster/group_vars/all/containerd.yml
containerd_insecure_registries:
"192.168.8.60:5000": "http://192.168.8.60:5000"
EOF
$ ansible -i inventory/mycluster/hosts.yaml all -m systemd -a 'name=firewalld state=stopped enabled=no'
# 在线
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa cluster.yml
# 离线
$ ansible-playbook -i inventory/my_airgap_cluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa cluster.yml
[root@node1 ~]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node61 Ready control-plane 11m v1.26.2 192.168.8.61 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
node62 Ready control-plane 10m v1.26.2 192.168.8.62 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
node63 Ready <none> 8m58s v1.26.2 192.168.8.63 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
node64 Ready <none> 8m58s v1.26.2 192.168.8.64 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
node65 Ready <none> 8m58s v1.26.2 192.168.8.65 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
node66 Ready <none> 8m58s v1.26.2 192.168.8.66 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.19
[root@node1 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
etcd-1 Healthy {"health":"true","reason":""}
etcd-2 Healthy {"health":"true","reason":""}
[root@node1 ~]# kubectl create deployment app --image=nginx --replicas=6
deployment.apps/app created
[root@node1 ~]# kubectl expose deployment app --port=80 --target-port=80 --type=LoadBalancer
service/app exposed
[root@node1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
app LoadBalancer 10.233.58.203 192.168.8.80 80:31624/TCP 5s
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 9m37s
[root@node1 ~]# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 192.168.8.61:6443,192.168.8.62:6443 10m
$ for i in {67..68}; do sshpass -p '密码' ssh-copy-id -o stricthostkeychecking=no root@192.168.8.$i ; done
$ ansible -i inventory/mycluster/hosts.yaml all -m systemd -a 'name=firewalld state=stopped enabled=no'
$ vi inventory/mycluster/hosts.yaml
all:
hosts:
...
node67:
ansible_host: 192.168.8.67
ip: 192.168.8.67
access_ip: 192.168.8.67
node68:
ansible_host: 192.168.8.68
ip: 192.168.8.68
access_ip: 192.168.8.68
children:
kube_node:
hosts:
...
node64:
node65:
apiserver_loadbalancer_domain_name
确保,对应的地址可访问cat >> /etc/hosts << EOF
192.168.8.xx k8s.labdoc.cc
EOF
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root --private-key ~/.ssh/id_rsa scale.yml --limit=node67,node68 -b -v
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
...
node80 Ready <none> 118s v1.26.1 192.168.8.80 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.18
node81 Ready <none> 118s v1.26.1 192.168.8.81 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.18