关于搭建k8s集群遇到的问题与解决方法

李云

2023-12-01

安装过程参考：
https://www.kubernetes.org.cn/6634.html

问题一、初始化报错[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.16.4 拉取镜像失败

[root@centos-7-120 ~]# kubeadm init --config=kubeadm-config.yaml
[init] Using Kubernetes version: v1.16.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.16.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@centos-7-120 ~]#

解决：使用阿里云加速器，使用 ./image.sh 重新下载镜像。（可以将原先下载的k8s其它镜像删除）

#配置镜像加速器
#配置daemon.json文件
[root@master01 ~]# mkdir -p /etc/docker
[root@master01 ~]# tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://v16stybc.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
#重启服务
[root@master01 ~]# systemctl daemon-reload
[root@master01 ~]# systemctl restart docker
[root@centos-7-120 ~]# cat image.sh 
#!/bin/bash
url=registry.cn-hangzhou.aliyuncs.com/loong576
version=v1.16.4
images=(`kubeadm config images list --kubernetes-version=$version|awk -F '/' '{print $2}'`)
for imagename in ${images[@]} ; do
  docker pull $url/$imagename
  docker tag $url/$imagename k8s.gcr.io/$imagename
  docker rmi -f $url/$imagename
done
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-apiserver   v1.16.4             3722a80984a0        4 months ago        217MB
k8s.gcr.io/kube-scheduler   v1.16.4             2984964036c8        4 months ago        87.3MB
k8s.gcr.io/kube-proxy       v1.16.4             091df896d78f        4 months ago        86.1MB
k8s.gcr.io/etcd             3.3.15-0            b2756210eeab        7 months ago        247MB
k8s.gcr.io/coredns          1.6.2               bf261d157914        8 months ago        44.1MB
hello-world                 latest              fce289e99eb9        15 months ago       1.84kB
k8s.gcr.io/pause            3.1                 da86e6ba6ca1        2 years ago         742kB
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# ls
anaconda-ks.cfg  image.sh  kubeadm-config.yaml
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# cat kubeadm-config.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.4
apiServer:
  certSANs:    #填写所有kube-apiserver节点的hostname、IP、VIP
  - centos-7-120
  - centos-7-121
  - centos-7-122
  - 192.168.41.120
  - 192.168.41.121
  - 192.168.41.122
  - 192.168.41.222
controlPlaneEndpoint: "192.168.41.222:6443"
networking:
  podSubnet: "10.244.0.0/16"
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# docker rmi 3722a80984a0
Untagged: k8s.gcr.io/kube-apiserver:v1.16.4
Deleted: sha256:3722a80984a04bd08ecf84162c8fbd46124185afb266fe5123e6e77b0e0f767b
Deleted: sha256:6413a2f7244cd652d051ac0a1c7be093cd473bedb19f3a175d9ee63481adda48
[root@centos-7-120 ~]# docker rmi fb4cca6b4e4c
Untagged: k8s.gcr.io/kube-controller-manager:v1.16.4
Deleted: sha256:fb4cca6b4e4cb2f606da76df5c3227ca1769493d1a1f89fcc7e6351908fced20
Deleted: sha256:32202db3aaddcf9350181e8d4521e985acc87867b45d3e7dfe9d5ded4be1d7b9
[root@centos-7-120 ~]# docker rmi 091df896d78f
Untagged: k8s.gcr.io/kube-proxy:v1.16.4
Deleted: sha256:091df896d78fd917aa7c09b916e23bbf2523cb1dfdab4bf7208b3ae3c6bdeccc
Deleted: sha256:72320daff230627a309d3cc086b56c6b192abd320b9e0713971466a9749ea7c0
Deleted: sha256:8d42edd79dc15688145f6936d7ab98b9e63c653dffde9e0de9488801b97208c2
[root@centos-7-120 ~]# docker rmi 2984964036c8
Untagged: k8s.gcr.io/kube-scheduler:v1.16.4
Deleted: sha256:2984964036c8189e21e0b8698bd2a801e162fc0b4ac6466ea637075f5f3ac4e1
Deleted: sha256:8e8be9fb87a206e5bcb9bd5bf2b6516b9231e44cfe94ca725d732e454562ab5e
[root@centos-7-120 ~]# docker rmi b2756210eeab
Untagged: k8s.gcr.io/etcd:3.3.15-0
Deleted: sha256:b2756210eeabf84f3221da9959e9483f3919dc2aaab4cd45e7cd072fcbde27ed
Deleted: sha256:3ecdd298bd8b0123f7c9f0c0a9396959955236ca968a935dfea8eadb08dfc03d
Deleted: sha256:01eb8ba6d2899109f2de6f37a079e6bef91b68735f278df67618e686da3fd373
Deleted: sha256:fe9a8b4f1dccd77105b8423a26536ff756f1ee99fdcae6893eb737ae1c527c7a
[root@centos-7-120 ~]# docker rmi bf261d157914
Untagged: k8s.gcr.io/coredns:1.6.2
Deleted: sha256:bf261d157914477ee1a5969d28ec687f3fbfc9fb5a664b22df78e57023b0e03b
Deleted: sha256:544728d19ece62f31b64f291a681c2867e0464f2eeffd88fa8c8908e33874c33
Deleted: sha256:225df95e717ceb672de0e45aa49f352eace21512240205972aca0fccc9612722
[root@centos-7-120 ~]# docker rmi da86e6ba6ca1
Untagged: k8s.gcr.io/pause:3.1
Deleted: sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e
Deleted: sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4
[root@centos-7-120 ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
hello-world         latest              fce289e99eb9        15 months ago       1.84kB
[root@centos-7-120 ~]# ./image.sh 
v1.16.4: Pulling from loong576/kube-apiserver
39fafc05754f: Pull complete 
fb5c3f053fdb: Pull complete 
Digest: sha256:b24373236fff6dcc0e154433b43d53a9b2388cdf39f05fbc46ac73082c9b05f9
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/kube-apiserver:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-apiserver:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-apiserver@sha256:b24373236fff6dcc0e154433b43d53a9b2388cdf39f05fbc46ac73082c9b05f9
v1.16.4: Pulling from loong576/kube-controller-manager
39fafc05754f: Already exists 
cd6e7c1595a1: Pull complete 
Digest: sha256:012f5029eddf4ba2abc5a8061d210a7c5ce7f15f98e5fc02862397620113ae92
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/kube-controller-manager:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-controller-manager:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-controller-manager@sha256:012f5029eddf4ba2abc5a8061d210a7c5ce7f15f98e5fc02862397620113ae92
v1.16.4: Pulling from loong576/kube-scheduler
39fafc05754f: Already exists 
2f2112548c87: Pull complete 
Digest: sha256:c2684277741926d7ac64c5f846813f7489cc878d9ad9ab78c108d8b4938fc364
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/kube-scheduler:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-scheduler:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-scheduler@sha256:c2684277741926d7ac64c5f846813f7489cc878d9ad9ab78c108d8b4938fc364
v1.16.4: Pulling from loong576/kube-proxy
39fafc05754f: Already exists 
db3f71d0eb90: Pull complete 
162fd2e96b64: Pull complete 
Digest: sha256:14ae870f4591ac2839e1218eb7b4e3caa20a39eab930efc157a9a72e0e4580e0
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/kube-proxy:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-proxy:v1.16.4
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/kube-proxy@sha256:14ae870f4591ac2839e1218eb7b4e3caa20a39eab930efc157a9a72e0e4580e0
3.1: Pulling from loong576/pause
cf9202429979: Pull complete 
Digest: sha256:759c3f0f6493093a9043cc813092290af69029699ade0e3dbe024e968fcb7cca
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/pause:3.1
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/pause:3.1
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/pause@sha256:759c3f0f6493093a9043cc813092290af69029699ade0e3dbe024e968fcb7cca
3.3.15-0: Pulling from loong576/etcd
39fafc05754f: Already exists 
aee6f172d490: Pull complete 
e6aae814a194: Pull complete 
Digest: sha256:37a8acab63de5556d47bfbe76d649ae63f83ea7481584a2be0dbffb77825f692
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/etcd:3.3.15-0
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/etcd:3.3.15-0
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/etcd@sha256:37a8acab63de5556d47bfbe76d649ae63f83ea7481584a2be0dbffb77825f692
1.6.2: Pulling from loong576/coredns
c6568d217a00: Pull complete 
3970bc7cbb16: Pull complete 
Digest: sha256:4dd4d0e5bcc9bd0e8189f6fa4d4965ffa81207d8d99d29391f28cbd1a70a0163
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/loong576/coredns:1.6.2
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/coredns:1.6.2
Untagged: registry.cn-hangzhou.aliyuncs.com/loong576/coredns@sha256:4dd4d0e5bcc9bd0e8189f6fa4d4965ffa81207d8d99d29391f28cbd1a70a0163
[root@centos-7-120 ~]# docker iamges
docker: 'iamges' is not a docker command.
See 'docker --help'
[root@centos-7-120 ~]# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-apiserver            v1.16.4             3722a80984a0        4 months ago        217MB
k8s.gcr.io/kube-controller-manager   v1.16.4             fb4cca6b4e4c        4 months ago        163MB
k8s.gcr.io/kube-scheduler            v1.16.4             2984964036c8        4 months ago        87.3MB
k8s.gcr.io/kube-proxy                v1.16.4             091df896d78f        4 months ago        86.1MB
k8s.gcr.io/etcd                      3.3.15-0            b2756210eeab        7 months ago        247MB
k8s.gcr.io/coredns                   1.6.2               bf261d157914        8 months ago        44.1MB
hello-world                          latest              fce289e99eb9        15 months ago       1.84kB
k8s.gcr.io/pause                     3.1                 da86e6ba6ca1        2 years ago         742kB
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# cat kubeadm-config.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.4
apiServer:
  certSANs:    #填写所有kube-apiserver节点的hostname、IP、VIP
  - centos-7-120
  - centos-7-121
  - centos-7-122
  - 192.168.41.120
  - 192.168.41.121
  - 192.168.41.122
  - 192.168.41.222
controlPlaneEndpoint: "192.168.41.222:6443"
networking:
  podSubnet: "10.244.0.0/16"
[root@centos-7-120 ~]# 
[root@centos-7-120 ~]#  kubeadm init --config=kubeadm-config.yaml
[init] Using Kubernetes version: v1.16.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [centos-7-120 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local centos-7-120 centos-7-121 centos-7-122] and IPs [10.96.0.1 192.168.41.120 192.168.41.222 192.168.41.120 192.168.41.121 192.168.41.122 192.168.41.222]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [centos-7-120 localhost] and IPs [192.168.41.120 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [centos-7-120 localhost] and IPs [192.168.41.120 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

问题二： 6443端口被占

root@centos-7-120 ~]#  kubeadm init --config=kubeadm-config.yaml
[init] Using Kubernetes version: v1.16.4
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-6443]: Port 6443 is in use
        [ERROR Port-10251]: Port 10251 is in use
        [ERROR Port-10252]: Port 10252 is in use
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR Port-2379]: Port 2379 is in use
        [ERROR Port-2380]: Port 2380 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@centos-7-120 ~]#

解决：
kubeadm reset
rm -rf $HOME/.kube/config
重置后，重新初始化
（kubeadm init --config=kubeadm-config.yaml）

[root@centos-7-120 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0413 15:48:33.291996   12513 reset.go:96] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://192.168.41.222:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 192.168.41.222:6443: connect: no route to host
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0413 15:48:43.482888   12513 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[root@centos-7-120 ~]#

问题三：初始化失败，使用“systemctl status kubelet” 查看主节点发现不到


[root@centos-7-120 ~]# kubeadm init --config=kubeadm-config.yaml 

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
        - 'docker ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

[root@centos-7-120 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 一 2020-04-13 15:38:18 CST; 4min 52s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 11980 (kubelet)
    Tasks: 16
   Memory: 53.7M
   CGroup: /system.slice/kubelet.service
           └─11980 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=...

4月 13 15:43:09 centos-7-120 kubelet[11980]: E0413 15:43:09.841228   11980 kubelet.go:2267] node "centos-7-120" not found
4月 13 15:43:09 centos-7-120 kubelet[11980]: E0413 15:43:09.841228   11980 kubelet.go:2267] node "centos-7-120" not found
4月 13 15:43:09 centos-7-120 kubelet[11980]: E0413 15:43:09.942078   11980 kubelet.go:2267] node "centos-7-120" not found

[root@centos-7-120 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 一 2020-04-13 15:59:51 CST; 5min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 12939 (kubelet)
    Tasks: 15
   Memory: 30.7M
   CGroup: /system.slice/kubelet.service
           └─12939 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=...

4月 13 16:04:54 centos-7-120 kubelet[12939]: E0413 16:04:54.771309   12939 event.go:265] Unable to write event: 'Post https://192.168.41.222:6443/api/v1/namespaces/default/events: ...ter sleeping)
4月 13 16:04:54 centos-7-120 kubelet[12939]: E0413 16:04:54.857161   12939 kubelet.go:2267] node "centos-7-120" not found
4月 13 16:04:54 centos-7-120 kubelet[12939]: E0413 16:04:54.958213   12939 kubelet.go:2267] node "centos-7-120" not found

原因：主机重启过，vip没了

[root@centos-7-120 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:26:3a:b4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.41.120/24 brd 192.168.41.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe26:3ab4/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:c8:47:dc:12 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:c8ff:fe47:dc12/64 scope link 
       valid_lft forever preferred_lft forever
[root@centos-7-120 ~]#

解决：重新添加vip

[root@centos-7-120 ~]# 
[root@centos-7-120 ~]# ifconfig ens33:2 192.168.41.222 netmask 255.255.255.0 up
[root@centos-7-120 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:26:3a:b4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.41.120/24 brd 192.168.41.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.41.222/24 brd 192.168.41.255 scope global secondary ens33:2
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe26:3ab4/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:c8:47:dc:12 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:c8ff:fe47:dc12/64 scope link 
       valid_lft forever preferred_lft forever
[root@centos-7-120 ~]#

问题四：
三个主节点master1、master2、master3，master1重初始化了，证书过期了，master2、master3重新加入集群失败

[root@centos-7-122 ~]# kubeadm join 192.168.41.222:6443 --token born8a.gx365vmx1vytbxxz \
>     --discovery-token-ca-cert-hash sha256:086a94831bca06ea2ce2976e2d1c850702e24ea442488a0245a63abd2c249b9a \
>     --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher
[root@centos-7-122 ~]#

解决：重新分发、获取证书
master1主机：
./cert-main-master.sh
master2/3主机：
./cert-other-master.sh

问题五：
三个主节点master1、master2、master3
不下心把master1删了，重置重初始化后，master2、master3重置重新加入集群失败

master2/master3：

[root@centos-7-122 ~]# kubeadm join 192.168.41.222:6443 --token born8a.gx365vmx1vytbxxz     --discovery-token-ca-cert-hash sha256:086a94831bca06ea2ce2976e2d1c850702e24ea442488a0245a63abd2c249b9a     --control-plane     
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [centos-7-122 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local centos-7-120 centos-7-121 centos-7-122] and IPs [10.96.0.1 192.168.41.122 192.168.41.222 192.168.41.120 192.168.41.121 192.168.41.122 192.168.41.222]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [centos-7-122 localhost] and IPs [192.168.41.122 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [centos-7-122 localhost] and IPs [192.168.41.122 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
[root@centos-7-122 ~]# 
[root@centos-7-122 ~]# journalctl -xeu kubelet
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.271099   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: W0414 02:54:50.313172   14497 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.372122   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.473197   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.493788   14497 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSIDriver: Unauthorized
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.574301   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: I0414 02:54:50.660900   14497 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.665414   14497 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: Unauthorized
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.675166   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.776427   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: W0414 02:54:50.833841   14497 status_manager.go:529] Failed to get status for pod "kube-apiserver-centos-7-122_kube-system(9bf75fa318236ee84993dd8432d39
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.877318   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.975015   14497 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: netw
4月 14 02:54:50 centos-7-122 kubelet[14497]: E0414 02:54:50.978170   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:51 centos-7-122 kubelet[14497]: E0414 02:54:51.033193   14497 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: Unauthorized
4月 14 02:54:51 centos-7-122 kubelet[14497]: E0414 02:54:51.079316   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:51 centos-7-122 kubelet[14497]: E0414 02:54:51.180834   14497 kubelet.go:2267] node "centos-7-122" not found
4月 14 02:54:51 centos-7-122 kubelet[14497]: E0414 02:54:51.265361   14497 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Unauthorized

原因是，master1 重置后没有执行 rm -rf $HOME/.kube/config，原来分发给master2、master3的证书失效

解决：
master1：

[root@centos-7-120 ~]# kubeadm reset
[root@centos-7-120 ~]# rm -rf $HOME/.kube/config
[root@centos-7-120 ~]# kubeadm init --config=kubeadm-config.yaml
[root@centos-7-120 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
[root@centos-7-120 ~]

重新分发证书给master2、master3，然后重置，重新将节点加入集群。

问题六：
master NotReady 状态

[root@centos-7-120 ~]# kubectl get nodes
NAME           STATUS     ROLES    AGE   VERSION
centos-7-120   NotReady   master   12m   v1.16.4
[root@centos-7-120 ~]#

解决：
master上建立flannel网络

[root@centos-7-120 ~]#  kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
[root@centos-7-120 ~]# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
centos-7-120   Ready    master   16m   v1.16.4
[root@centos-7-120 ~]#

问题七：
woker NotReady 状态

[root@centos-7-120 ~]# kubectl get nodes 
NAME           STATUS     ROLES    AGE   VERSION
centos-7-120   Ready      master   43h   v1.16.4
centos-7-121   NotReady   <none>   43h   v1.16.4
centos-7-122   NotReady   <none>   43h   v1.16.4
[root@centos-7-120 ~]#

原因：master虚拟ip没了
解决：master重新绑定vip

ifconfig ens33:2 192.168.41.222 netmask 255.255.255.0 up

关于搭建k8s集群遇到的问题与解决方法

相关阅读

相关文章

相关问答

相关文档