# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master.demo.com Ready control-plane,master 118m v1.21.14
k8s-node1.demo.com NotReady <none> 115m v1.21.14
k8s-node2.demo.com NotReady <none> 112m v1.21.14
从结果可以看到node不为Ready状态
# kubectl describe node k8s-node1.demo.com
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 06 Dec 2022 13:46:49 +0800 Tue, 06 Dec 2022 13:46:49 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 06 Dec 2022 14:01:25 +0800 Tue, 06 Dec 2022 12:06:53 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 06 Dec 2022 14:01:25 +0800 Tue, 06 Dec 2022 12:06:53 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 06 Dec 2022 14:01:25 +0800 Tue, 06 Dec 2022 12:06:53 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 06 Dec 2022 14:01:25 +0800 Tue, 06 Dec 2022 12:06:53 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 172.18.0.72
Hostname: k8s-node1.demo.com
...
从describe可以看到,报错信息为:
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
cni配置未初始化
登陆到node节点服务器,查看kubelet的日志
# journalctl -fu kubelet
-- Logs begin at Tue 2022-12-06 12:20:35 CST. --
Dec 06 14:05:50 k8s-node1.demo.com kubelet[923]: E1206 14:05:50.057223 923 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
Dec 06 14:05:50 k8s-node1.demo.com kubelet[923]: E1206 14:05:50.484753 923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:51 k8s-node1.demo.com kubelet[923]: E1206 14:05:51.485980 923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:52 k8s-node1.demo.com kubelet[923]: E1206 14:05:52.487097 923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:53 k8s-node1.demo.com kubelet[923]: E1206 14:05:53.488097 923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:54 k8s-node1.demo.com kubelet[923]: E1206 14:05:54.488270 923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:55 k8s-node1.demo.com kubelet[923]: I1206 14:05:55.000464 923 cni.go:204] "Error validating CNI config list" configList="{\n \"name\": \"cbr0\",\n \"cniVersion\": \"0.3.1\",\n \"plugins\": [\n {\n \"type\": \"flannel\",\n \"delegate\": {\n \"hairpinMode\": true,\n \"isDefaultGateway\": true\n }\n },\n {\n \"type\": \"portmap\",\n \"capabilities\": {\n \"portMappings\": true\n }\n }\n ]\n}\n" err="[failed to find plugin \"portmap\" in path [/opt/cni/bin]]"
Dec 06 14:05:55 k8s-node1.demo.com kubelet[923]: I1206 14:05:55.000514 923 cni.go:239] "Unable to update cni config" err="no valid networks found in /etc/cni/net.d"
从日志看来,flannel缺失某些文件,可能需要重新安装一下kubernetes程序,然后重新运行节点加入,此前在卸载k8s集群节点时候,是比较蛮力删除相关文件,导致一些文件丢失
# kubectl delete node k8s-node1.demo.com
node "k8s-node1.demo.com" deleted
# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1206 14:12:28.674854 24214 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
# yum remove kubelet kubectl kubernetes-cni kubeadm
Removed:
kubeadm.x86_64 0:1.21.14-0 kubectl.x86_64 0:1.24.3-0 kubelet.x86_64 0:1.21.14-0 kubernetes-cni.x86_64 0:0.8.7-0
Complete!
# yum install -y kubelet-1.21.14 kubectl-1.21.14 kubernetes-cni kubeadm-1.21.14
# systemctl enable kubelet
# kubeadm join 172.18.0.71:6443 --token vr2dry.igtxh6mmr67o8a9u --discovery-token-ca-cert-hash sha256:a3b77324c7b6aefc93daa4692b9a601106a3d326ed246b999e8b9aa910a3e788
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master.demo.com Ready control-plane,master 138m v1.21.14
k8s-node1.demo.com Ready <none> 54s v1.21.14
k8s-node2.demo.com NotReady <none> 132m v1.21.14
可以看到节点1的状态已经为Ready
通过这一次排查,在k8s集群中,如果有节点状态不为Ready的情况下,可以通过查看describe node先进行判断,再从kubelet的日志中找到对应的出错信息并予以解决。