记一次解决node不为ready的问题

金令秋
2023-12-01

问题现象

# kubectl get node
NAME                  STATUS     ROLES                  AGE    VERSION
k8s-master.demo.com   Ready      control-plane,master   118m   v1.21.14
k8s-node1.demo.com    NotReady   <none>                 115m   v1.21.14
k8s-node2.demo.com    NotReady   <none>                 112m   v1.21.14

从结果可以看到node不为Ready状态

# kubectl describe node k8s-node1.demo.com 

  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 06 Dec 2022 13:46:49 +0800   Tue, 06 Dec 2022 13:46:49 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Tue, 06 Dec 2022 14:01:25 +0800   Tue, 06 Dec 2022 12:06:53 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 06 Dec 2022 14:01:25 +0800   Tue, 06 Dec 2022 12:06:53 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 06 Dec 2022 14:01:25 +0800   Tue, 06 Dec 2022 12:06:53 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                False   Tue, 06 Dec 2022 14:01:25 +0800   Tue, 06 Dec 2022 12:06:53 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:  172.18.0.72
  Hostname:    k8s-node1.demo.com
...

从describe可以看到,报错信息为:

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

cni配置未初始化

排查

登陆到node节点服务器,查看kubelet的日志

# journalctl -fu kubelet
-- Logs begin at Tue 2022-12-06 12:20:35 CST. --
Dec 06 14:05:50 k8s-node1.demo.com kubelet[923]: E1206 14:05:50.057223     923 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
Dec 06 14:05:50 k8s-node1.demo.com kubelet[923]: E1206 14:05:50.484753     923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:51 k8s-node1.demo.com kubelet[923]: E1206 14:05:51.485980     923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:52 k8s-node1.demo.com kubelet[923]: E1206 14:05:52.487097     923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:53 k8s-node1.demo.com kubelet[923]: E1206 14:05:53.488097     923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:54 k8s-node1.demo.com kubelet[923]: E1206 14:05:54.488270     923 file_linux.go:60] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
Dec 06 14:05:55 k8s-node1.demo.com kubelet[923]: I1206 14:05:55.000464     923 cni.go:204] "Error validating CNI config list" configList="{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n" err="[failed to find plugin \"portmap\" in path [/opt/cni/bin]]"
Dec 06 14:05:55 k8s-node1.demo.com kubelet[923]: I1206 14:05:55.000514     923 cni.go:239] "Unable to update cni config" err="no valid networks found in /etc/cni/net.d"

从日志看来,flannel缺失某些文件,可能需要重新安装一下kubernetes程序,然后重新运行节点加入,此前在卸载k8s集群节点时候,是比较蛮力删除相关文件,导致一些文件丢失

解决步骤

  1. 从master移除该节点
# kubectl delete node k8s-node1.demo.com 
node "k8s-node1.demo.com" deleted
  1. 重置该节点,并卸载目前所有的k8s组件
   # kubeadm reset
   [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
   [reset] Are you sure you want to proceed? [y/N]: y
   [preflight] Running pre-flight checks
   W1206 14:12:28.674854   24214 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
   [reset] No etcd config found. Assuming external etcd
   [reset] Please, manually reset etcd to prevent further issues
   [reset] Stopping the kubelet service
   [reset] Unmounting mounted directories in "/var/lib/kubelet"
   [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
   [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
   [reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
   
   The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
   
   The reset process does not reset or clean up iptables rules or IPVS tables.
   If you wish to reset iptables, you must do so manually by using the "iptables" command.
   
   If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
   to reset your system's IPVS tables.
   
   The reset process does not clean your kubeconfig files and you must remove them manually.
   Please, check the contents of the $HOME/.kube/config file.
# yum remove kubelet kubectl kubernetes-cni kubeadm
Removed:
kubeadm.x86_64 0:1.21.14-0  kubectl.x86_64 0:1.24.3-0  kubelet.x86_64 0:1.21.14-0  kubernetes-cni.x86_64 0:0.8.7-0 

Complete!
  1. 重新安装k8s组件,注意版本同master相同
  # yum install -y kubelet-1.21.14 kubectl-1.21.14 kubernetes-cni kubeadm-1.21.14
  # systemctl enable kubelet
  1. 加入K8S集群
# kubeadm join 172.18.0.71:6443 --token vr2dry.igtxh6mmr67o8a9u --discovery-token-ca-cert-hash sha256:a3b77324c7b6aefc93daa4692b9a601106a3d326ed246b999e8b9aa910a3e788
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
  1. 验证节点状态
# kubectl get nodes
NAME                  STATUS     ROLES                  AGE    VERSION
k8s-master.demo.com   Ready      control-plane,master   138m   v1.21.14
k8s-node1.demo.com    Ready      <none>                 54s    v1.21.14
k8s-node2.demo.com    NotReady   <none>                 132m   v1.21.14

可以看到节点1的状态已经为Ready

总结

通过这一次排查,在k8s集群中,如果有节点状态不为Ready的情况下,可以通过查看describe node先进行判断,再从kubelet的日志中找到对应的出错信息并予以解决。

 类似资料: