本文基于kubernetes 1.5.2版本编写
node经常会遇到以下问题:
硬件问题: cpu 内存 磁盘
内核问题: 内核死锁, 文件系统损坏
容器问题: 守护进程无响应
K8S集群管理对node的健康状态是无法感知的,pod依旧会调度到有问题的node上,通过DaemonSet部署node-problem-detector,向apiserver上报node的状态信息,使node的健康状态对上游管理可见,pod不会再调度到有异常的node上。
cat << EOF > node-problem-detector.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-problem-detector-v0.4.1
namespace: kube-system
labels:
k8s-app: node-problem-detector
version: v0.4.1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: node-problem-detector
version: v0.4.1
kubernetes.io/cluster-service: "true"
spec:
hostNetwork: true
containers:
- name: node-problem-detector
image: docker.io/googlecontainer/node-problem-detector:v0.4.1
securityContext:
privileged: false
resources:
limits:
cpu: "200m"
memory: "100Mi"
requests:
cpu: "20m"
memory: "20Mi"
volumeMounts:
- name: log
mountPath: /log
readOnly: true
volumes:
- name: log
hostPath:
path: /var/log/
kubectl create -f node-problem-detector.yaml