当前位置: 首页 > 工具软件 > Node-limits > 使用案例 >

k8s 资源调度(nodeSelector、nodeAffinity、taint&tolrations

施越彬
2023-12-01

容器资源限制:

  • resources.limits.cpu

  • resources.limits.memory

容器使用的最小资源需求,作为容器调度时资源分配的依据:

  • resources.requests.cpu

  • resources.requests.memory

CPU单位:可以写m也可以写浮点数。例如0.5=500m, 1=1000m

[](()示例

//K8s会根据Request的值去查找有足够资源的Node来调度此Pod

[root@master ~]# cat tt.yml

apiVersion: v1

kind: Pod

metadata:

name: nginx

namespace: default

spec:

containers:

  • name: nginx

image: nginx

imagePullPolicy: IfNotPresent

resources:

requests:

memory: “64Mi”

cpu: “250m”

limits:

memory: “128Mi”

cpu: “500m”

[root@master ~]# kubectl apply -f tt.yml

pod/nginx created

[root@master ~]# kubectl describe node node1

Name: node1

Roles:

Labels: beta.kubernetes.io/arch=amd64

beta.kubernetes.io/os=linux

kubernetes.io/arch=amd64

kubernetes.io/hostname=node1

kubernetes.io/os=linux

Annotations: flannel.alpha.coreos.com/backend-data: {“VNI”:1,“VtepMAC”:“fe:b4:c1:77:05:a5”}

flannel.alpha.coreos.com/backend-type: vxlan

flannel.alpha.coreos.com/kube-subnet-manager: true

flannel.alpha.coreos.com/public-ip: 192.168.129.135

kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock

node.alpha.kubernetes.io/ttl: 0

volumes.kubernetes.io/controller-managed-attach-detach: true

CreationTimestamp: Sat, 18 Dec 2021 16:30:11 +0800

Taints:

Unschedulable: false

Lease:

HolderIdentity: node1

AcquireTime:

RenewTime: Thu, 23 Dec 2021 20:35:59 +0800

Conditions:

Type Status LastHeartbeatTime LastTransitionTime Reason Message


NetworkUnavailable False Thu, 23 Dec 2021 19:56:36 +0800 Thu, 23 Dec 2021 19:56:36 +0800 FlannelIsUp Flannel is running on this node

MemoryPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available

DiskPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure

PIDPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available

Ready True Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletReady kubelet is posting ready status

Addresses:

InternalIP: 192.168.129.135

Hostname: node1

Capacity:

cpu: 2

ephemeral-storage: 36731368Ki

hugepages-1Gi: 0

hugepages-2Mi: 0

memory: 3842264Ki

pods: 110

Allocatable:

cpu: 2

ephemeral-storage: 33851628693

hugepages-1Gi: 0

hugepages-2Mi: 0

memory: 3739864Ki

pods: 110

System Info:

Machine ID: d2c10a72b80c45679e2c249297ecb522

System UUID: 5b114d56-95d2-7774-4a94-988a30aa87a6

Boot ID: 4ce4678f-9d93-4c07-8f8f-de94070d807f

Kernel Version: 4.18.0-193.el8.x86_64

OS Image: Red Hat Enterprise Linux 8.2 (Ootpa)

Operating System: linux

Architecture: amd64

Container Runtime Version: docker://20.10.12

Kubelet Version: v1.20.0

Kube-Proxy Version: v1.20.0

PodCIDR: 10.244.1.0/24

PodCIDRs: 10.244.1.0/24

Non-terminated Pods: (3 in total)

Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default web 250m (12%) 500m (25%) 64Mi (1%) 128Mi (3%) 5s

kube-system kube-flannel-ds-c9z87 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 5d3h

kube-system kube-proxy-9z78l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d4h

Allocated resources:

(Total limits may be over 100 percent, i.e., overcommitted.)

Resource Requests Limits


cpu 350m (17%) 600m (30%)

memory 114Mi (3%) 178Mi (4%)

ephemeral-storage 0 (0%) 0 (0%)

hugepages-1Gi 0 (0%) 0 (0%)

hugepages-2Mi 0 (0%) 0 (0%)

Events:

Type Reason Age From Message


Normal Starting 39m kubelet Starting kubelet.

Normal NodeAllocatableEnforced 39m kubelet Updated Node Allocatable limit across pods

Normal NodeHasSufficientMemory 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasSufficientMemory

Normal NodeHasNoDiskPressure 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasNoDiskPressure

Normal NodeHasSufficientPID 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasSufficientPID

Warning Rebooted 39m kubelet Node node1 has been rebooted, boot id: 4ce4678f-9d93-4c07-8f8f-de94070d807f

Normal Starting 39m kube-proxy Starting kube-proxy.

[](()nodeSelector & nodeAffinity


nodeSelector:用于将Pod调度到匹配Label的Node上,如果没有匹配的标签会调度失败。

作用:

  • 约束Pod到特定的节点运行·完全匹配节点标签

应用场景:

  • 专用节点:根据业务线将Node分组管理

  • 配备特殊硬件:部分Node配有SSD硬盘、GPU

示例:确保Pod分配到具有SSD硬盘的节点上

格式: kubectl label nodes =

例如: kubectl label nodes node2 app=nginx

验证: kubectl get nodes node2 --show-labels

删除: kubectl label nodes node2 app-

验证: kubectl get pod -o wide

[](()示例

调度成功案例

[root@master ~]# kubectl label nodes node2 app=nginx

node/node2 labeled

[root@master ~]# kubectl get nodes node2 --show-labels

NAME STATUS ROLES AGE VERSION LABELS

node2 Ready 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

[root@master ~]# cat jj.yml

apiVersion: v1

kind: Pod

metadata:

name: nginx

namespace: default

spec:

containers:

  • name: nginx

image: nginx

imagePullPolicy: IfNotPresent

nodeSelector:

app: nginx

[root@master ~]# kubectl apply -f jj.yml

pod/nginx created

[root@master ~]# kubectl get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

nginx 1/1 Running 0 13s 10.244.2.48 node2

调度失败案例

//取消标签

[root@master ~]# kubectl label nodes node2 app-

node/node2 labeled

//验证

[root@master ~]# kubectl get nodes node2 --show-labels

NAME STATUS ROLES AGE VERSION LABELS

node2 Ready 5d4h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

[root@master ~]# cat jj.yml

apiVersion: v1

kind: Pod

metadata:

name: nginx

namespace: default

spec:

containers:

  • name: nginx

image: nginx

imagePullPolicy: IfNotPresent

nodeSelector:

app: nginx

[root@master ~]# kubectl delete -f jj.yml

pod “nginx” deleted

[root@master ~]# kubectl apply -f jj.yml

pod/nginx created

//这种情况属于等待(也就是说等待某个节点中有app=nginx,一直等,等到那个节点有就给哪个节点)

[root@master ~]# kubectl get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

nginx 0/1 Pending 0 14s

//我现在给node2上加标签

[root@master ~]# kubectl get nodes node2 --show-labels #确定node2上没有标签

NAME STATUS ROLES AGE VERSION LABELS

node2 Ready 5d4h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

[root@master ~]# kubectl label nodes node2 app=nginx

node/node2 labeled

//刚刚添加标签

[root@master ~]# kubectl get nodes node2 --show-labels

NAME STATUS ROLES AGE VERSION LABELS

node2 Ready 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

//发现是node2

[root@master ~]# kubectl get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

nginx 1/1 Running 0 5m 10.244.2.49 node2

nodeAffinity:节点亲和性,与nodeSelector作用一样。但相比更灵活,满足更多条件

  • 匹配有更多的逻辑组合,不只是字符串的完全相等

  • 调度分为软策略和硬策略,而不是硬性要求

  • 硬(required):必须满足

  • 软(preferred):尝试满足,但不保证

  • 操作符:ln、NotIn、Exists、DoesNotExist、Gt.Lt

//帮助

[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity

KIND: Pod

VERSION: v1

RESOURCE: nodeAffinity

DESCRIPTION:

Describes node affinity scheduling rules for the pod.

Node affinity is a group of node affinity scheduling rules.

FIELDS:

preferredDuringSchedulingIgnoredDuringExecution <[]Object>

The scheduler will prefer to schedule pods to nodes that satisfy the

affinity expressions specified by this field, but it may choose a node that

violates one or more of the expressions. The node that is most preferred is

the one with the greatest sum of weights, i.e. for each node that meets all

of the scheduling requirements (resource request, requiredDuringScheduling

affinity expressions, etc.), compute a sum by iterating through the

elements of this field and adding “weight” to the sum if the node matches

the corresponding matchExpressions; the node(s) with the highest sum are

the most preferred.

requiredDuringSchedulingIgnoredDuringExecution

If the affinity requirements specified by this field are not met at

scheduling time, the pod will not be scheduled onto the node. If the

affinity requirements specified by this field cease to be met at some point

during pod execution (e.g. due to an update), the system may or may not try

to eventually evict the pod from its node.

[](()示例

第一种(只会在node1)

node1 打两个标签(app=nginx gpu=nvdia)

node2 打一个标签(app=nginx)

  • required:必须满足

  • preferred:尝试满足,但不保证

//node1 打两个标签(app=nginx gpu=nvdia)

[root@master ~]# kubectl label nodes node1 app=nginx gpu=nvdia

node/node1 labeled

//node2 打一个标签(app=nginx)

[root@master ~]# kubectl label nodes node2 app=nginx

node/node2 labeled

[root@master ~]# kubectl get nodes node1 node2 --show-labels

NAME STATUS ROLES AGE VERSION LABELS

node1 Ready 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gpu=nvdia,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux

node2 Ready 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

[root@master ~]# cat yy.yml

apiVersion: v1

kind: Pod

metadata:

name: test

namespace: default

spec:

containers:

  • name: b1

image: busybox

imagePullPolicy: IfNotPresent

command: [“bin/sh”,“-c”,“sleep 45”]

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

  • matchExpressions:

  • key: app

operator: In

values:

  • nginx

preferredDuringSchedulingIgnoredDuringExecution:

  • weight: 3

preference:

matchExpressions:

  • key: gpu

operator: In

values:

  • nvdia

[root@master ~]# kubectl apply -f yy.yml

pod/test created

[root@master ~]# kubectl get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

test 1/1 Running 1 87s 10.244.1.97 node1

第二种(遵循默认规则,公平竞争)

node1 打一个标签(app=nginx )

node2 打一个标签(app=nginx)

  • required:必须满足

  • preferred:尝试满足,但不保证

//node1 打一个标签(app=nginx)

[root@master ~]# kubectl label nodes node1 app=nginx

node/node1 labeled

//node2 打一个标签(app=nginx)

[root@master ~]# kubectl label nodes node2 app=nginx

node/node2 labeled

[root@master ~]# kubectl get nodes node1 node2 --show-labels

NAME STATUS ROLES 《大厂前端面试题解析+Web核心总结学习笔记+企业项目实战源码+最新高清讲解视频》无偿开源 徽信搜索公众号【编程进阶路】 AGE VERSION LABELS

node1 Ready 5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux

node2 Ready 5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

[root@master ~]# cat yy.yml

apiVersion: v1

kind: Pod

metadata:

name: test

namespace: default

spec:

containers:

  • name: b1

image: busybox

imagePullPolicy: IfNotPresent

command: [“bin/sh”,“-c”,“sleep 45”]

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

  • matchExpressions:

  • key: app

operator: In

values:

  • nginx

preferredDuringSchedulingIgnoredDuringExecution:

  • weight: 3

preference:

matchExpressions:

  • key: gpu

operator: In

values:

  • nvdia

[root@master ~]# kubectl apply -f yy.yml

pod/test created

[root@master ~]# kubectl get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

test 1/1 Running 0 6s 10.244.1.98 node1

[](()Taint(污点)& Tolerations(污点容忍)


Taints: 避免Pod调度到特定Node上

TolerationsI: 允许Pod调度到持有Taints的Node上

应用场景:

  • 专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配

  • 配备特殊硬件:部分Node配有SSD硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配

  • 基于Taint的驱逐

给节点添加污点

格式:kubectl taint node [node] key=value:[effect]

例如:kubectl taint node node1 gpu=yes:NoSchedule

验证:kubectl describe node node1 lgrep Taint

去掉污点:kubectl taint node [node] key:[effect]-

//查看污点

[root@master ~]# kubectl describe node node1 node2 master | grep -i taint

Taints:

Taints:

Taints: node-role.kubernetes.io/master:NoSchedule

其中[effect]可取值

  • NoSchedule :一定不能被调度

  • PreferNoSchedule:尽量不要调度,非必须配置容忍

  • NoExecute:不仅不会调度,还会驱逐Node上已有的Pod

添加污点容忍(tolrations)字段到Pod配置中

//案例

apiVersion: v1

kind: Pod

metadata:

name: pod-taints

spec:

containers:

  • name: pod-taints

image: busybox:latest

 类似资料: