ovs-cni是由kubevirt提供的一种k8s cni, 用于将pod接口长在ovs网桥上面,其原理为:创建一对veth接口,一端加到ovs网桥,另一端加到pod内部。
ovs-cni不会自动创建网桥,所以必须提前创建好。
ovs-cni也不会实现跨host的pod通信,必须提前规划好通过ovs跨host通信方案。
必须在安装了multus的k8s环境上,因为要使用multus创建的crd资源network-attachment-definitions来定义ovs配置。
k8s环境如下
root@master:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready master 183d v1.17.3 192.168.122.20 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2
node1 Ready <none> 183d v1.17.3 192.168.122.21 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2
node2 Ready <none> 183d v1.17.3 192.168.122.22 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2
root@master:~# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5b644bc49c-4vfjx 1/1 Running 2 46d
calico-node-5gtw7 1/1 Running 2 46d
calico-node-mqt6l 1/1 Running 4 46d
calico-node-t4vjh 1/1 Running 2 46d
coredns-9d85f5447-4znmx 1/1 Running 4 42d
coredns-9d85f5447-fh667 1/1 Running 2 42d
etcd-master 1/1 Running 8 183d
kube-apiserver-master 1/1 Running 0 27h
kube-controller-manager-master 1/1 Running 8 183d
kube-multus-ds-amd64-7b4fw 1/1 Running 0 5h13m
kube-multus-ds-amd64-dq2s8 1/1 Running 0 5h13m
kube-multus-ds-amd64-sqf8g 1/1 Running 0 5h13m
kube-proxy-l4wn7 1/1 Running 5 183d
kube-proxy-prhcm 1/1 Running 5 183d
kube-proxy-psxqt 1/1 Running 8 183d
kube-scheduler-master 1/1 Running 8 183d
在三个节点上执行如下命令,安装openvswitch,如果要实现跨host的pod通信,可以将host上的对外通信的网卡加到网桥上。
apt install openvswitch-switch/eoan
下载ovs-cni源码,获取用于安装ovs-cni的yaml文件
git clone https://github.com/kubevirt/ovs-cni.git
cp manifests/ovs-cni.yml.in ./ovs-cni.yaml
修改ovs-cni.yaml文件中如下几个宏定义
#安装到kube-system namespace中
NAMESPACE -> kube-system
#ovs-cni-plugin的image路径
${OVS_CNI_PLUGIN_IMAGE_REPO}/${OVS_CNI_PLUGIN_IMAGE_NAME}:${OVS_CNI_PLUGIN_IMAGE_VERSION} -> quay.io/kubevirt/ovs-cni-plugin
#cni binary路径,ovs-cni-plugin起来后,会将pod内的ovs binary复制到这个路径
CNI_MOUNT_PATH->/opt/cni/bin
#image pull策略,不一定非是Always
OVS_CNI_PLUGIN_IMAGE_PULL_POLICY->Always
#ovs-cni-marker的image路径
${OVS_CNI_MARKER_IMAGE_REPO}/${OVS_CNI_MARKER_IMAGE_NAME}:${OVS_CNI_MARKER_IMAGE_VERSION} ->quay.io/kubevirt/ovs-cni-marker
安装
root@master:~/ovs/ovs-cni-master# kubectl apply -f ovs-cni.yaml
daemonset.apps/ovs-cni-amd64 created
clusterrole.rbac.authorization.k8s.io/ovs-cni-marker-cr created
clusterrolebinding.rbac.authorization.k8s.io/ovs-cni-marker-crb created
serviceaccount/ovs-cni-marker created
如下,ovs-cni pod已经在三个节点上处于running状态。
root@master:~/ovs/ovs-cni-master# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5b644bc49c-4vfjx 1/1 Running 2 46d
calico-node-5gtw7 1/1 Running 2 46d
calico-node-mqt6l 1/1 Running 4 46d
calico-node-t4vjh 1/1 Running 2 46d
coredns-9d85f5447-4znmx 1/1 Running 4 42d
coredns-9d85f5447-fh667 1/1 Running 2 42d
etcd-master 1/1 Running 8 183d
kube-apiserver-master 1/1 Running 0 28h
kube-controller-manager-master 1/1 Running 8 183d
kube-multus-ds-amd64-7b4fw 1/1 Running 0 5h26m
kube-multus-ds-amd64-dq2s8 1/1 Running 0 5h26m
kube-multus-ds-amd64-sqf8g 1/1 Running 0 5h26m
kube-proxy-l4wn7 1/1 Running 5 183d
kube-proxy-prhcm 1/1 Running 5 183d
kube-proxy-psxqt 1/1 Running 8 183d
kube-scheduler-master 1/1 Running 8 183d
ovs-cni-amd64-2wjnx 1/1 Running 0 4m53s
ovs-cni-amd64-dp7w5 1/1 Running 0 4m53s
ovs-cni-amd64-l849m 1/1 Running 0 4m53s
从上面的ovs-cni.yaml可知,ovs-cni的pod中配置了两个容器:ovs-cni-plugin和ovs-cni-marker。下面分别介绍这俩容器的作用
a. ovs-cni-plguin是一个initContainers,它的作用是将ovs binary从image中拷贝到host上的/opt/cni/bin目录下,执行完此容器就结束,所以pod处于running后,READY为1/1,只显示一个容器。可查看下pod的describe中和 ovs-cni-plugin相关的状态。
Init Containers:
ovs-cni-plugin:
Container ID: docker://b74f58af95cf2e36be9c34bc168fcf57a51643b4aeaef92dbff7eae1b25951f8
Image: quay.io/kubevirt/ovs-cni-plugin
Image ID: docker-pullable://quay.io/kubevirt/ovs-cni-plugin@sha256:4101c52617efb54a45181548c257a08e3689f634b79b9dfcff42bffd8b25af53
Port: <none>
Host Port: <none>
Command:
cp
/ovs
/host/opt/cni/bin/ovs
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 16 Aug 2020 15:03:48 +0000
Finished: Sun, 16 Aug 2020 15:03:48 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/opt/cni/bin from cnibin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ovs-cni-marker-token-mg682 (ro)
b. ovs-cni-marker主要是为了将node上发现的ovs网桥通知k8s,作为k8s的node资源
在三个节点上创建网桥br1
root@master:~# ovs-vsctl add-br br1
root@master:~# ovs-vsctl show
10e5bd4e-be5c-4f68-ba52-59d428e9dbe3
Bridge "br1"
Port "br1"
Interface "br1"
type: internal
ovs_version: "2.12.0"
可以查看到node的capacity和Allocatable多了ovs-cni的资源,其中"1k"表示br1网桥上ovs端口个数(代码写死,没参数可以修改)。
root@master:~/ovs/ovs-cni-master# kubectl describe node master
...
Capacity:
ovs-cni.network.kubevirt.io/br1: 1k
...
Allocatable:
ovs-cni.network.kubevirt.io/br1: 1k
...
查看marker进程,-ovs-socket 指定了ovs db的sock文件,用于获取ovs网桥和接口信息
root@master:~# ps -ef | grep marker | grep -v grep
root 23338 23319 0 15:03 ? 00:00:04 ./marker -v 3 -logtostderr -node-name master -ovs-socket /host/var/run/openvswitch/db.sock
首先创建net-attach-def,可使用如下参数
name (string, required): the name of the network.
type (string, required): "ovs".
bridge (string, required): name of the bridge to use.
vlan (integer, optional): VLAN ID of attached port. Trunk port if not specified.
mtu (integer, optional): MTU.
trunk (optional): List of VLAN ID's and/or ranges of accepted VLAN ID's.
创建单个ovs接口
创建一个net-attach-def ovs-conf, cni 类型为ovs,使用网桥为br1,指定vlan id为100
cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-conf
annotations:
k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/br1
spec:
config: '{
"cniVersion": "0.3.1",
"type": "ovs",
"bridge": "br1",
"vlan": 100
}'
EOF
创建一个pod,annotations指定网络为ovs-conf
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
查看pod的网络接口,lo为默认的loopback接口,tunl0为calico网络自动创建的,eth0为calico创建的默认的pod接口,net1为刚才创建的连接到ovs网桥的接口(接口类型为veth,net1的对端连接到ovs网桥)。
root@master:~# kubectl exec -it test ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP
link/ether 1e:6a:39:93:ba:e8 brd ff:ff:ff:ff:ff:ff
inet 10.24.166.138/32 scope global eth0
valid_lft forever preferred_lft forever
6: net1@if5: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:00:00:75:a6:3e brd ff:ff:ff:ff:ff:ff
此pod被调度到node1上,查看网桥br1,可看到有一个veth74ca52ee接口,并且vlan id为100。此接口的对端为pod内部的net1。
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "veth74ca52ee"
tag: 100
Interface "veth74ca52ee"
Port "br1"
Interface "br1"
type: internal
ovs_version: "2.12.0"
创建多个ovs接口
可以在pod的annotations中,指定多次同一个net-attach-def,
或者指定多个不同的net-attach-def来达到添加多个接口的目的。
a. 指定多次同一个net-attach-def ovs-conf
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf,ovs-conf
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
在node1的网桥br1上,可看到添加了两个veth接口,并且vlan都是100
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "br1"
Interface "br1"
type: internal
Port "veth2fa51154"
tag: 100
Interface "veth2fa51154"
Port "veth99fb8572"
tag: 100
Interface "veth99fb8572"
ovs_version: "2.12.0"
b. 指定多个不同的net-attach-def
首先创建另一个net-attach-def ovs-conf1, 指定vlan id为200
cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-conf1
annotations:
k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/br1
spec:
config: '{
"cniVersion": "0.3.1",
"type": "ovs",
"bridge": "br1",
"vlan": 200
}'
EOF
创建pod时,同时指定ovs-conf和ovs-conf1
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf,ovs-conf1
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
在node1的网桥br1添加了两个veth接口,并且vlan是不同的,分别为ovs-conf指定的100和ovs-conf1指定的200.
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "veth1d98bc6f"
tag: 100
Interface "veth1d98bc6f"
Port "br1"
Interface "br1"
type: internal
Port "veth2e3c55ba"
tag: 200
Interface "veth2e3c55ba"
ovs_version: "2.12.0"
https://github.com/kubevirt/ovs-cni
https://github.com/kubevirt/ovs-cni/blob/master/docs/cni-plugin.md
https://github.com/kubevirt/ovs-cni/blob/master/docs/marker.md