kubernetes DNS

景成和

2023-12-01

最近公司的k8s集群中不时的会出现域名解析的问题,但排查问题的过程却费了些时间,为了以后的经验积累,在这里在梳理下用到的基本知识.

DNS: /etc/resolv.conf

有四个重要的元素:

nameserver //定义DNS服务器的IP地址,可以有多个,分行即可
domain //定义本地域名
search //定义域名的搜索列表,可以是多个,空格分隔即可
sortlist //对返回的域名进行排序

k8s DNS: k8s集群中是如何处理DNS的

每个k8s集群都会处理DNS的pod和service.

DNS name

k8s中的每个service都被assign了DNS name,这样在搜索的时候就可以按照一下规则来处理:

如果: service a和b都处于同一个namespace,那么a就可以直接search b
如果: service b处于另外一个namespace B,那么a就需要通过b.B来search

一个正常的DNS record: my-svc.my-namespace.svc.cluster-domain.example这就是经常说的FQDN, full qualified domain name

k8s Pod DNS policy

k8s的pod可以有四种DNS policy,只需要指定dnsPolicy元素即可,但是常用也就两种:

default: pod继承DNS configuration from 它所运行在的node
ClusterFirst: 正如名字所说的,pod现在k8s cluster内部做DNS解析,如果找不到,再去从其他地方做查询,比如它所运行的node,或是另外一个DNS server

kube-DNS

在k8s v1.11之前,k8s使用kube-dns来做DNS service:

会有一个kube-dns的service和它的几个pod
这个kube-dns会监听service和endpoint的event(类似service discovery)来更新dns records.
kubectl将新建pod的/etc/resolv.conf的DNS server设置成DNS service的IP
之后DNS就可以在cluster内部做DNS解析了

CoreDNS

但是自动k8s v1.11之后,默认的dns就从kube-dns换成了coredns,那么为什么?
CodeDNS除了使用Go开发的,提供了所有kube-dns所提供的功能,并且还dns cache, health check等其他功能外,还在kube-dns的基础上做了如下的优化:

提供了round-robinload balance策略
在解析100.1.1.1.namespace.svc.cluster.local如果100.1.1.1根本不存在,kube-dns依然会解析成100.1.1.1,这会带来什么麻烦;而coredns会多做一步verify,只有在它真正存在的时候,才会resolve

CoreDNS plugin & configuration

CoreDNS是建立在众多plugin的基础上的,即基本所有的功能都是由plugin带来的,这里可以找到所有的plugin,这里可以找到如何配置CoreDNS

举一个CoreDNS的简单例子

[ITPanda@panda001 ~]$ kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {    #port53是默认的DNS port，这个括号里面的全都是plugins
        errors #Any errors encountered during the query processing will be printed to standard output. The errors of particular type can be consolidated and printed once per some period of time.
        health #When CoreDNS is up and running this returns a 200 OK HTTP status code
        kubernetes cluster.local {    #[plugin-kubernetes](https://github.com/coredns/coredns/tree/master/plugin/kubernetes)
          pods insecure
          upstream /etc/resolv.conf    #这个和下面的 proxy . /etc/resolv.conf配合，可以使用node上的DNS
        }
        prometheus :9153    #With prometheus you export metrics from CoreDNS and any plugin that has them. The default location for the metrics is localhost:9153
        proxy . /etc/resolv.conf
        cache 30
    }

最后,k8s官方介绍些如何trouble shooting k8s cluster DNS问题的方法,看这里