当前位置: 首页 > 文档资料 > 技术文档 Cookbook >

常见问题

优质
小牛编辑
146浏览
2023-12-01

certificate signed by unknown authority

安装环境

  • OpenShift 3.11

  • RHEL 7.5

问题描述

后续安装过程中 import image-stream 或部署镜像时报错:

Get https://registry.example.com/v2/: x509: certificate signed by unknown authority

原因及解决办法

OpenShift 无法找到自签名的证书,导致此问题,解决办法:

编辑 Master 节点上的 API Server 配置文件 /etc/origin/node/pods/apiserver.yaml,添加
    volumeMounts:
    ...
    - mountPath: /etc/pki
      name: certs
    ...
  volumes:
  ...
  - hostPath:
      path: /etc/pki
    name: certs
编辑 Master 节点上的 controller 配置文件 /etc/origin/node/pods/controller.yaml,添加
    volumeMounts:
    ...
    - mountPath: /etc/pki
      name: certs
    ...
  volumes:
  ...
  - hostPath:
      path: /etc/pki
    name: certs
default 下创建 configmap,并编辑 dc docker-registry
# oc create configmap host-ca-trust --from-file=cert.pem=/etc/pki/tls/cert.pem
# oc edit dc docker-registry -n default
...
        volumeMounts:
        ...
        - mountPath: /etc/pki/tls/certs
          name: ca-trust
        ...
      volumes:
        ...
      - configMap:
          defaultMode: 420
          name: host-ca-trust
        name: ca-trust
Note修改完成后需重起服务: master-restart ap, master-restart controllers

nfs is an unsupported type

安装环境

  • OpenShift 3.9

  • RHEL 7.5

问题描述

Ansible 执行 playbooks prerequisites.yml 检测出错:

TASK [Run variable sanity checks] ************************************************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"failed": true, "msg": "last_checked_host: master.example.com, last_checked_var: openshift_hosted_registry_storage_kind;nfs is an unsupported type for openshift_hosted_registry_storage_kind. openshift_enable_unsupported_configurations=True mustbe specified to continue with this configuration."}

原因及解决办法

The use of NFS for the core OpenShift Components was never recommended, as NFS (and the NFS Protocol) does not
provide the proper consistency needed for the applications that make up the OpenShift infrastructure.

As a result, the installer/update playbooks now require an option to enable the use of NFS with core infrastructure components.

In ansible inventory file you should specify the following:

openshift_enable_unsupported_configurations=True

Package atomic-openshift-clients-3.9.27 not available

安装环境

  • OpenShift 3.9

  • RHEL 7.5

问题描述

Ansible 执行 playbooks deploy_cluster.yml 过程报错:

TASK [openshift_cli : Install clients] *******************************************************************************************************************************************************
FAILED - RETRYING: Install clients (2 retries left).
FAILED - RETRYING: Install clients (1 retries left).
fatal: [master.example.com]: FAILED! => {"attempts": 2, "changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-clients-3.9.27' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-clients-3.9.27' found available, installed or updated"]}

问题原因

atomic-openshift-clients 包在 Master 节点由于一些依赖冲突导致找不到。

解决办法

  • 在同步 yum 源时确保包为唯一,即 reposync -lmn 同步 yum 源,-n 仅同步最新的包

  • 将相应的包 atomic-openshift-clients-3.9.27-1.git.0.964617d.el7.x86_64.rpm 及其依赖 atomic-openshift-3.9.27-1.git.0.964617d.el7.x86_64.rpm 拷贝到 master, 本地安装。

分析步骤

  1. 在本地 yum 源仓库执行 find 确认包是否存在(find -name atomic-openshift-clients*),如果存在执行第二步

  2. 在 Master 节点执行 yum search,如果包不存在,则说明依赖冲突导致某些包别屏蔽

NFS mount registry-volume failed

安装环境

  • OpenShift 3.9

  • RHEL 7.5

问题描述

Ansible 执行 playbooks deploy_cluster.yml 过程报错:

TASK [openshift_hosted : Poll for OpenShift pod deployment success] **************************************************************************************************************************
FAILED - RETRYING: Poll for OpenShift pod deployment success (60 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (59 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (58 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (57 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (56 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (55 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (54 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (53 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (52 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (51 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (50 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (49 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (48 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (47 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (46 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (45 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (44 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (43 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (42 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (41 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (40 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (39 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (38 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (37 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (36 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (35 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (34 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (33 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (32 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (31 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (30 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (29 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (28 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (27 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (26 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (25 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (24 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (23 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (22 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (21 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (20 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (19 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (18 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (17 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (16 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (15 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (14 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (13 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (12 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (11 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (10 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (9 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (8 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (7 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (6 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (5 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (4 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (3 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (2 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (1 retries left).
failed: [master.example.com] (item=[{u'namespace': u'default', u'name': u'docker-registry'}, {'_ansible_parsed': True, 'stderr_lines': [], u'cmd': [u'oc', u'get', u'deploymentconfig', u'docker-registry', u'--namespace', u'default', u'--config', u'/etc/origin/master/admin.kubeconfig', u'-o', u'jsonpath={ .status.latestVersion }'], u'end': u'2018-06-17 10:04:10.045056', '_ansible_no_log': False, u'stdout': u'3', '_ansible_item_result': True, u'changed': True, 'item': {u'namespace': u'default', u'name': u'docker-registry'}, u'delta': u'0:00:00.227236', u'stderr': u'', u'rc': 0, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u"oc get deploymentconfig docker-registry --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, 'stdout_lines': [u'3'], u'start': u'2018-06-17 10:04:09.817820', '_ansible_ignore_errors': None, 'failed': False}]) => {"attempts": 60, "changed": true, "cmd": ["oc", "get", "replicationcontroller", "docker-registry-3", "--namespace", "default", "--config", "/etc/origin/master/admin.kubeconfig", "-o", "jsonpath={ .metadata.annotations.openshift\\.io/deployment\\.phase }"], "delta": "0:00:00.196019", "end": "2018-06-17 10:14:37.184958", "failed": true, "failed_when_result": true, "item": [{"name": "docker-registry", "namespace": "default"}, {"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["oc", "get", "deploymentconfig", "docker-registry", "--namespace", "default", "--config", "/etc/origin/master/admin.kubeconfig", "-o", "jsonpath={ .status.latestVersion }"], "delta": "0:00:00.227236", "end": "2018-06-17 10:04:10.045056", "failed": false, "invocation": {"module_args": {"_raw_params": "oc get deploymentconfig docker-registry --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "docker-registry", "namespace": "default"}, "rc": 0, "start": "2018-06-17 10:04:09.817820", "stderr": "", "stderr_lines": [], "stdout": "3", "stdout_lines": ["3"]}], "rc": 0, "start": "2018-06-17 10:14:36.988939", "stderr": "", "stderr_lines": [], "stdout": "Failed", "stdout_lines": ["Failed"]}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
master.example.com         : ok=460  changed=69   unreachable=0    failed=1
nfs.example.com            : ok=30   changed=1    unreachable=0    failed=0
node1.example.com          : ok=120  changed=13   unreachable=0    failed=0
node2.example.com          : ok=120  changed=13   unreachable=0    failed=0


INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization             : Complete (0:00:31)
Health Check               : Complete (0:00:05)
etcd Install               : Complete (0:00:28)
NFS Install                : Complete (0:00:54)
Master Install             : Complete (0:07:44)
Master Additional Install  : Complete (0:00:33)
Node Install               : Complete (0:01:42)
Hosted Install             : In Progress (0:21:02)
	This phase can be restarted by running: playbooks/openshift-hosted/config.yml
Failure summary:


  1. Hosts:    master.example.com
     Play:     Poll for hosted pod deployments
     Task:     Poll for OpenShift pod deployment success
     Message:  All items completed

问题原因

  • docker-registry Mount NFS 服务器不成功,docker-registry Pod Start Failed due to NFS Server mount registry-volume failed

  • mount.nfs: Protocol not supported

解决办法

解决方法-1:Skip hosted_manage_registry, 设置 openshift_hosted_manage_registry 为 false,这样会跳过安装 docker-registry
openshift_hosted_manage_registry=false

分析步骤

1 - 安装过程查看 docker-registry 相关的 Pod
# oc get pods | grep docker-registry
docker-registry-3-deploy   1/1       Running             0          9m
docker-registry-3-g7l84    0/1       ContainerCreating   0          9m
2 - docker-registry-deploy Pod 启动成功后查看docker-registry Pod 启动情况
# oc describe po/docker-registry-3-g7l84
...
  Warning  FailedMount  8m  kubelet, node1.example.com  MountVolume.SetUp failed for volume "registry-volume" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/aee76710-76fd-11e8-956e-5254006bf7c5/volumes/kubernetes.io~nfs/registry-volume --scope -- mount -t nfs nfs.example.com:/exports/registry /var/lib/origin/openshift.local.volumes/pods/aee76710-76fd-11e8-956e-5254006bf7c5/volumes/kubernetes.io~nfs/registry-volume
Output: Running scope as unit run-2262.scope.
mount.nfs: Protocol not supported
...

apiserver Pod can not resolve the etc hostname

安装环境

  • OpenShift 3.9

  • RHEL 7.5

问题描述

Ansible 执行 playbooks deploy_cluster.yml 过程报错:

错误类型一
TASK [openshift_service_catalog : wait for api server to be ready] ***************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"attempts": 1, "changed": false, "connection": "close", "content": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed\n", "content_length": "180", "content_type": "text/plain; charset=utf-8", "date": "Sat, 23 Jun 2018 23:29:40 GMT", "failed": true, "msg": "Status code was not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
master.example.com         : ok=641  changed=130  unreachable=0    failed=1
nfs.example.com            : ok=29   changed=1    unreachable=0    failed=0
node1.example.com          : ok=120  changed=13   unreachable=0    failed=0
node2.example.com          : ok=120  changed=13   unreachable=0    failed=0


INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization             : Complete (0:00:32)
Health Check               : Complete (0:00:04)
etcd Install               : Complete (0:00:30)
NFS Install                : Complete (0:00:38)
Master Install             : Complete (0:01:34)
Master Additional Install  : Complete (0:00:28)
Node Install               : Complete (0:01:37)
Hosted Install             : Complete (0:00:31)
Metrics Install            : Complete (0:01:42)
Service Catalog Install    : In Progress (0:00:48)
	This phase can be restarted by running: playbooks/openshift-service-catalog/config.yml
Failure summary:


  1. Hosts:    master.example.com
     Play:     Service Catalog
     Task:     wait for api server to be ready
     Message:  Status code was not [200]: HTTP Error 500: Internal Server Error
问题描述二
TASK [openshift_service_catalog : wait for api server to be ready] ***************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"attempts": 60, "changed": false, "connection": "close", "content": "Too many requests, please try again later.\n", "content_length": "43", "content_type": "text/plain; charset=utf-8", "date": "Sun, 24 Jun 2018 06:28:47 GMT", "failed": true, "msg": "Status code was not [200]: HTTP Error 429: Too Many Requests", "redirected": false, "retry_after": "1", "status": 429, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
master.example.com         : ok=653  changed=121  unreachable=0    failed=1
nfs.example.com            : ok=29   changed=1    unreachable=0    failed=0
node1.example.com          : ok=120  changed=13   unreachable=0    failed=0
node2.example.com          : ok=120  changed=13   unreachable=0    failed=0


INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization             : Complete (0:00:57)
Health Check               : Complete (0:00:07)
etcd Install               : Complete (0:00:48)
NFS Install                : Complete (0:01:03)
Master Install             : Complete (0:02:50)
Master Additional Install  : Complete (0:00:35)
Node Install               : Complete (0:01:54)
Hosted Install             : Complete (0:11:11)
Metrics Install            : Complete (0:01:53)
Service Catalog Install    : In Progress (0:11:10)
	This phase can be restarted by running: playbooks/openshift-service-catalog/config.yml
Failure summary:


  1. Hosts:    master.example.com
     Play:     Service Catalog
     Task:     wait for api server to be ready
     Message:  Status code was not [200]: HTTP Error 429: Too Many Requests

问题原因

apiserver POD 中 DNS 无法解析导致 https://apiserver.kube-service-catalog.svc/healthz 调运失败。

解决方法

NOTE:如下三种方法都可以解决这个问题,选择其中任意一种即可。

解决方法一: 在 DNS 服务中添加静态地址映射

1 - 编辑 /etc/dnsmasq.d/openshift-cluster.conf,添加静态域名映射
local=/example.com/
address=/.apps.example.com/192.168.122.101
address=/master.example.com/192.168.122.101
address=/node1.example.com/192.168.122.105
address=/node2.example.com/192.168.122.106
2 - 重启 DNS 服务器
# systemctl restart dnsmasq.service
# systemctl status dnsmasq.service
3 - 验证 DNS 解析
# for i in master node1 node2 ; do ssh $i.example.com 'dig master.example.com +short' ; done
192.168.122.101
192.168.122.101
192.168.122.101

# oc rsh apiserver-n56cp
sh-4.2# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://master.example.com:2379/version
{"etcdserver":"3.2.18","etcdcluster":"3.2.0"}
4 - 重新安装 service catalog
# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-service-catalog/config.yml

解决方法二:解决 DNS 服务器不加载静态域名映射的问题

Base on DNS concepts, the DNS server like dnsmasq should parse /etc/hosts and add all static mapping as a DNS records accordingly, find the root reason why dnsmasq not load /etc/hosts and resolve it also can resolve this issue.

Check from journal logs in my environment which run the DNS server, I can find the error like failed to load names from /etc/hosts: Permission denied.

解决方法三:在 POD 中添加静态域名

在 apiserver POD 中添加静态域名解析,例如 oc rsh apiserver-nkt5k && echo "192.168.122.101 master.example.com" >> /etc/hosts

NOTE:这种方式适合 POD 配置持久化 PV,将一些结果持久化到存储服务器

分析步骤

验证 apiserver 中是否可进行 DNS 解析

1 - 测试服务是否可达
# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
2 - oc edit ds/apiserver 查看 etcd 服务地址
    spec:
      containers:
      - args:
        - apiserver
        - --storage-type
        - etcd
        - --secure-port
        - "6443"
        - --etcd-servers
        - https://master.example.com:2379
        - --etcd-cafile
        - /etc/origin/master/master.etcd-ca.crt
        - --etcd-certfile
        - /etc/origin/master/master.etcd-client.crt
        - --etcd-keyfile
        - /etc/origin/master/master.etcd-client.key
3 - 查看服务是否可达
# etcdctl -C https://master.example.com:2379 --ca-file /etc/origin/master/master.etcd-ca.crt --cert-file /etc/origin/master/master.etcd-client.crt  --key-file /etc/origin/master/master.etcd-client.key ls

# oc rsh apiserver-56p7q
# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://master.example.com:2379/version
curl: (6) Could not resolve host: master.example.com; Unknown error
4 - IP 地址替换域名后继续第三步操作
# etcdctl -C https://192.168.56.66:2379 --ca-file /etc/origin/master/master.etcd-ca.crt --cert-file /etc/origin/master/master.etcd-client.crt  --key-file /etc/origin/master/master.etcd-client.key ls

# oc rsh apiserver-56p7q
# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://192.168.56.66:2379/version
{"etcdserver":"3.2.18","etcdcluster":"3.2.0"}
5 - 查看 api POD 静态域名配置及域名解析配置文件
# oc rsh apiserver-56p7q

sh-4.2# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
10.244.0.15	apiserver-56p7q

sh-4.2# cat /etc/resolv.conf
nameserver 192.168.122.101
search kube-service-catalog.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

查看 DNS 相关配置及检测是否可以解析

1 - 查看 ifcfg 中 DNS 配置
# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; cat /etc/sysconfig/network-scripts/ifcfg-eth0 | grep DNS; echo' ; done
master.example.com
DNS1=192.168.122.101

node1.example.com
DNS1=192.168.122.101

node2.example.com
DNS1=192.168.122.101
2 - 查看 resolv.conf 配置文件
# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; cat /etc/resolv.conf ; echo' ; done
master.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.101

node1.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.105

node2.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.106
3 - ping domain master.example.com
# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; ping master.example.com -c1 ; echo' ; done
master.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.049 ms

--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.049/0.049/0.049/0.000 ms

node1.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.238 ms

--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.238/0.238/0.238/0.000 ms

node2.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.116 ms

--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.116/0.116/0.116/0.000 ms
Note第三步 ping 是可达的
4 - dig 分析域名解析
# for i in master node1 node2 ; do ssh $i.example.com 'dig test.apps.example.com +short ; echo' ; done
192.168.122.101

192.168.122.101

192.168.122.101


# for i in master node1 node2 ; do ssh $i.example.com 'dig master.example.com +short; echo' ; done
Notedig 分析的结果显示 DNS 服务器不能够解析 master.example.com,但可以解析应用地址 test.apps.example.com
5 - nslookup 进一步分析域名解析
# for i in master node1 node2 ; do ssh $i.example.com 'nslookup test.apps.example.com ; echo' ; done
Server:		192.168.122.101
Address:	192.168.122.101#53

Name:	test.apps.example.com
Address: 192.168.122.101


Server:		192.168.122.105
Address:	192.168.122.105#53

Name:	test.apps.example.com
Address: 192.168.122.101


Server:		192.168.122.106
Address:	192.168.122.106#53

Name:	test.apps.example.com
Address: 192.168.122.101


# for i in master node1 node2 ; do ssh $i.example.com 'nslookup master.example.com ; echo' ; done
Server:		192.168.122.101
Address:	192.168.122.101#53

** server can't find master.example.com: NXDOMAIN


Server:		192.168.122.105
Address:	192.168.122.105#53

** server can't find master.example.com: NXDOMAIN


Server:		192.168.122.106
Address:	192.168.122.106#53

** server can't find master.example.com: NXDOMAIN
Note和第四步得出的结论一样,DNS 服务器不能够解析 master.example.com,但可以解析应用地址 test.apps.example.com

查看 DNS 服务器运行状态

# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-07-03 10:32:22 CST; 1s ago
 Main PID: 30605 (dnsmasq)
    Tasks: 1
   Memory: 940.0K
   CGroup: /system.slice/dnsmasq.service
           └─30605 /usr/sbin/dnsmasq -k

Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:23 master.example.com dnsmasq[30605]: setting upstream servers from DBus
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using local addresses only for domain example.com
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local