常见问题
certificate signed by unknown authority
安装环境
OpenShift 3.11
RHEL 7.5
问题描述
后续安装过程中 import image-stream 或部署镜像时报错:
Get https://registry.example.com/v2/: x509: certificate signed by unknown authority
原因及解决办法
OpenShift 无法找到自签名的证书,导致此问题,解决办法:
编辑 Master 节点上的 API Server 配置文件/etc/origin/node/pods/apiserver.yaml
,添加 volumeMounts:
...
- mountPath: /etc/pki
name: certs
...
volumes:
...
- hostPath:
path: /etc/pki
name: certs
编辑 Master 节点上的 controller 配置文件 /etc/origin/node/pods/controller.yaml
,添加 volumeMounts:
...
- mountPath: /etc/pki
name: certs
...
volumes:
...
- hostPath:
path: /etc/pki
name: certs
default 下创建 configmap,并编辑 dc docker-registry# oc create configmap host-ca-trust --from-file=cert.pem=/etc/pki/tls/cert.pem
# oc edit dc docker-registry -n default
...
volumeMounts:
...
- mountPath: /etc/pki/tls/certs
name: ca-trust
...
volumes:
...
- configMap:
defaultMode: 420
name: host-ca-trust
name: ca-trust
Note | 修改完成后需重起服务: master-restart ap , master-restart controllers 。 |
nfs is an unsupported type
安装环境
OpenShift 3.9
RHEL 7.5
问题描述
Ansible 执行 playbooks prerequisites.yml 检测出错:
TASK [Run variable sanity checks] ************************************************************************************************************************************************************ fatal: [master.example.com]: FAILED! => {"failed": true, "msg": "last_checked_host: master.example.com, last_checked_var: openshift_hosted_registry_storage_kind;nfs is an unsupported type for openshift_hosted_registry_storage_kind. openshift_enable_unsupported_configurations=True mustbe specified to continue with this configuration."}
原因及解决办法
The use of NFS for the core OpenShift Components was never recommended, as NFS (and the NFS Protocol) does not provide the proper consistency needed for the applications that make up the OpenShift infrastructure. As a result, the installer/update playbooks now require an option to enable the use of NFS with core infrastructure components. In ansible inventory file you should specify the following: openshift_enable_unsupported_configurations=True
Package atomic-openshift-clients-3.9.27 not available
安装环境
OpenShift 3.9
RHEL 7.5
问题描述
Ansible 执行 playbooks deploy_cluster.yml 过程报错:
TASK [openshift_cli : Install clients] ******************************************************************************************************************************************************* FAILED - RETRYING: Install clients (2 retries left). FAILED - RETRYING: Install clients (1 retries left). fatal: [master.example.com]: FAILED! => {"attempts": 2, "changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-clients-3.9.27' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-clients-3.9.27' found available, installed or updated"]}
问题原因
atomic-openshift-clients 包在 Master 节点由于一些依赖冲突导致找不到。
解决办法
在同步 yum 源时确保包为唯一,即
reposync -lmn
同步 yum 源,-n
仅同步最新的包将相应的包
atomic-openshift-clients-3.9.27-1.git.0.964617d.el7.x86_64.rpm
及其依赖atomic-openshift-3.9.27-1.git.0.964617d.el7.x86_64.rpm
拷贝到 master, 本地安装。
分析步骤
在本地 yum 源仓库执行 find 确认包是否存在(
find -name atomic-openshift-clients*
),如果存在执行第二步在 Master 节点执行
yum search
,如果包不存在,则说明依赖冲突导致某些包别屏蔽
NFS mount registry-volume failed
安装环境
OpenShift 3.9
RHEL 7.5
问题描述
Ansible 执行 playbooks deploy_cluster.yml 过程报错:
TASK [openshift_hosted : Poll for OpenShift pod deployment success] **************************************************************************************************************************
FAILED - RETRYING: Poll for OpenShift pod deployment success (60 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (59 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (58 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (57 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (56 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (55 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (54 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (53 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (52 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (51 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (50 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (49 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (48 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (47 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (46 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (45 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (44 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (43 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (42 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (41 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (40 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (39 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (38 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (37 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (36 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (35 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (34 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (33 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (32 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (31 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (30 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (29 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (28 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (27 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (26 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (25 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (24 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (23 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (22 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (21 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (20 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (19 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (18 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (17 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (16 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (15 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (14 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (13 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (12 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (11 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (10 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (9 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (8 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (7 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (6 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (5 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (4 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (3 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (2 retries left).
FAILED - RETRYING: Poll for OpenShift pod deployment success (1 retries left).
failed: [master.example.com] (item=[{u'namespace': u'default', u'name': u'docker-registry'}, {'_ansible_parsed': True, 'stderr_lines': [], u'cmd': [u'oc', u'get', u'deploymentconfig', u'docker-registry', u'--namespace', u'default', u'--config', u'/etc/origin/master/admin.kubeconfig', u'-o', u'jsonpath={ .status.latestVersion }'], u'end': u'2018-06-17 10:04:10.045056', '_ansible_no_log': False, u'stdout': u'3', '_ansible_item_result': True, u'changed': True, 'item': {u'namespace': u'default', u'name': u'docker-registry'}, u'delta': u'0:00:00.227236', u'stderr': u'', u'rc': 0, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u"oc get deploymentconfig docker-registry --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, 'stdout_lines': [u'3'], u'start': u'2018-06-17 10:04:09.817820', '_ansible_ignore_errors': None, 'failed': False}]) => {"attempts": 60, "changed": true, "cmd": ["oc", "get", "replicationcontroller", "docker-registry-3", "--namespace", "default", "--config", "/etc/origin/master/admin.kubeconfig", "-o", "jsonpath={ .metadata.annotations.openshift\\.io/deployment\\.phase }"], "delta": "0:00:00.196019", "end": "2018-06-17 10:14:37.184958", "failed": true, "failed_when_result": true, "item": [{"name": "docker-registry", "namespace": "default"}, {"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["oc", "get", "deploymentconfig", "docker-registry", "--namespace", "default", "--config", "/etc/origin/master/admin.kubeconfig", "-o", "jsonpath={ .status.latestVersion }"], "delta": "0:00:00.227236", "end": "2018-06-17 10:04:10.045056", "failed": false, "invocation": {"module_args": {"_raw_params": "oc get deploymentconfig docker-registry --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "docker-registry", "namespace": "default"}, "rc": 0, "start": "2018-06-17 10:04:09.817820", "stderr": "", "stderr_lines": [], "stdout": "3", "stdout_lines": ["3"]}], "rc": 0, "start": "2018-06-17 10:14:36.988939", "stderr": "", "stderr_lines": [], "stdout": "Failed", "stdout_lines": ["Failed"]}
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry
PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=13 changed=0 unreachable=0 failed=0
master.example.com : ok=460 changed=69 unreachable=0 failed=1
nfs.example.com : ok=30 changed=1 unreachable=0 failed=0
node1.example.com : ok=120 changed=13 unreachable=0 failed=0
node2.example.com : ok=120 changed=13 unreachable=0 failed=0
INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization : Complete (0:00:31)
Health Check : Complete (0:00:05)
etcd Install : Complete (0:00:28)
NFS Install : Complete (0:00:54)
Master Install : Complete (0:07:44)
Master Additional Install : Complete (0:00:33)
Node Install : Complete (0:01:42)
Hosted Install : In Progress (0:21:02)
This phase can be restarted by running: playbooks/openshift-hosted/config.yml
Failure summary:
1. Hosts: master.example.com
Play: Poll for hosted pod deployments
Task: Poll for OpenShift pod deployment success
Message: All items completed
问题原因
docker-registry Mount NFS 服务器不成功,docker-registry Pod Start Failed due to NFS Server mount registry-volume failed
mount.nfs: Protocol not supported
解决办法
解决方法-1:Skip hosted_manage_registry, 设置 openshift_hosted_manage_registry 为 false,这样会跳过安装 docker-registryopenshift_hosted_manage_registry=false
分析步骤
1 - 安装过程查看 docker-registry 相关的 Pod# oc get pods | grep docker-registry
docker-registry-3-deploy 1/1 Running 0 9m
docker-registry-3-g7l84 0/1 ContainerCreating 0 9m
2 - docker-registry-deploy Pod 启动成功后查看docker-registry Pod 启动情况# oc describe po/docker-registry-3-g7l84
...
Warning FailedMount 8m kubelet, node1.example.com MountVolume.SetUp failed for volume "registry-volume" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/aee76710-76fd-11e8-956e-5254006bf7c5/volumes/kubernetes.io~nfs/registry-volume --scope -- mount -t nfs nfs.example.com:/exports/registry /var/lib/origin/openshift.local.volumes/pods/aee76710-76fd-11e8-956e-5254006bf7c5/volumes/kubernetes.io~nfs/registry-volume
Output: Running scope as unit run-2262.scope.
mount.nfs: Protocol not supported
...
apiserver Pod can not resolve the etc hostname
安装环境
OpenShift 3.9
RHEL 7.5
问题描述
Ansible 执行 playbooks deploy_cluster.yml 过程报错:
错误类型一TASK [openshift_service_catalog : wait for api server to be ready] ***************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"attempts": 1, "changed": false, "connection": "close", "content": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed\n", "content_length": "180", "content_type": "text/plain; charset=utf-8", "date": "Sat, 23 Jun 2018 23:29:40 GMT", "failed": true, "msg": "Status code was not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry
PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=13 changed=0 unreachable=0 failed=0
master.example.com : ok=641 changed=130 unreachable=0 failed=1
nfs.example.com : ok=29 changed=1 unreachable=0 failed=0
node1.example.com : ok=120 changed=13 unreachable=0 failed=0
node2.example.com : ok=120 changed=13 unreachable=0 failed=0
INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization : Complete (0:00:32)
Health Check : Complete (0:00:04)
etcd Install : Complete (0:00:30)
NFS Install : Complete (0:00:38)
Master Install : Complete (0:01:34)
Master Additional Install : Complete (0:00:28)
Node Install : Complete (0:01:37)
Hosted Install : Complete (0:00:31)
Metrics Install : Complete (0:01:42)
Service Catalog Install : In Progress (0:00:48)
This phase can be restarted by running: playbooks/openshift-service-catalog/config.yml
Failure summary:
1. Hosts: master.example.com
Play: Service Catalog
Task: wait for api server to be ready
Message: Status code was not [200]: HTTP Error 500: Internal Server Error
问题描述二TASK [openshift_service_catalog : wait for api server to be ready] ***************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"attempts": 60, "changed": false, "connection": "close", "content": "Too many requests, please try again later.\n", "content_length": "43", "content_type": "text/plain; charset=utf-8", "date": "Sun, 24 Jun 2018 06:28:47 GMT", "failed": true, "msg": "Status code was not [200]: HTTP Error 429: Too Many Requests", "redirected": false, "retry_after": "1", "status": 429, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry
PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=13 changed=0 unreachable=0 failed=0
master.example.com : ok=653 changed=121 unreachable=0 failed=1
nfs.example.com : ok=29 changed=1 unreachable=0 failed=0
node1.example.com : ok=120 changed=13 unreachable=0 failed=0
node2.example.com : ok=120 changed=13 unreachable=0 failed=0
INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization : Complete (0:00:57)
Health Check : Complete (0:00:07)
etcd Install : Complete (0:00:48)
NFS Install : Complete (0:01:03)
Master Install : Complete (0:02:50)
Master Additional Install : Complete (0:00:35)
Node Install : Complete (0:01:54)
Hosted Install : Complete (0:11:11)
Metrics Install : Complete (0:01:53)
Service Catalog Install : In Progress (0:11:10)
This phase can be restarted by running: playbooks/openshift-service-catalog/config.yml
Failure summary:
1. Hosts: master.example.com
Play: Service Catalog
Task: wait for api server to be ready
Message: Status code was not [200]: HTTP Error 429: Too Many Requests
问题原因
apiserver POD 中 DNS 无法解析导致 https://apiserver.kube-service-catalog.svc/healthz 调运失败。
解决方法
NOTE:如下三种方法都可以解决这个问题,选择其中任意一种即可。
解决方法一: 在 DNS 服务中添加静态地址映射
1 - 编辑 /etc/dnsmasq.d/openshift-cluster.conf,添加静态域名映射local=/example.com/
address=/.apps.example.com/192.168.122.101
address=/master.example.com/192.168.122.101
address=/node1.example.com/192.168.122.105
address=/node2.example.com/192.168.122.106
2 - 重启 DNS 服务器# systemctl restart dnsmasq.service
# systemctl status dnsmasq.service
3 - 验证 DNS 解析# for i in master node1 node2 ; do ssh $i.example.com 'dig master.example.com +short' ; done
192.168.122.101
192.168.122.101
192.168.122.101
# oc rsh apiserver-n56cp
sh-4.2# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://master.example.com:2379/version
{"etcdserver":"3.2.18","etcdcluster":"3.2.0"}
4 - 重新安装 service catalog# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-service-catalog/config.yml
解决方法二:解决 DNS 服务器不加载静态域名映射的问题
Base on DNS concepts, the DNS server like dnsmasq should parse /etc/hosts
and add all static mapping as a DNS records accordingly, find the root reason why dnsmasq not load /etc/hosts
and resolve it also can resolve this issue.
Check from journal logs in my environment which run the DNS server, I can find the error like failed to load names from /etc/hosts: Permission denied
.
解决方法三:在 POD 中添加静态域名
在 apiserver POD 中添加静态域名解析,例如 oc rsh apiserver-nkt5k && echo "192.168.122.101 master.example.com" >> /etc/hosts
NOTE:这种方式适合 POD 配置持久化 PV,将一些结果持久化到存储服务器
分析步骤
验证 apiserver 中是否可进行 DNS 解析
1 - 测试服务是否可达# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
2 - oc edit ds/apiserver 查看 etcd 服务地址 spec:
containers:
- args:
- apiserver
- --storage-type
- etcd
- --secure-port
- "6443"
- --etcd-servers
- https://master.example.com:2379
- --etcd-cafile
- /etc/origin/master/master.etcd-ca.crt
- --etcd-certfile
- /etc/origin/master/master.etcd-client.crt
- --etcd-keyfile
- /etc/origin/master/master.etcd-client.key
3 - 查看服务是否可达# etcdctl -C https://master.example.com:2379 --ca-file /etc/origin/master/master.etcd-ca.crt --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key ls
# oc rsh apiserver-56p7q
# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://master.example.com:2379/version
curl: (6) Could not resolve host: master.example.com; Unknown error
4 - IP 地址替换域名后继续第三步操作# etcdctl -C https://192.168.56.66:2379 --ca-file /etc/origin/master/master.etcd-ca.crt --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key ls
# oc rsh apiserver-56p7q
# curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key -X GET https://192.168.56.66:2379/version
{"etcdserver":"3.2.18","etcdcluster":"3.2.0"}
5 - 查看 api POD 静态域名配置及域名解析配置文件# oc rsh apiserver-56p7q
sh-4.2# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.0.15 apiserver-56p7q
sh-4.2# cat /etc/resolv.conf
nameserver 192.168.122.101
search kube-service-catalog.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5
查看 DNS 相关配置及检测是否可以解析
1 - 查看 ifcfg 中 DNS 配置# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; cat /etc/sysconfig/network-scripts/ifcfg-eth0 | grep DNS; echo' ; done
master.example.com
DNS1=192.168.122.101
node1.example.com
DNS1=192.168.122.101
node2.example.com
DNS1=192.168.122.101
2 - 查看 resolv.conf 配置文件# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; cat /etc/resolv.conf ; echo' ; done
master.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.101
node1.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.105
node2.example.com
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.122.106
3 - ping domain master.example.com# for i in master node1 node2 ; do ssh $i.example.com 'hostname ; ping master.example.com -c1 ; echo' ; done
master.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.049 ms
--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.049/0.049/0.049/0.000 ms
node1.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.238 ms
--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.238/0.238/0.238/0.000 ms
node2.example.com
PING master.example.com (192.168.122.101) 56(84) bytes of data.
64 bytes from master.example.com (192.168.122.101): icmp_seq=1 ttl=64 time=0.116 ms
--- master.example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.116/0.116/0.116/0.000 ms
Note | 第三步 ping 是可达的。 |
# for i in master node1 node2 ; do ssh $i.example.com 'dig test.apps.example.com +short ; echo' ; done
192.168.122.101
192.168.122.101
192.168.122.101
# for i in master node1 node2 ; do ssh $i.example.com 'dig master.example.com +short; echo' ; done
Note | dig 分析的结果显示 DNS 服务器不能够解析 master.example.com,但可以解析应用地址 test.apps.example.com。 |
# for i in master node1 node2 ; do ssh $i.example.com 'nslookup test.apps.example.com ; echo' ; done
Server: 192.168.122.101
Address: 192.168.122.101#53
Name: test.apps.example.com
Address: 192.168.122.101
Server: 192.168.122.105
Address: 192.168.122.105#53
Name: test.apps.example.com
Address: 192.168.122.101
Server: 192.168.122.106
Address: 192.168.122.106#53
Name: test.apps.example.com
Address: 192.168.122.101
# for i in master node1 node2 ; do ssh $i.example.com 'nslookup master.example.com ; echo' ; done
Server: 192.168.122.101
Address: 192.168.122.101#53
** server can't find master.example.com: NXDOMAIN
Server: 192.168.122.105
Address: 192.168.122.105#53
** server can't find master.example.com: NXDOMAIN
Server: 192.168.122.106
Address: 192.168.122.106#53
** server can't find master.example.com: NXDOMAIN
Note | 和第四步得出的结论一样,DNS 服务器不能够解析 master.example.com,但可以解析应用地址 test.apps.example.com。 |
查看 DNS 服务器运行状态
# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2018-07-03 10:32:22 CST; 1s ago
Main PID: 30605 (dnsmasq)
Tasks: 1
Memory: 940.0K
CGroup: /system.slice/dnsmasq.service
└─30605 /usr/sbin/dnsmasq -k
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:22 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:23 master.example.com dnsmasq[30605]: setting upstream servers from DBus
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using local addresses only for domain example.com
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jul 03 10:32:23 master.example.com dnsmasq[30605]: using nameserver 127.0.0.1#53 for domain cluster.local