当前位置: 首页 > 知识库问答 >
问题:

kubeadm init 在运行状况检查中卡住,当部署 HA kubernetes 主节点与哈波西

罗鸿畴
2023-03-14

我正在使用kubeadm部署HA kubernetes master(stacked etcd ),我遵循了官方网站上的说明:https://kubernetes . io/docs/setup/independent/high-avail ability/< br >目前我的集群中计划有四个节点:

    < li >一个HAProxy服务器节点用于主负载平衡。 < li >三个etcd堆叠主节点。

我使用以下配置部署了haproxy:

global
    daemon
    maxconn 256

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend haproxy_kube
    bind *:6443
    mode tcp
    option tcplog
    timeout client  10800s
    default_backend masters

backend masters
    mode tcp
    option tcplog
    balance leastconn
    timeout server  10800s
    server master01 <master01-ip>:6443 check

我的 kubeadm-config.yaml 是这样的:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
  name: "master01"
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
apiServer:
  certSANs:
  - "<haproxyserver-dns>"
controlPlaneEndpoint: "<haproxyserver-dns>:6443"
networking:
  serviceSubnet: "172.24.0.0/16"
  podSubnet: "172.16.0.0/16"

我的初始命令是:

kubeadm init --config=kubeadm-config.yaml -v 11

但是当我在master01上运行上面的命令后,它一直记录以下信息:

I0122 11:43:44.039849   17489 manifests.go:113] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0122 11:43:44.041038   17489 local.go:57] [etcd] wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
I0122 11:43:44.041068   17489 waitcontrolplane.go:89] [wait-control-plane] Waiting for the API server to be healthy
I0122 11:43:44.042665   17489 loader.go:359] Config loaded from file /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0122 11:43:44.044971   17489 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
I0122 11:43:44.120973   17489 round_trippers.go:438] GET https://<haproxyserver-dns>:6443/healthz?timeout=32s  in 75 milliseconds
I0122 11:43:44.120988   17489 round_trippers.go:444] Response Headers:
I0122 11:43:44.621201   17489 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
I0122 11:43:44.703556   17489 round_trippers.go:438] GET https://<haproxyserver-dns>:6443/healthz?timeout=32s  in 82 milliseconds
I0122 11:43:44.703577   17489 round_trippers.go:444] Response Headers:
I0122 11:43:45.121311   17489 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
I0122 11:43:45.200493   17489 round_trippers.go:438] GET https://<haproxyserver-dns>:6443/healthz?timeout=32s  in 79 milliseconds
I0122 11:43:45.200514   17489 round_trippers.go:444] Response Headers:
I0122 11:43:45.621338   17489 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
I0122 11:43:45.698633   17489 round_trippers.go:438] GET https://<haproxyserver-dns>:6443/healthz?timeout=32s  in 77 milliseconds
I0122 11:43:45.698652   17489 round_trippers.go:444] Response Headers:
I0122 11:43:46.121323   17489 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
I0122 11:43:46.199641   17489 round_trippers.go:438] GET https://<haproxyserver-dns>:6443/healthz?timeout=32s  in 78 milliseconds
I0122 11:43:46.199660   17489 round_trippers.go:444] Response Headers:

使用 Ctrl-C 退出循环后,我每年运行一次 curl 命令,但一切似乎都没问题:

curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab" 'https://<haproxyserver-dns>:6443/healthz?timeout=32s'
* About to connect() to <haproxyserver-dns> port 6443 (#0)
*   Trying <haproxyserver-ip>...
* Connected to <haproxyserver-dns> (10.135.64.223) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*   subject: CN=kube-apiserver
*   start date: Jan 22 03:43:38 2019 GMT
*   expire date: Jan 22 03:43:38 2020 GMT
*   common name: kube-apiserver
*   issuer: CN=kubernetes
> GET /healthz?timeout=32s HTTP/1.1
> Host: <haproxyserver-dns>:6443
> Accept: application/json, */*
> User-Agent: kubeadm/v1.13.2 (linux/amd64) kubernetes/cff46ab
> 
< HTTP/1.1 200 OK
< Date: Tue, 22 Jan 2019 04:09:03 GMT
< Content-Length: 2
< Content-Type: text/plain; charset=utf-8
< 
* Connection #0 to host <haproxyserver-dns> left intact
ok

我不知道如何找出这个问题的根本原因,希望知道这个问题的人可以给我一些建议。谢谢!

共有1个答案

皇甫伟彦
2023-03-14

经过几天的寻找和尝试,我又一次可以自己解决这个问题了。事实上,这个问题可能是伴随着一种非常罕见的情况出现的:

我在/etc/profiledocker.service.d中的主节点上设置了代理,这使得对haagent的请求无法正常工作。

我不知道是哪个设置导致了这个问题。但是在添加了无代理规则后,问题解决了,kubeadm在haagent负载均衡器之后成功初始化了一个master。这是我的代理设置:

/etc/配置文件:

...
export http_proxy=http://<my-proxy-server-dns:port>/
export no_proxy=<my-k8s-master-loadbalance-server-dns>,<my-proxy-server-dns>,localhost

/etc/systemd/system/docker . service . d/http-proxy . conf:

[Service]
Environment="HTTP_PROXY=http://<my-proxy-server-dns:port>/" "NO_PROXY<my-k8s-master-loadbalance-server-dns>,<my-proxy-server-dns>,localhost, 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16"
 类似资料:
  • 我有一个与这里类似的情况:Spring Cloud Stream和RabbitMQ健康检查 错误信息 环境: Java 8 Grails 3.3.8 弹簧-拉比-1.7.9.释放 弹簧引导致动器-1.5.15.释放 KeyCloak服务器4.6.0.final 知道怎么能让兔子在这里停用吗?

  • 我有一个容器Airflow安装程序,使用LocalExector在马拉松上运行。我运行了一个运行状况检查,可以ping Airflow网络服务器上的endpoint。它目前有5个cpu分配给它,网络服务器正在运行4个Gunicorn。昨晚我有大约25个任务同时运行。这导致健康检查失败,没有一条有用的错误消息。容器刚刚收到一个SIGTERM。我想知道是否有人可以提出导致健康检查失败的可能罪魁祸首?是

  • 我试图通过 但是什么也没发生。似乎它在等待什么。控制台没有回来。不得不用CTRL C杀死它。 我还试图通过 与上述行为相同。

  • 我试图理解docker编写健康检查选项是如何工作的。 运行状况检查:间隔:1 分钟30 秒超时:10 秒重试次数:3 我是否可以说这个配置将每90秒轮询一个容器,然后如果容器在10秒后超时,则群将再次尝试3次,之后它将标记容器终止并创建一个新容器来替换它 此处的文档 https://docs.docker.com/compose/compose-file/compose-file-v3/ 没什么帮

  • 是否存在SQS的Spring引导执行器健康检查终结点?我已经构建了一个SQS使用者,我想检查SQS是否已启动并运行。我没有使用JMSlistener连接到SQS,而是使用Spring云库。 我实现了以下健康检查endpoint。当我删除队列并尝试命中运行状况检查endpoint时,这将返回以下错误。如果存在连接性问题或SQS服务关闭,是否会出现类似的错误,最终导致健康检查endpoint失败? 豆