0%

Kubernetes证书过期(二)

今天在部署KubernetesDeployment时,删除Deployment,发现不会自动删除Replica SetPod,开始怀疑是某个工作node故障,重启或者迁移到其他node问题一样。后来重启了master节点,发现有2个master启动无法注册node

我的环境中有3个master节点,检查容器,发现master01启动正常,master02master03etcd容器都无法启动,查看日志发现提示如下:

1
2
Jan 14 22:29:29 master02.k8s kubelet[1998]: E0114 22:29:29.187573    1998 kubelet.go:2291] "Error getting node" err="node \"master02.k8s\" not found"

master01上查看etcd的日志发现:

1
2022-01-14 15:57:13.963722 I | embed: rejected connection from "192.168.203.4:45008" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "")

明确是证书过期导致的,根据上次过期的经验检查如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# kubeadm certs renew
missing subcommand; "renew" is not meant to be run on its own
To see the stack trace of this error execute with --v=5 or higher
[root@master02 kubernetes]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 10, 2022 00:10 UTC 238d no
apiserver Sep 10, 2022 00:20 UTC 238d ca no
apiserver-etcd-client Sep 10, 2022 00:20 UTC 238d etcd-ca no
apiserver-kubelet-client Sep 10, 2022 00:20 UTC 238d ca no
controller-manager.conf Sep 10, 2022 00:09 UTC 238d no
etcd-healthcheck-client Dec 22, 2021 23:53 UTC <invalid> etcd-ca no
etcd-peer Dec 22, 2021 23:53 UTC <invalid> etcd-ca no
etcd-server Dec 22, 2021 23:53 UTC <invalid> etcd-ca no
front-proxy-client Sep 10, 2022 00:20 UTC 238d front-proxy-ca no
scheduler.conf Sep 10, 2022 00:09 UTC 238d no

CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Dec 24, 2029 07:18 UTC 7y no
etcd-ca Dec 24, 2029 07:18 UTC 7y no
front-proxy-ca Dec 24, 2029 07:18 UTC 7y no

可以看出etcd的证书的确过期,renew证书:

1
2
3
# kubeadm certs renew healthcheck-client
# kubeadm certs renew etcd-peer
# kubeadm certs renew etcd-server

重新检查确定新的证书已经生效,重启容器和kubelet,检查发现问题件已经解决。

产生这个问题的原因是,master01已经自动更新了证书,保证容器环境正常使用,但其他master节点的证书没有自动更新,需要手工处理一下,估计未来版本会解决这个问题。

坚持原创技术分享,您的支持将鼓励我继续创作!