hello云胜

技术与生活

0%

master的重装

下掉master1

删除master1节点

3台master下掉一个,剩下2个master运行基本也没问题。坚持个一两天问题不大。

1
2
kubectl drain rrs-t-paas-master0 --delete-local-data --force --ignore-daemonsets
kubectl delete node rrs-t-paas-master0

清理etcd集群

进去etcd容器

1
kubectl -n kube-system exec -it etcd-paas-m-k8s-master-2 -- /bin/sh

查看member list

1
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list

看到现在还是3个member,把下掉的那个删除

1
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 8aab0313a9647d4d

image-20240110135504525

然后master-1做了一些硬盘修复

之后再加入集群

master1再重新加入集群

重置下master1

1
kubeadm reset

配置一个对域名apiserver.cluster.local的解析

修改 /etc/hosts

1
你的活着的master的ip apiserver.cluster.local

在kubeadm join的时候会用到

在master2上生成join命令

1
2
3
4
5
6
[root@paas-m-k8s-master-2 ~]# kubeadm init phase upload-certs --upload-certs
I0110 10:10:11.254956 12245 version.go:252] remote version is much newer: v1.29.0; falling back to: stable-1.18
W0110 10:10:13.812440 12245 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324
1
2
3
[root@paas-m-k8s-master-2 ~]# kubeadm token create --print-join-command
W0110 10:11:40.990463 14694 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join apiserver.cluster.local:6443 --token yubedv.0rg185no5jgqwn07 --discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482

master1加入集群

到要被加入的机器上执行

加master节点才需要–control-plane –certificate-key

1
2
3
4
kubeadm join apiserver.cluster.local:6443 \
--token yubedv.0rg185no5jgqwn07 \
--discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482 \
--control-plane --certificate-key 23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324

成功

image-20240110135409427

遇到过的问题总结

  1. 域名解析不到apiserver.cluster.local
1
2
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://apiserver.cluster.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: dial tcp: lookup apiserver.cluster.local on 1x.xx.xx.xx:53: no such host

是域名解析的问题,找不到apiserver.cluster.local

解决:

直接在/ets/hosts里配上

1
你的活着的master的ip apiserver.cluster.local
  1. kubelet的端口占用
1
Port 10250 is in use

kubelet还活着。kubeadm join时会启动kubelet

使用kubeadm reset 重置配置

  1. etcd目录不为空
1
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

删除即可。

1
rm -rf /var/lib/etcd

也可以kubeadm reset 重置

  1. etcd健康检查失败

原因是之前的etcd记录还存在,查看

1
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list

删除

1
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778