kubeAdm部署K8S高可用集群 前置准备 服务器环境 修改hostname
打通ssh免密登录 选一台master机器到其他机器的免密ssh登录
关闭防火墙 1 2 systemctl stop firewalld systemctl disable firewalld
禁用selinux getenforce 如果是disable即为已经关闭,否则执行
1 2 setenforce 0 sed -i 's/^SELINUX=.*$/SELINUX=disable/' /etc/selinux/config
关闭swap分区 1 2 3 4 5 swapoff -a echo "vm.swappiness=0" >> /etc/sysctl.conf sysctl -p /etc/sysctl.conf sed -i 's$/dev/mapper/centos-swap$#/dev/mapper/centos-swap$g' /etc/fstab free -m
ulimit 1 2 3 4 5 6 7 8 9 10 11 ulimit -n 1024 临时生效设置如下: # ulimit -SHn 65535 永久生效设置,添加如下两行 # vim /etc/security/limits.conf * soft nofile 65535 * hard nofile 65535
时间同步 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 先查看一下系统是否安装ntp,如下命令: [root@k8s-master21 ~]# rpm -qa ntp 如果没有安装ntp,则如下安装: [root@k8s-master21 ~]# yum install ntp -y 先设置好时区,如下: [root@k8s-master21 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime [root@k8s-master21 ~]# echo "Asia/Shanghai" > /etc/timezone 再设置阿里云的时间同步服务器,如下: [root@k8s-master21 ~]# ntpdate time2.aliyun.com 26 Dec 15:35:57 ntpdate[7043]: step time server 203.107.6.88 offset -8.227227 sec 再将时间同步添加到系统定时任务中,如下命令后,添加 */5 * * * * ntpdate time2.aliyun.com 保存退出即可: [root@k8s-master21 ~]# crontab -e */5 * * * * ntpdate time2.aliyun.com 最后,将时间同步添加到开机自启动中,打开/etc/rc.local 添加 ntpdate time2.aliyun.com 保存退出即可: [root@k8s-master21 ~]# vim /etc/rc.local ntpdate time2.aliyun.com
开启ipv4 转发 为了让 Kubernetes 能够检查、转发网络流量
1 2 3 4 5 cat > /etc/sysctl.d/k8s.conf << EOF net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF
1 2 modprobe br_netfilter sysctl -p /etc/sysctl.d/k8s.conf
加载ip_vs模块 1 2 3 4 5 6 7 8 cat > /etc/sysconfig/modules/ipvs.modules <<EOF #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 EOF
1 chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
部署docker 移除旧版本docker,避免干扰 1 2 3 4 5 6 7 8 yum remove docker \ docker-client \ docker-client-latest \ docker-common \ docker-latest \ docker-latest-logrotate \ docker-logrotate \ docker-engine
安装依赖 1 yum install -y yum-utils device-mapper-persistent-data lvm2
指定阿里云镜像源 1 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
部署docker
1 2 3 4 5 6 7 8 mkdir -p /etc/docker/ cat > /etc/docker/daemon.json <<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "registry-mirrors": ["https://6kx4zyno.mirror.aliyuncs.com"] } EOF
“exec-opts”: [“native.cgroupdriver=systemd”] 为kubelete启动必须
然后重启docker
1 systemctl enable docker && systemctl daemon-reload && systemctl restart docker
部署K8S 部署kubeadm kubelet kubectl 配置镜像源 1 2 3 4 5 6 7 8 9 cat > /etc/yum.repos.d/kubernetes.repo <<EOF [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
1 2 3 4 5 yum install -y kubectl-1.22.8 kubelet-1.22.8 kubeadm-1.22.8 # 也可安装kubelet kubeadm kubectl 其他版本 但注意需要和docker版本匹配 # 查看 可安装 kubelet kubeadm kube版本 # yum list kubeadm kubelet kubectl --showduplicates|sort -r
配置kubelet的pause镜像 默认配置的pause镜像使用gcr.io仓库,国内可能无法访问,所以这里配置Kubelet使用阿里云的pause镜像:
1 2 3 cat >/etc/sysconfig/kubelet<<EOF KUBELET_EXTRA_ARGS="--cgroup-driver=systemd --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.2" EOF
设置开机启动 1 systemctl daemon-reload && systemctl enable --now kubelet
安装master 使用kubeadm init需要一个config配置文件
可以使用先init一个默认config文件
1 kubeadm config print init-defaults
所有master节点
1 vim /root/kubeadm-config.yaml
内容按实际情况修改,kubeadm 配置 (v1beta3) | Kubernetes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 1x.xx.12.181 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock imagePullPolicy: IfNotPresent name: k8s-master1 taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 1x.xx.12.181:6443 controllerManager: {} dns: {} etcd: local: dataDir: /var/lib/etcd imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: 1.22.0 networking: dnsDomain: cluster.local podSubnet: 192.168.0.0/16 serviceSubnet: 1x.xx.0.0/12 scheduler: {}
提前下载镜像,可以节省初始化时间:
1 kubeadm config images pull --config /root/kubeadm-config.yaml
在master1节点进行初始化
1 kubeadm init --config /root/kubeadm-config.yaml --upload-certs
初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master1即可。
成功之后会打印
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:30e24f2bf41b30864c9a2aff54aece349f3bdff2882b9b54145cd9c7976e8ef9 \ --control-plane --certificate-key e1fec22ea86d49cf3f74bab846075cf234fb19f4de18243bcdd220537fa4285f Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:30e24f2bf41b30864c9a2aff54aece349f3bdff2882b9b54145cd9c7976e8ef9
kubeadm 的配置管理是通过 pod 管理的,所有的组件都是通过容器启动的,通过 /etc/kubernetes/manifests 目录下面的 yaml 文件启动,这就是 kubelet 生命周期管理的目录,在这里面配置一个 pod 的 yaml 文件,它就会为你管理 pod 的生命周期。 进入到该目录中,可以看到以下文件
1 2 3 cd /etc/kubernetes/manifests ls etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
master1节点配置环境变量,用于kubectl访问Kubernetes集群。
通过 admin.conf 文件和 k8s 通讯
1 2 3 4 cat <<EOF >> /root/.bashrc export KUBECONFIG=/etc/kubernetes/admin.conf EOF source /root/.bashrc
查看节点状态:
1 2 3 [root@k8s -master1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master1 NotReady control-plane,master 13m v1.22 .8
查看service
1 2 3 4 [root@k8s-master1 ~]# kubectl get svc -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 1x.xx.0.1 <none> 443/TCP 19m kube-system kube-dns ClusterIP 1x.xx.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m
采用初始化安装方式,所有的系统组件均以容器的方式运行并且在kube-system命名空间内
查看pod
1 2 3 4 5 6 7 8 9 10 [root@k8s-master1 ~]# kubectl get pod -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-7d89d9b6b8-6xpvs 0/1 Pending 0 49m <none> <none> <none> <none> kube-system coredns-7d89d9b6b8-d5q2h 0/1 Pending 0 49m <none> <none> <none> <none> kube-system etcd-k8s-master1 1/1 Running 0 50m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-apiserver-k8s-master1 1/1 Running 1 (34m ago) 51m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-controller-manager-k8s-master1 1/1 Running 1 (108s ago) 3m10s 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-proxy-kb9w9 1/1 Running 0 49m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-scheduler-k8s-master1 1/1 Running 1 (109s ago) 3m10s 1x.xx.12.181 k8s-master1 <none> <none>
全部running
加入其他master 使用前面init成功后打印的join
1 2 3 kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:e8158944761027754f2ff819e8d299af1095bbc12699a2f34b6d8d80a67d8b6f \ --control-plane --certificate-key 707aa3ccb1d514ea0270683f7c0ed351f08976617cebe902f1771f6198b6b344
注意,token的有效期默认设置是24小时,过期后需要重新生成
1 2 3 4 5 [root@k8s-master1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master1 NotReady control-plane,master 59m v1.22.8 k8s-master2 NotReady control-plane,master 2m38s v1.22.8 k8s-master3 NotReady control-plane,master 2m37s v1.22.8
加入node节点 1 2 kubeadm join 1x.xx.12.181:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:e8158944761027754f2ff819e8d299af1095bbc12699a2f34b6d8d80a67d8b6f
安装Calico 在master1节点进行
问题解决: 安装docker需要container-selinux >= 2:2.74 1 2 错误:软件包:containerd.io-1.6.6-3.1.el7.x86_64 (docker-ce-stable) 需要:container-selinux >= 2:2.74
http://mirror.centos.org/centos/7/extras/x86_64/Packages/
1 yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.119.2-1.911c772.el7_8.noarch.rpm
更好的解决办法:
#进入yum 源配置文件夹 cd /etc/yum.repos.d mv CentOS-Base.repo CentOS-Base.repo_bak
在/etc/yum.repos.d/docker-ce.repo文件顶部添加一个条目,内容如下: [centos-extras] name=Centos extras - $basearch baseurl=http://mirror.centos.org/centos/7/extras/x86_64 enabled=1 gpgcheck=0
保存退出
#然后安装命令: yum -y install slirp4netns fuse-overlayfs container-selinux
controller-manager和scheduler状态异常 1 2 3 4 5 6 7 8 9 [root@k8s-master1 ~]# kubectl get pod -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-7d89d9b6b8-6xpvs 0/1 Pending 0 28m <none> <none> <none> <none> kube-system coredns-7d89d9b6b8-d5q2h 0/1 Pending 0 28m <none> <none> <none> <none> kube-system etcd-k8s-master1 1/1 Running 0 30m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-apiserver-k8s-master1 1/1 Running 1 (14m ago) 30m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-controller-manager-k8s-master1 0/1 CrashLoopBackOff 7 (4m31s ago) 30m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-proxy-kb9w9 1/1 Running 0 28m 1x.xx.12.181 k8s-master1 <none> <none> kube-system kube-scheduler-k8s-master1 0/1 CrashLoopBackOff 6 (4m27s ago) 30m 1x.xx.12.181 k8s-master1 <none> <none>
看到kube-controller-manager-k8s-master1和kube-scheduler-k8s-master1都是CrashLoopBackOff
查看原因
1 Liveness probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
出现这种情况是kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0,在文件中注释掉就可以了。(每台master节点都要执行操作)
1 2 vim /etc/kubernetes/manifests/kube-scheduler.yaml vim /etc/kubernetes/manifests/kube-controller-manager.yaml
重启kubelet
1 systemctl restart kubelet.service
Error from server: etcdserver: request timed out 1 2 docker ps -a docker logs -f etcd的容器id
看到一个错误信息
1 leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk
查看服务器的io性能
看起来不妙啊,再用sar看看
1 2 3 4 平均时间: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 平均时间: vda 12.80 0.00 248.00 19.38 2.28 177.77 35.69 45.68 平均时间: vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 平均时间: centos-root 24.70 0.00 252.00 10.20 2.29 92.52 40.52 100.09
await:平均每次设备 I/O 操作的等待时间
svctm:平均每次设备 I/O 操作的服务时间
%util:一秒中有百分之几的时间用于 I/O 操作
如果 svctm 的值与 await 很接近,表示几乎没有 I/O 等待,磁盘性能很好,如果 await 的值远高于 svctm 的值,则表示 I/O 队列等待太长,系统上运行的应用程序将变慢,此时可以通过更换更快的硬盘来解决问题。
通过iostat 也可看磁盘性能,现在发现磁盘性能确实不行。
我测了其他一台机器
1 2 3 平均时间: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 平均时间: vda 51.10 0.00 554.40 10.85 0.03 0.65 0.70 3.56 平均时间: vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
差距极大。
这几台测试机器是openstack虚拟化的老旧服务器,磁盘是老旧的sata盘,也没做优化。这性能无法支持etcd集群。只能装单节点。
1 The connection to the server 1x.xx.12.181:6443 was refused - did you specify the right host or port?
检查各服务状态
1 2 3 systemctl status docker.service systemctl status kubelet.service systemctl status firewalld.service #关闭状态
正常
检查端口是否有监听
1 netstat -pnlt | grep 6443
无监听
查看kubelet日志
卸载 1 2 3 4 5 6 7 8 9 10 11 kubeadm reset -f modprobe -r ipip yum -y remove kubeadm* kubectl* kubelet* docker* rm -rf ~/.kube/ rm -rf /etc/kubernetes/ rm -rf /etc/systemd/system/kubelet.service.d rm -rf /etc/systemd/system/kubelet.service rm -rf /usr/bin/kube* rm -rf /var/lib/etcd rm -rf /var/etcd