hello云胜

技术与生活

0%

Sealos 4.0.0安装k8s集群

Sealos 4.0.0

经验教训:

  1. /根目录需要大一些,镜像很大。300G
  2. 镜像很大,安装的时间准备的长一些
  3. 使用nohup,避免中途断网

如果磁盘不是根目录,要创建软连接

1
2
3
4
5
6
mkdir -p /data/run/containerd
mkdir -p /data/var/lib/containers
mkdir -p /data/var/lib/kubelet
ln -s /data/run/containerd /run/containerd
ln -s /data/var/lib/containers /var/lib/containers
ln -s /data/var/lib/kubelet /var/lib/kubelet

下载sealos命令行工具

提前安装一下jq工具

1
yum install -y jq

查看当前的sealos命令行工具版本

1
curl --silent "https://api.github.com/repos/labring/sealos/releases" | jq -r '.[].tag_name'

选其中的稳定版

1
2
3
4
5
6
7
8
9
10
v5.0.0-beta5
v5.0.0-beta4
v4.4.0-beta3
v5.0.0-beta3
v5.0.0-beta2
v5.0.0-beta1
v4.3.7
v5.0.0-alpha2
v4.3.7-rc1
v4.3.6

我这里用最新的稳定版4.3.7

下载

1
wget https://github.com/labring/sealos/releases/download/v4.3.7/sealos_4.3.7_linux_amd64.tar.gz

安装

1
tar zxvf sealos_4.3.7_linux_amd64.tar.gz sealos && chmod +x sealos && mv sealos /usr/bin

配置服务器环境

服务器做免密

在其中一台master上执行sealos命令,要配置这台master到其他所有服务器的免密登录

1
2
3
4
5
6
ssh-keygen
cd .ssh
cat >> .ssh/authorized_keys << EOF master0的id_rsa.pub的内容 EOF
firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address="10.xx.xx.194" service name='ssh' accept"
firewall-cmd --reload
echo "sshd: master节点的ip" >> /etc/hosts.allow

关闭selinux

关闭selinux以允许容器访问宿主机的文件系统

1
2
3
# 查看,我的服务器交付时就已经关了,
[root@my-paas-master0 ~]# getenforce
Permissive

关闭方法:

1
2
setenforce 0  # 临时关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config # 永久关闭,reboot生效

关闭swap

因为我们在oom时应该干脆的杀死应用,而不是用swap续命,引发级联故障。

1
2
3
4
5
# swap total是0。我的服务器交付时默认关闭了swap
[root@my-paas-master0 ~]# free -g
total used free shared buff/cache available
Mem: 7 0 6 0 0 7
Swap: 0 0 0

关闭方法:

1
2
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

打开NAT转发

如果没有开启包转发功能。那么系统只处理目的地为本机的数据包,不会转发发往其他地址的数据包。表现为如果使用nodeport的service,只有pod启动在这台服务器的服务器可以访问,其他服务器无法访问。

1
2
3
4
5
firewall-cmd --add-masquerade --permanent

# 检查是否允许NAT转发
firewall-cmd --query-masquerade

防火墙端口

测试环境我都是直接关了防火墙,生产环境不合适

官方文档 Ports and Protocols | Kubernetes

![image-20231019172733543](D:\github\docs\云原生\k8s\Sealos 4.0.0.assets\image-20231019172733543.png)

![image-20231019172850695](D:\github\docs\云原生\k8s\Sealos 4.0.0.assets\image-20231019172850695.png)

k8s master需要开启以下端口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

firewall-cmd --permanent --add-port=6443/tcp

firewall-cmd --permanent --add-port=2379-2380/tcp

firewall-cmd --permanent --add-port=10250-10259/tcp

//firewall-cmd --permanent --add-port=10259/tcp

//firewall-cmd --permanent --add-port=10257/tcp

# 如果要让master也可以当作nodeport的暴露ip使用
firewall-cmd --permanent --add-port=30000-32767/tcp

# 还有几个官方文档没写,但是根据经验和踩坑教训的,一起开了。多开几个也没事
//firewall-cmd --permanent --add-port=10251/tcp
//firewall-cmd --permanent --add-port=10252/tcp
//firewall-cmd --permanent --add-port=10255/tcp
firewall-cmd --permanent --add-port=8472/udp
firewall-cmd --permanent --add-port=443/tcp
firewall-cmd --permanent --add-port=443/udp
firewall-cmd --permanent --add-port=53/udp
firewall-cmd --permanent --add-port=53/tcp
firewall-cmd --permanent --add-port=9153/tcp

firewall-cmd --reload

k8s node需要开启以下端口

1
2
3
4
5
firewall-cmd --permanent --add-port=10250/tcp

firewall-cmd --permanent --add-port=30000-32767/tcp

firewall-cmd --reload

如果你使用了istio还有把istio-pilot的端口加到防火墙里:

1
firewall-cmd --permanent --add-port=15010-15014/tcp

配置linux最大线程数,最大文件数配置::

1
2
3
4
5
echo "fs.file-max = 655350" >> /etc/sysctl.conf
echo "kernel.pid_max = 655350" >> /etc/sysctl.conf

echo "root soft nofile 655350" >> /etc/security/limits.conf
echo "root hard nofile 655350" >> /etc/security/limits.conf

安装socat

socat是一个网络工具, k8s 使用它来进行 pod 的数据交互

1
yum install -y socat

加载br_netfilter模块

1
modprobe  br_netfilter

配置了免密登录,不需要密码

1
2
3
4
5
6
nohup sealos run \
--masters 10.xx.xx.194 \
--nodes 1x.xx.12.209,1x.xx.12.216 \
labring/kubernetes:v1.25.0 \
labring/helm:v3.8.2 \
labring/calico:v3.24.1 >sealos.log 2>&1 &

如果有报错,先清理再重新安装

1
sealos reset

起个nginx测试一下

1
2
kubectl run ng --image=harbor-test.xxx.net/base/nginx:1.25.2
kubectl expose pod ng --port=80 --target-port=80 --type=NodePort

![image-20231020141624870](D:\github\docs\云原生\k8s\Sealos 4.0.0.assets\image-20231020141624870.png)

把防火墙打开

1
systemctl start firewalld

还是可以的,没问题

安装kubesphere

安装nfs

查看nfs的端口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 59758 status
100024 1 tcp 33494 status
100005 1 udp 20048 mountd
100005 1 tcp 20048 mountd
100005 2 udp 20048 mountd
100005 2 tcp 20048 mountd
100005 3 udp 20048 mountd
100005 3 tcp 20048 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049 nfs_acl
100021 1 udp 35373 nlockmgr
100021 3 udp 35373 nlockmgr
100021 4 udp 35373 nlockmgr
100021 1 tcp 22351 nlockmgr
100021 3 tcp 22351 nlockmgr
100021 4 tcp 22351 nlockmgr

nfs服务需要开启 mountd,nfs,nlockmgr,portmapper,rquotad这5个服务

nfs 和 portmapper两个服务是固定端口的,nfs为2049,portmapper为111。其他的3个服务是用的随机端口。

把这些端口全部加到防火墙里

1
2
3
4
5
6
7
8
9
10
firewall-cmd --permanent --add-port=111/tcp
firewall-cmd --permanent --add-port=111/udp
firewall-cmd --permanent --add-port=2049/tcp
firewall-cmd --permanent --add-port=2049/udp
firewall-cmd --permanent --add-port=59758/udp
firewall-cmd --permanent --add-port=33494/tcp
firewall-cmd --permanent --add-port=20048/tcp
firewall-cmd --permanent --add-port=20048/udp
firewall-cmd --permanent --add-port=22351/tcp
firewall-cmd --permanent --add-port=35373/udp

每台服务器都不一样,需要手动加特别麻烦

1
for i in $(rpcinfo -p | awk 'NR>1 {print $4}' | sort -u);do firewall-cmd --permanent --add-port=$i/tcp; firewall-cmd --permanent --add-port=$i/udp;done

写了个脚本,自动加一下

1
2
3
mkdir -p /data/nfsstorage
mount -t nfs 10.xx.xx.194:/data/nfsstorage /data/nfsstorage
showmount -e 10.xx.xx.194

kubesphere需要开的端口.(猜测,解决问题过程发现的,还需要验证)

1
firewall-cmd --permanent --add-port=9115/tcp

calico需要的端口

查看官方文档,需要开通下列端口

System requirements | Calico Documentation (tigera.io)

![image-20231024092357785](D:\github\docs\云原生\k8s\Sealos 4.0.0.assets\image-20231024092357785.png)

1
2
3
4
5
6
firewall-cmd --permanent --add-port=179/tcp
firewall-cmd --permanent --add-port=4789/udp
firewall-cmd --permanent --add-port=5473/tcp
firewall-cmd --permanent --add-port=51820/udp
firewall-cmd --permanent --add-port=51821/udp
firewall-cmd --permanent --add-port=2379/tcp

但是好像不准,我用nmap -v -A localhost在其他机器上看到还用了9999端口

1
firewall-cmd --permanent --add-port=9999/tcp

问题

发现还需要一个5443端口

1
firewall-cmd --permanent --add-port=5443/tcp

kubesphere需要的端口

其他的前面已经开通了,追加一下几个

1
2
3
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --permanent --add-port=9099-9100/tcp
firewall-cmd --permanent --add-port=8443/tcp

不行,还是有问题。目前只能还是关闭防火墙

kubesphere 默认密码P@88w0rd

改成 xxxPaasNo1

启用插件

clusterconfiguration

![image-20220817161651571](D:\github\docs\paas平台建设\云原生\k8s\Sealos 4.0.0.assets\image-20220817161651571.png)

nerdctl

1
2
3
4
5
6
7
8
wget http://1x.xx.66.1/4zLKJ/nerdctl-full-0.22.2-linux-amd64.tar.gz

mkdir nerdctl
tar -zxvf nerdctl-full-0.22.2-linux-amd64.tar.gz -C nerdctl
cp nerdctl/lib/systemd/system/*.service /etc/systemd/system/

systemctl enable buildkit containerd
systemctl start buildkit containerd

mkdir -p /usr/local/containerd/bin && tar -zxvf nerdctl-full-0.22.2-linux-amd64.tar.gz nerdctl && mv nerdctl /usr/local/containerd/bin

wget http://transfer.paas.xxx.net/1usiGhY/buildkit-v0.10.3.linux-amd64.tar.gz

1
2
3
tar -zxvf buildkit-v0.10.3.linux-amd64.tar.gz -C /usr/local/containerd/
ln -s /usr/local/containerd/bin/buildkitd /usr/local/bin/buildkitd
ln -s /usr/local/containerd/bin/buildctl /usr/local/bin/buildctl

创建systemd服务相关文件 /etc/systemd/system/buildkit.socket

1
2
3
4
5
6
7
8
9
[Unit]
Description=BuildKit
Documentation=https://github.com/moby/buildkit

[Socket]
ListenStream=%t/buildkit/buildkitd.sock

[Install]
WantedBy=sockets.target

/etc/systemd/system/buildkit.service

1
2
3
4
5
6
7
8
9
10
[Unit]
Description=BuildKit
Requires=buildkit.socket
After=buildkit.socketDocumentation=https://github.com/moby/buildkit

[Service]
ExecStart=/usr/local/bin/buildkitd --oci-worker=false --containerd-worker=true

[Install]
WantedBy=multi-user.target

启动buildkitd

1
2
3
systemctl daemon-reload
systemctl enable buildkit
systemctl start buildkit

外网互通场景安装

需要路由器配置流量转发

1
2
yum install traceroute.x86_64 -y
yum install net-tools -y

生成配置文件

1
2
3
4
5
6
sealos gen labring/kubernetes:v1.22.11 \
labring/calico:v3.22.1 \
labring/openebs:v1.9.0 \
registry.cn-shenzhen.aliyuncs.com/cnmirror/kubesphere:v3.3.0 \
--masters 1x.xx.232.237,1x.xx.232.238,1x.xx.232.239 \
--nodes 1x.xx.232.233,1x.xx.232.234,1x.xx.232.236 > Clusterfile

修改配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
apiVersion: apps.sealos.io/v1beta1
kind: Cluster
metadata:
creationTimestamp: null
name: default
spec:
hosts:
- ips:
- 1x.xxx.5.6:22
roles:
- master
- amd64
- ips:
- 1x.xxx.5.7:22
- 1x.xxx.5.8:22
- 1x.xxx.5.9:22
- 1x.xxx.5.10:22
roles:
- node
- amd64
image:
- labring/kubernetes:v1.22.11
- labring/calico:v3.22.1
- labring/openebs:v1.9.0
- registry.cn-shenzhen.aliyuncs.com/cnmirror/kubesphere:v3.3.0
ssh:
pk: /root/.ssh/id_rsa
port: 22
user: root
status: {}
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
networking:
podSubnet: 1x.xxx.36.0/22
---
apiVersion: apps.sealos.io/v1beta1
kind: Config
metadata:
name: calico
spec:
path: manifests/calico.yaml
data: |
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
bgp: Enabled
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 26
# Note: Must be the same as podCIDR
cidr: 1x.xxx.36.0/22
encapsulation: None
natOutgoing: Enabled
nodeSelector: all()
nodeAddressAutodetectionV4:
interface: "eth.*|en.*"
1
1x.xxx.40.0/21 给  生产dubbo   网关 1x.xxx.47.254 子网掩码255.255.248.0
1
1x.xxx.36.0/22 给 测试dubbo    网关 1x.xxx.39.254

部署

1
nohup sealos apply -f Clusterfile  >sealos.log 2>&1 &
  1. prometheus开启失败

    查看日志MountVolume.SetUp failed for volume “secret-kube-etcd-client-certs” : secret “kube-etcd-client-certs” not found

    解决:

1
kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs

ldap

1
kubectl -n kubesphere-system edit cc ks-installer

配置authentication字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

authentication:
jwtSecret: ''
maximumClockSkew: 10s
multipleLogin: true
oauthOptions:
accessTokenMaxAge: 0
accessTokenInactivityTimeout: 30m
identityProviders:
- name: ldap
type: LDAPIdentityProvider
mappingMethod: auto
provider:
host: 1x.xxx.7.142:389
managerDN: cn=hopuser,o=services
managerPassword: hopuser@2014
userSearchBase: o=xxx
loginAttribute: cn
mailAttribute: mail

查看ks-install的日志

1
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

结束之后

重启下api-server组件

1
kubectl -n kubesphere-system rollout restart deploy/ks-apiserver

如果还不行,openldap, controller-manager(不确定怎么好的)

1
2
kubectl -n kubesphere-system delete pod openldap-0
kubectl -n kubesphere-system rollout restart deploy/ks-controller-manager

文件在1x.xxx.2.103上,/root/ks 通过7.21上去

http://transfer.paas.xxx.net/1QJ9T6W/config.yaml

Dockerfile

1
2
3
4
5
6
FROM harbor-test.xxx.net/kubesphere/ks-console:v3.3.0
COPY ./logo.svg /opt/kubesphere/console/dist/assets/
COPY ./login-logo.svg /opt/kubesphere/console/dist/assets/
COPY ./favicon.ico /opt/kubesphere/console/dist/assets/
COPY ./locale-zh.03f0fb248751b0a3bd2d.json /opt/kubesphere/console/dist/
COPY ./config.yaml /opt/kubesphere/console/server/
1
2
3
docker build -t kubesphere/ks-console:v3.3.rrs .
docker tag kubesphere/ks-console:v3.3.rrs harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs
docker push harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs

不需要了

1
nerdctl pull harbor-test.xxx.net/kubesphere/ks-console:v3.3.rrs

然后修改ks-console的镜像

![image-20220818094957032](D:\github\docs\paas平台建设\云原生\k8s\Sealos 4.0.0.assets\image-20220818094957032-16607873976021.png)

![image-20220818095047981](D:\github\docs\paas平台建设\云原生\k8s\Sealos 4.0.0.assets\image-20220818095047981-16607874485993.png)

podman pull问题

1
* Error initializing source docker://registry.fedoraproject.org/java:openjdk-8-jre-alpine: Error reading manifest openjdk-8-jre-alpine in registry.fedoraproject.org/java: manifest unknown: manifest unknown

集群纳管

主集群中执行以下命令来获取jwtSecret

1
kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret

vakPJMEze4ws8mHgCq2jlvpVD3piOBhp

被纳管集群执行

1
kubectl edit cc ks-installer -n kubesphere-system

ks-installer 的 YAML 文件中对应输入上面所示的 jwtSecret

1
2
authentication:
jwtSecret: vakPJMEze4ws8mHgCq2jlvpVD3piOBhp

向下滚动并将 clusterRole 的值设置为 member

1
2
multicluster:
clusterRole: member

执行命令,查看进度

1
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

添加集群

获取被纳管集群的kubeconfig

1
cat $HOME/.kube/config

去主机群的页面,集群管理页面点击添加集群

问题:token not found in cache

原因其实是token过期了

ks-instaler里的auth配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
authentication:
jwtSecret: vakPJMEze4ws8mHgCq2jlvpVD3piOBhp
maximumClockSkew: 10s
multipleLogin: true
oauthOptions:
accessTokenInactivityTimeout: 30m
accessTokenMaxAge: 0
identityProviders:
- mappingMethod: auto
name: ldap
provider:
host: 1x.xxx.7.142:389
loginAttribute: cn
mailAttribute: mail
managerDN: cn=hopuser,o=services
managerPassword: hopuser@2014
userSearchBase: o=xxx
type: LDAPIdentityProvider

把这个accessTokenMaxAge设置成0,表示永不过期

之前设置成了10h,但是如果搭建的ks超过了10小时,再纳管就会过期。