介绍
Kubespray是一个安装k8s集群的工具,kuberspray对比kubeadm更加简洁内部集成了kubeadm与ansible,通过ansible-playbook 来定义系统与k8s集群部署的任务。
- 官网: https://kubespray.io
- GitHub地址: https://github.com/kubernetes-sigs/kubespray
- ansible文档说明: https://kubespray.io/#/docs/ansible
- Kubespray K8s的大规模部署参数优化
- Kubespray离线部署
Kubespray 在2.9版本移除kubeadm_enabled (控制是否kubeadm部署kube-apiserver …),意味着kube-apiserver …不再有二进制方式部署了。etcd则默认采用二进制方式部署,可选kubeadm部署。
Kubespray 2.9以上版本采用kubeadm以静态pod方式部署kube-apiserver,kube-scheduler,kube-controller-manager
部署环境准备
kubespray部署专用机器: 192.168.10.220
方案1: 物理部署
# Ansible 要求python版本提高到python3.8及更高版本
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make
wget https://www.python.org/ftp/python/3.9.15/Python-3.9.15.tar.xz
tar xf Python-3.9.15.tar.xz
cd Python-3.9.15/
./configure --enable-optimizations --prefix=/usr/local/python39
make install
# ansible 安装在/usr/local/python39/bin/下
cat << \EOF > /etc/profile.d/python39.sh
export PATH=$PATH:/usr/local/python39/bin
EOF
ln -sv /usr/local/python39/bin/python3 /usr/local/bin/
ln -sv /usr/local/python39/bin/pip3 /usr/local/bin/
# 升级pip和setuptools
python3 -m pip install --upgrade pip
pip3 install --upgrade setuptools
方案2: Docker部署
docker pull quay.io/kubespray/kubespray:v2.20.0
docker run --rm -it --mount type=bind,source="$(pwd)"/inventory/sample,dst=/inventory \
--mount type=bind,source="${HOME}"/.ssh/id_rsa,dst=/root/.ssh/id_rsa \
quay.io/kubespray/kubespray:v2.20.0 bash
# 进行容器中运行 kubespray playbooks:
ansible-playbook -i /inventory/inventory.ini --private-key /root/.ssh/id_rsa cluster.yml
Kubespray配置
下载及依赖
https://github.com/kubernetes-sigs/kubespray/releases/tag/v2.20.0
kubernetes 1.24.x
注:要指定版本,不要直接拉取master,有未知的bug
git clone https://github.com/kubernetes-sigs/kubespray.git -b v2.20.0
cd kubespray
pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
# pip3 install -r requirements.txt
自定义集群
# 拷贝集群清单
cp -rfp inventory/sample inventory/mycluster
需要修改的配置文件列表:
- inventory/mycluster/group_vars/all/*.yml
- inventory/mycluster/group_vars/k8s-cluster/*.yml
集群网络
vim inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 选择网络插件,支持 cilium, calico, weave 和 flannel
kube_network_plugin: cilium
# 设置 Service 网段
kube_service_addresses: 10.233.0.0/18
# 设置 Pod 网段
kube_pods_subnet: 10.233.64.0/18
容器配置
相关配置文件:
- inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
- inventory/mycluster/group_vars/all/containerd.yml
- inventory/mycluster/group_vars/all/cri-o.yml
- inventory/mycluster/group_vars/all/docker.yml
cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 支持 docker, crio 和 containerd,推荐 containerd.
container_manager: containerd
# 是否开启 kata containers
kata_containers_enabled: false
修改容器数据目录
vim ./inventory/mycluster/group_vars/all/containerd.yml
containerd_storage_dir: "/data/containerd"
配置容器registry
vim ./inventory/mycluster/group_vars/all/containerd.yml
containerd_registries:
"docker.io":
- "http://hub-mirror.c.163.com"
- "https://mirror.aliyuncs.com"
centos7需启用containerd_snapshotter: "native"
不然kubelet报错启动不了
sed -i 's@# containerd_snapshotter: "native"@containerd_snapshotter: "native"@g' inventory/mycluster/group_vars/all/containerd.yml
# 修改后
cat inventory/mycluster/group_vars/all/containerd.yml
containerd_snapshotter: "native"
修改etcd
vim inventory/mycluster/group_vars/all/etcd.yml
etcd_data_dir: /data/etcd
集群证书 默认一年有效期
sed -i 's@auto_renew_certificates: false@auto_renew_certificates: true@g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 修改后
cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 是否开启自动更新证书,推荐开启。
auto_renew_certificates: true
打开日志排错
vim inventory/mycluster/group_vars/all/all.yml
unsafe_show_logs: true
使用外部负载器
默认没有对kube-apiserver https做负载高可用
这里使用外部haproxy负载kube-apiserver https
例haproxy配置
listen kubernetes-apiserver-https
bind 0.0.0.0:6443
mode tcp
option tcplog
option log-health-checks
balance roundrobin
timeout client 3h
timeout server 3h
server k8s-master01 192.168.10.221:6443 check check-ssl verify none inter 10000
server k8s-master02 192.168.10.222:6443 check check-ssl verify none inter 10000
server k8s-master03 192.168.10.223:6443 check check-ssl verify none inter 10000
定义了loadbalancer_apiserver
会自动关闭loadbalancer_apiserver_localhost
vim ./inventory/mycluster/group_vars/all/all.yml
apiserver_loadbalancer_domain_name: "apiserver.sundayhk.com"
loadbalancer_apiserver:
address: 192.168.10.220
port: 8443
配置主机列表
vim inventory/mycluster/inventory.ini
[all]
master1 ansible_host=192.168.10.221 # ip=10.3.0.1 etcd_member_name=etcd1
master2 ansible_host=192.168.10.222 # ip=10.3.0.2 etcd_member_name=etcd2
master3 ansible_host=192.168.10.223 # ip=10.3.0.3 etcd_member_name=etcd3
node1 ansible_host=192.168.10.224 # ip=10.3.0.4 etcd_member_name=etcd4
node2 ansible_host=192.168.10.225 # ip=10.3.0.5 etcd_member_name=etcd5
node3 ansible_host=192.168.10.226 # ip=10.3.0.6 etcd_member_name=etcd6
[kube_control_plane]
master1
master2
master3
[etcd]
master1
master2
master3
[kube_node]
master1
master2
master3
node1
node2
node3
[calico_rr]
[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr
查看集群部署配置
ansible-inventory -i inventory/mycluster/inventory.ini --list
国内环境安装
在国内进行安装时会因GFW影响而安装失败.
- 方案1:参考官方文档mirror
- 方案2:参考kubespray离线安装配置
- 方案3: 通过设置http_proxy、https_proxy代理 inventory/mycluster/group_vars/all/all.yml
这里采用方案2: daocloud国内源
修改 offline.yml
# 备份
cp inventory/mycluster/group_vars/all/offline.yml{,.bak}
# 修改files_repo
sed -i 's@^# files_repo: .*@files_repo: "https://files.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 修改registry_repo
sed -i 's@^# kube_image_repo: .*@kube_image_repo: "k8s.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# gcr_image_repo: .*@gcr_image_repo: "gcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# github_image_repo: .*@github_image_repo: "ghcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# docker_image_repo: .*@docker_image_repo: "docker.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# quay_image_repo: .*@quay_image_repo: "quay.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 取消注释 启用files_repo和registry_host
sed -i -E '/# .*\{\{ files_repo/s/^# //g' inventory/mycluster/group_vars/all/offline.yml
sed -i -E '/# .*\{\{ registry_host/s/^# //g' inventory/mycluster/group_vars/all/offline.yml
修改后,也可以直接复制下面的到offline.yml
cat inventory/mycluster/group_vars/all/offline.yml
files_repo: "https://files.m.daocloud.io"
## Container Registry overrides
kube_image_repo: "k8s.m.daocloud.io"
gcr_image_repo: "gcr.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"
## Kubernetes components
kubeadm_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"
kubectl_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl"
kubelet_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet"
## CNI Plugins
cni_download_url: "{{ files_repo }}/github.com/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"
## cri-tools
crictl_download_url: "{{ files_repo }}/github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"
## [Optional] etcd: only if you **DON'T** use etcd_deployment=host
etcd_download_url: "{{ files_repo }}/github.com/etcd-io/etcd/releases/download/{{ etcd_version }}/etcd-{{ etcd_version }}-linux-{{ image_arch }}.tar.gz"
# [Optional] Calico: If using Calico network plugin
calicoctl_download_url: "{{ files_repo }}/github.com/projectcalico/calico/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
calicoctl_alternate_download_url: "{{ files_repo }}/github.com/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
# [Optional] Calico with kdd: If using Calico network plugin with kdd datastore
calico_crds_download_url: "{{ files_repo }}/github.com/projectcalico/calico/archive/{{ calico_version }}.tar.gz"
# [Optional] Cilium: If using Cilium network plugin
ciliumcli_download_url: "{{ files_repo }}/github.com/cilium/cilium-cli/releases/download/{{ cilium_cli_version }}/cilium-linux-{{ image_arch }}.tar.gz"
# [Optional] Flannel: If using Falnnel network plugin
flannel_cni_download_url: "{{ files_repo }}/kubernetes/flannel/{{ flannel_cni_version }}/flannel-{{ image_arch }}"
# [Optional] helm: only if you set helm_enabled: true
helm_download_url: "{{ files_repo }}/get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz"
# [Optional] crun: only if you set crun_enabled: true
crun_download_url: "{{ files_repo }}/github.com/containers/crun/releases/download/{{ crun_version }}/crun-{{ crun_version }}-linux-{{ image_arch }}"
# [Optional] kata: only if you set kata_containers_enabled: true
kata_containers_download_url: "{{ files_repo }}/github.com/kata-containers/kata-containers/releases/download/{{ kata_containers_version }}/kata-static-{{ kata_containers_version }}-{{ ansible_architecture }}.tar.xz"
# [Optional] cri-dockerd: only if you set container_manager: docker
cri_dockerd_download_url: "{{ files_repo }}/github.com/Mirantis/cri-dockerd/releases/download/v{{ cri_dockerd_version }}/cri-dockerd-{{ cri_dockerd_version }}.{{ image_arch }}.tgz"
# [Optional] cri-o: only if you set container_manager: crio
# crio_download_base: "download.opensuse.org/repositories/devel:kubic:libcontainers:stable"
# crio_download_crio: "http://{{ crio_download_base }}:/cri-o:/"
# [Optional] runc,containerd: only if you set container_runtime: containerd
runc_download_url: "{{ files_repo }}/github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}"
containerd_download_url: "{{ files_repo }}/github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz"
nerdctl_download_url: "{{ files_repo }}/github.com/containerd/nerdctl/releases/download/v{{ nerdctl_version }}/nerdctl-{{ nerdctl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"
部署集群
ansible-playbook -i inventory/mycluster/inventory.ini \
--private-key=id_rsa --user=ubuntu -b -v cluster.yml
部署完成
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane 64m v1.24.6
master2 Ready control-plane 64m v1.24.6
master3 Ready control-plane 64m v1.24.6
node1 Ready <none> 62m v1.24.6
node2 Ready <none> 62m v1.24.6
node3 Ready <none> 62m v1.24.6
获取Kubeconfig
部署完成后,可以在master节点上的 /root/.kube/config 路径获取到 kubeconfig 这里使用ansible的fetch模块,将kubeconfig 拷贝下来:
# 这里从master1拉取
ansible -i inventory/mycluster/inventory.ini master1 \
-m fetch -a 'src=/root/.kube/config dest=kubeconfig flat=yes' \
-b --user=ubuntu --private-key id_rsa
master1 | CHANGED => {
"changed": true,
"checksum": "bbe7e6462702d1bd4a0414a3e97053fa63eaab62",
"dest": "/root/kubespray/kubeconfig",
"md5sum": "859683a1484a802d4859db115bb42a16",
"remote_checksum": "bbe7e6462702d1bd4a0414a3e97053fa63eaab62",
"remote_md5sum": null
}
$ ls -l kubeconfig
-rw------- 1 root root 5645 Nov 12 22:11 kubeconfig
获取到kubeconfig后,将 https://127.0.0.1:6443 修改成kube-apiserver负载均衡器的地址:端口,或者其中一台master。
扩容节点
https://kubespray.io/#/docs/nodes
如果要扩容节点,可以准备好节点的内网 IP 列表,并追加到之前的 inventory 文件里,然后再次使用 ansible-playbook 运行一次,有点不同的是: cluster.yml 换成 scale.yml:
ansible-playbook -i inventory/mycluster/inventory.ini \
--private-key=id_rsa --user=ubuntu -b \
scale.yml --limit=NEW_NODE_NAME
您可以使用--limit=NODE_NAME
限制 Kubespray 以避免干扰集群中的其他节点。
在没有使用--limit
playbook会运行facts.yml
刷新所有节点的fact缓存。
缩容节点
如果有节点不再需要了,我们可以将其移除集群,通常步骤是:
- 1.
kubectl cordon NODE
驱逐节点,确保节点上的服务飘到其它节点上去,参考安全维护或下线节点。 - 2.停止节点上的一些 k8s 组件 (kubelet, kube-proxy) 等。
- 3.kubectl delete NODE 将节点移出集群。
- 4.如果节点是虚拟机,并且不需要了,可以直接销毁掉。
前3个步骤,也可以用 kubespray 提供的remove-node.yml
这个 playbook 来一步到位实现:
ansible-playbook \
-i inventory/mycluster/inventory.ini \
--private-key=id_rsa --user=ubuntu -b \
-e "node=node2,node3" \
remove-node.yml
-e
里写要移出的节点名列表,如果您要删除的节点不在线,您应该将reset_nodes=false
和添加allow_ungraceful_removal=true
到您的额外变量中
升级
https://kubespray.io/#/docs/upgrades
错误解决
FAILED - RETRYING: [master1]: download_file | Validate mirrors (4 retries left).
failed: [master1] (item=None) => {"attempts": 4, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
fatal: [master1 -> {{ download_delegate if download_force_cache else inventory_hostname }}]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
vim inventory/mycluster/group_vars/all/all.yml
# 打开日志查看
unsafe_show_logs: true
https://github.com/containerd/containerd/issues/4581
"Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
# 启用containerd_snapshotter: "native"
sed -i 's@# containerd_snapshotter: "native"@containerd_snapshotter: "native"@g' inventory/mycluster/group_vars/all/containerd.yml
cat inventory/mycluster/group_vars/all/containerd.yml
containerd_snapshotter: "native"
重新运行kubespray
ModuleNotFoundError: No module named '_ctypes'
yum install -y libffi-devel 然后重新make python39
https://github.com/kubernetes-sigs/kubespray/blob/master/docs/mirror.md
https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubespray/