前言
整个k8s诸多组件几乎都是无状态的,所有的数据保存在etcd里,可以说etcd是整个k8s集群的数据库。可想而知,etcd的重要性。因而做好etcd数据备份工作至关重要。这篇主要讲一下我司的相关的实践。
备份etcd数据到s3
能做etcd的备份方案很多,但是大同小异,基本上都是利用了etcdctl命令来完成。
为什么选择s3那?
- 因为我们单位对于aws使用比较多,另外我们希望我们备份到一个高可用的存储中,而不是部署etcd的本机中。
- 此外,s3支持存储的生命周期的设置。设置一下,就可以aws帮助我们定时删除旧数据,保留新的备份数据。
具体方案
我们基本上用了etcd-backup这个项目,当然也fork了,做了稍微的更改,主要是更改了dockerfile。将etcdctl 修改为我们线上实际的版本。
修改之后的dockerfile如下:
FROM alpine:3.8
RUN apk add --no-cache curl
# Get etcdctl
ENV ETCD_VER=v3.2.24
RUN \
cd /tmp && \
curl -L https://storage.googleapis.com/etcd/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz | \
tar xz -C /usr/local/bin --strip-components=1
COPY ./etcd-backup /
ENTRYPOINT ["/etcd-backup"]
CMD ["-h"]
之后就是docker build之类了。
k8s部署方案
选择k8s中的cronjob比较合适,我的备份策略是每三小时备份一次。
cronjob.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "0 */4 * * *"
successfulJobsHistoryLimit: 2
failedJobsHistoryLimit: 2
jobTemplate:
spec:
# Job timeout
activeDeadlineSeconds: 300
template:
spec:
tolerations:
# Tolerate master taint
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# Container creates etcd backups.
# Run container in host network mode on G8s masters
# to be able to use 127.0.0.1 as etcd address.
# For etcd v2 backups container should have access
# to etcd data directory. To achive that,
# mount /var/lib/etcd3 as a volume.
nodeSelector:
node-role.kubernetes.io/master: ""
containers:
- name: etcd-backup
image: iyacontrol/etcd-backup:0.1
args:
# backup guest clusters only on production instalations
# testing installation can have many broken guest clusters
- -prefix=k8s-prod-1
- -etcd-v2-datadir=/var/lib/etcd
- -etcd-v3-endpoints=https://172.xx.xx.221:2379,https://172.xx.xx.83:2379,https://172.xx.xx.246:2379
- -etcd-v3-cacert=/certs/ca.crt
- -etcd-v3-cert=/certs/server.crt
- -etcd-v3-key=/certs/server.key
- -aws-s3-bucket=mybucket
- -aws-s3-region=us-east-1
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-datadir
- mountPath: /certs
name: etcd-certs
env:
- name: ETCDBACKUP_AWS_ACCESS_KEY
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_AWS_ACCESS_KEY
- name: ETCDBACKUP_AWS_SECRET_KEY
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_AWS_SECRET_KEY
- name: ETCDBACKUP_PASSPHRASE
valueFrom:
secretKeyRef:
name: etcd-backup
key: ETCDBACKUP_PASSPHRASE
volumes:
- name: etcd-datadir
hostPath:
path: /var/lib/etcd
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd/
# Do not restart pod, job takes care on restarting failed pod.
restartPolicy: Never
hostNetwork: true
注意:容忍 和 nodeselector配合,让pod调度到master节点上。
然后secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: etcd-backup
namespace: kube-system
type: Opaque
data:
ETCDBACKUP_AWS_ACCESS_KEY: QUtJTI0TktCT0xQRlEK
ETCDBACKUP_AWS_SECRET_KEY: aXJ6eThjQnM2MVRaSkdGMGxDeHhoeFZNUDU4ZGRNbgo=
ETCDBACKUP_PASSPHRASE: ""
总结
之前我们尝试过,etcd-operator来完成backup。实际使用过程中,发现并不好,概念很多,组件复杂,代码很多写法太死。
最后选择etcd-backup。主要是因为简单,less is more。看源码,用golang编写,扩展自己的一些需求,也比较简单。