由于工作是基于k8s平台的,但是单独部署Prometheus监控系统的各个组件过程又比较繁琐,所以使用了coreos开发的自动部署项目kube-prometheus,但是项目并没有实现数据的持久化存储,由于没有太多资料,只能自己看官方项目,一点一点摸索。顺便写个文档,希望对使用这个项目的同学有用。
配置条件:部署了ceph存储的k8s集群。
在Path/to/kube-prometheus/manifests/prometheus-prometheus.yaml
中做如下修改:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
#-----storage-----
storage: #这部分为持久化配置
volumeClaimTemplate:
spec:
storageClassName: csi-cephfs
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
#-----------------
baseImage: quay.azk8s.cn/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
在修改yaml文件之后执行kubectl apply -f /path/to/manifests/prometheus-prometheus.yaml
命令,会自动创建两个指定大小的pv卷,因为会为prometheus的备份(prometheus-k8s-0
,prometheus-k8s-1
)也创建一个pv(我是用的是cephfs,也可以换成其他的文件系统),但是pv卷创建之后,无法修改,所以最好先考虑好合适的参数配置,比如访问模式和容量大小。在下次重新apply prometheus-prometheus.yaml时,数据会存到已经创建的pv卷中。
在Path/to/kube-prometheus/manifests/0prometheus-operator-deployment
中做如下修改:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: v0.33.0
name: prometheus-operator
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: v0.33.0
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --logtostderr=true
- --config-reloader-image=quay.azk8s.cn/coreos/configmap-reload:v0.0.1
- --prometheus-config-reloader=quay.azk8s.cn/coreos/prometheus-config-reloader:v0.33.0
- storage.tsdb.retention.time=30d # 在这添加time参数
image: quay.azk8s.cn/coreos/prometheus-operator:v0.33.0
name: prometheus-operator
ports:
...........
注意:参数名为storage.tsdb.retention.time=30d
,我之前使用的是--storage.tsdb.retention.time=30d
,apply之后提示flag 提供了但未定义。
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: csi-cephfs
然后在grafana-deployment.yaml将emptydir存储方式改为pvc方式:
#- emptyDir: {}
# name: grafana-storage
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
secret:
secretName: grafana-datasources
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-apiserver
name: grafana-dashboard-apiserver
- configMap:
name: grafana-dashboard-controller-manager
........
暂时到这里,其他介绍性内容很多博客都可以找到,就不再赘述了,只放部署文件。