当前位置: 首页 > 工具软件 > ROOK > 使用案例 >

ROOK-03 rook ceph集群使用和管理

闾丘诚
2023-12-01

一、通过storageclass调用rbd

1、storageclass参数调整
rook/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool  #rbd pool名称根据需求修改
  namespace: rook-ceph
spec:
  failureDomain: host  #默认是以host为级别进行故障隔离
  replicated: #采用3副本方式保障数据可用性
    size: 3
    requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  pool: replicapool  #使用rbd的pool名称和CephBlockPool的name字段保持一致
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/fstype: ext4 #还可以选则xfs
allowVolumeExpansion: true
reclaimPolicy: Delete

2、部署storageclass

cd ~/rook/cluster/examples/kubernetes/ceph/csi/rbd
kubectl apply -f storageclass.yaml

3、部署wordpress验证storageclass功能

cd ~/rook/cluster/examples/kubernetes
kubectl apply -f mysql.yaml -f wordpress.yaml
kubectl get pvc
kubectl get pv

kubectl -n rook-ceph exec -it rook-ceph-tools-6dd46f6946-p66xl bash
ceph osd lspool
rbd -p replicapool ls

二、ceph fs部署和使用

1、修改ceph fs部署参数,其余资源限制、调度等参数
rook/cluster/examples/kubernetes/ceph/filesystem.yaml

  metadataServer:
    activeCount: 2  #保证mds为2主2备,实现服务高可用
    activeStandby: true

2、配置mds服务的node亲和性,由于mds服务消耗cpu资源较多,建议规划专用节点以保障服务在专用机器运行

kubectl label node rook01 ceph-mds=enable
kubectl label node rook02 ceph-mds=enable
kubectl label node rook03 ceph-mds=enable
kubectl label node rook04 ceph-mds=enable
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-mds
              operator: In
              values:
              - enable

3、部署ceph fs

kubectl apply -f rook/cluster/examples/kubernetes/ceph/filesystem.yaml

4、部署基于ceph fs的storageclass和在k8s中调用ceph fs storageclass

cd rook/cluster/examples/kubernetes/ceph/csi/cephfs
kubectl apply -f storageclass.yaml
kubectl apply -f kube-registry.yaml
kubectl -n kube-system get pvc|grep cephfs

5、外部使用ceph fs挂载方式

mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')

# Mount the filesystem
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry

6、ceph fs运维关注点

  1. pods状态查看,mds以pods的形式运⾏,需要确保pods运⾏正常
  2. Ceph状态查看
  3. 查看CephFS⽂件系统(ceph fs ls)
  4. 容器⽇志查看,包含mds的⽇志和对接驱动的⽇志,当服务异常的时候可以结合⽇志信息进⾏排查,provisioner包含有多个不同的容器
csi-attacher 挂载
csi-snapshotter 快照
csi-resizer 调整⼤⼩
csi-provisioner 创建
csi-cephfsplugin 驱动agent
  1. liveness-prometheus 存活监控

三、ceph rgw 对象存储

1、修改ceph rgw部署参数,部署高可用rgw集群。cobject.yaml

instances: 2
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-rgw
              operator: In
              values:
              - enabled
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - rook-ceph-rgw
              topologyKey: kubernetes.io/hostname

2、配置rgw服务的node亲和性,建议规划专用节点以保障服务在专用机器运行

kubectl label node rook01 ceph-rgw=enabled
kubectl label node rook02 ceph-rgw=enabled
kubectl label node rook03 ceph-rgw=enabled

3、rook管理外部rgw集群(可选)
rook/cluster/examples/kubernetes/ceph/object-external.yaml

4、创建bucket(通过k8s创建storageclass方式)

#创建storageclass
kubectl apply -f rook/cluster/examples/kubernetes/ceph/storageclass-bucket-delete.yaml

#创建bucket验证
kubectl apply -f rook/cluster/examples/kubernetes/ceph/object-bucket-claim-delete.yaml
#查看objectbucketclaims
kubectl  get objectbucketclaims.objectbucket.io
radosgw-admin bucket list

5、容器内访问对象存储
1>获取访问信息

#获取访问地址
kubectl get cm ceph-delete-bucket -o yaml -o jsonpath='{.data.BUCKET_HOST}'|awk '{print $1":80"}'

kubectl get cm ceph-delete-bucket -o yaml -o jsonpath='{.data.BUCKET_HOST}'|awk '{print $1":80/%(bucket)"}'

#获取AK/SK
kubectl get secrets ceph-delete-bucket -o yaml -o jsonpath='{.data.AWS_ACCESS_KEY_ID}'|base64 -d
kubectl get secrets ceph-delete-bucket -o yaml -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}'|base64 -d

2>容器内配置客户端

kubectl -n rook-ceph exec -it rook-ceph-tools-6dd46f6946-j6qnp bash
yum -y install s3cmd

s3cmd --configure

  Access Key: 8014201KV9K0E6RFR7OW
  Secret Key: gYBzNuIRxXpSy9N24V4slKC7Lm6YBTNwsQdNucBv
  Default Region: US
  S3 Endpoint: rook-ceph-rgw-my-store.rook-ceph.svc:80
  DNS-style bucket+hostname:port template for accessing a bucket: rook-ceph-rgw-my-store.rook-ceph.svc:80/%(bucket)
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

3>上传下载测试

s3cmd ls
s3cmd put /etc/passwd* s3://ceph-bkt-84449e86-0fcb-4436-bd98-dbcc4e4bde0e
s3cmd ls s3://ceph-bkt-84449e86-0fcb-4436-bd98-dbcc4e4bde0e
s3cmd get s3://ceph-bkt-84449e86-0fcb-4436-bd98-dbcc4e4bde0e/passwd

6、外部访问rook对象存储
1>暴露nodePort或者ingress并查看nodePort端口

kubectl apply -f rook/cluster/examples/kubernetes/ceph/rgw-external.yaml

2>配置客户端

yum -y install s3cmd
s3cmd --configure

New settings:
  Access Key: 8014201KV9K0E6RFR7OW
  Secret Key: gYBzNuIRxXpSy9N24V4slKC7Lm6YBTNwsQdNucBv
  Default Region: US
  S3 Endpoint: 192.168.86.36:32172
  DNS-style bucket+hostname:port template for accessing a bucket: 192.168.86.36:32172/%(bucket)
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

7、创建单独访问用户访问
1>创建用户

kubectl apply -f rook/cluster/examples/kubernetes/ceph/object-user.yaml

2>获取用户信息

kubectl -n rook-ceph get secrets rook-ceph-object-user-my-store-my-user -o yaml -o jsonpath='{.data.AccessKey}'|base64 -d
kubectl -n rook-ceph get secrets rook-ceph-object-user-my-store-my-user -o yaml -o jsonpath='{.data.SecretKey}'|base64 -d
kubectl -n rook-ceph get secrets rook-ceph-object-user-my-store-my-user -o yaml -o jsonpath='{.data.Endpoint}'|base64 -d

3>参考上一步骤配置客户端

8、rgw 维护
1>关注rook-ceph-rgw-my-store容器状态,把pod打散,不要在通一个节点
2>保障有冗余节点,避免在节点故障时pod有节点可去
3>关注ceph -s中rgw状态
4>debug问题看rook-ceph-rgw-my-store的pod日志和rook-ceph-mgr的pod日志

四、osd日常管理

1、扩容osd
1>直接加硬盘或者机器

#!/bin/sh
##新增硬盘后,使用此脚本自动发现新增硬盘,不需要重启服务器
scsihostnum=`ls -alh /sys/class/scsi_host/host*|wc -l`
for ((i=0;i<${scsihostnum};i++))
do
echo "- - -" > /sys/class/scsi_host/host${i}/scan
done

在rook/cluster/examples/kubernetes/ceph/cluster.yaml 对应节点添加盘符或者节点信息,应用这个yaml即可
kubectl apply -f
rook/cluster/examples/kubernetes/ceph/cluster.yaml

2、使用bluestore加速(未验证)

      - name: "sdd"
        config:
          storeType: bluestore
          metadataDevice: "/dev/sde"
          databaseSizeMB: "4096"
          walSizeMB: "4096"

3、删除osd
注意事项:
删除osd后确保集群有⾜够的容量
删除osd后确保PG状态正常
单次尽可能不要删除过多的osd
删除多个osd需要等待数据同步同步完毕后再执⾏(rebalancing)

1>通过k8s方式删除

kubectl -n rook-ceph scale deploy rook-ceph-osd-5 --replicas=0
修改:osd-purge.yaml中osd-isd字段的osd编号
args: ["ceph", "osd", "remove", "--preserve-pvc", "false", "--osd-ids", "5"]

kubectl apply -f rook/cluster/examples/kubernetes/ceph/osd-purge.yaml
kubectl -n rook-ceph delete deployments.apps rook-ceph-osd-5

2>手动删除

#将osd标识为out,此时会进⾏元数据的同步
ceph osd out osd.5
#删除out,此时会进⾏数据的搬迁,即backfilling和rebalancing动作,完成数据的迁移
ceph osd purge 5
#把对应deployment和cluster.yaml中的内容删除
kubectl -n rook-ceph delete deployments.apps rook-ceph-osd-5 

4、替换osd
替换操作的思路是:将其从Ceph集群中删除—采⽤云原⽣⽅式或⼿动⽅式,删除之后数据同步完毕后再
通过扩容的⽅式添加回集群中,添加回来时候注意将对应的LVM删除

 类似资料: