初始化问题显现,如下:
[root@rook-ceph-tools-78cdfd976c-dhrlx /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 15.00000 root default
-11 3.00000 host master1
4 hdd 1.00000 osd.4 up 1.00000 1.00000
9 hdd 1.00000 osd.9 down 0 1.00000
14 hdd 1.00000 osd.14 up 1.00000 1.00000
在检查ceph集群状态,发现: 37 daemons have recently crashed
[root@rook-ceph-tools-78cdfd976c-dhrlx osd]# ceph -s
cluster:
id: f65c0ebc-0ace-4181-8061-abc2d1d581e9
health: HEALTH_WARN
37 daemons have recently crashed
services:
mon: 3 daemons, quorum a,c,g (age 9m)
mgr: a(active, since 13d)
mds: 1/1 daemons up, 1 hot standby
osd: 15 osds: 14 up (since 10m), 14 in (since 2h)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 20.64k objects, 72 GiB
usage: 216 GiB used, 14 TiB / 14 TiB avail
pgs: 97 active+clean
io:
client: 8.8 KiB/s rd, 1.2 MiB/s wr, 2 op/s rd, 49 op/s wr
判断这里显示的应该是历史故障信息,查看历史crash:
ceph crash ls-new
2022-05-13T01:46:58.600474Z_11da8241-7462-49b5-8ab6-83e96d0dd1d9
查看crash日志
ceph crash info 2022-05-13T01:46:58.600474Z_11da8241-7462-49b5-8ab6-83e96d0dd1d9
2393> 2020-05-13 10:24:55.180 7f5d5677aa80 -1 Falling back to public interface
-1754> 2020-05-13 10:25:07.419 7f5d5677aa80 -1 osd.2 875 log_to_monitors {default=true}
-1425> 2020-05-13 10:25:07.803 7f5d48d7c700 -1 osd.2 875 set_numa_affinity unable to identify public interface 'eth0' numa node: (2) No such file or directory
-2> 2020-05-13 10:25:23.731 7f5d4436d700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 717694145, got 2263389519 in db/001499.sst offset 43727772 size 3899 code = 2 Rocksdb transaction:
-1> 2020-05-13 10:25:23.735 7f5d4436d700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7f5d4436d700 time 2020-05-13 10:25:23.733456
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 11016: FAILED ceph_assert(r == 0)
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x56297aa20f7d]
2: (()+0x4cb145) [0x56297aa21145]
3: (BlueStore::_kv_sync_thread()+0x11c3) [0x56297af95233]
4: (BlueStore::KVSyncThread::entry()+0xd) [0x56297afba3fd]
5: (()+0x7e65) [0x7f5d537bfe65]
6: (clone()+0x6d) [0x7f5d5268388d]
0> 2020-05-13 10:25:23.735 7f5d4436d700 -1 *** Caught signal (Aborted) **
in thread 7f5d4436d700 thread_name:bstore_kv_sync
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
1: (()+0xf5f0) [0x7f5d537c75f0]
2: (gsignal()+0x37) [0x7f5d525bb337]
3: (abort()+0x148) [0x7f5d525bca28]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x56297aa20fcc]
5: (()+0x4cb145) [0x56297aa21145]
6: (BlueStore::_kv_sync_thread()+0x11c3) [0x56297af95233]
7: (BlueStore::KVSyncThread::entry()+0xd) [0x56297afba3fd]
8: (()+0x7e65) [0x7f5d537bfe65]
9: (clone()+0x6d) [0x7f5d5268388d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
问题原因是一个“rocksdb: submit_common error: Corruption: block checksum mismatch: expected 717694145, got 2263389519 in db/001499.sst offset 43727772 size 3899 code = 2 Rocksdb transaction”,assert出错,OSD程序就一直启动不了。那么如何解决这个block mismatch问题呢?
上面这个问题里的一个关键字是rocksdb,这是什么呢?Ceph的文件存储引擎默认是filestore,为了改善性能,如今改为了bluestore,而bluestore引擎的metadata就存放在rocksdb中。这说明:Ceph的文件存储引擎bluestore的元数据损坏了!
直接恢复是恢复不回来了,于是删掉这个对应的OSD,再重新加回来。
1,查看当前OSD的状态
[root@rook-ceph-tools-7bbsyszux-584k5 /]# ceph osd status
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | ai05 | 299G | 3426G | 0 | 0 | 5 | 382k | exists,up |
| 1 | ai05 | 178G | 3547G | 0 | 18 | 2 | 1110k | exists,up |
| 2 | ai03 | 108G | 3617G | 0 | 944 | 5 | 84.0k | exists,up |
| 3 | ai01 | 438G | 3287G | 0 | 763 | 7 | 708k | exists,up |
| 4 | ai03 | 217G | 3508G | 0 | 339 | 7 | 63.6k | exists,up |
| 5 | ai02 | 217G | 2576G | 1 | 10.9k | 6 | 403k | exists,up |
| 6 | ai04 | 300G | 3425G | 15 | 100k | 7 | 161k | exists,up |
| 7 | ai03 | 109G | 3616G | 0 | 0 | 0 | 0 | exists,up |
| 8 | ai02 | 246G | 3479G | 1 | 23.6k | 2 | 813k | exists,up |
| 9 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 10 | ai03 | 136G | 3589G | 0 | 741 | 4 | 679k | exists,up |
| 11 | ai03 | 162G | 3563G | 0 | 22.2k | 4 | 824k | exists,up |
| 12 | ai03 | 55.7G | 3670G | 0 | 0 | 2 | 952k | exists,up |
| 13 | ai01 | 194G | 3531G | 0 | 130k | 3 | 37.9k | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
2,把出问题的OSD标记为out
[root@rook-ceph-tools-7gemfield-584k5 /]# ceph osd out osd.9
osd.2 is already out.
3、查找出OSD对应的磁盘
[root@master1 ~]# kubectl get po rook-ceph-osd-9-7dd6fc544c-4vhtm -n rook-ceph -o yaml |grep UUID
- name: ROOK_OSD_UUID
-o xtrace\n\nOSD_ID=\"$ROOK_OSD_ID\"\nOSD_UUID=052383d6-90ca-4ea1-a9c0-bcb0c43d8317\nOSD_STORE_FLAG=\"--bluestore\"\nOSD_DATA_DIR=/var/lib/ceph/osd/ceph-\"$OSD_ID\"\nCV_MODE=lvm\nDEVICE=\"$ROOK_BLOCK_PATH\"\n\n#
\"$OSD_ID\" \"$OSD_UUID\"\n\n\t# copy the tmpfs directory to a temporary directory\n\t#
[root@master1 ~]# lsblk |grep -C2 052383d6
rbd8 251:128 0 5G 0 disk /var/lib/kubelet/pods/7b39990a-ea1c-4f00-a767-a9fbc4a19ecd/volumes/kubernetes.io~csi/pvc-f78f0dd9-188c-4d02-aed0-03f25ed4d0a0/mount
vdc 252:32 0 1T 0 disk
└─ceph--66c4c661--cf98--417b--afda--f79c3de1204c-osd--block--052383d6--90ca--4ea1--a9c0--bcb0c43d8317 253:3 0 1024G 0 lvm
rbd12 251:192 0 10G 0 disk /var/lib/kubelet/pods/bfc62153-6844-498c-92f0-e86d09e8a7cc/volumes/kubernetes.io~csi/pvc-051b9632-fe52-4201-9572-79a75793ffb5/mount
rbd6 251:96 0 5G 0 disk /var/lib/kubelet/pods/b36acdab-1a0c-4ce4-b5a6-7aca039514ed/volumes/kubernetes.io~csi/pvc-7f6a160b-0e8e-46f8-989e-531667a13a3a/mount
检查哈是否有硬件报错,如下没发现具体的硬件报错
[root@master1 ~]# dmesg |grep vdc
[ 2.630026] virtio_blk virtio3: [vdc] 2147483648 512-byte logical blocks (1.10 TB/1.00 TiB)
检查对应osd的相关信息
[root@rook-ceph-tools-78cdfd976c-dhrlx /]# ceph device ls-by-daemon osd.9
DEVICE HOST:DEV EXPECTED FAILURE
4033036832428-3 master1:vdc
4,检查确认磁盘是否正确**
要细致,别删错硬盘。
gemfield@ai04:~$ sudo hdparm -I /dev/vdc | grep 4033036832428-3
Serial Number: 4033036832428-3
5,purge掉osd.2**
得加上–force
[root@rook-ceph-tools-7bb5797c8-ns4bw /]# ceph osd purge osd.9 --force
[root@rook-ceph-tools-7bb5797c8-ns4bw /]# ceph auth del osd.9 #清理认证信息
6,清除OSD的Pod**
未设置的removeOSDsIfOutAndSafeToRemove: false,所以坏掉的OSD不会被自动删除,需要手动清除掉rook-ceph-osd-9:
[root@master1 ~]# kubectl -n rook-ceph delete deployment rook-ceph-osd-9
deployment.apps "rook-ceph-osd-9" deleted
7,彻底清理掉vdc**
[root@master1 ~]# DISK="/dev/vdc"
[root@master1 ~]# sudo sgdisk --zap-all $DISK
[root@master1 ~]# sudo dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
#注如果是ssd盘请用 blkdiscard /dev/vdc
[root@master1 ~]# ls /dev/mapper/ceph-*
/dev/mapper/ceph--971efece--8880--4e81--90c6--621493c66294-osd--data--7775b10e--7a0d--4ddd--aaf7--74c4498552ff
/dev/mapper/ceph--a7d7b063--7092--4698--a832--1cdd1285acbd-osd--data--ec2df8ee--0a7a--407f--afe3--41d045e889a9
#清理掉lvm的残余,删除对应的逻辑卷
[root@master1 ~]# sudo dmsetup remove /dev/mapper/ceph--a7d7b063--7092--4698--a832--1cdd1285acbd-osd--data--ec2df8ee--0a7a--407f--afe3--41d045e889a9
#查看还剩余一个
[root@master1 ~]# ls /dev/mapper/ceph-*
/dev/mapper/ceph--971efece--8880--4e81--90c6--621493c66294-osd--data--7775b10e--7a0d--4ddd--aaf7--74c4498552ff
#确保/dev下还剩一个
[root@master1 ~]# ls -l /dev/ceph-*
total 0
lrwxrwxrwx 1 root root 7 May 15 20:14 osd-data-7775b10e-7a0d-4ddd-aaf7-74c4498552ff ->
[root@master1 ~]# partprobe /dev/vdc
8、重启ceph operator调度,使检测到格式化后的osd硬盘,osd启动后ceph集群会自动平衡数据
kubectl rollout restart deploy rook-ceph-operator -n rook-ceph
该操作会重新去检查和调度rook-ceph的创建过程
等完成后在检查集群状态
[root@master1 ~]# kubectl get po -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-6rrgv 3/3 Running 15 167d
csi-cephfsplugin-6t7kg 3/3 Running 15 167d
csi-cephfsplugin-7ksh2 3/3 Running 15 167d
csi-cephfsplugin-mr5z7 3/3 Running 21 167d
csi-cephfsplugin-provisioner-7bcbf457c5-hv5nv 6/6 Running 284 167d
csi-cephfsplugin-provisioner-7bcbf457c5-qk9t6 6/6 Running 23 45d
csi-cephfsplugin-zsf6w 3/3 Running 30 167d
csi-rbdplugin-5tsqc 3/3 Running 19 167d
csi-rbdplugin-8d6m5 3/3 Running 15 167d
csi-rbdplugin-998lx 3/3 Running 15 167d
csi-rbdplugin-jx676 3/3 Running 30 167d
csi-rbdplugin-njmtd 3/3 Running 21 167d
csi-rbdplugin-provisioner-69f65b7897-jh88t 6/6 Running 54 45d
csi-rbdplugin-provisioner-69f65b7897-qxpdr 6/6 Running 65 45d
rook-ceph-crashcollector-master1-84899f577b-fnf5f 1/1 Running 3 45d
rook-ceph-crashcollector-master2-6f7c4fb8d5-lzkf7 1/1 Running 3 45d
rook-ceph-crashcollector-master3-695b549f6b-gtpx7 1/1 Running 3 128d
rook-ceph-crashcollector-node1-67458cc896-pf6nx 1/1 Running 3 49d
rook-ceph-crashcollector-node2-5458f6f68c-nsd84 1/1 Running 3 42d
rook-ceph-mds-myfs-a-58f484bd6b-wxzts 1/1 Running 86 45d
rook-ceph-mds-myfs-b-669b684d78-mqfct 1/1 Running 13 128d
rook-ceph-mgr-a-85954dfbc5-zxtmk 1/1 Running 8 128d
rook-ceph-mon-a-5ff4694d9-dc6v6 1/1 Running 4 54m
rook-ceph-mon-c-868f4547cc-s97vv 1/1 Running 12 167d
rook-ceph-mon-g-fb46bdf77-g5k98 1/1 Running 10 49d
rook-ceph-operator-74646576d7-bkcq7 1/1 Running 0 67m
rook-ceph-osd-0-5d94784b45-xr5fr 1/1 Running 6 51d
rook-ceph-osd-1-98b84c76-5w6s8 1/1 Running 4 42d
rook-ceph-osd-10-75c65bc759-wkzjz 1/1 Running 4 42d
rook-ceph-osd-11-855495cf97-dvwp9 1/1 Running 7 51d
rook-ceph-osd-12-7d55b9ddbd-hqbb4 1/1 Running 10 49d
rook-ceph-osd-13-6bfc5b744-mhxw9 1/1 Running 13 167d
rook-ceph-osd-14-7cd656d799-shtnr 1/1 Running 118 45d
rook-ceph-osd-2-56c45f9db4-lzgbn 1/1 Running 9 49d
rook-ceph-osd-3-6d9bdb7fd6-r6cgw 1/1 Running 13 167d
rook-ceph-osd-4-5c8fb468c7-c6v9x 1/1 Running 61 45d
rook-ceph-osd-5-85b7ff6578-zjgmw 1/1 Running 6 51d
rook-ceph-osd-6-67dfcbc7c9-5vtjx 1/1 Running 5 42d
rook-ceph-osd-7-5d86487c7-dnmkv 1/1 Running 9 49d
rook-ceph-osd-8-5648594c55-gs7bb 1/1 Running 13 167d
rook-ceph-osd-9-7dd6fc544c-7pw8t 1/1 Running 0 16s
rook-ceph-osd-prepare-master1-qh9j9 0/1 Completed 0 58m
rook-ceph-osd-prepare-master2-2d9q7 0/1 Completed 0 58m
rook-ceph-osd-prepare-master3-pndv9 0/1 Completed 0 58m
rook-ceph-osd-prepare-node1-5dbdq 0/1 Completed 0 58m
rook-ceph-osd-prepare-node2-4lk9l 0/1 Completed 0 58m
rook-ceph-tools-78cdfd976c-dhrlx 1/1 Running 3 45d
[root@rook-ceph-tools-78cdfd976c-dhrlx /]# ceph -s
cluster:
id: f65c0ebc-0ace-4181-8061-abc2d1d581e9
health: HEALTH_OK
[root@rook-ceph-tools-78cdfd976c-dhrlx /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 15.00000 root default
-11 3.00000 host master1
4 hdd 1.00000 osd.4 up 1.00000 1.00000
9 hdd 1.00000 osd.9 up 1.00000 1.00000
14 hdd 1.00000 osd.14 up 1.00000 1.00000
-7 3.00000 host master2
0 hdd 1.00000 osd.0 up 1.00000 1.00000
5 hdd 1.00000 osd.5 up 1.00000 1.00000
11 hdd 1.00000 osd.11 up 1.00000 1.00000
-9 3.00000 host master3
3 hdd 1.00000 osd.3 up 1.00000 1.00000
8 hdd 1.00000 osd.8 up 1.00000 1.00000
13 hdd 1.00000 osd.13 up 1.00000 1.00000
-5 3.00000 host node1
2 hdd 1.00000 osd.2 up 1.00000 1.00000
7 hdd 1.00000 osd.7 up 1.00000 1.00000
12 hdd 1.00000 osd.12 up 1.00000 1.00000
-3 3.00000 host node2
1 hdd 1.00000 osd.1 up 1.00000 1.00000
6 hdd 1.00000 osd.6 up 1.00000 1.00000
10 hdd 1.00000 osd.10 up 1.00000 1.00000
此时rook-ceph集群恢复正常