环境:ceph-12.2.1
3节点测试性能集群 60块osd
最近ceph集群中有两个osd在重启之后遇到如下问题,osd获取不到集群osdmap产生coredump:
ceph version 12.2.1.06 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
1: (()+0xa2bf21) [0x7fcd91627f21]
2: (()+0xf6d0) [0x7fcd8e42d6d0]
3: (gsignal()+0x37) [0x7fcd8d44e277]
4: (abort()+0x148) [0x7fcd8d44f968]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fcd8dd5d9d5]
6: (()+0x5e946) [0x7fcd8dd5b946]
7: (()+0x5e973) [0x7fcd8dd5b973]
8: (()+0x5eb9f) [0x7fcd8dd5bb9f]
9: (entity_addr_t::decode(ceph::buffer::list::iterator&)+0x31d) [0x7fcd9111f1bd]
10: (void decode<entity_addr_t, mempool::pool_allocator<(mempool::pool_index_t)15, std::shared_ptr<entity_addr_t> > >(std::vector<std::shared_ptr<entity_addr_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::shared_ptr<entity_addr_t> > >&, ceph::buffer::list::iterator&)+0xfc) [0x7fcd91743ddc]
11: (OSDMap::decode_classic(ceph::buffer::list::iterator&)+0x66a) [0x7fcd9172daca]
12: (OSDMap::decode(ceph::buffer::list::iterator&)+0x8c) [0x7fcd9173ae5c]
13: (OSDMap::decode(ceph::buffer::list&)+0x2e) [0x7fcd9173c41e]
14: (OSDService::try_get_map(unsigned int)+0x664) [0x7fcd910ea6a4]
15: (OSD::load_pgs()+0x1209) [0x7fcd910ee909]
16: (OSD::init()+0x2186) [0x7fcd91106606]
17: (main()+0x2e05) [0x7fcd9100bd75]
18: (__libc_start_main()+0xf5) [0x7fcd8d43a445]
19: (()+0x4ae306) [0x7fcd910aa306]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph daemon osd.58 config set debug_osd 20
ceph-objectstore-tool
工具从正常osd上获取对应版本号的osdmapsystemctl stop ceph-osd@id(正常)
该工具需要先停止osdceph-objectstore-tool --op get-osdmap --epoch 314 --data-path /var/lib/ceph/osd/ceph-id --type bluestore --file osdmap314
ceph-objectstore-tool --op set-osdmap --epoch 314 --data-path /var/lib/ceph/osd/ceph-58 --type bluestore --file osdmap314
systemctl start ceph-osd@58