当前位置: 首页 > 工具软件 > Osd-Lyrics > 使用案例 >

ceph-osd无法获取osd map导致osd down掉的解决办法

端木宏盛
2023-12-01

环境:ceph-12.2.1 3节点测试性能集群 60块osd
最近ceph集群中有两个osd在重启之后遇到如下问题,osd获取不到集群osdmap产生coredump:

 ceph version 12.2.1.06 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
 1: (()+0xa2bf21) [0x7fcd91627f21]
 2: (()+0xf6d0) [0x7fcd8e42d6d0]
 3: (gsignal()+0x37) [0x7fcd8d44e277]
 4: (abort()+0x148) [0x7fcd8d44f968]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fcd8dd5d9d5]
 6: (()+0x5e946) [0x7fcd8dd5b946]
 7: (()+0x5e973) [0x7fcd8dd5b973]
 8: (()+0x5eb9f) [0x7fcd8dd5bb9f]
 9: (entity_addr_t::decode(ceph::buffer::list::iterator&)+0x31d) [0x7fcd9111f1bd]
 10: (void decode<entity_addr_t, mempool::pool_allocator<(mempool::pool_index_t)15, std::shared_ptr<entity_addr_t> > >(std::vector<std::shared_ptr<entity_addr_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::shared_ptr<entity_addr_t> > >&, ceph::buffer::list::iterator&)+0xfc) [0x7fcd91743ddc]
 11: (OSDMap::decode_classic(ceph::buffer::list::iterator&)+0x66a) [0x7fcd9172daca]
 12: (OSDMap::decode(ceph::buffer::list::iterator&)+0x8c) [0x7fcd9173ae5c]
 13: (OSDMap::decode(ceph::buffer::list&)+0x2e) [0x7fcd9173c41e]
 14: (OSDService::try_get_map(unsigned int)+0x664) [0x7fcd910ea6a4]
 15: (OSD::load_pgs()+0x1209) [0x7fcd910ee909]
 16: (OSD::init()+0x2186) [0x7fcd91106606]
 17: (main()+0x2e05) [0x7fcd9100bd75]
 18: (__libc_start_main()+0xf5) [0x7fcd8d43a445]
 19: (()+0x4ae306) [0x7fcd910aa306]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

解决办法:将正常osd上的osdmap取出写回故障osd中

  • 设置故障osd的debug等级为20,查看出现coredump问题时osdmap的版本号,例如214
    ceph daemon osd.58 config set debug_osd 20
  • 使用ceph-objectstore-tool工具从正常osd上获取对应版本号的osdmap
    systemctl stop ceph-osd@id(正常) 该工具需要先停止osd
    ceph-objectstore-tool --op get-osdmap --epoch 314 --data-path /var/lib/ceph/osd/ceph-id --type bluestore --file osdmap314
  • 将获取到的osdmap写回故障osd里
    ceph-objectstore-tool --op set-osdmap --epoch 314 --data-path /var/lib/ceph/osd/ceph-58 --type bluestore --file osdmap314
  • 启动故障osd即可
    systemctl start ceph-osd@58
 类似资料: