当前位置: 首页 > 知识库问答 >
问题:

帮助在单个节点上创建rook-ceph集群时进行故障排除

呼延沈义
2023-03-14

我知道您不应该在单个节点上创建ceph集群。但这只是一个小型私人项目,因此我没有资源或需要真正的集群。

但我想建立一个集群,我有一些问题。当前,我的群集已关闭,我得到以下健康问题。

[root@rook-ceph-tools-6bdcd78654-vq7kn /]# ceph status
  cluster:
    id:     12d9fbb9-73f3-4229-9ef4-6b7670324629
    health: HEALTH_WARN
            Reduced data availability: 33 pgs inactive
            68 slow ops, oldest one blocked for 26686 sec, osd.0 has slow ops
 
  services:
    mon: 1 daemons, quorum g (age 15m)
    mgr: a(active, since 44m)
    osd: 1 osds: 1 up (since 8m), 1 in (since 9m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 465 GiB / 466 GiB avail
    pgs:     100.000% pgs unknown
             33 unknown

[root@rook-ceph-tools-6bdcd78654-vq7kn /]# ceph health detail
HEALTH_WARN Reduced data availability: 33 pgs inactive; 68 slow ops, oldest one blocked for 26691 sec, osd.0 has slow ops
[WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive
    pg 2.0 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.0 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.2 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.3 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.4 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.5 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.6 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.7 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.8 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.9 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.a is stuck inactive for 44m, current state unknown, last acting []
    pg 3.b is stuck inactive for 44m, current state unknown, last acting []
    pg 3.c is stuck inactive for 44m, current state unknown, last acting []
    pg 3.d is stuck inactive for 44m, current state unknown, last acting []
    pg 3.e is stuck inactive for 44m, current state unknown, last acting []
    pg 3.f is stuck inactive for 44m, current state unknown, last acting []
    pg 3.10 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.11 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.12 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.13 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.14 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.15 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.16 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.17 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.18 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.19 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1a is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1b is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1c is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1d is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1e is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1f is stuck inactive for 44m, current state unknown, last acting []
[WRN] SLOW_OPS: 68 slow ops, oldest one blocked for 26691 sec, osd.0 has slow ops

如果有人知道从哪里开始或如何解决我的问题,请帮助!

共有1个答案

华浩壤
2023-03-14

是的,同意上面提到的eblock。如果每个OSD上至少有3个对象副本,则应该有3个以上的OSD(最小3个磁盘,或3个卷...随便)。放置组中的对象内容存储在一组OSD中,放置组不拥有OSD,它们与来自同一池甚至其他池的其他放置组共享OSD。

>

  • 如果一个OSD失败,它所包含的对象的所有副本都丢失。对于放置组中的所有对象,副本的数量突然从三个下降到两个。Ceph通过选择一个新的OSD来重新创建所有对象的第三个副本来开始这个放置组的恢复。

    如果同一放置组中的另一个OSD在新OSD用第三个副本完全填充之前失败。一些对象将只有一个幸存的副本。

  •  类似资料:
    • 创建采集节点: 网址索引 内容配置

    • 内容配置: (图2.211) 因各人采集习惯不同,这理只做简单的介绍,具体操作方式在第5部分有详细的采集例子。图2.211是创建采集节点页面的第二步“内容配置”,其中有三大区需设置,内容分页,文章固定的项目,文章内容。填写时需注意下面两点: 1) 采集内容均以“[内容]”表示,也可指定固定内容替换 2) 表单均为可填,

    • 网址索引: (图2.111) 上图(图2.111)是创建采集节点页面的网址索引页截图,这里主要填写目标站列表地址和相应的规则。下面将分别把节点基本信息、列表网址获取规则和文章网址匹配规则介绍一下。 节点基本信息 在节点基本信息下有两个地方需要注意,节点名称和目标页面编码。节点名称虽然允许为空,但最好能填上简单易懂文字进行标识;目标页面编码指的是目标文章页面源代码中的charset值,这个设置是为了

    • Disque 以集群模式运行, 每个服务器都是集群中的一个节点, 用户可以运行任意数量的节点, 只要确保每个节点的端口号不同即可。 在默认情况下, 运行 Disque 服务器程序 disque-server 将启动一个端口号为 7711 的 Disque 节点: $ ./disque-server 528:C 28 Apr 11:50:08.519 # Warning: no config fil

    • 在某些情况下,如服务器硬件故障,造成单台 Ceph 节点宕机无法启动,可以按照本节所示流程将该节点上的 OSD 移除集群,从而达到 Ceph 集群的恢复。 5.1 单台 Ceph 节点宕机处理步骤 登陆 ceph monitor 节点,查询 ceph 状态: ceph health detail 将故障节点上的所有 osd 设置成 out,该步骤会触发数据 recovery, 需要等待数据迁移完成

    • 1)以上假设是否正确。2)当发生故障时,滚动窗口有状态是否有意义,我们从最后一个kafka分区提交的偏移量开始。3)当滚动窗口有状态时,这个状态什么时候可以被flink使用。4)为什么检查点和保存点的状态大小不同。5)当发生故障时,flink总是从sorce运算符开始。对吗?