问题：

Elasticsearch每隔几小时警告一次未分配的碎片

年文柏

2023-03-14

我们的集群有3个elasticsearch data pods/3个主pods/1个客户端和1个导出器。问题是警报“ElasticSearch因断路异常而未分配碎片”。你可以在这个问题中进一步检查这个问题

eleasticsearch-data-0、1和2的堆利用率分别为68%、61%和63%。

我做了下面的API调用，可以看到碎片几乎是均匀分布的。

curl-s http://localhost:9200/_cat/shards grep elasticsearch-data-0 wc-l

{
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "can_allocate": "no",
    "current_state": "unassigned",
    "index": "graph_24_18549",
    "node_allocation_decisions": [
        {
            "deciders": [
                {
                    "decider": "max_retry",
                    "decision": "NO",
                    "explanation": "shard has exceeded the maximum number of retries [50] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:18:44.115Z], failed_attempts[50], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:16:42.146Z], failed_attempts[49], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid2], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:15:05.849Z], failed_attempts[48], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [tsg_ngf_graph_1_mtermmetrics1_vertex_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid3], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:11:50.143Z], failed_attempts[47], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[o_9jyrmOSca9T12J4bY0Nw], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid4], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:08:10.182Z], failed_attempts[46], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid6], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:07:03.102Z], failed_attempts[45], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid7], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:05:53.267Z], failed_attempts[44], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid8], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:04:24.507Z], failed_attempts[43], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid9], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:03:02.018Z], failed_attempts[42], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid10], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:01:38.094Z], failed_attempts[41], delayed=false, details[failed shard on node [nodeid1]: failed recovery, failure RecoveryFailedException[[graph_24_18549][0]: Recovery failed from {elasticsearch-data-2}{}{} into {elasticsearch-data-1}{}{}{IP}{IP:9300}]; nested: RemoteTransportException[[elasticsearch-data-2][IP:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2012997826/1.8gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2012934784/1.8gb], new bytes reserved: [63042/61.5kb]]; ], allocation_status[no_attempt]], expected_shard_size[4338334540], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[5040039519], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2452709390/2.2gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2060112120/1.9gb], new bytes reserved: [392597270/374.4mb]]; ], allocation_status[no_attempt]], expected_shard_size[2606804616], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4799579998], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4012459974], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2045921066/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1770141176/1.6gb], new bytes reserved: [275779890/263mb]]; ], allocation_status[no_attempt]], expected_shard_size[3764296412], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[2631720247], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2064366222/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1838754456/1.7gb], new bytes reserved: [225611766/215.1mb]]; ], allocation_status[no_attempt]], expected_shard_size[3255872204], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2132674062/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1902340880/1.7gb], new bytes reserved: [230333182/219.6mb]]; ], allocation_status[no_attempt]], expected_shard_size[2956220256], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2092139364/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1855009224/1.7gb], new bytes reserved: [237130140/226.1mb]]; ], allocation_status[no_attempt]]]"
                },
{
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[graph_24_18549][0], node[nodeid2], [P], s[STARTED], a[id=someid]]"
                }
            ],
            "node_decision": "no",
            "node_id": "nodeid2",
            "node_name": "elasticsearch-data-2",
            "transport_address": "IP:9300"
        }

现在需要做什么？因为我没看到堆在射击。我已经尝试过下面的API，它帮助并分配所有未分配的碎片，但问题每隔几个小时就会出现。

curl-xpost':9200/_cluster/reroute？retry_failed

司寇星海

2023-03-14

您使用的是哪个ElasticSearch版本？7.9.1和7.10.1由于CircuitBreakingException和更好的索引压力，具有更好的重试失败复制

我建议您尝试升级集群。Version7.10.1似乎已经为我修复了这个问题。查看更多信息：不支持未分配的碎片/CircuitBreakingException/小于-1字节的值的帮助

Elasticsearch每隔几小时警告一次未分配的碎片

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档