Redis trouble15 -- unable to failover:check ‘cluster-replica-validity-factor‘ configuration option

黎承颜
2023-12-01

一、问题描述

标题太短了放不下,完整的异常提示为:

Currently unable to failover: Disconnected from master for longer than allowed. Please check the 'cluster-replica-validity-factor' configuration option.

原为三主三从集群,同时宕掉一对主从,此时集群状态fail,启动从节点,会发现从节点并没有如我们所想那样成为主节点,集群状态依旧fail,这个是个正常的现象并没有解决方案,这篇文章是为了分析在什么情况下会出现主从不能切换的问题

1:S 18 Aug 2021 17:00:07.622 # Error condition on socket for SYNC: Connection refused
1:S 18 Aug 2021 17:00:08.626 * Connecting to MASTER 127.0.0.1:8518
1:S 18 Aug 2021 17:00:08.626 * MASTER <-> REPLICA sync started
1:S 18 Aug 2021 17:00:08.626 # Error condition on socket for SYNC: Connection refused
1:S 18 Aug 2021 17:00:09.528 # Currently unable to failover: Disconnected from master for longer than allowed. Please check the 'cluster-replica-validity-factor' configuration option.
1:S 18 Aug 2021 17:00:09.628 * Connecting to MASTER 127.0.0.1:8518
1:S 18 Aug 2021 17:00:09.628 * MASTER <-> REPLICA sync started
1:S 18 Aug 2021 17:00:09.628 # Error condition on socket for SYNC: Connection refused
1:S 18 Aug 2021 17:00:10.631 * Connecting to MASTER 127.0.0.1:8518
1:S 18 Aug 2021 17:00:10.631 * MASTER <-> REPLICA sync started
1:S 18 Aug 2021 17:00:10.631 # Error condition on socket for SYNC: Connection refused
1:S 18 Aug 2021 17:00:11.636 * Connecting to MASTER 127.0.0.1:8518
1:S 18 Aug 2021 17:00:11.636 * MASTER <-> REPLICA sync started
1:S 18 Aug 2021 17:00:11.636 # Error condition on socket for SYNC: Connection refused
1:S 18 Aug 2021 17:00:12.640 * Connecting to MASTER 127.0.0.1:8518

二、什么情况下主从会不切换

通过以下的源码分析,主从断开达到一定的时间(计算时长主从断开超过160s)时,会存在从不能切主的情况,需要手动failover才可以切换
slave和master的复制连接断开时间不超过给定的值(值可配置,目的是确保slave上的数据足够完整,所以运维时不能任由一个slave长时间不可用,需要通过监控将异常的slave及时恢复)。

 计算failover的超时时间
超时时间为 max(cluster_node_timeout*2,2000),
集群默认超时时间为15000*2,所以默认的 auth_timeout 时间为30s 默认 auth_retry_time 时间为60s

 /* Compute the failover timeout (the max time we have to send votes
     * and wait for replies), and the failover retry time (the time to wait
     * before trying to get voted again).
     *
     * Timeout is MAX(NODE_TIMEOUT*2,2000) milliseconds.
     * Retry is two times the Timeout.
     */
    auth_timeout = server.cluster_node_timeout*2;
    if (auth_timeout < 2000) auth_timeout = 2000;
    auth_retry_time = auth_timeout*2;

'以下情况不能执行failover'
1).当前节点不是slave节点
2).主节点被标记为fail,或者是手动故障切换
3).没有设置failover的配置(cluster-slave-no-failover yes) 并且不是手动故障切换
4).上边有slots使用 

 /* Pre conditions to run the function, that must be met both in case
     * of an automatic or manual failover:
     * 1) We are a slave.
     * 2) Our master is flagged as FAIL, or this is a manual failover.
     * 3) We don't have the no failover configuration set, and this is
     *    not a manual failover.
     * 4) It is serving slots. */
    if (nodeIsMaster(myself) ||
        myself->slaveof == NULL ||
        (!nodeFailed(myself->slaveof) && !manual_failover) ||
        (server.cluster_slave_no_failover && !manual_failover) ||
        myself->slaveof->numslots == 0)
    {
        /* There are no reasons to failover, so we set the reason why we
         * are returning without failing over to NONE. */
        server.cluster->cant_failover_reason = CLUSTER_CANT_FAILOVER_NONE;
        return;
    }
// 主节点断开连接的秒数
	/* Set data_age to the number of seconds we are disconnected from
     * the master. */
    if (server.repl_state == REPL_STATE_CONNECTED) {
        data_age = (mstime_t)(server.unixtime - server.master->lastinteraction)
                   * 1000;
    } else {
        data_age = (mstime_t)(server.unixtime - server.repl_down_since) * 1000;
    }

//data_age中要除去集群超时时间
    /* Remove the node timeout from the data age as it is fine that we are
     * disconnected from our master at least for the time it was down to be
     * flagged as FAIL, that's the baseline. */
    if (data_age > server.cluster_node_timeout)
        data_age -= server.cluster_node_timeout;

根据用户配置的cluster_slave_validity_factor检查我们的数据是否足够新,并检查是否有手动failover'
'默认参数配置:'
    cluster-node-timeout 15000
    cluster-replica-validity-factor 10 
    repl-ping-replica-period 10
    repl-timeout 60
'同时满足以下条件不能主从切换'
    1.配置了cluster_slave_validity_factor 
    2.data_age > 10*1000 + 15000*10 = 160s
    3.不是手动failover 

    /* Check if our data is recent enough according to the slave validity
     * factor configured by the user.
     * Check bypassed for manual failovers. */
    if (server.cluster_slave_validity_factor &&
        data_age >
        (((mstime_t)server.repl_ping_slave_period * 1000) +
         (server.cluster_node_timeout * server.cluster_slave_validity_factor)))
    {
        if (!manual_failover) {
            clusterLogCantFailover(CLUSTER_CANT_FAILOVER_DATA_AGE);
            return;
        }
    }

排除以上不能failover的情况以外,如果可以failover下边提供了具体的failover的规则
如果上一次故障切换尝试时间已过,且重试时间已过,则可以设置新的故障切换尝试时间
    只有master为fail状态,slave才会发起选举。但并不是master为fail时立即发起选举,而是延迟下列随机时长,以避免多个slaves同时发起选举(至少延迟0.5秒后才会发起选举):
    500 milliseconds + random delay between 0 and 500 milliseconds + SLAVE_RANK * 1000 milliseconds 

    /* If the previous failover attempt timedout and the retry time has
     * elapsed, we can setup a new one. */
    if (auth_age > auth_retry_time) {
        server.cluster->failover_auth_time = mstime() +
            500 + /* Fixed delay of 500 milliseconds, let FAIL msg propagate. */
            random() % 500; /* Random delay between 0 and 500 milliseconds. */
        server.cluster->failover_auth_count = 0;
        server.cluster->failover_auth_sent = 0;
        server.cluster->failover_auth_rank = clusterGetSlaveRank();
        /* We add another delay that is proportional to the slave rank.
         * Specifically 1 second * rank. This way slaves that have a probably
         * less updated replication offset, are penalized. */
        server.cluster->failover_auth_time +=
            server.cluster->failover_auth_rank * 1000;
        /* However if this is a manual failover, no delay is needed. */
        if (server.cluster->mf_end) {
            server.cluster->failover_auth_time = mstime();
            server.cluster->failover_auth_rank = 0;
	    clusterDoBeforeSleep(CLUSTER_TODO_HANDLE_FAILOVER);
        }
        serverLog(LL_WARNING,
            "Start of election delayed for %lld milliseconds "
            "(rank #%d, offset %lld).",
            server.cluster->failover_auth_time - mstime(),
            server.cluster->failover_auth_rank,
            replicationGetSlaveOffset());
        /* Now that we have a scheduled election, broadcast our offset
         * to all the other slaves so that they'll updated their offsets
         * if our offset is better. */
        clusterBroadcastPong(CLUSTER_BROADCAST_LOCAL_SLAVES);
        return;
    }

 类似资料:

相关阅读

相关文章

相关问答