聊聊cortex的Backoff

吕钧

2023-12-01

序

本文主要研究一下cortex的Backoff

Backoff

github.com/cortexproject/cortex/pkg/util/backoff.go

// Backoff implements exponential backoff with randomized wait times
type Backoff struct {
	cfg          BackoffConfig
	ctx          context.Context
	numRetries   int
	nextDelayMin time.Duration
	nextDelayMax time.Duration
}

// NewBackoff creates a Backoff object. Pass a Context that can also terminate the operation.
func NewBackoff(ctx context.Context, cfg BackoffConfig) *Backoff {
	return &Backoff{
		cfg:          cfg,
		ctx:          ctx,
		nextDelayMin: cfg.MinBackoff,
		nextDelayMax: doubleDuration(cfg.MinBackoff, cfg.MaxBackoff),
	}
}

Backoff定义了cfg、ctx、numRetries、nextDelayMin、nextDelayMax属性；NewBackoff提供了基于BackoffConfig的工厂方法，默认的nextDelayMin为cfg.MinBackoff

BackoffConfig

github.com/cortexproject/cortex/pkg/util/backoff.go

// BackoffConfig configures a Backoff
type BackoffConfig struct {
	MinBackoff time.Duration `yaml:"min_period"`  // start backoff at this level
	MaxBackoff time.Duration `yaml:"max_period"`  // increase exponentially to this level
	MaxRetries int           `yaml:"max_retries"` // give up after this many; zero means infinite retries
}

BackoffConfig定义了MinBackoff、MaxBackoff、MaxRetries属性

Ongoing

github.com/cortexproject/cortex/pkg/util/backoff.go

// Reset the Backoff back to its initial condition
func (b *Backoff) Reset() {
	b.numRetries = 0
	b.nextDelayMin = b.cfg.MinBackoff
	b.nextDelayMax = doubleDuration(b.cfg.MinBackoff, b.cfg.MaxBackoff)
}

// Ongoing returns true if caller should keep going
func (b *Backoff) Ongoing() bool {
	// Stop if Context has errored or max retry count is exceeded
	return b.ctx.Err() == nil && (b.cfg.MaxRetries == 0 || b.numRetries < b.cfg.MaxRetries)
}

// Err returns the reason for terminating the backoff, or nil if it didn't terminate
func (b *Backoff) Err() error {
	if b.ctx.Err() != nil {
		return b.ctx.Err()
	}
	if b.cfg.MaxRetries != 0 && b.numRetries >= b.cfg.MaxRetries {
		return fmt.Errorf("terminated after %d retries", b.numRetries)
	}
	return nil
}

// NumRetries returns the number of retries so far
func (b *Backoff) NumRetries() int {
	return b.numRetries
}

// Wait sleeps for the backoff time then increases the retry count and backoff time
// Returns immediately if Context is terminated
func (b *Backoff) Wait() {
	// Increase the number of retries and get the next delay
	sleepTime := b.NextDelay()

	if b.Ongoing() {
		select {
		case <-b.ctx.Done():
		case <-time.After(sleepTime):
		}
	}
}

func (b *Backoff) NextDelay() time.Duration {
	b.numRetries++

	// Handle the edge case the min and max have the same value
	// (or due to some misconfig max is < min)
	if b.nextDelayMin >= b.nextDelayMax {
		return b.nextDelayMin
	}

	// Add a jitter within the next exponential backoff range
	sleepTime := b.nextDelayMin + time.Duration(rand.Int63n(int64(b.nextDelayMax-b.nextDelayMin)))

	// Apply the exponential backoff to calculate the next jitter
	// range, unless we've already reached the max
	if b.nextDelayMax < b.cfg.MaxBackoff {
		b.nextDelayMin = doubleDuration(b.nextDelayMin, b.cfg.MaxBackoff)
		b.nextDelayMax = doubleDuration(b.nextDelayMax, b.cfg.MaxBackoff)
	}

	return sleepTime
}

func doubleDuration(value time.Duration, max time.Duration) time.Duration {
	value = value * 2

	if value <= max {
		return value
	}

	return max
}

Backoff主要提供了Ongoing及Wait方法；Ongoing返回bool用于表示是否可以继续，如果err为nil且b.cfg.MaxRetries或者b.numRetries < b.cfg.MaxRetries返回true；Wait方法会等待执行完成或者是b.NextDelay()时间到；NextDelay方法会递增numRetries然后计算sleepTime；Err方法返回ctx的Err或者是重试次数超限的错误

实例

// NewBackoffRetry gRPC middleware.
func NewBackoffRetry(cfg util.BackoffConfig) grpc.UnaryClientInterceptor {
	return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
		backoff := util.NewBackoff(ctx, cfg)
		for backoff.Ongoing() {
			err := invoker(ctx, method, req, reply, cc, opts...)
			if err == nil {
				return nil
			}

			if status.Code(err) != codes.ResourceExhausted {
				return err
			}

			backoff.Wait()
		}
		return backoff.Err()
	}
}

NewBackoffRetry展示了如何使用backoff，通过for循环，条件为backoff.Ongoing()，中间执行要重试的操作，最后执行backoff.Wait()，如果没有提前返回最后返回backoff.Err()

小结

cortex提供了Backoff，可以基于MinBackoff、MaxBackoff、MaxRetries来进行重试。

doc

cortex

聊聊cortex的Backoff

序

Backoff

BackoffConfig

Ongoing

实例

小结

doc

相关阅读

相关文章

相关问答

相关文档