当前位置: 首页 > 工具软件 > gohttp > 使用案例 >

go HTTP Client大量长连接保持(自定义client设置及源码简单分析)

墨翔宇
2023-12-01

一,问题起因

       线上server to server的服务,出现大量的TIME_WAIT。用netstat发现,不断的有连接在建立,没有保持住连接。抓TCP包确认request和response中的keepalive都已经设置,但是每个TCP连接处理6次左右的http请求后,就被关闭。

       就处理该问题的过程中,查看了一下http client的部分源码。

二,HTTP Client简单结构

1,简单HTTP client定义

httpClient := &http.Client{
		Transport: trans,
		Timeout:   config.Client_Timeout * time.Millisecond,
	}

Timeout:从发起请求到整个报文响应结束的超时时间。

Transport:为http.RoundTripper接口,定义功能为负责http的请求分发。实际功能由结构体net/http/transport.go中的Transport struct继承并实现,除了请求发分还实现了对空闲连接的管理。如果创建client时不定义,就用系统默认配置。

        2,DefaultTransport定义

// DefaultTransport is the default implementation of Transport and is
// used by DefaultClient. It establishes network connections as needed
// and caches them for reuse by subsequent calls. It uses HTTP proxies
// as directed by the $HTTP_PROXY and $NO_PROXY (or $http_proxy and
// $no_proxy) environment variables.
var DefaultTransport RoundTripper = &Transport{
	Proxy: ProxyFromEnvironment,
	DialContext: (&net.Dialer{
		Timeout:   30 * time.Second,
		KeepAlive: 30 * time.Second,
		DualStack: true,
	}).DialContext,
	MaxIdleConns:          100,
	IdleConnTimeout:       90 * time.Second,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
}
net.Dialer.Timeout连接超时时间。

net.Dialer.KeepAlive:开启长连接(说明默认http client是默认开启长连接的)。

http.Transport.TLSHandshakeTimeout:限制TLS握手使用的时间。
http.Transport.ExpectContinueTimeout:限制客户端在发送一个包含:100-continue的http报文头后,等待收到一个go-ahead响应报文所用的时间。

http.Transport.MaxIdleConns:最大空闲连接数。(the maximum number of idle (keep-alive) connections across all hosts. Zero means no limit.)

http.Transport.IdleConnTimeout:连接最大空闲时间,超过这个时间就会被关闭。

三,问题跟踪-keepAlive设置

1. 按照DefaultTransport自定义Transport后,怎么调整参数,线上问题依旧没有得到解决。怀疑是对keepAlive参数的理解不到位,所以继续看源码中对keepAlive参数的使用。

2. net包中net/dial.go中, 使用方法func (d *Dialer) DialContext()创建新连接,有代码片段如下:

	if tc, ok := c.(*TCPConn); ok && d.KeepAlive > 0 {
		setKeepAlive(tc.fd, true)
		setKeepAlivePeriod(tc.fd, d.KeepAlive)
		testHookSetKeepAlive()
	}
    在Dialer中设置的一个keepalive参数,被分解成了两个分支,一是开关,二是keepalive周期。再继续往下跟踪源码的时候,就开始系统调用了,提取出关键代码如下:

setKeepAlive():
syscall.SetsockoptInt(fd.sysfd, syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, boolint(keepalive))
setKeepAlivePeriod():  
syscall.SetsockoptInt(fd.sysfd, syscall.IPPROTO_TCP, sysTCP_KEEPINTVL, secs)  
syscall.SetsockoptInt(fd.sysfd, syscall.IPPROTO_TCP, syscall.TCP_KEEPALIVE, secs) 
    大致意思是,首先开启系统socket的SOL_SOCKET设置;然后TCP_KEEPINTVL和TCP_KEEPALIVE用的同一个时间来设置。

3. 可以查看linux系统中TCP关于keepalive的三个参数,执行man 7 tcp 命令可以找到以下三个参数:

 tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
     The number of seconds between TCP keep-alive probes.

 tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
     The  maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained
     from the other end.

tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
     The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes.  Keep-alives are  only
     sent  when  the SO_KEEPALIVE socket option is enabled.  The default value is 7200 seconds (2 hours).  An idle connection is
     terminated after approximately an additional 11 minutes (9 probes an interval of  75  seconds  apart)  when  keep-alive  is
     enabled.

     Note that underlying connection tracking mechanisms and application timeouts may be much shorter.
大意为:要想使用keepalive机制,首先得开启SO_KEEPALIVE设置;然后系统会在connection空闲keepalive_time时间后发起探针,连续keepalive_probes个探针失败时,系统将关闭连接。keepalive_intvl为两次探针的间隔时间。

明白go的keepalive后,理论上应用中的设置是没问题的,实际经过调大调小该参数,也是没有解决保持不住长连接的问题。无奈从Client开始继续看源码...

四,Transport

client发起请求一般是由Do(req *Request) (*Response, error)方法开始,而真正处理请求分发的是transport的RoundTrip(*Request) (*Response, error)方法,Transport定义如下:

// Transport is an implementation of RoundTripper that supports HTTP,
// HTTPS, and HTTP proxies (for either HTTP or HTTPS with CONNECT).
//
// By default, Transport caches connections for future re-use.
// This may leave many open connections when accessing many hosts.
// This behavior can be managed using Transport's CloseIdleConnections method
// and the MaxIdleConnsPerHost and DisableKeepAlives fields.
//
// Transports should be reused instead of created as needed.
// Transports are safe for concurrent use by multiple goroutines.
//
// A Transport is a low-level primitive for making HTTP and HTTPS requests.
// For high-level functionality, such as cookies and redirects, see Client.
//
// Transport uses HTTP/1.1 for HTTP URLs and either HTTP/1.1 or HTTP/2
// for HTTPS URLs, depending on whether the server supports HTTP/2,
// and how the Transport is configured. The DefaultTransport supports HTTP/2.
// To explicitly enable HTTP/2 on a transport, use golang.org/x/net/http2
// and call ConfigureTransport. See the package docs for more about HTTP/2.
type Transport struct {
	idleMu     sync.Mutex
	wantIdle   bool                                // user has requested to close all idle conns
	idleConn   map[connectMethodKey][]*persistConn // most recently used at end
	idleConnCh map[connectMethodKey]chan *persistConn
	idleLRU    connLRU

	reqMu       sync.Mutex
	reqCanceler map[*Request]func(error)

	altMu    sync.Mutex   // guards changing altProto only
	altProto atomic.Value // of nil or map[string]RoundTripper, key is URI scheme

	// Proxy specifies a function to return a proxy for a given
	// Request. If the function returns a non-nil error, the
	// request is aborted with the provided error.
	// If Proxy is nil or returns a nil *URL, no proxy is used.
	Proxy func(*Request) (*url.URL, error)

	// DialContext specifies the dial function for creating unencrypted TCP connections.
	// If DialContext is nil (and the deprecated Dial below is also nil),
	// then the transport dials using package net.
	DialContext func(ctx context.Context, network, addr string) (net.Conn, error)

	// Dial specifies the dial function for creating unencrypted TCP connections.
	//
	// Deprecated: Use DialContext instead, which allows the transport
	// to cancel dials as soon as they are no longer needed.
	// If both are set, DialContext takes priority.
	Dial func(network, addr string) (net.Conn, error)

	// DialTLS specifies an optional dial function for creating
	// TLS connections for non-proxied HTTPS requests.
	//
	// If DialTLS is nil, Dial and TLSClientConfig are used.
	//
	// If DialTLS is set, the Dial hook is not used for HTTPS
	// requests and the TLSClientConfig and TLSHandshakeTimeout
	// are ignored. The returned net.Conn is assumed to already be
	// past the TLS handshake.
	DialTLS func(network, addr string) (net.Conn, error)

	// TLSClientConfig specifies the TLS configuration to use with
	// tls.Client.
	// If nil, the default configuration is used.
	// If non-nil, HTTP/2 support may not be enabled by default.
	TLSClientConfig *tls.Config

	// TLSHandshakeTimeout specifies the maximum amount of time waiting to
	// wait for a TLS handshake. Zero means no timeout.
	TLSHandshakeTimeout time.Duration

	// DisableKeepAlives, if true, prevents re-use of TCP connections
	// between different HTTP requests.
	DisableKeepAlives bool

	// DisableCompression, if true, prevents the Transport from
	// requesting compression with an "Accept-Encoding: gzip"
	// request header when the Request contains no existing
	// Accept-Encoding value. If the Transport requests gzip on
	// its own and gets a gzipped response, it's transparently
	// decoded in the Response.Body. However, if the user
	// explicitly requested gzip it is not automatically
	// uncompressed.
	DisableCompression bool

	// MaxIdleConns controls the maximum number of idle (keep-alive)
	// connections across all hosts. Zero means no limit.
	MaxIdleConns int

	// MaxIdleConnsPerHost, if non-zero, controls the maximum idle
	// (keep-alive) connections to keep per-host. If zero,
	// DefaultMaxIdleConnsPerHost is used.
	MaxIdleConnsPerHost int

	// IdleConnTimeout is the maximum amount of time an idle
	// (keep-alive) connection will remain idle before closing
	// itself.
	// Zero means no limit.
	IdleConnTimeout time.Duration

	// ResponseHeaderTimeout, if non-zero, specifies the amount of
	// time to wait for a server's response headers after fully
	// writing the request (including its body, if any). This
	// time does not include the time to read the response body.
	ResponseHeaderTimeout time.Duration

	// ExpectContinueTimeout, if non-zero, specifies the amount of
	// time to wait for a server's first response headers after fully
	// writing the request headers if the request has an
	// "Expect: 100-continue" header. Zero means no timeout and
	// causes the body to be sent immediately, without
	// waiting for the server to approve.
	// This time does not include the time to send the request header.
	ExpectContinueTimeout time.Duration

	// TLSNextProto specifies how the Transport switches to an
	// alternate protocol (such as HTTP/2) after a TLS NPN/ALPN
	// protocol negotiation. If Transport dials an TLS connection
	// with a non-empty protocol name and TLSNextProto contains a
	// map entry for that key (such as "h2"), then the func is
	// called with the request's authority (such as "example.com"
	// or "example.com:1234") and the TLS connection. The function
	// must return a RoundTripper that then handles the request.
	// If TLSNextProto is not nil, HTTP/2 support is not enabled
	// automatically.
	TLSNextProto map[string]func(authority string, c *tls.Conn) RoundTripper

	// ProxyConnectHeader optionally specifies headers to send to
	// proxies during CONNECT requests.
	ProxyConnectHeader Header

	// MaxResponseHeaderBytes specifies a limit on how many
	// response bytes are allowed in the server's response
	// header.
	//
	// Zero means to use a default limit.
	MaxResponseHeaderBytes int64

	// nextProtoOnce guards initialization of TLSNextProto and
	// h2transport (via onceSetNextProtoDefaults)
	nextProtoOnce sync.Once
	h2transport   *http2Transport // non-nil if http2 wired up

	// TODO: tunable on max per-host TCP dials in flight (Issue 13957)
}
大意为Transport是一个支持HTTP、HTTPS、HTTP Proxies的RoundTripper,是协程安全的,并默认支持连接池。

从源码能看到,当获取一个IdleConn处理完request后,会调用tryPutIdleConn方法回放conn,代码有这样一个逻辑:

	idles := t.idleConn[key]
	if len(idles) >= t.maxIdleConnsPerHost() {
		return errTooManyIdleHost
	}
也就是说IdleConn不仅受到MaxIdleConn的限制,也受到MaxIdleConnsPerHost的限制,DefaultTranspor中是没有设置该参数的,而默认的参数为2.

由于我们业务为server to server,所以是定点访问,经过该参数的调整,服务器上已经保持住稳定的长连接了。


五,参考资料

1. Go net/http 超时指导

2. golang 长短连接处理

3.为什么基于TCP的应用需要心跳包(TCP keep-alive原理分析)

4.Golang 优化之路——HTTP长连接

5. golang的垃圾回收与Finalizer——tcp连接是如何被自动关闭的









 类似资料: