Nginx 性能优化给你的几点建议和技巧 Performance Tuning – Tips & Tricks

施飞昂

2023-12-01

Tuning NGINX Configuration

Please refer to the NGINX reference documentation for details about supported values, default settings, and the scope within which each setting is supported.

SSL

This section describes how to remove slow and unnecessary ciphers from OpenSSL and NGINX.

When SSL performance is paramount, it’s always a good idea to try different key sizes and types in your environment, finding the correct balance for your specific security needs between longer keys for increased security and shorter keys for faster performance. An easy test is to move from more traditional RSA keys to Elliptical Curve Cryptography (ECC), which uses smaller key sizes and is therefore computationally faster for the same level of security.

To generate quick, self‑signed ECC P‑256 keys for testing, run these commands:

# openssl ecparam -out ./nginx-ecc-p256.key -name prime256v1 -genkey
# openssl req -new -key ./nginx-ecc-p256.key -out ./nginx-ecc-p256-csr.pem -subj '/CN=localhost'
# openssl req -x509 -nodes -days 30 -key ./nginx-ecc-p256.key -in ./nginx-ecc-p256-csr.pem -out ./nginx-ecc-p256.pem

Compression

Gzip parameters provide granular control over how NGINX delivers content, so setting them incorrectly can decrease NGINX performance. Enabling gzip can save bandwidth, improving page load time on slow connections. (In local, synthetic benchmarks, enabling gzip might not show the same benefits as in the real world.) Try these settings for optimum performance:

Enable gzip only for relevant content, such as text, JavaScript, and CSS files
Do not increase the compression level, as this costs CPU effort without a commensurate increase in throughput
Evaluate the effect of enabling compression by enabling and disabling gzip for different types and sizes of content

More information on granular gzip control can be found in the reference documentation for the NGINX gzip module.

Connection Handling

There are several tuning options related to connection handling. Please refer to the linked reference documentation for details on proper syntax, applicable configuration blocks (http, server, location), and so on.

accept_mutex off – All worker processes are notified about new connections (the default in NGINX 1.11.3 and later, and NGINX Plus R10 and later). If enabled, worker processes accept new connections by turns.

We recommend keeping the default value (off) unless you have extensive knowledge of your app’s performance and the opportunity to test under a variety of conditions, but it can lead to inefficient use of system resources if the volume of new connections is low. Changing the value to on might be beneficial under some high loads.
keepalive 128 – Enables keepalive connections from NGINX Plus to upstream servers, defining the maximum number of idle keepalive connections preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed. Without keepalives, there is more overhead and both connections and ephemeral ports are not used efficiently.

For HTTP traffic, when you include this directive in an upstream block you must also include the following directives in the configuration so that they apply to all location blocks that proxy traffic to that upstream group (you can place them in individual such location blocks, in the parent server blocks, or at the http level):
- proxy_http_version 1.1 – NGINX Plus uses HTTP/1.1 for proxied requests
- proxy_set_header Connection "" – NGINX Plus strips any Connection headers from the proxied request
multi_accept off – A worker process accepts one new connection at a time (the default). If enabled, a worker process accepts all new connections at once.

We recommend keeping the default value (off), unless you’re sure there’s a benefit to changing it. Start performance testing with the default value to better measure predictable scale.
proxy_buffering on – NGINX Plus receives a response from the proxied server as soon as possible, and buffers it (the default). If disabled, NGINX Plus passes the response to the client synchronously, as soon as it is received, which increases the load on NGINX Plus.

Disabling response buffering is necessary only for applications that need immediate access to the data stream.
listen 80 reuseport – Enables port sharding, which means an individual listening socket is created for each worker process (using the SO_REUSEPORT socket option), which allows the kernel to distribute incoming connections among worker processes. For details, see Socket Sharding in NGINX Release 1.9.1 on our blog.

Logging

Logging is an important tool for managing and auditing your system. Logging large amounts of data, and storing large logs, can strain system resources, but we recommend that you disable logging only in very specific cases or for performance troubleshooting.

access_log off – Disables access logging.
access_log /path/to/access.log main buffer=16k – Enables buffering to access logs.

You may benefit from a centralized logging system based on the syslog protocol, available from many open source projects and commercial vendors. If you need metrics (which aggregate information initially recorded in logs) for NGINX and NGINX Plus servers, you can use NGINX Amplify.

Thread Pooling

Thread pooling consists of a task queue and a number of threads that handle the queue. When a worker process needs to do a potentially long operation, instead of processing the operation by itself, it puts a task in the pool’s queue, from which it can be taken and processed by any free thread.

To enabling thread pooling, include the aio threads directive. Note that the way thread pools are managed can be affected by other buffer‑related configuration settings. For complete information on tweaking other settings to support thread pooling, see our blog.

CPU Affinity

CPU affinity is used to control which CPUs NGINX Plus utilizes for individual worker processes (for background information, see the reference documentation for worker_cpu_affinity).

In most cases, we recommend the default auto parameter worker_processes directive; it sets the number of worker processes to match the number of available CPU cores.

However, when NGINX Plus is running in a containerized environment such as Docker, a system admin might chose to assign fewer cores to the container that are available on the host machine. In this case, NGINX Plus detects the number available on the host, and rotates workers among the cores that are actually available within that container. In that case, reduce the number of workers by setting worker_processes to the number of cores available in the container.

Testing CPU Affinity

It’s always best to load NGINX Plus with traffic similar to your production traffic. However, for basic testing, you can use a load generator such as wrk, as described here.

Load NGINX Plus with a quick wrk session:

# wrk -t 1 -c 50 -d 20s http://localhost/1k.bin

If necessary, you can create a simple 1k.bin file for testing with:

# dd if=/dev/zero of=1kb.bin bs=1024 count=1

Run top in CPU view mode (by pressing 1 after top starts).

You can repeat the test with different numbers of processes and affinity bindings to see the linear scale. That’s an effective way to set the access limit to the appropriate subset of available cores.

Sizing Recommendations

Here’s a very rough sizing approximation for general web serving and load balancing. The values might not be as approriate for VOD streaming or CDNs.

CPU

Allocate 1 CPU core per 1–2 Gbps of unencrypted traffic.

Small (1–2 KB) responses and 1 response per connection increase CPU load.

RAM

Allocate 1 GB for OS and other general needs.

The rest is divided among NGINX Plus buffers, socket buffers, and virtual memory cache, with a rough estimate of 1 MB per connection.

Details

proxy_buffers (per connection)
proxy_buffers size should be chosen to avoid disk i/o. If response size is larger than (proxy_buffers size + proxy_buffer_size) the response may be written to disk, thus increasing I/O, response time, etc.

Shared Memory Zones

On the surface, zones are used to store data shared by multiple upstream servers, such as status, metrics, cookies, healthchecks, etc.

Zones can also affect how NGINX Plus distributes load between various components such as worker processes, however. For full documentation on what components a zone stores and effects, please refer to the NGINX Plus Admin Guide.

It’s not possible to prescribe exact settings because usage patterns differso widely. Each feature, such as session persistence with the sticky directive, health checks, or DNS re‑resolving affects the zone size. As an example, with the sticky route session‑persistence method and a single health check enabled, a 256‑KB zone can accommodate information about the indicated number of upstream servers:

128 servers (each defined as an IP‑address:port pair)
88 servers (each defined as hostname:port pair where the hostname resolves to a single IP address)
12 servers (each defined as hostname:port pair where the hostname resolves to multiple IP addresses)

When creating zones, it’s important to note that the shared memory area is controlled by the name of the zone. If you use the same name for all zones, then all data from all upstreams will be stored in that zone. In this case, the size may be exceeded.

Disk I/O

The limiting factor for disk I/O is a number of I/O operations per second (iops).

NGINX Plus depends on disk I/O and iops for a number of functions, including logging and caching.