Using NGINX as an Accelerating Proxy for HTTP Servers


Have you ever benchmarked a server in the lab and then deployed it for real traffic, only to find that it can’t achieve anything close to the benchmark performance? CPU utilization is low and there are plenty of free resources, but clients complain of slow response times and you can’t figure out how to get better utilization from the server?

What you’re observing is one effect of “HTTP Heavy Lifting”. In this blog post, we’ll investigate how HTTP operates and how common HTTP servers process HTTP transactions. We’ll look at some of the performance problems that can occur, and see how NGINX’s event‑driven model makes it a very effective accelerating proxy for these HTTP servers. With NGINX, you can transform your real‑world performance so it’s back to the level of your local benchmarks.

To learn how NGINX can improve speed and scalability of your applications, read our blog post Tuning NGINX for Performance for a breakdown of configurations.

An Introduction to HTTP and Keepalive Connections

HTTP keepalive connections are a necessary performance feature that reduce latency and allow web pages to load faster.

HTTP is a simple, text‑based protocol. If you’ve not done so before, take a look at the output from an HTTP debugging tool such as the one in your web browser, and check out the standard request and response structure:

developer tools screenshot

In its simplest implementation, an HTTP client creates a new TCP connection to the destination server, writes the request, and receives the response. The server then closes the TCP connection to release resources.

This mode of operation can be very inefficient, particularly for complex web pages with a large number of elements or when network links are slow. Creating a new TCP connection requires a ‘three‑way handshake’, and tearing it down also involves a two‑way shutdown procedure. Repeatedly creating and closing TCP connections, one for each message, is akin to hanging up and redialing a phone call between each exchange in a conversation.

HTTP uses a mechanism called keepalive connections to hold open the TCP connection between the client and the server after an HTTP transaction has completed. If the client needs to conduct another HTTP transaction, it can use the idle keepalive connection rather than creating a new TCP connection.

Clients generally open a number of simultaneous TCP connections to a server and conduct keepalive transactions across them all. These connections are held open until either the client or the server decides they are no longer needed, generally as a result of an idle timeout.

Modern web browsers typically open 6 to 8 keepalive connections and hold them open for several minutes before timing them out. Web servers may be configured to time these connections out and close them sooner.

What is the Effect of Keepalives on the HTTP Server?

If lots of clients use HTTP keepalives and the web server has a concurrency limit or scalability problem, then performance plummets once that limit is reached.

The approach above is designed to give the best possible performance for an individual client. Unfortunately, in a ‘tragedy of the commons’‑like scenario, if all clients operate in this way, it can have a detrimental effect on the performance of many common web servers and web applications.

The reason is that many servers have a fixed concurrency limit. For example, in common configurations, the Apache HTTP Server can only process 150 (with the worker multiprocessing module [MPM]) or 256 (with the prefork MPM) concurrent TCP connections. Each idle HTTP keepalive connection consumes one of these concurrency slots, and once all of the slots are occupied, the server cannot accept any more HTTP connections.

Conventional wisdom says to turn off keepalives on the web server, or limit them to a very short timeout. They provide a very simple vector for the Keep-Dead and Slowloris denial-of-service attacks (for a quick solution, see Protecting against Keep-Dead Denial of Service at

Furthermore, these web and application servers typically allocate an operating system thread or process for each connection. A TCP connection is a very lightweight operating system object, but a thread or process is very heavyweight. Threads and processes require memory, they must be actively managed by the operating system, and ‘context switching’ between threads or processes consumes CPU. Assigning each connection its own thread or process is hugely inefficient.

The large number of concurrent client connections and the assignment of a thread or process to each connection produces the phenomenon known as “HTTP Heavy Lifting” – a disproportionately large effort is required to process a lightweight HTTP transaction.

What Does This Mean in Practice?

It does not take many clients to exhaust the concurrency limit in many contemporary web and application servers.

If a client opens 8 TCP connections, and keeps them alive for 15 seconds after they are needed, the client consumes 8 concurrency slots for 15 seconds. If clients arrive at your website at the rate of 1 per second, 120 concurrency slots are continually occupied by idle keepalive connections. If the rate is 2 clients per second, 240 concurrency slots are occupied. Once the slots are exhausted, clients can no longer connect until the current connections time out.

This can result in very uneven levels of service. Clients who successfully acquire a keepalive connection can browse your service at will. Clients who are locked out have to wait in a queue.

Why Do You Not See These Effects During Benchmark Testing?

These problems only manifest themselves in slow networks with many clients. They don’t appear when benchmarking with a single client over a fast local network.

There are a couple of reasons why you may not see these effects in a benchmark.

Note that most benchmark tools only report on successful transactions. Connections that are stalled because of resource exhaustion may not be reported, or may only appear to be a tiny fraction of the successful connections. This conceals the true nature of the problem with real‑world traffic.

How Common Is The Problem?

Any thread‑ or process‑based web or application server is vulnerable to concurrency limitations.

This problem is inherent to any web or application platform that assigns a thread or process to each connection. It’s not easy to detect in an optimized benchmark environment, but it manifests itself as poor performance and poor CPU utilization in a real‑world environment.

There are several measures you can take to address this problem:

Use NGINX as an Accelerating HTTP Proxy

NGINX uses a different architecture that does not suffer from the concurrency problems described above. It transforms slow client connections to optimized benchmark‑like connections to extract the best performance from your servers.

NGINX uses a highly efficient event‑driven model to manage connections.

Each NGINX process can handle multiple connections at the same time. When a new connection is accepted, the overhead is very low (a new file descriptor and a new event to poll for), unlike the per‑process or per‑thread model described above. NGINX has a very effective event loop:

This allows each NGINX process to easily scale to tens or thousands (or hundreds of thousands) of connections simultaneously.

NGINX then proxies these requests to the upstream server. When it does so, NGINX uses a local pool of keepalive connections. There’s no TCP open or close overhead, and the TCP stacks quickly adapt to the optimal window size and retry parameters. Writing requests and reading responses is much more rapid over the local, optimized network:


The net effect is that the upstream server finds itself talking to a single local client over a fast network; a client that makes optimal use of HTTP keepalive connections to minimize connection setup without holding connections open unnecessarily. This puts the server back into its optimal, benchmark‑like environment.

With NGINX acting as an HTTP proxy, you see:

Other Ways NGINX Can Accelerate Services

Removing the burden of HTTP heavy lifting is only one of the performance‑transforming measures that NGINX can bring to bear on your overloaded application infrastructure.

NGINX’s HTTP‑caching feature can cache responses from the upstream servers, following the standard cache semantics to control what is cached and for how long. If several clients request the same resource, NGINX can respond from its cache and not burden upstream servers with duplicate requests.

NGINX can also offload other operations from the upstream server. You can offload compression to reduce bandwidth, centralize SSL decryption, perform initial authentication, and apply all manner of rules to rate limit traffic when necessary.

Not Your Typical Load Balancer or ADC

Finally, do not forget that unlike other accelerating proxies, load balancers or application delivery controllers (ADCs), NGINX is also a full web server. You can use NGINX to serve static content, execute PHP, Java, Ruby, Python and other applications, deliver media (audio and video), integrate with authentication and security systems, and even respond to transactions directly using rules embedded in the NGINX configuration.

With no built‑in performance limitations, NGINX and NGINX Plus take full advantage of the hardware you deploy it on, now and in the future.

To try NGINX Plus, start your free 30-day trial today or contact us for a demo.

Retrieved by Nick Shadrin from website.