Nginx – Configuring Reverse Proxy


To totally unlock this section you need to Log-in


Login

A reverse proxy is an in-the-middle proxy service which takes a client request, passes it on to one or more servers, and subsequently delivers the server’s response to the client: basically the communication will be only between the proxy and the client, there will be no direct traffic between these two endpoints.

NOTE: during implementation you could prefer to have a reverse proxy separate from the other server systems (for security and availability for other web applications, but you could run Nginx directly with other web servers, like Apache, on the same server.

A common reverse proxy configuration on Linux systems is to put Nginx in front of an Apache web server.

Now we will see which are the main benefits to setting up an Nginx reverse proxy in front of a public web server:

  • Load Balancing: a Nginx reverse proxy can perform load balancing between several servers and this will usually help distribute client requests evenly across them providing excellent performances. This kind of configuration will avoid the scenario where a particular server becomes overloaded due to a sudden spike in requests (DDoS attacks for example). In terms of failovering , load balancing give also redundancy and availability in case of server failure, as the reverse proxy will simply re-route requests to a different server.
  • Increased Security: A reverse proxy also acts as a line of defense for your backend servers. Configuring a reverse proxy ensures that the identity of your backend servers remains unknown. This can greatly help in protecting your servers from attacks such as DDoS for example.
  • Better Performance: Nginx has been known to perform better in delivering static content over Apache. Therefore with an Nginx reverse proxy, all client requests can be handled by Nginx while all requests for dynamic content can be passed on to the backend Apache server. This helps improve performance by optimizing the delivery of assets based on their type. Additionally, reverse proxies can also be used to serve cached content and perform SSL encryption to take a load off the web server(s).
  • Easy Logging and Auditing: Since there is only one single point of access when a reverse proxy is implemented, this makes logging and auditing much simpler. Using this method, you can easily monitor what goes in and out through the reverse proxy.

Install Nginx

These steps install NGINX Mainline on Ubuntu/Debian from NGINX Inc’s official repository.

We need to open /etc/apt/sources.list file in a text editor (nano, vi, vim, etc.) and add the following line to the bottom. Replace CODENAME in this example with the codename of your Ubuntu release.

For example, for Ubuntu 18.04, named Bionic Beaver, insert bionic in place of CODENAME below:

/etc/apt/sources.list

deb http://nginx.org/packages/mainline/ubuntu/ CODENAME nginx

Import the repository’s package signing key and add it to apt:

sudo wget http://nginx.org/keys/nginx_signing.key
sudo apt-key add nginx_signing.key

Install now the Nginx package:

sudo apt update
sudo apt install nginx

Ensure Nginx is running and and enabled to start automatically on reboot:

sudo systemctl start nginx
sudo systemctl enable nginx

Basic Configuration

The following procedure will enable your web portal (web site), on a specific port (not 80), reachable through the Nginx reverse proxy service.

Let's begin by creating a configuration file for the app in /etc/nginx/conf.d/. Replace the example.com domain in this example with your app’s domain or public IP address:

/etc/nginx/conf.d/webapp.conf

server {
  listen 80;
  listen [::]:80;

  server_name example.com;

  location / {
      proxy_pass http://localhost:3000/;
  }
}

The proxy_pass directive is what makes this configuration a reverse proxy. It specifies that all requests which match the location block (in this case the root / path) should be forwarded to port 3000 on localhost, where the web app or portal is running.

Disable or delete the default Welcome to Nginx page:

sudo mv /etc/nginx/conf.d/default.conf /etc/nginx/conf.d/default.conf.disabled

Test the configuration:

sudo nginx -t

If no errors are reported, reload the new configuration:

sudo nginx -s reload

In a browser, navigate to your public IP address or public FQDN domain. You should see the web page or portal displayed (even if your application is running on a different logical port).

For a simple app or web portal, the proxy_pass directive should be sufficient. However, more complex apps may need additional directives. For example, for apps that require a lot of real-time interactions we should disable NGINX’s buffering feature:

/etc/nginx/conf.d/webapp.conf

location / {
    proxy_pass http://localhost:3000/;
    proxy_buffering off;
}

You can also modify or add the headers that are forwarded along with the proxied requests with proxy_set_header:

/etc/nginx/conf.d/nodeapp.conf

location / {
    proxy_pass http://localhost:3000/;
    proxy_set_header X-Real-IP $remote_addr;
}

This configuration uses the built-in $remote_addr variable to send the IP address of the original client to the proxy host.

The proxy_set_header is a statement that should be always considered in reverse proxying; the following list will explain which are the common ones:

  • The X-Real-IP is set to the IP address of the client so that the proxy can correctly make decisions or log based on this information.
  • The X-Forwarded-For header is a list containing the IP addresses of every server the client has been proxied through up to this point. In the example above, we set this to the $proxy_add_x_forwarded_for variable. This variable takes the value of the original X-Forwarded-For header retrieved from the client and adds the Nginx server's IP address to the end.
  • The X-Forwarded-Proto header gives the proxied server information about the schema of the original client request (whether it was an http or an https request).
  • The HOST header, set to the $host variable, should contain information about the original host being requested.

Another important directive to use in a reverse proxy configuration could be the proxy_http_version: this directive will let you, if version 1.1 is used, for use with keepalive connections and NTLM authentication.

Configuring a Proxy Cache

To speed up the delivering of web pages to hundreds or thousands of concurrent users, we can enable our reverse proxy to provide a "copy" of an already requested page to the other remote users without collapsing our backend server with multiple requests (potentially all the same). The schema of the cache approach is the following:

Nginx - Configuring Reverse Proxy

To set up a cache to use for proxied content, we can use the proxy_cache_path directive. This will create an area where data returned from the proxied servers can be kept. The proxy_cache_path directive must be set in the http context.

So, open /etc/nginx/nginx.conf using your favorite text editor and add the following lines right under the http definition:

proxy_cache_path /var/www/cache levels=1:2 keys_zone=my-cache:8m max_size=1000m inactive=600m;
proxy_temp_path /var/www/cache/tmp;
real_ip_header X-Forwarded-For;

The first two lines in this example will create a cache directory. The real X-Forwarded-For header instructs Nginx to forward the original IP address to the backend server (for example on port 8080) or else all traffic would seem coming from the same IP address.

The Proxy Cache

The following directives can be placed on http block:

proxy_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=backcache:8m max_size=50m;
proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;

With the proxy_cache_path directive, we have have defined a directory on the filesystem where we would like to store our cache. In this example, we've chosen the /var/lib/nginx/cache directory. If this directory does not exist, you can create it with the correct permission and ownership by typing:

sudo mkdir -p /var/lib/nginx/cache
sudo chown www-data /var/lib/nginx/cache
sudo chmod 700 /var/lib/nginx/cache

For a basic caching are needed only two directives: proxy_cache_path and proxy_cache. The proxy_cache_path directive sets the path and configuration of the cache, and the proxy_cache directive activates it.

The proxy_cache_valid directive enforces an expiration for the cached data. NGINX does not cache files that have no expiration. In the following example we will see that we will cache all the pages returning HTTP 200 (status OK) or 302 (temporary redirect) for 10 minutes, those returning HTTP 301 (permanent redirect) for an entire hour, and anything else (such as 404, error 500 and so on) for one minute.

proxy_cache_valid 200 302 10m;
proxy_cache_valid 301 1h;
proxy_cache_valid any 1m;

Some useful directives for some PHP projects that will use PHP Sessions and will be placed with an Nginx reverse proxy will be the following (to manage multiple PHP sessions and to avoid info leakage):

  • proxy_no_cache $cookie_PHPSESSID forbids the reverse cache proxy from caching requests that have a PHPSESSID cookie. Or else you will end up with your logged in users’ pages cached and displayed to other people. If you’re using a PHP framework that uses a cookie name other than the default PHPSESSID for cookies, make sure to replace it.
  • proxy_cache_bypass $cookie_PHPSESSID instructs the proxy to bypass the cache and forwards the request to the backend if the incoming request contains a PHPSESSID cookie. Or else you’ll end up showing logged in users, the logged out version (served from the cache).

NOTE: remember that a PHP Session involves setting a cookie specifically called PHPSESSID with a unique identification string as the value. A common example would be storing shopping cart data, recently viewed items, or an authentication across multiple pages.

To be sure that Nginx will not cache, as expected, cookies and other user-related info that will be user by the web portal to show proper user-related data, we could even add the following directives to our configuration:

# We ignore the Set-Cookie header
proxy_ignore_headers Set-Cookie;

# We hit origin every time a X-No-Cache custom header is set proxy_no_cache $http_x_no_cache

In this main example, in the following directive we have omitted the temp folder (a directory for storing temporary files with data received from proxied servers), in the following example all temporary files will be put directly in the directory specified (by default).

proxy_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=backcache:8m max_size=50m inactive=60m;

To specify another folder in which save all temporary data received by the reverse proxy you will have to use the following directive. By default, if you are not using the use_temp_path parameter in the proxy_cache_path directive (equivalent to use_temp_path=off) the cache directory will hold also temporary data.

So, to use a temp folder you will need to modify the previous directive and add the proxy_temp_path directive and specify the target folder on your filesystem, for example:

proxy_cache_path /var/lib/nginx/cache use_temp_path=on levels=1:2 keys_zone=backcache:8m max_size=50m inactive=60m;

proxy_temp_path /var/cache/nginx/tmp;

Finally, let's see how it is structured the proxy_cache_path directive in each part. Using the above proxy_cache_path example:

  • The local disk directory in which will reside the cache is called /var/lib/nginx/cache.
  • levels=1:2 is a parameter that set up a two‑level directory hierarchy under /var/lib/nginx/cache/. Having a large number of files in a single directory can slow down file access, so the usual recommendation is a two‑level directory hierarchy for most deployments. If the levels parameter is omitted, Nginx will put all files in the same directory, in this case /var/lib/nginx/cache.
  • The keys_zone parameter basically defines the name and size of the shared memory zone that is used to store metadata about cached items but also usage timers. Having a copy of the keys in memory enables Nginx to quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up the check. For example a 1‑MB zone can store data for about 8,000 keys while for 10‑MB zone will be about 80,000 keys.
  • The max_size parameter sets the upper limit of the size of the cache, so in the example will be 50 megabytes. This parameter is optional; not specifying a value allows the cache to grow to use all available disk space. When the cache size reaches the limit, an Nginx process called the cache manager removes the files that were least recently used to bring the cache size back under the limit.
  • The inactive parameter specifies how long an item can remain in the cache without being accessed. In this example, a file that has not been requested for 60 minutes is automatically deleted from the cache by the cache manager process, regardless of whether or not it has expired. The default value is 10 minutes (10m). Inactive content differs from expired content. Nginx does not automatically delete content that has expired as defined by a cache control header (Cache-Control:max-age=120 for example). Expired (stale) content is deleted only when it has not been accessed for the time specified by inactive. When expired content is accessed, Nginx refreshes it from the origin server and resets the inactive timer.

Nginx first writes files that are destined for the cache to a temporary storage area, and the use_temp_path=off directive instructs Nginx to write them to the same directories where they will be cached. We recommend that you set this parameter to off to avoid unnecessary copying of data between file systems. The use_temp_path parameter was introduced in NGINX version 1.7.10 and NGINX Plus R6.

To give more "intelligence" to our Nginx reverse proxy we could decide to instruct it to not cache every request received by users the first time they appear, but after it receives them after a fixed number of times; to define the minimum number of times that a request with the same key must be made before the response is cached, include the proxy_cache_min_uses directive:

proxy_cache_min_uses 5;

A quick benchmark

Nginx Reverse Proxy can be measured by using the apache bench utility. We will be able to see, by reviewing the output that serving content, for Nginx, from the cache is a much easier task than spawning PHP processes, interpreting PHP libraries and executing bytecode. The following command, using ab (apache bench), is used to simulate a total of 1000 requests, in block of 100 concurrents requests) against the web portal using the reverse proxy URL, http://120.0.0.1:80:

ab -n 1000 -c 100 http://120.0.0.1:80

Nginx - Configuring Reverse Proxy

On the other hand, the following command will simulate the same of the above command, but using directly the URL of our web portal, http://120.0.0.1:8080, so not using the Nginx reverse proxy (in this case the proxy avoid the exposition of the 8080 port definition outside our LAN):

ab -n 1000 -c 100 http://120.0.0.1:8080

Nginx - Configuring Reverse Proxy

While reviewing the key components marked in red, we can see that Nginx takes 0.2 seconds to run 1000 requests on port 80 compared to 2.5 seconds on port 8080, so 12.5 times faster.

With only 23ms time per request on port 80 compared to 252ms on port 8080, so 10.9 times faster, means that for each page request Nginx took only 2,52ms (mean) to answer (very low).