Saturday, 19 October 2013

Cache HTTP reverse proxy using Varnish on Ubuntu with Drupal 7


Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.

Varnish speed up Drupal sites through caching. In a single-server setup, Varnish sits at port 80 (rather than Apache) and calls out to Apache only if the request URL isn't cached or if another set of conditions are met. Those sets of conditions are usually spelled out in the default.vcl configuration file.

In order to set up Varnish on my latest Drupal 7 project, I needed to put together documentation from a lot of different places.

Varnish installation

To install the latest version of Varnish using the varnish-cache.org repository, do the following

    1. curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add -
    2. echo "deb http://repo.varnish-cache.org/ubuntu/ precise varnish-3.0" | sudo tee -a /etc/apt/sources.list
    3. sudo apt-get update
    4. sudo apt-get install varnish

Apache configuration

  1. Change port 80 to port 8080 in /etc/apache2/ports.conf
  2. Change port 80 to port 8080 in /etc/apache2/sites-available/default
  3. Make a Drupal directory for Varnish ( mkdir -p /var/lib/varnish/drupal; chown varnish.varnish /var/lib/varnish/drupal)

Varnish configuration

Start by edit the default configuration at /etc/default/varnish, you should notice that Varnish only handles http request not https.
uncommented bits match the following,

START=yes
NFILES=131072
MEMLOCK=82000

INSTANCE=drupal
DAEMON_OPTS="-a :80\
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=4 \
             -p thread_pool_min=2 \
             -p thread_pool_max=4000 \
             -p session_linger=50 \
             -p sess_workspace=262144 \
             -s malloc,1G"
VARNISH_LISTEN_PORT=80

This configures Varnish to listen on port 80 (http) for requests and use 768 Mb of memory for caching, on my system which has 4G of RAM (Varnish should only be allowed to work with 25% of system memory).

The default.vcl (which is usually located at /etc/varnish/) should be edited like so:


For more in-depth information about the Varnish Configuration Language (vcl) options and the documentation at www.varnish-cache.org/docs 

If you don't want to use basic authentication just comment out 54-57 and 184-187.
Assuming that you do, though, note that the hash on line 55 should be replaced with one for your own username:password. You can generate one by entering the following at the command line:

$ echo username:password | base64

If you haven't set up basic authentication in Apache, you can quickly do this by running the following:

$ htpasswd -c /home/secure/.htpasswd bob

Then edit the .htaccess file in your Drupal directory and add the following lines to the top:

AuthName "Secure Area"
AuthType Basic
AuthUserFile /home/secure/.htpasswd
require valid-user

Replace "/home/secure" with whatever directory you'd like to store your credentials in (but it better be secure!).

Drupal configuration

We need to tell Drupal that it's behind a proxy/caching server or it will not set the correct headers and Varnish will not cache pages. You should add the lines below to your settings.php.

// Tell Drupal it's behind a proxy.
$conf['reverse_proxy'] = TRUE;
 
// Tell Drupal what addresses the proxy server(s) use.
$conf['reverse_proxy_addresses'] = array('127.0.0.1');
 
// Bypass Drupal bootstrap for anonymous users so that Drupal sets max-age < 0.
$conf['page_cache_invoke_hooks'] = FALSE;
 
// Make sure that page cache is enabled.
$conf['cache'] = 1;
$conf['cache_lifetime'] = 0;
$conf['page_cache_maximum_age'] = 21600;

Installing the Varnish integration module for Drupal

I usually use drush for stuff like this.

$ drush dl varnish
$ drush en varnish -y
$ sudo cat /etc/varnish/secret


Go to the varnish configuration screen (/admin/config/development/varnish) and enter the following values
Field Value
Flush page cache on cron? Disabled
Varnish version 3.x
Varnish Control Terminal 127.0.0.1:6082
Varnish Control Key Use the varnish "secret" that was just printed by the last command on the Varnish module's configuration screen.
Varnish connection timeout (milliseconds) 100
Varnish Cache Clearing Drupal Default
Varnish determines what should and shouldn't be cached based on Drupal's response headers, which in turn are controlled by Drupal's own native caching system. So, in order to really get Varnish caching working you also have to turn on Drupal's own page caching. Weird right?
To do this, go to Performance (/admin/config/development/performance) and switch on Drupal's page caching by turning on the checkbox "Cache pages for anonymous users". The other settings are really up to you and your particular needs, but for my sites I like to use a minimum cache lifetime of 3 minutes and a maximum cache lifetime of 15 minutes. Or, if you like tables (and really, who doesn't?):
Field Value
Cache pages for anonymous users [yes]
Minimum cache lifetime 3 min
Maximum cache lifetime 15 min

When all that is done, run the following at the command line to restart Varnish and Apache:

$ /etc/init.d/varnish restart
$ /etc/init.d/apache2 restart

If you look at your site on Firefox with Firebug's Net window, you should see something like this in the main page's response header:

Via 1.1 varnish
X-Cacheable YES
X-Drupal-Cache MISS
X-Generator Drupal 7 (http://drupal.org)
X-Powered-By PHP/5.3.10-1ubuntu3.4
X-Varnish 1844931166 1844930927
X-Varnish-Cache HIT
X-Varnish-Hits 1

If you see that you're in business! Your site will now be better protected from huge influxes of traffic, even on a single server setup.

Helpful Varnish commands

Varnish comes with a range of different utility programs, that can help keeping an eye on the performance of Varnish and help resolve caching issues.

Varnish administration

At some point you will have to flush the cache and the commando below will do that. You can change "." to match the path that you which to flush from the cache,

~$ varnishadm -T 127.0.0.1:6082 url.purge "."

Varnish history

The Varnish history command shows the distribution of the last N requests processing. Hits are marked with a pipe character ("|"), and misses are marked with a hash character ("#").

~$ varnishhist

Varnish top

The command below shows a list over the top most requests sent to the back-end, hence the URLs that are not in the cache.

~$ varnishtop -b -i TxURL