Tuesday, December 08, 2009

Memcached Cache Invalidation Made Easy

There are only two hard problems in computer science - cache invalidation, and naming things
Phil Karlton

It's an oft-quoted truism that brings a knowing smile to most hardened programmers, but it's oft-quoted precisely because it's true - and during a recent enforced rush job to implement a cache, I came across a nifty solution to the first problem by judicious use of the second.

First, the problem - someone posted Cragwag on StumbleUpon, which led to an immediate spike in traffic on top of the slow increase I've been getting since I made it Tweet the latest news. All the optimisation work that I knew I needed to do at some point was more than a few hours work, and I had to get something out quickly - enter memcached.

Memcached is a simple, distributed-memory caching server that basically stores whatever data you give it in memory, associated with a given key. Rails has a built-in client that you can use simply as follows:

my_data = Cache.get(key) do {
... do stuff to generate data

If the cache has an entry for the given key, it will return it straight from the cache. If not, the block will be called, and whatever is returned from the block will be cached with that key.

So far so good - but what exactly should you cache, and how should you do it?

The Complicated Way To Do It

A common pattern is to cache ActiveRecord objects, say by wrapping the finder method in a cache call, and generating a key of the class name and primary key. But this only works for single objects, which are usually pretty quick to retrieve anyway, and is no use for the more expensive queries, such as lists of objects plus related objects and metadata, or - often particularly slow - searches.

So you could extend that simple mechanism to cache lists of objects and search results, say by using the method name and the given parameters. But then you have an all-new headache - an object might be cached in many different collections, so how do you know which cache keys to purge? You have two options:

  • Try and keep track of which cache keys are caching which objects? Eep - that's starting to sound nasty - you're effectively creating a meta-index of cached entries and keys, which would almost certainly be comparable in size to your actual cache... and where's that index going to live and how are you going to make sure that it's faster to search this potentially large and complex index than to just hit the damn database?

  • Sidestep the invalidation problem by invalidating the entire cache whenever data is updated. This is much simpler, but there doesn't seem to be a "purge all" method - so you'd need to keep track of what keys are generated somewhere, then loop round them and delete them individually. You could do this with, say, an ActiveRecord class and delete the cache keys on a destroy_all - but still, that's icky.

The Easy Way To Do It

After a few minutes Googling, I found this post on the way Shopify have approached it, and suddenly it all became clear. You can solve the problem of Cache Invalidation by being cunning about Naming Things - in particular, your cache keys.

The idea is very simple - Be Specific about exactly what you're caching. Read that post for more details, or read on for how I've done it.

So I ripped out all of my increasingly-over-complicated caching code from the model, and went for a simple approach of caching the generated html in the controllers. At the start of each request, in a before_filter, I have one database hit - load the current CacheVersion - which just retrieves one integer from a table with only one record. Super fast - and if the data is cached, that's the only db hit for the whole request.

The current cache version number is stored as an instance variable of the application controller, and prepended to all cache keys. The rest of the key is generated from the controller name, the action, and a string constructed out of the passed parameters. Any model methods that aren't just simple retrievals but affect data, can just bump up the current cache version, and hey presto - everything then gets refreshed on next hit, and the old version just gets expired on the least-recently-used-goes-first rule.

This has a few very nice architectural benefits:

  • The caching code is then in the "right" place - in the bit you want to speed up - i.e. the interface
  • You also eliminate the overhead of rendering any complicated views - you just grab the html (or xml, or json) straight from the cache and spit it back.
  • It utilises, and fits in with, one of the fundamental ideas of resource-based IA - that the URL (including the query string) should uniquely identify the resource(s) requested
  • The application controller gives you a nice central place to generate your keys
  • If you have to display different data to users, no problem - just put the user id as part of the key.
  • Rails conveniently puts the controller and action names into the params hash, so your cache key generation is very simple
  • The admin interface can then easily work off up-to-date data
  • You can also provide an admin "Clear the cache" button that just has to bump up the current cache version number.

Etc etc - I could go on, but I won't. The net result is that pages which used to take several seconds to render now take just a few milliseconds, it's much much simpler and more elegant this way, and if you're not convinced by now, just give it a try. <mrsdoyle>Go on - ah go on now, ah you will now, won't you Father?</mrsdoyle>


class CacheVersion < ActiveRecord::Base
def self.current
CacheVersion.find(:last) || CacheVersion.new(:version=>0)

def self.increment
cv = current
cv.version = cv.version + 1


require 'memcache_util'

class ApplicationController < ActionController::Base
# load the current cache_version from the db
# this is used to enable easy memcache "expiration"
# by simply bumping up the current version whenever data changes
include Cache
before_filter :get_current_cache_version


def cache_key
"#{@cache_version.version}_#{params.sort.to_s.gsub(/ /, '_')}"

def get_current_cache_version
@cache_version = CacheVersion.current

def with_cache( &block )
@content, @content_type = Cache.get(cache_key) do
[@content, @content_type]
render :text=>@content, :content_type=>(@content_type||"text/html")

in your actual controller:

  def index 
with_cache {
# get data
# NOTE: you must render to string and store it in @content
respond_to do |format|
format.html {
@content = render_to_string :action => "index", :layout => "application"
format.xml {
@content_type = "text/xml"
@content = render_to_string :xml => @whatever, :layout=> false