Tuesday, December 08, 2009

Memcached Cache Invalidation Made Easy

There are only two hard problems in computer science - cache invalidation, and naming things
Phil Karlton

It's an oft-quoted truism that brings a knowing smile to most hardened programmers, but it's oft-quoted precisely because it's true - and during a recent enforced rush job to implement a cache, I came across a nifty solution to the first problem by judicious use of the second.

First, the problem - someone posted Cragwag on StumbleUpon, which led to an immediate spike in traffic on top of the slow increase I've been getting since I made it Tweet the latest news. All the optimisation work that I knew I needed to do at some point was more than a few hours work, and I had to get something out quickly - enter memcached.

Memcached is a simple, distributed-memory caching server that basically stores whatever data you give it in memory, associated with a given key. Rails has a built-in client that you can use simply as follows:

my_data = Cache.get(key) do {
... do stuff to generate data

If the cache has an entry for the given key, it will return it straight from the cache. If not, the block will be called, and whatever is returned from the block will be cached with that key.

So far so good - but what exactly should you cache, and how should you do it?

The Complicated Way To Do It

A common pattern is to cache ActiveRecord objects, say by wrapping the finder method in a cache call, and generating a key of the class name and primary key. But this only works for single objects, which are usually pretty quick to retrieve anyway, and is no use for the more expensive queries, such as lists of objects plus related objects and metadata, or - often particularly slow - searches.

So you could extend that simple mechanism to cache lists of objects and search results, say by using the method name and the given parameters. But then you have an all-new headache - an object might be cached in many different collections, so how do you know which cache keys to purge? You have two options:

  • Try and keep track of which cache keys are caching which objects? Eep - that's starting to sound nasty - you're effectively creating a meta-index of cached entries and keys, which would almost certainly be comparable in size to your actual cache... and where's that index going to live and how are you going to make sure that it's faster to search this potentially large and complex index than to just hit the damn database?

  • Sidestep the invalidation problem by invalidating the entire cache whenever data is updated. This is much simpler, but there doesn't seem to be a "purge all" method - so you'd need to keep track of what keys are generated somewhere, then loop round them and delete them individually. You could do this with, say, an ActiveRecord class and delete the cache keys on a destroy_all - but still, that's icky.

The Easy Way To Do It

After a few minutes Googling, I found this post on the way Shopify have approached it, and suddenly it all became clear. You can solve the problem of Cache Invalidation by being cunning about Naming Things - in particular, your cache keys.

The idea is very simple - Be Specific about exactly what you're caching. Read that post for more details, or read on for how I've done it.

So I ripped out all of my increasingly-over-complicated caching code from the model, and went for a simple approach of caching the generated html in the controllers. At the start of each request, in a before_filter, I have one database hit - load the current CacheVersion - which just retrieves one integer from a table with only one record. Super fast - and if the data is cached, that's the only db hit for the whole request.

The current cache version number is stored as an instance variable of the application controller, and prepended to all cache keys. The rest of the key is generated from the controller name, the action, and a string constructed out of the passed parameters. Any model methods that aren't just simple retrievals but affect data, can just bump up the current cache version, and hey presto - everything then gets refreshed on next hit, and the old version just gets expired on the least-recently-used-goes-first rule.

This has a few very nice architectural benefits:

  • The caching code is then in the "right" place - in the bit you want to speed up - i.e. the interface
  • You also eliminate the overhead of rendering any complicated views - you just grab the html (or xml, or json) straight from the cache and spit it back.
  • It utilises, and fits in with, one of the fundamental ideas of resource-based IA - that the URL (including the query string) should uniquely identify the resource(s) requested
  • The application controller gives you a nice central place to generate your keys
  • If you have to display different data to users, no problem - just put the user id as part of the key.
  • Rails conveniently puts the controller and action names into the params hash, so your cache key generation is very simple
  • The admin interface can then easily work off up-to-date data
  • You can also provide an admin "Clear the cache" button that just has to bump up the current cache version number.

Etc etc - I could go on, but I won't. The net result is that pages which used to take several seconds to render now take just a few milliseconds, it's much much simpler and more elegant this way, and if you're not convinced by now, just give it a try. <mrsdoyle>Go on - ah go on now, ah you will now, won't you Father?</mrsdoyle>


class CacheVersion < ActiveRecord::Base
def self.current
CacheVersion.find(:last) || CacheVersion.new(:version=>0)

def self.increment
cv = current
cv.version = cv.version + 1


require 'memcache_util'

class ApplicationController < ActionController::Base
# load the current cache_version from the db
# this is used to enable easy memcache "expiration"
# by simply bumping up the current version whenever data changes
include Cache
before_filter :get_current_cache_version


def cache_key
"#{@cache_version.version}_#{params.sort.to_s.gsub(/ /, '_')}"

def get_current_cache_version
@cache_version = CacheVersion.current

def with_cache( &block )
@content, @content_type = Cache.get(cache_key) do
[@content, @content_type]
render :text=>@content, :content_type=>(@content_type||"text/html")

in your actual controller:

  def index 
with_cache {
# get data
# NOTE: you must render to string and store it in @content
respond_to do |format|
format.html {
@content = render_to_string :action => "index", :layout => "application"
format.xml {
@content_type = "text/xml"
@content = render_to_string :xml => @whatever, :layout=> false

Monday, October 26, 2009

Specs failing with Daylight Saving Time change?

So I've been banging my head for the past hour or so, trying to work out why some of our specs have suddenly started failing without the code having been touched, and it comes down to an inconsistency in how Rails is handling UTC offsets when Time objects are manipulated:

>> Time.now
=> Mon Oct 26 15:55:20 0000 2009
>> Time.now.utc
=> Mon Oct 26 15:55:26 UTC 2009

OK, that's fine. Now let's try a calculation:

>> (Time.now - 1.day).utc
[Sun Oct 25 14:55:41 UTC 2009

What? Where did that one hour offset come from??

It turns out THAT is the source of all the specs which suddenly failed for no apparent reason this morning.

The good news is, it's easy to fix:

>> Time.now.utc - 1.day
=> Sun Oct 25 15:56:08 UTC 2009

MUCH better!

Ooh, we're in IDC's "10 most innovative software companies"!

Oooh, shiny! We just got named as one of IDC's 10 most innovative sub-$100m software companies to watch

I guess we scored more highly on the "Web 2.0-like functionality moves into the enterprise" category more than the other two, and by itself it might not mean much, but it's still great to be thought of in those terms. Makes me feel all warm and fuzzly.

Tuesday, October 13, 2009

Al's Ultimate Lasagne Recipe

OK, ok, so I've never posted a recipe on here before. BUT, pretty much everyone I've ever made this lasagne for has said wow, you've GOT to give me the recipe for that! - the most recent example being a guy who had spent years living in Italy, no less - and I can't sleep tonight, so here goes.

Last year I added some refinements from Heston Blumenthal's 'Perfect' bolognese, but mercifully this version takes about 3-4hrs, rather than 3 days. You can add whatever embellishments you like, here I'm just going to describe the basic sauce preparation.

You will need:


  • A heavy-bottomed frying pan (our Le Creuset pan was perfect for this) that can get really hot

  • A large sauce pot / casserole dish

  • A large lasagne dish - deep enough for at least 3 layers of sauce, plus about half an inch of bechamel sauce

  • A medium sauce pan for the bechamel sauce


NOTE: all quantities are approximate. Don't be the kind of cook who has to measure everything to the 3rd significant figure - taste often and see what you think it needs, it's much more fun!

  • About 1kg of lean minced beef (preferably organic, or at least free range - we like Waitrose's, and 2 of their 500g packs works nicely)

  • About 400g of lardons (again, 2 packs of the Waitrose Free-Range lardons work nicely)

  • 1 red onion, 1 white onion

  • 1 stick of celery

  • 1 or 2 carrots, depending on size - we're aiming for roughly equal quantities of diced carrot, red onion and celery, so adjust as needed

  • 3 tins of chopped tomatoes

  • 3 or 4 mushrooms - we like Portabellini
  • a bulb of garlic, the fresher the better

  • 2 large bay leaves

  • 2 large/4 small star anise

  • about 1tbsp Thai fish sauce (we like Squid Brand, which you should be able to get from any good Chinese food store)

  • about 1tbsp Lea & Perrins Worcester Sauce

  • about a third of a bottle of red wine

  • Maldon sea salt & freshly ground black pepper

  • A tablespoon of Marmite

  • Butter. Probably about 100g, maybe a bit more

  • Olive oil - it's best to NOT use extra virgin, you're going to be frying with it - but really, it doesn't make that much difference in the end

  • Lasagne sheets

  • A decent handful of basil leaves (MUST be fresh - dried is no good here)

  • A decent handful of fresh oregano (ditto)

  • The vine from some vine tomatoes (that's where most of the smell comes from, not the tomato itself)

  • For the bechamel sauce - maybe 50g of plain flour and about 1/3 pint of milk, plus about 50g of good mature cheddar (Cathedral City Extra Mature works well).

Preparation (20 mins)

  1. Dice the carrot, the red onion and the celery. We're aiming for equal quantities of each, in equal-sized pieces.

  2. Chop the white onion - these pieces don't need to be the same size as the previous lot, they can be bigger and rougher.

  3. Slice the mushrooms, so they're maybe half a centimetre thick

  4. Lightly crush and peel all the cloves of garlic. I use a good whack with my fist on top of a large knife on top the clove. It doesn't need to be obliterated, just kind of half-crushed, so the skin comes off easily.

  5. Take half of the semi-crushed garlic and chop it finely.

  6. In a small bowl, crush (with your fingers is fine) a good tablespoon of sea salt and grind about an equal quantity of black pepper (you're going to be handling raw beef next - you don't want the juices from your hands to hang around on the pepper mill, do you?)

  7. Season the beef - I tip one pack of beef onto the other, then separate the strands back into the empty pack. Each time you make a full layer, sprinkle on some salt and pepper

Cooking stage 1 (20 mins)

  1. In the big pot, pour a good layer of olive oil. Heat over a low-to-medium heat.

  2. Add the semi-crushed (not chopped) garlic cloves. Cook until they're just going golden but still soft (burnt garlic is bitter and grim), then remove them and save for later

  3. Increase the heat slightly, and add the diced red onion, carrot, and celery. Stir these occasionally while they soften.

  4. Put the heavy-bottom frying pan over a medium-to-high heat, and add some olive oil

  5. In the frying pan, add the chopped white onion and star anise. Fry until the onions are caramelising - i.e. going golden brown. Then tip the contents of the frying pan into the big pot (before the onions start to burn and go bitter.

  6. In the frying pan, turn the heat up to full and start to sear the beef. Do this layer-by-layer - ALL the beef must be touching the pan, otherwise it'll broil instead of searing, so just do a little at a time. Keep it moving until it's browned outside but pink in the middle, then tip into the big pot. Repeat until all the beef is seared.

  7. ...remember to keep stirring the big pot every so often!

  8. In the frying pan, add a big lump of butter - probably 50g or so. This will sizzle and spit for a while. Once it's stopped sizzling, all the water has gone out of it, so then add the chopped garlic and mushrooms and saute these until they're golden brown at the edges but still plump and juicy. Then tip them into the large pot (the butter will help give the sauce a nice sheen)

  9. Make sure the frying pan is hot, then sear the lardons. You're looking for golden crispy edges, but plump and juicy pinkness. These will probably release a fair amount of fat into the pan - this is fine, it's all good for the sauce. When done, tip everything into the big pot.

  10. Now de-glaze the frying pan by tipping the red wine into it and keep stirring it and scraping the tasty bits off the bottom until the wine is reduced by about half. Tip into the large pot. You can now wash the frying pan - everything from now on takes place in the large pot (except the bechamel sauce)

  11. In the large pot, add the tomatoes and stir well.

  12. Add the Thai fish sauce and Worcester sauce, stir, and reduce the heat to low-to-medium

  13. Crack the bay leaves in half, and add to the pot

  14. Chop and add the lightly-browned garlic that you used to flavour the oil with, right back at the start

Reducing the sauce (1hr-2hrs)

  1. Leave the sauce pot simmering with the lid half-on for about an hour, stirring occasionally and adding pepper if needed. (Taste!)

  2. Sometimes I add a tablespoon of Marmite if the sauce needs more depth, sometimes not - depends on the ingredients

  3. It's done when it's done - i.e. when it looks and tastes like really good bolognese-style sauce, and isn't too runny or too dry.

  4. When it *is* done, turn the heat off and leave it to cool with the lid on for about 45mins

  5. Once it's cool, chop the basil and oregano and stir through. At this point, you can add the vine of the tomatoes and leave it to rest and infuse for about half an hour, then take the vine out again.

Layer the lasagne (5 mins)

  1. Put the oven onto 180 degrees C to heat up

  2. Put a layer of sauce in the bottom of the lasagne dish, then add a layer of lasagne

  3. Repeat at least once, preferably twice (depending on the size of your dish and how much sauce you have) until you've used all the sauce - but make sure that your top layer is sauce, not pasta!

Bechamel Sauce (5 mins)

  1. In the saucepan, melt about 50g of butter over a medium heat

  2. Add the flour and stir quickly until you have a good consistency - still "smearable", not a dry lump

  3. Begin adding milk a little at a time and stirring until absorbed. Make sure you don't add too much too quickly or it'll curdle.

  4. When you have a nice medium-thick-but-still-easily-pourable sauce, add the grated cheese and grind some more pepper into it.

  5. Stir until the cheese is all melted, then pour over the top of your lasagne, making sure it's all covered

  6. Grate a bit more cheese on top

Final baking (45 mins)

  1. Bake the lasagne in the oven at 180 degrees C for about 45 mins, or until the top is golden brown and ever so slightly crispy around the edges.

  2. Remove from the oven and leave to rest for about ten minutes before serving

And that's it! Quality of ingredients counts for a lot, but the biggest clincher is the care taken to make sure that each ingredient is individually well prepared and cooked before adding to the pot. Enjoy!

Saturday, October 03, 2009

Blogspot ATOM in Feed-normalizer / Simple-RSS

Both Cragwag and Sybilline are using the excellent Feed-normalizer for parsing RSS and ATOM feeds, but there's been a niggling problem with the ATOM generated by Blogger / Blogspot in particular - the resulting links on each entry end up pointing to the comments, not the post itself.

So I just forked simple-rss at github and fixed this.

Turns out that simple-rss is just taking the first link tag that it comes across and using that as the link for a post, which in the case of Blogspot ATOM is the comments link.

On inspection of the ATOM RFC it says (section :

atom:link elements MAY have a "rel" attribute that indicates the link relation type. If the "rel" attribute is not present, the link element MUST be interpreted as if the link relation type is "alternate".

Looking at the Blogspot ATOM, it looks like every element has a link rel="alternative" that points to the URL you would see if you navigated to the post from the blog homepage, so I've made it choose that link if it exists.

Github should build the gem automatically - but it's taking a long time to do it, so in the meantime, you can download it from http://github.com/aldavidson/simple-rss and build it locally:

gem uninstall simple-rss
cd (source root)
rake gem
cd pkg
gem install -l simple-rss

That should fix the problem

Tuesday, September 08, 2009

How to create and save an AMI image from a running instance

One snag I encountered early on in my migration of Cragwag and Sybilline to Amazon's EC2 Cloud, was that I needed to take a snapshot of my running instance and save it as a new Amazon Machine Image (AMI).

I'd created a bare-bones Debian image from a public AMI (32-bit Lenny, 5.0, not much else) and then installed a few standard software packages on it - mysql, ruby, apache, etc etc etc. Once I'd got them configured the way I wanted, it had taken a couple of hours (I'll go into the configuration relating to EBS in a separate post) so I wanted to snapshot this instance as a new AMI image. That way, if and when I needed to create a new instance, all of this work would already have been done.

It actually took a fair amount of time to find out (well, more than a few seconds Googling, which is just eternity these days, y'know?) so I'll save you the pain and just give you the solution.

First, install Amazon's AMI tools, and API tools:

export EC2_TOOLS_DIR=~/.ec2 #(or choose a directory here)
mkdir ec2-ami-tools
cd ec2-ami-tools
wget http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.zip
unzip ec2-ami-tools.zip
ln -s ec2-ami-tools-* current
cd ..
mkdir ec2-api-tools
cd ec2-api-tools
wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip
unzip ec2-api-tools.zip
ln -s ec2-api-tools-* current

echo "export EC2_AMITOOL_HOME=`dirname $EC2_TOOLS_DIR`/ec2-ami-tools/current" >> ~/.bashrc
echo "export EC2_APITOOL_HOME=`dirname $EC2_TOOLS_DIR`/ec2-api-tools/current" >> ~/.bashrc
echo "export PATH=${PATH}:`dirname $AMI_TOOLS_DIR`/ec2-ami-tools/current/bin:`dirname $AMI_TOOLS_DIR`/ec2-api-tools/current/bin" >> ~/.bashrc
source ~/.bashrc

Next, you'll need to get your security credentials. You can get a reminder of - or create as needed - these on the AWS "Your Account" > "Security Credentials" page.

I recommend you saving your X.509 certificate and your private key somewhere under /mnt/ - this directory is excluded from the bundled image. Quite important that, as otherwise your credentials would be bundled up in the image - and if you ever shared that image with anyone else, you'd be sharing your credentials too!

You'll also need to note your AWS access details - especially your access key and secret key - plus your Amazon account ID.

Now, we're at the main event.

To take a snapshot of your running instance:

First, choose a name for your AMI snapshot. We'll call it ami-instance-name :)

# make a directory for your image:
mkdir /mnt/ami-instance-name

# create the image (this will take a while!)
ec2-bundle-vol -d /mnt/ami-instance-name -k /path/to/your/pk-(long string).pem -c /path/to/your/cert-(long string).pem -u YOUR_AMAZON_ACCOUNT_ID_WITHOUT_DASHES

Once that's done, you should have a file called image.manifest.xml in your /mnt/ami-instance-name directory, along with all the bundle parts. Sometimes it will say Unable to read instance meta-data for product-codes - but this doesn't seem to cause any problems, and I've successfully ignored it so far :)

Next, upload the AMI image to S3. This command will create an S3 bucket of the given name if it doesn't exist - I've found it convenient to call my buckets the same as the instance name:

ec2-upload-bundle -b ami-instance-name -m /mnt/ami-instance-name/image.manifest.xml -a YOUR_AWS_ACCESS_KEY -s YOUR_AWS_SECRET_KEY

You should then be able to register the instance. I've done that using the rather spiffy AWS Management Console web UI, but you can also do it from the command line using:

ec2-register ami-instance-name/image.manifest.xml

And that's it!

Of course, you could be cunning and create a script that does it all in one. I've got my AWS/EC2 credentials stored in environment variables from my .bashrc:

export EC2_PRIVATE_KEY=/mnt/keys/pk-(long string).pem
export EC2_CERT=/mnt/keys/cert-(long string).pem
export AWS_ACCOUNT_ID=(my account id)
export AWS_ACCESS_KEY=(my AWS access key)
export AWS_SECRET_KEY=(my AWS secret key)

which means I can make, upload and register an instance in one, by running this script:



ec2-bundle-vol -d /mnt/images/$1 -k $EC2_PRIVATE_KEY -c $EC2_CERT -u $AWS_ACCOUNT_ID
ec2-upload-bundle -b $1 -m /mnt/images/$1/image.manifest.xml -a $AWS_ACCESS_KEY -s $AWS_SECRET_KEY
ec2-register $1/image.manifest.xml

...and giving it a parameter of ami-instance-name. I have that script saved as make_ami.sh, so I can just call, for instance:

make_ami.sh webserver-with-sites-up-and-running

...and go have a cup of coffee while it does it's thing.

Moving a site to The Cloud

Last week I did a lot of reading and research into cloud hosting. The "Cloud" has been a buzzword for a while now, often bandied about by those who know no better as a simple sprinkle-on solution for all of your scale problems - much in the same way as AJAX was touted around a few years ago as a magic solution to all of your interface problems.

The perception can sometimes seem to be Hey, if we just shift X to The Cloud, we can scale it infinitely!. The reality, of course, is something rather more qualified. Yes, in theory, the cloud has all the capacity you're likely to need, unless you're going to be bigger than, say, Amazon - (are you? Are you reeeeeeeally? C'mon, be honest...) - provided - and it's a big proviso - that you architect it correctly. You can't just take an existing application and dump it into the cloud and expect to never have a transaction deadlock again, for instance. That's an application usage pattern issue that needs to be dealt with in your application, and no amount of hardware, physical or virtual, will solve it.

There are also some constraints that you'll need to work around, that may seem a little confusing at first. But once I got it, the light went on, and I became increasingly of the opinion that the cloud architecture is just sheer bloody genius.

What kind of constraints are they? Well, let's focus on Amazon's EC2, as it's the most well-known....

  • Your cloud-hosted servers are instances of an image
    They're not physical machines - you can think of them as copies of a template Virtual Machine, if you like. Like a VMWare image. OK, that one's fairly straightforward. Got it? Good. Next:

  • Instances are transient - they do not live for ever
    Bit more of the same, this means that you create and destroy instances as you need. The flipside is that there is no guarantee that the instance you created yesterday will still be there today. It should be, but it might not be. EC2 instances do die, and when they do, they can't be brought back - you need to create a new one. This is by design. Honestly!

  • Anything you write to an instance's disk after creation is non-persistent
    Now we're getting down to it. This means that if you create an instance of, say, a bare-bones Linux install, and then install some more software onto it, and set up a website, then the instance dies - everything you've written to that instance's disk is GONE. There are good strategies for dealing with this, which we'll come onto next, but this is also by design. Yes, it is...

  • You can attach EBS persistent storage volumes to an instance - but only to one instance per volume
    This one is maybe the most obscure constraint but is quite significant. Take a common architecture of two load-balanced web servers with a separate database server. It's obvious that the database needs to be stored on a persistent EBS volume - but what if the site involves users uploading files? Where do they live? A common pattern would be to have a shared file storage area mounted onto both web servers - but if an EBS volume can only be attached to one instance, you can't do that.

Think about that for a few seconds - this has some pretty serious implications for the architecture of a cloud-hosted site. BUT - and here's the sheer bloody genius - these are the kind of things you'd have to deal with for scaling out a site on physical servers anyway. Physical hardware and especially disks are not infallible and shouldn't be relied on. Servers can and do go down. Disks conk out. Scaling out horizontally needs up-front thought put into the architecture. The cloud constraints simply force you to accept that, and deal with it by designing your applications with horizontal scaling in mind from the start. And, coincidentally, provide some kick-ass tools to help you do that.

Take, for example, the last bullet point above - that EBS volumes can only be attached to one instance. So how do you have file storage shared between N load-balanced web servers? Well, the logical thing to do is to have a separate instance with a big persistent EBS volume attached to it, and have the web servers access it by some defined API - WebDAV, say, or something more application-specific.

But hang on.... isn't that what you should be doing anyway?. Isn't that a more scalable model? So that when your fileserver load becomes large, you could, say, create more instances to service your file requests, and maybe load-balance those, and....

See? It forces you to do the right thing - or, at least, put in the thought up front as to how you'll handle it. And if you then decide to stubbornly go ahead and do the wrong thing, then that's up to you... :)

So, anyway, I wanted to get my head round it, and thought I'd start by shifting Cragwag and Sybilline onto Amazons EC2 cloud hosting service. I did this over a two day period - most of which, it has to be said, was spent setting up Linux the way I was used to, rather than the cloud config - and I'll be blogging a few small, self-contained articles with handy tips I've learned along the way. Stay tuned....

Monday, August 17, 2009

Styling the Rails auto_complete plugin

Over the weekend I was having a fiddle around with the layout on Cragwag, as the existing design was a bit, shall we say, emergent. I'd being trying to avoid a LHS column, because everything always seems to end up with one, but in the end I had to give up and just go with it. The auto-tag cloud just makes more sense over there, and there's no point fighting it.

Anyway, the next question was what else to put there? I was looking for an intermediate stop-gap search until I got round to putting a proper Lucene-based search in there, and so the thought struck - I'd been looking for a way to browse tags that weren't necessarily in the top 20 (e.g. show me all the posts about Dave McLeod's 2006 E11 route), so why not try an auto-suggest tag search?

So I found DHH's auto_complete plugin (it used to be part of Rails core until version 2) and got to work. This should be easy, right?


I'll cut out the frustrations and skip to the solutions. :)

The documentation on this plugin is virtually non-existent, and there some extra bits you'll need - but I found this post very helpful.

One minor irritation I found was that it writes out a bunch of css attributes directly into your HTML inside a style tag - including a hard-coded absolute width of 350px. (see the auto_complete_stylesheet method)

Argh! said I, as my search needed to fit into a 150px width.

So how can you get round this? Simple - you can override the inline CSS in your stylesheet, provided you provide a CSS selector with higher specificity

Now, CSS specificity can be a fairly complicated topic, but I usually just remember it like this - if you've got two rules that apply to a particular thing, the more specific rule wins.

In this case, the inline CSS selector from the plugin:
 div.auto_complete {
width: 350px;
background: #fff;

gets trumped by Cragwag's more specific selector:
div#content div#lh_sidebar div.auto_complete {
width: 150px;
background: #fff;

The first applies to any div with class="auto_complete", but the second applies only to divs with class="auto_complete" which are inside a div with id="lh_sidebar" which is inside a div with id="content". So that's a more specific rule, so it wins.


Wednesday, August 12, 2009

OutOfMemoryError in ActiveRecord-JDBC on INSERT SELECT

During some scale testing the other day, we came across this unusual / mildly amusing error in a database-bound command that just funnels INSERT SELECT statements down the ActiveRecord JDBC driver:

java/util/Arrays.java:2734:in `copyOf': java.lang.OutOfMemoryError: Java heap space (NativeException)
from java/util/ArrayList.java:167:in `ensureCapacity'
from java/util/ArrayList.java:351:in `add'
from com/mysql/jdbc/StatementImpl.java:1863:in `getGeneratedKeysInternal'
from com/mysql/jdbc/StatementImpl.java:1818:in `getGeneratedKeys'
from org/apache/commons/dbcp/DelegatingStatement.java:318:in `getGeneratedKeys'
from jdbc_adapter/JdbcAdapterInternalService.java:668:in `call'
from jdbc_adapter/JdbcAdapterInternalService.java:241:in `withConnectionAndRetry'
from jdbc_adapter/JdbcAdapterInternalService.java:662:in `execute_insert'
... 25 levels...

Now, there's a couple of things here that are worth pointing out.

  1. I REALLY LOVE the fact that it blew heap space in a method called ensureCapacity. That makes me smile.
  2. Why is it calling getGeneratedKeys() for an INSERT SELECT?

The getGeneratedKeys() method retrieves all the primary keys that are generated when you execute an INSERT statement. Fair enough - BUT the issue here is that we'd specifically structured the process and the SQL statements involved so as to be done with INSERT SELECTS, and hence avoid great chunks of data being transferred backwards and forwards between the app and the database.

It turns out that the ActiveRecord JDBC adapter is doing this :

@JRubyMethod(name = "execute_insert", required = 1)
public static IRubyObject execute_insert(final IRubyObject recv, final IRubyObject sql) throws SQLException {
return withConnectionAndRetry(recv, new SQLBlock() {
public IRubyObject call(Connection c) throws SQLException {
Statement stmt = null;
try {
stmt = c.createStatement();
smt.executeUpdate(rubyApi.convertToRubyString(sql).getUnicodeValue(), Statement.RETURN_GENERATED_KEYS);
return unmarshal_id_result(recv.getRuntime(), stmt.getGeneratedKeys());
} finally {
if (null != stmt) {
try {
} catch (Exception e) {

...in other words, explicitly telling the driver to return all the generated keys.
Hmm, OK, can we get round this by NOT calling the execute_insert method, and instead calling a raw execute method that doesn't return all the keys?

Well, no, unfortunately, because it also turns out that the ruby code is doing this:

# we need to do it this way, to allow Rails stupid tests to always work
# even if we define a new execute method. Instead of mixing in a new
# execute, an _execute should be mixed in.
def _execute(sql, name = nil)
if JdbcConnection::select?(sql)
elsif JdbcConnection::insert?(sql)

...and the JdbcConnection::insert? method is detecting if something's an insert by doing this:
(JdbcAdapterInternalService.java again)

@JRubyMethod(name = "insert?", required = 1, meta = true)
public static IRubyObject insert_p(IRubyObject recv, IRubyObject _sql) {
ByteList bl = rubyApi.convertToRubyString(_sql).getByteList();

int p = bl.begin;
int pend = p + bl.realSize;

p = whitespace(p, pend, bl);

if(pend - p >= 6) {
switch(bl.bytes[p++]) {
case 'i':
case 'I':
switch(bl.bytes[p++]) {
case 'n':
case 'N':
switch(bl.bytes[p++]) {
case 's':
case 'S':
switch(bl.bytes[p++]) {
case 'e':
case 'E':
switch(bl.bytes[p++]) {
case 'r':
case 'R':
switch(bl.bytes[p++]) {
case 't':
case 'T':
return recv.getRuntime().getTrue();
return recv.getRuntime().getFalse();

...in other words, if the sql contains the word INSERT, then it's an INSERT, and should be executed with an execute_insert call.

So, looks like we're a bit knacked here. There are two possible solutions:

  1. The proper solution - fix the AR JDBC adapter (and, arguably, the MySQL connector/J as well, to stop it blowing heap space), submit a patch, wait for it to be accepted and make it into the next release.

  2. Or, the pragmatic solution - rewrite the SQL-heavy command as a stored procedure and just call it with an EXECUTE and sod the DHH dogma.

We went with option 2 :)

Big kudos to D. R. MacIver for tracking down the source of the fail in double-quick time.

Monday, August 03, 2009

Introducing Cragwag.com!

You know those conversations you end up having in the pub, where after a couple of beers you end up saying "you know what there should be? There should be a site that does X" (where X can be anything at all)

I've had so many of those over the years, and never quite managed to work up the free time / motivation to actually get on and put the ideas into practice..... (and then what's tended to happen is that a couple of years later, someone else goes and does them and makes a fortune, but that's just sods law)

Well a few months ago, I decided enough was enough, and the next time I had one of those ideas, I should just stop talking about it and actually do it - so in my evening and weekends here and there, I've been noodling away on a couple of ideas, mainly just for my own amusement, and to keep me in the habit of actually following through on things.

So here's one of them - Cragwag - a climbing news aggregator.

Those of you who know me in person know that I'm a keen amateur climber. I'm under no illusions - I'm definitely in the "amateur" category for good reason :) - but it struck me that although there's a "definitive" go-to site for uk-centric climbing news ( UK Climbing.com ) it's still editorially filtered - an editor keeps himself up to date on everything that's going on in the scene, and then publishes what he thinks is significant.

That's all well and good, but typically what's significant is what's going on at the forefront of ability. I felt there was also a place for the (admittedly by-now-a-little-cliched) long tail of climbing-related blogs - unfiltered, un-edited, everybody's tales. Whether of heart-stopping epics in the Himalaya, or scrabbling up a Stanage slab. If you felt it enough to blog about it, then someone wants to hear about it.

Plus it was an experiment in automatic news tagging and cross-relating of posts based on content, so it was kind of techie fun too. Which is important :) I'd like to do something with a crowd-sourced google map and iphone gps too, but that's much more experimental. Need to learn the iPhone SDK first...

So that's Cragwag - all the climbing news from punters and pros alike. Just for fun, for the sake of an experiment, and - to paraphprase the most famous answer to "why?" of all time - because it wasn't there :) Yay!

Wednesday, July 29, 2009

VC 2.0 - Crowdfunding is go!

The Series B VC market sucks right now. The boom times of the pre-credit-crunch 2.0 world seem very far away. Although there's still enthusiasm to invest in early-stage companies at Angel or Series A rounds (small investment, large slice of equity, potential for large return) and Series C/D+ (where the company is commercialised and probably generating revenue or damn close to it, and therefore the risk is much lower) there's been a huge withdrawal of funds and appetite for Series B investments.

Which is were we are right now.

So in typical Trampo style, we've decided not to go down that route... and today we launched Trampoline's Crowdfunding process.

The idea is, simply put, that rather than persuading a few people to part with a large sum of cash, to go the Obama way and offer tranches of shares for much smaller amounts to many more people. This also opens up great possibilities for utilising the power of the extended network of many investors, all the way from getting a large network of potential contacts (it's almost like hiring a hundred part-time salesmen) to exploiting the wisdom of crowdfunders on key strategic decisions.

Of course, nothing's ever quite as simple as it sounds, and there are many FSA regulations about this kind of thing designed to protect Mom and Pop investors from the plethora of possible boiler room-type scams that could be represented in this light, such as:

  1. We can't actively solicit investment in any public setting.
    So this post is FYI only, ok? Ok...

  2. We can't discuss any details with anyone, unless they're
    1. A certified Sophisticated Investor, or
    2. A certified High Net Worth Individual, or
    3. A friend or family member (yes, apparently that's ok!)

  3. So this is emphatically NOT an exchange-less IPO! 'Cause those are illegal! It's more of a self-build consortium, kind of thing

Anyway, bearing in mind the FSA regulations, that's about as much as I can say, apart from point you at Trampoline's Crowdfunding article in todays Financial Times (also on page 12 of the print edition), and the Crowdfunding Website if you want more information

Friday, May 22, 2009

This should not be possible - transactional fail?

We're having a bizarre problem with MySQL 5.0.32 on Debian in a highly-concurrent environment, using the ActiveRecord JDBC adapter. We've also seen it on 5.0.51 on Ubuntu.

There's a particular sequence of 3 queries, all executed from within a ActiveRecord::Base.transaction { } block:

// Statement 1:
INSERT INTO (InnoDB table 1)
(type, ...loads of other fields...)
SELECT 'ProtoMessageItem', ...other fields...
FROM (MyISAM table 1) LEFT OUTER JOIN (InnoDB table 2)
WHERE (anti-duplication clause)
GROUP BY (synthetic identifier on MyISAM table 1, to be a unique key on InnoDB table 1)

// Statement 2:
INSERT INTO (InnoDB table 2)
SELECT (fields)
FROM (MyISAM table 1) LEFT OUTER JOIN (InnoDB table 3) INNER JOIN (InnoDB table 1)
WHERE (anti-duplication clause)

// Statement 3:
UPDATE (InnoDB table 1)
SET type = 'MessageItem'
WHERE type = 'ProtoMessageItem'

Now, all of this takes place within an explicit transaction, and works well - this approach provides huge performance gains over the more simple and obvious methods.

The problem is that just very very occasionally, it breaks. We don't know how or why, but once in a blue moon, statement 2 throws a DUPLICATE KEY exception, and the transaction DOESN'T get rolled back. Statement 1 still gets COMMITTED. We can see ProtoMessageItems in the DB, and this blocks up the pipeline and stops further MessageItems from being created.

ProtoMessageItems should NEVER be visible - they're only ever created and then updated to full MessageItems in the same 3-statement transaction. Seeing them from outside that transaction is like seeing a naked singularity - it just shouldn't be possible.

This is all kinds of wrong in a very wrong way, as it means one of the following - either:

  1. The DB is getting the correct BEGIN/COMMIT/ROLLBACK statements, but transactional atomicity is not holding for some reason

  2. The JDBC driver is not sending the correct ROLLBACK statements, but thinks it's in autocommit mode

  3. There's an obscure concurrency edge case somewhere with the ActiveRecord JDBC adapter, that means once in a while, for some reason, it starts that transaction in autocommit mode.

All of these possibilities are worrying. We've eliminated the possibility of it being some quirk of the particular data that causes this problem. We've turned on MySQL statement logging and watched it for hours, and as far as we can see, the correct ROLLBACK statements seem to be issued every time.

Has anyone encountered a similar problem? Any known issues with INSERT/SELECT on combinations of (transactional) InnoDB and (non-transactional) MyISAM tables?

Tuesday, May 19, 2009

Suggestions for GruntFuddlr !

Following on from the previous post on The imminent death of Twitter, after several comments along the lines of "Oh Al, you have SOOOO got to buy that domain" followed by a brief moment of hastiness with the credit card, I am now the proud owner of GruntFuddlr.com !

So all I need to do now is decide what to do with it.... add your suggestions to the comments! The winner will win - erm - something to be decided!

Friday, May 15, 2009

Twitter Will Soon Die - the second Tipping Point approaches

Maybe I'm sticking my neck out, but it's inevitable - hear me out here. By now you're surely familiar with the concept of The Tipping Point as espoused by Malcolm Gladwell -
the moment of critical mass, the threshold, the boiling point
the point at which momentum for change becomes unstoppable

This has been adopted as a central tenet of the Web 2.0+ business model, and is surely now almost as ubiquitous on a web startup's business plan as a long tail reference or hockey stick graph. This post is about the Tipping Point concept specifically as applied to social networking services.

It's certainly true in my experience that every successful social software service goes through some standard stages of growth:

  1. Early adoption
    The service gets adopted by a set of keenly curious early users. This is often through an initial restricted beta program, or through the conference circuit, or friends and families of those involved in the development. Many never get past this stage - maybe it's just not a particularly good idea, or useful service, or well-executed, or maybe it's just unlucky. Assuming the service does get past this stage, they move on to...

  2. Viral spread
    The early adopters have judged and found it worthy. Here's where they get to gain geek cool points by name-dropping it amongst their friends, on the conference circuit, in Friday-night bars, etc... often taking advantage of the opportunity to faintly patronise those who are still using an inferior / more mass-market / older service - I mean, how many times have you heard lines like these?

    ....Oh Digg totally jumped the shark, I moved off there to Reddit and now I've started using MangWurdlr (or whatever)
    ...Yeah I use GruntFuddlr for all my (whatever)-ing now, it's nice, they've got a really cool UI, and ....etc...etc...
    ...Oh, you should have a look at PigNudgr - they're still in beta, but it looks really cool, and they're a great bunch of guys - I used to work with Andy at (name of now-defunct late-nineties web agency, usually some combination of colour and animal) and he's doing some really neat stuff now...
    etc etc.

    Every time I have one of those conversations I'm reminded of when I was in the sixth form (that's school from 16-18, when you've passed your first exams - sorry I don't know what the US equivalent is) and people would compete for alpha-cool status by getting into ever-more obscure bands on ever-smaller labels.

    Remember those? Bands like "Anna", "Spontaneous Cattle Combustion", "Generic me-too grunge band #241" - when they appeared in NME for the first time they were the coolest thing since sliced bread, and if you didn't have a copy of their debut EP you were nobody, maaannn - ? It's the same phenomenon. Web startups / bands - same thing, just slightly older protagonists. Same silly haircuts though.

  3. The first Tipping Point

    This is the Holy Grail for earlier-stage web services. This is where you succeed in creating enough word-of-mouth buzz that suddenly, you're unstoppable. You tip over into the mainstream. People's parents have heard of you (well, those that read the more techno-savvy broadsheets anyway) A whole new wave of joiners is poised to come on-board, often purely because of who is already on there. Prime example - Stephen Fry on Twitter.

  4. Inflation

    (apologies to Alan Guth!)
    This is where the whole thing goes crazy. Faster growth than they ever thought possible. All those times the engineers said "Well, that's a high-class problem to have, and we'll just have to deal with that when we get there.." well - they're there right now. Seems like every week there's another digit on the hit counts. Often characterised by regular outages due to scale fail in completely unexpected ways ("Holy crap, we've run out of inodes??? That's possible?"). They're sticking in new servers by the dozen all the time, burning through all that VC funding and not making any money yet..

    Meanwhile the user base seems to take on a life of its own. It's no longer a community, it instead takes on the characteristics of a population - splits, fragments, in-fighting, single-issue activism - often against the decisions of the founders, who in their increasingly urgent need to get revenue to support the spiralling infrastructure costs are taking commercially-led decisions and seen as increasingly out-of-touch with the still-trying-to-be-faithful users.

    There's conflicting senses of excitement and disappointment, that it's just not the same as it used to be. And who are all these ****wits joining now anyway? What - my MUM just joined? And she wants to be my friend? Oh crap, better get rid of all of those tales of drunken debauchery and undeniable photographic evidence. Your boss probably joins up, adding further fuel to the conflict of identity. Is this personal? A work tool? Both?

  5. The second Tipping Point

    All good things must come to an end, and this is the end of the line for the early adopters. It's not new, edgy or cool anymore, now that it's talked about on Question Time and even the Government are starting to use it in some god-awful, half-assed but completely cringe-making attempt to be Down With Tha Kids. Meanwhile, you keep getting friend requests from people at school who you weren't friends with in the first place and have no intention of pretending to be friends with now.

    The whole world and his hamster seems to be trying to spam you about it or on it. The number of z-list celebrities on it just keeps growing - Peter Andre? Really? Christ... - and they don't even post themselves, it's done via their PR person, or if they do they're just pushing their warez (yes, @WilliamShatner, I'm looking at you).


    You've had enough. You can barely be bothered to check it any more, and all the third-party apps built on the API are just getting on your nerves, and all the cool kids have moved on to DangDurdlr.com anyway... apparently it's really cool, and apparently Dave from AvocadoAardvaark.com got together with Si from PurplePlatypus to start it up, and they've got a really neat UI, and it doesn't have all of these ****wits on it...

So that's the Second Tipping Point. When all of the people who joined before the first Tipping Point leave, partly due to their constant need for something new, partly because services just naturally evolve, but mostly because of all the people who joined just after the first. Who, in their turn, had mainly joined because of all the people who were already on it...

Facebook got there a few months ago. Twitter is rapidly approaching it, and you know what? It's natural. It's part of the inevitable evolution of ideas and the services built upon them. It's not that Twitter has anything fundamentally wrong with it, it's mainly that the problem with software that puts people in touch with people..... is people. The cliques and cooler-than-thouism that ruled school society never went away, they just changed form and went online.

MySpace took over from first-gen social software like SmartGroups, and while its user numbers may technically still be growing, in new-frontier terms, it's now dying. It was supplanted by Facebook, which has become supplanted by Twitter, which will soon be supplanted by some other new kid on the block. No-one knows what it will be yet, but that's part of the whole fun of this business - natural selection in the full glare of the web, with all the competition, cross-breeding, malformed offspring and occasional apparently-arbitrary successes. I love it.

Tuesday, April 07, 2009

Another day, another deployment

Posts have been increasingly thin on the ground recently, due to a couple of things: my discovery of a reasonable lightweight API client for Twitter, but mainly lack of time - which itself is down to two things:
  1. I'm organising my wedding to the ludicrously lovely Lise, and
  2. we've been getting SONAR installed and running at two large corporates recently (who shall remain nameless)

It's a good feeling to have seen a product go right through from initial idea on a post-it note to being installed at a super-huge financial client, and it makes all the work worthwhile when you see people actually using it to do their job. I wish I could talk more about the details, but confidentiality dictates not, sadly.

I've also been meaning to blog about the app I whipped up to manage the football team I organise - being a geek at heart, it's amazing the amount of time and effort I devote to being effectively lazy, and this little Rails app which started out as a quick time-saver ended up being a case study in the power of stupidity as applied to a linear optimisation problem - "given this set of players, each of whom have a set of positions they prefer to play in, what is the best possible line-up ?" I'll get round to blogging about that at some point, honest..

Another little personal project that I started as a tool for myself and then ended up generalising and meaning to add to whenever I get time (chortle) is Cragwag - an RSS aggregator, plus some - aggregating many various climbing-related blogs, an only-just-started map of climbing areas plus B&B's plus parking plus convenient deli's, which I may end up mashing up with weather reports to answer questions like "where should we go climbing this weekend?"

Bleh. So that's why I've ended up micro-blogging rather than macro-blogging these days. I will try to get used to writing in more than 140 characters again soon...

Tuesday, March 24, 2009

And a jolly good time was had by all

Originally uploaded by Dr Snooks
Stag do - survived!
Physically unscathed.
Some of the mental scars may take a little longer to expunge....

Thursday, January 29, 2009

What was that about imitation / flattery?

So, way back in April 2008, Techcrunch ran this story about us: The Enterprise Social Network Auto-Generated and Visually Mapped

Imagine our surprise here when we discovered this post today on TechcrunchIT:
When IBM Beats Facebook and Twitter

So IBM are now also doing a product called SONAR, that does almost exactly the same thing as our existing product called SONAR (Social Networks And Relevance)??? Spooky!
The article says that it's called SAND (Social Networks And Discovery), but the screenshots clearly show SONAR everywhere, and that's how it's referred to on the IBM Research Labs page

I felt compelled to respond to the claim that "No one else in consumer or enterprise is doing this yet.", so I posted a comment about an hour ago. Hasn't appeared yet though.....

Wednesday, January 07, 2009

...but can Evel jump it?

Happy New Year to all!

It's time for my first bold tech prediction* of 2009 - this year's "....but will it blend?" will be "...but can Evel jump it?"

Get your suggestions in - the more surreal, the better!

* Disclaimer: OK, OK, so for "bold tech prediction", read "blatant attempt to spread a meme and drive some traffic to my mate's site" :)