Instant Badger: 2011

Tuesday, November 22, 2011

Porting a Rails 2.3 app to Ruby 1.9

We finally managed to get enough space in the schedule to take the plunge and port our monolithic Rails 2.3 app to Ruby 1.9, with a view to increasing scalability of our app. An upgrade to Rails 3 is also on the cards for later, but... one thing at a time.

As ever, the path to true Ruby nirvana is paved with good intentions, and tends to detour into dependency hell for a good portion of the way. Here's a quick shortlist of some of the issues we found along the way, and what we did to get round them.

In no particular order, here we go.....

MySQL2 version should be no later than 0.2.x

If you get the dreaded Please install the mysql2 adapter: `gem install activerecord-mysql2-adapter` error message, what it really means is You can't use MySQL2 version 0.3+ with Rails version less than 3
If you're already using MySQL2 0.2.x, and you're on Mac OS X, and you're still getting the error, then the other thing that it really means is I couldn't find the dynamic library libmysqlclient.18.dylib. There are two proposed fixes for this:

sudo install_name_tool -change libmysqlclient.18.dylib /usr/local/mysql/lib/libmysqlclient.18.dylib /Users/YOUR_USER_NAME/.rvm/gems/1.8/gems/mysql2-0.2 - I couldn't get this working
sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib - This worked for me on Lion

Use Bundler

If you weren't using it before (we weren't), use it now. Stop fighting the inevitable, just give in, bend over and take it - and use Bundler. Seriously, it makes things easier in the long run.

EventMachine does not compile (update: ..easily..) on 1.9.2

...at least not on my Lion Macbook Pro. It kept giving me the compiler error: cc1plus: error: unrecognized command line option ‘-Wshorten-64-to-32’. Under 1.9.3, however - no problems, compiled first time every time.

Update: actually, I did eventually get EventMachine 0.12.10 to compile under 1.9.2 with an evil hack. The error "-Wshorten-64-to-32" above made me think - the compiler is not part of RVM, so the only way the compiler would recognise a command-line option in one Ruby but not in another... is if it's not the same compiler! So I tried this:

$ rvm use 1.9.3@1.9.3rails2.3
Using /Users/aldavidson/.rvm/gems/ruby-1.9.3-p0 with gemset 1.9.3rails2.3
$ irb
ruby-1.9.3-p0 :002 > require 'mkmf'
 => true 
ruby-1.9.3-p0 :004 >  CONFIG['CXX']
 => "g++-4.2" 
ruby-1.9.3-p0 :005 > exit

Now let's see what 1.9.2 is using:

$ rvm use 1.9.2@1.9.2rails2.3
Using /Users/aldavidson/.rvm/gems/ruby-1.9.2-p290 with gemset 1.9.2rails2.3
$ irb 
ruby-1.9.2-p290 :001 > require 'mkmf'
 => true 
ruby-1.9.2-p290 :002 > CONFIG['CXX']
 => "g++" 
ruby-1.9.2-p290 :006 > exit

Ah-hah! So they are in fact using different C++ compilers! And although many native gems (e.g. MySQL) will respect and pass-through the CXX environment variable to the Makefile, sadly EventMachine is not one of them. So, the easiest fix was a bit of evil hackery - move g++ out of the way and symlink it to g++-4.2:

$ which g++
/usr/local/bin/g++

$ which g++-4.2
/usr/bin/g++-4.2

$ sudo mv /usr/local/bin/g++ /usr/local/bin/g++.bak && sudo ln -s /usr/bin/g++-4.2 /usr/local/bin/g++

looking good - let's go for it:

gem install eventmachineBuilding native extensions.  This could take a while...
Successfully installed eventmachine-0.12.10
1 gem installed

Yay!

Rails 2.3 is NOT SUPPORTED under Ruby 1.9.3!

There was a big fuss about Rails loading times under 1.9.3. The way that 'require' statements work has been changed, and it no longer checks to see if the file is there before requiring it. As a result, every controller MUST have a corresponding helper file, even if it's just a stub. This is majorly annoying, as we have 73 controllers across 3 namespaces, and only 17 helpers. The Rails team have "frozen" the 2.3 branch, and say that only security fixes will go in after 2.3.14 - in other words, they're not going to fix this. The good-old monkeypatch-via-plugin method doesn't work either, as the files get required AS Rails is loaded, not once it's initialised.

So there are two options, either:

Create a stub helper for each controller
...which seems a bit fugly to me, or..
Fork Rails, fix the issue, use that fork in the meantime until we can port to Rails 3.

This seems better, and who knows, if enough people complain about it, maybe we'll get a patch release (2.3.15?). I'm not holding my breath, mind.... but if we're in this situation, I'm sure others are too, so this will hopefully help some other people who are having the same problem.
So, here's the forked Rails 2.3.14 that we will fix the helper requires problem on (note: WILL fix, not "have already fixed"! :)

UPDATE: After having got round the EventMachine issue above, we no longer need to do this - so we're carrying on with 1.9.2 and standard rails 2.3.14.

Thursday, September 29, 2011

MySQL idle connections still holding locks

We had an interesting problem today. We were seeing very slow single-row updates (>30s) on our (innodb) scheduled_jobs table, and a large number of the update queries were failing with a LOCK WAIT TIMEOUT. The updates were using the primary key, so they should be pretty fast - but they weren't.

So we fired up a console and ran SHOW INNODB STATUS, and saw lots of transactions with long-held locks -

---TRANSACTION 0 200086649, ACTIVE 3000 sec, process no 29791, OS thread id 140353331377936

12 lock struct(s), heap size 3024, 6 row lock(s), undo log entries 10

MySQL thread id 243, query id 314676 ip-10-94-245-79.ec2.internal 10.94.245.79 tn

- however, when we cross-referenced the MySQL thread IDs with a SHOW PROCESSLIST, we found that lots of the threads weren't actually doing anything:

+------+------------+-----------+-----------------+----------------+------+--------------+------------------+

| Id   | User       | Host      | db              | Command        | Time | State        | Info             |

+------+------------+-----------+-----------------+----------------+------+--------------+------------------+

|  243 | tn         | (..snip..)| tn_production   | Sleep          | 3000 |              | NULL

This was strange - but after a bit of a Googling and a few minutes thought, we realised the cause.

When one of our worker processes dies - e.g. either because monit has killed it for taking up too much resource for too long, or because the EC2 instance it's running on has become unresponsive to ssh and been automatically restarted by our detect-and-rescue script - it may be in the middle of a transaction at the point it's killed.

If the transaction was holding locks, and never got to the "COMMIT" stage, and the connection wasn't closed normally, then that connection will persist - and maintain its locks - until it gets timed-out by the MySQL server.... and the default wait_timeout variable is 28800s - 8 hrs!.

So, we killed the idle connections, and the update queries started going through much more quickly again. We've now brought our wait_timeout variable down to 10 mins, and we'll see how we get on.

Wednesday, May 11, 2011

New EU Cookie / Privacy Law - Should you panic?

There's been a lot of fuss recently about the new "EU Cookie Law", and what effect it will have on EU- and UK-based online businesses. The EU directive has been around and discussed with varying degrees of hyperbole for a while, what's caused the recent kerfuffle has been the adoption into UK legislation pretty-much as-is.

So, should you be panicking in order to meet the implementation date of 26th May 2011?

Well... maybe, but I'm not.

Allow me to explain.

There is cause for concern - just like we saw with the RIPA act circa 2000, the legislation is clearly well-intentioned, but has just-as-clearly been implemented by people with little understanding of the problem domain. The full legislation is available from the horse's mouth here: The Privacy and Electronic Communications (EC Directive) (Amendment) Regulations 2011, but I would recommend the implementation guidelines from the Information Commissioner's Office for a slightly lighter read.

Credit where it's due, the guidelines give a welcome degree of balance and reasonableness to the situation, but even so they manage to contradict themselves.

The important section is "What do the new rules say?", where it quotes the relevant section of the new legislation:

6 (1) Subject to paragraph (4), a person shall not store or gain access to information stored, in the terminal equipment of a subscriber or user unless the requirements of paragraph (2) are met.
(2) The requirements are that the subscriber or user of that terminal equipment--
(a) is provided with clear and comprehensive information about the purposes of the storage of, or access to, that information; and
(b) has given his or her consent.
(3) Where an electronic communications network is used by the same person to store or access information in the terminal equipment of a subscriber or user on more than one occasion, it is sufficient for the purposes of this regulation that the requirements of paragraph (2) are met in respect of the initial use.

(3A) For the purposes of paragraph (2), consent may be signified by a subscriber who amends or sets controls on the internet browser which the subscriber uses or by using another application or programme to signify consent.

(4) Paragraph (1) shall not apply to the technical storage of, or access to, information--
(a) for the sole purpose of carrying out the transmission of a communication over an electronic communications network; or
(b) where such storage or access is strictly necessary for the
provision

Now, correct me if I'm wrong, but that seems fairly clear - if the user sets the controls on their internet browser to accept cookies, consent may be taken to be signified, right? Riiiiight?

Well, apparently not. A little bit further down the document, it says:

I have heard that browser settings can be used to indicate
consent – can I rely on that?
(...)
At present, most browser settings are not sophisticated enough to
allow you to assume that the user has given their consent to allow
your website to set a cookie

Er...wut? Most browsers are set to accept cookies by default, but can be changed to reject them, or to reject third-party cookies, or prompt for each one. How is that not giving consent? And how does this guidance interact with paragraph 3A from the regulations themselves?

There's another implementation question as well. Let's walk through a workflow.

User X arrives at a site
Site wants to set a cookie so that it can identify that User X's next click comes from User X, and not any of its other users.
..So Site has to ask for consent, presumably by an irritating and obtrusive pop-up window, or some other interstitial means. Remember that cookies are sent back as part of the HTTP response, so the decision regarding whether or not to set a cookie is taken on the server, before a response is sent back to the user.
If the user says "YES", all well and good - Site sets a cookie and remembers that choice.
BUT... what if the user says no? How do you remember that choice, without using a cookie? You can't, right? So you're going to have to ask every request...

OK, that's admittedly a simplified scenario for the purposes of making a point. But the point is valid - it's going to be extremely tricky to think of ways of implementing this directive in any meaningful way without destroying your user experience. And what about the ubiquitous third-party cookies that form the basis of services such as Google Analytics? Again, the guidelines are specific:

An analytic cookie might not appear to be as intrusive as others that might track a user across multiple sites but you still need consent

And THAT, paradoxically, is why I'm not panicking. Laws that are very difficult to comply with are, in practice, difficult to enforce. This law, if it ever does get enforced, is most likely to be used as a political club to attack a high-profile mega-corporate that couldn't be brought down any other way - think MS and the eventually farcical browser-bundling lawsuit. Think Al Capone, and the fact that the only crime he ever got convicted of was tax evasion.

There have also been some surprisingly reasonable quotes from the Information Commissioner's Office and the Government :

[Information Commissioner Christopher] Graham said the ICO was clear the changes should not have a detrimental impact on consumers nor cause “an unnecessary burden on UK businesses.”

and

we do not expect the ICO to take enforcement action in the short term against businesses and organisations as they work out how to address their use of cookies (Ed Vaizey, Culture Minister)

Regardless, there may well end up being a high-profile test case at some point, which will garner huge amounts of publicity and huge amounts of lawyers saying things like "this is a very interesting case" - which, to paraphrase Terry Pratchett, is lawyer-speak for "at least six months at three grand a day, plus expenses, per lawyer, with a minimum team of ten". It will eventually establish a precedent for judicial interpretation of the act, and at that point the necessary course of action will become clear.

Until then, I'm not panicking. And neither should you.

Tuesday, February 22, 2011

Rails requests not timing out in time?

The web server CPU was mostly idle, the 5 thin instances didn't seem to be doing much, but the requests were still taking forever to return anything. So what the flimmin' flip was taking so long?

Nginx was configured to timeout requests after 60s, but it didn't seem to be working. In fact, the key to figuring out what was going on, was that while tailing the logs, I saw an occasional query come through with a ridiculously long execution time - 400 seconds or more... Yowch!

So, if requests were supposed to time out after 60 seconds, how come a query was allowed to stay executing for 400 or more?

It all comes down to threads. We're still on CRuby 1.8.7 (MRI) in production - we'll move to 1.9 and JRuby at some point, but previous experience has made me careful to test thoroughly, test some more, then test again and again and again before making the move, and we're not quite ready to shift yet.

Now, the default (MRI) CRuby interpreter uses green threads, and the eventmachine gem underlying Thin server uses - like most Ruby libraries - timeout.rb for its timeout mechanism. This article gives the full details, but it boils down to the fact that:
it is a well-known limitations of green threads that when a green thread performs a blocking system call to the underlying operating systems, none of the green threads in the virtual machine will run until the system call returns ... From the operating system perspective, there is only a single thread in the Ruby interpreter (the native one)

So essentially, the monitor thread which is responsible for timing out the request has to wait for the blocking IO call to return, before it can resume. This means that when you have a long-running, block IO call (say, for instance, an ActiveRecord query that takes 400 seconds to run) running under MRI, the default timeout mechanism is useless.

The solution? If you can't just switch to JRuby, then here's a simple 5-minute fix that did the trick for us:

in environment.rb :



  config.gem 'SystemTimer', :lib => 'system_timer'

in application_controller.rb :

  around_filter :force_timeout

  def force_timeout( timeout=nil, &block )
    timeout ||= 60 # <= or whatever value is appropriate
    if defined?(SystemTimer) && timeout.to_i > 0
      begin
        SystemTimer.timeout_after(timeout) do
          yield
        end
      rescue Timeout::Error => e
        logger.error( "Timed out request after #{timeout}s!")
        raise "RequestTimeout" 
      end
    else
      yield
    end
  end

Instant Badger