Nginx was configured to timeout requests after 60s, but it didn't seem to be working. In fact, the key to figuring out what was going on, was that while tailing the logs, I saw an occasional query come through with a ridiculously long execution time - 400 seconds or more... Yowch!
So, if requests were supposed to time out after 60 seconds, how come a query was allowed to stay executing for 400 or more?
It all comes down to threads. We're still on CRuby 1.8.7 (MRI) in production - we'll move to 1.9 and JRuby at some point, but previous experience has made me careful to test thoroughly, test some more, then test again and again and again before making the move, and we're not quite ready to shift yet.
Now, the default (MRI) CRuby interpreter uses green threads, and the eventmachine gem underlying Thin server uses - like most Ruby libraries - timeout.rb for its timeout mechanism. This article gives the full details, but it boils down to the fact that:
it is a well-known limitations of green threads that when a green thread performs a blocking system call to the underlying operating systems, none of the green threads in the virtual machine will run until the system call returns ... From the operating system perspective, there is only a single thread in the Ruby interpreter (the native one)
So essentially, the monitor thread which is responsible for timing out the request has to wait for the blocking IO call to return, before it can resume. This means that when you have a long-running, block IO call (say, for instance, an ActiveRecord query that takes 400 seconds to run) running under MRI, the default timeout mechanism is useless.
The solution? If you can't just switch to JRuby, then here's a simple 5-minute fix that did the trick for us:
in environment.rb :
config.gem 'SystemTimer', :lib => 'system_timer'
in application_controller.rb :
around_filter :force_timeout
def force_timeout( timeout=nil, &block )
timeout ||= 60 # <= or whatever value is appropriate
if defined?(SystemTimer) && timeout.to_i > 0
begin
SystemTimer.timeout_after(timeout) do
yield
end
rescue Timeout::Error => e
logger.error( "Timed out request after #{timeout}s!")
raise "RequestTimeout"
end
else
yield
end
end