Instant Badger: ruby

Showing posts with label ruby. Show all posts

Monday, February 06, 2012

Errno::EPIPE: Broken pipe when accessing S3 ?

Quick tech tip - if you're trying to access Amazon S3 from Ruby, even with the official aws-sdk gem, and you get errors like this: Errno::EPIPE: Broken pipe - when trying to upload, the issue is probably that you need to explicitly set the endpoint appropriately for the region of the bucket you're trying to access. By default, the endpoint is US-specific (US-EAST, I believe) Yes, I know the docs say you don't need to, but try it - if, like me, you find your problem instantly goes away, then Robert is your mothers' brother, as they say. Well, they do round here, anyway. You might find this comprehensive list of AWS endpoints useful. Oh, and it's not immediately obvious from the docs how to set the endpoint - I found it easiest to pass in a :s3_endpoint param when initialising the S3 object, like so -


 my_s3_object = AWS::S3.new( :s3_endpoint => 's3-eu-west-1.amazonaws.com', :access_key_id=>'my access key', :secret_access_key=>'my secret access key')

Tuesday, November 22, 2011

Porting a Rails 2.3 app to Ruby 1.9

We finally managed to get enough space in the schedule to take the plunge and port our monolithic Rails 2.3 app to Ruby 1.9, with a view to increasing scalability of our app. An upgrade to Rails 3 is also on the cards for later, but... one thing at a time.

As ever, the path to true Ruby nirvana is paved with good intentions, and tends to detour into dependency hell for a good portion of the way. Here's a quick shortlist of some of the issues we found along the way, and what we did to get round them.

In no particular order, here we go.....

MySQL2 version should be no later than 0.2.x

If you get the dreaded Please install the mysql2 adapter: `gem install activerecord-mysql2-adapter` error message, what it really means is You can't use MySQL2 version 0.3+ with Rails version less than 3
If you're already using MySQL2 0.2.x, and you're on Mac OS X, and you're still getting the error, then the other thing that it really means is I couldn't find the dynamic library libmysqlclient.18.dylib. There are two proposed fixes for this:

sudo install_name_tool -change libmysqlclient.18.dylib /usr/local/mysql/lib/libmysqlclient.18.dylib /Users/YOUR_USER_NAME/.rvm/gems/1.8/gems/mysql2-0.2 - I couldn't get this working
sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib - This worked for me on Lion

Use Bundler

If you weren't using it before (we weren't), use it now. Stop fighting the inevitable, just give in, bend over and take it - and use Bundler. Seriously, it makes things easier in the long run.

EventMachine does not compile (update: ..easily..) on 1.9.2

...at least not on my Lion Macbook Pro. It kept giving me the compiler error: cc1plus: error: unrecognized command line option ‘-Wshorten-64-to-32’. Under 1.9.3, however - no problems, compiled first time every time.

Update: actually, I did eventually get EventMachine 0.12.10 to compile under 1.9.2 with an evil hack. The error "-Wshorten-64-to-32" above made me think - the compiler is not part of RVM, so the only way the compiler would recognise a command-line option in one Ruby but not in another... is if it's not the same compiler! So I tried this:

$ rvm use 1.9.3@1.9.3rails2.3
Using /Users/aldavidson/.rvm/gems/ruby-1.9.3-p0 with gemset 1.9.3rails2.3
$ irb
ruby-1.9.3-p0 :002 > require 'mkmf'
 => true 
ruby-1.9.3-p0 :004 >  CONFIG['CXX']
 => "g++-4.2" 
ruby-1.9.3-p0 :005 > exit

Now let's see what 1.9.2 is using:

$ rvm use 1.9.2@1.9.2rails2.3
Using /Users/aldavidson/.rvm/gems/ruby-1.9.2-p290 with gemset 1.9.2rails2.3
$ irb 
ruby-1.9.2-p290 :001 > require 'mkmf'
 => true 
ruby-1.9.2-p290 :002 > CONFIG['CXX']
 => "g++" 
ruby-1.9.2-p290 :006 > exit

Ah-hah! So they are in fact using different C++ compilers! And although many native gems (e.g. MySQL) will respect and pass-through the CXX environment variable to the Makefile, sadly EventMachine is not one of them. So, the easiest fix was a bit of evil hackery - move g++ out of the way and symlink it to g++-4.2:

$ which g++
/usr/local/bin/g++

$ which g++-4.2
/usr/bin/g++-4.2

$ sudo mv /usr/local/bin/g++ /usr/local/bin/g++.bak && sudo ln -s /usr/bin/g++-4.2 /usr/local/bin/g++

looking good - let's go for it:

gem install eventmachineBuilding native extensions.  This could take a while...
Successfully installed eventmachine-0.12.10
1 gem installed

Yay!

Rails 2.3 is NOT SUPPORTED under Ruby 1.9.3!

There was a big fuss about Rails loading times under 1.9.3. The way that 'require' statements work has been changed, and it no longer checks to see if the file is there before requiring it. As a result, every controller MUST have a corresponding helper file, even if it's just a stub. This is majorly annoying, as we have 73 controllers across 3 namespaces, and only 17 helpers. The Rails team have "frozen" the 2.3 branch, and say that only security fixes will go in after 2.3.14 - in other words, they're not going to fix this. The good-old monkeypatch-via-plugin method doesn't work either, as the files get required AS Rails is loaded, not once it's initialised.

So there are two options, either:

Create a stub helper for each controller
...which seems a bit fugly to me, or..
Fork Rails, fix the issue, use that fork in the meantime until we can port to Rails 3.

This seems better, and who knows, if enough people complain about it, maybe we'll get a patch release (2.3.15?). I'm not holding my breath, mind.... but if we're in this situation, I'm sure others are too, so this will hopefully help some other people who are having the same problem.
So, here's the forked Rails 2.3.14 that we will fix the helper requires problem on (note: WILL fix, not "have already fixed"! :)

UPDATE: After having got round the EventMachine issue above, we no longer need to do this - so we're carrying on with 1.9.2 and standard rails 2.3.14.

Tuesday, December 08, 2009

Memcached Cache Invalidation Made Easy

There are only two hard problems in computer science - cache invalidation, and naming things
Phil Karlton

It's an oft-quoted truism that brings a knowing smile to most hardened programmers, but it's oft-quoted precisely because it's true - and during a recent enforced rush job to implement a cache, I came across a nifty solution to the first problem by judicious use of the second.

First, the problem - someone posted Cragwag on StumbleUpon, which led to an immediate spike in traffic on top of the slow increase I've been getting since I made it Tweet the latest news. All the optimisation work that I knew I needed to do at some point was more than a few hours work, and I had to get something out quickly - enter memcached.

Memcached is a simple, distributed-memory caching server that basically stores whatever data you give it in memory, associated with a given key. Rails has a built-in client that you can use simply as follows:

my_data = Cache.get(key) do { 
  ... do stuff to generate data
}

If the cache has an entry for the given key, it will return it straight from the cache. If not, the block will be called, and whatever is returned from the block will be cached with that key.

So far so good - but what exactly should you cache, and how should you do it?

The Complicated Way To Do It

A common pattern is to cache ActiveRecord objects, say by wrapping the finder method in a cache call, and generating a key of the class name and primary key. But this only works for single objects, which are usually pretty quick to retrieve anyway, and is no use for the more expensive queries, such as lists of objects plus related objects and metadata, or - often particularly slow - searches.

So you could extend that simple mechanism to cache lists of objects and search results, say by using the method name and the given parameters. But then you have an all-new headache - an object might be cached in many different collections, so how do you know which cache keys to purge? You have two options:

Try and keep track of which cache keys are caching which objects? Eep - that's starting to sound nasty - you're effectively creating a meta-index of cached entries and keys, which would almost certainly be comparable in size to your actual cache... and where's that index going to live and how are you going to make sure that it's faster to search this potentially large and complex index than to just hit the damn database?

Sidestep the invalidation problem by invalidating the entire cache whenever data is updated. This is much simpler, but there doesn't seem to be a "purge all" method - so you'd need to keep track of what keys are generated somewhere, then loop round them and delete them individually. You could do this with, say, an ActiveRecord class and delete the cache keys on a destroy_all - but still, that's icky.

The Easy Way To Do It

After a few minutes Googling, I found this post on the way Shopify have approached it, and suddenly it all became clear. You can solve the problem of Cache Invalidation by being cunning about Naming Things - in particular, your cache keys.

The idea is very simple - Be Specific about exactly what you're caching. Read that post for more details, or read on for how I've done it.

So I ripped out all of my increasingly-over-complicated caching code from the model, and went for a simple approach of caching the generated html in the controllers. At the start of each request, in a before_filter, I have one database hit - load the current CacheVersion - which just retrieves one integer from a table with only one record. Super fast - and if the data is cached, that's the only db hit for the whole request.

The current cache version number is stored as an instance variable of the application controller, and prepended to all cache keys. The rest of the key is generated from the controller name, the action, and a string constructed out of the passed parameters. Any model methods that aren't just simple retrievals but affect data, can just bump up the current cache version, and hey presto - everything then gets refreshed on next hit, and the old version just gets expired on the least-recently-used-goes-first rule.

This has a few very nice architectural benefits:

The caching code is then in the "right" place - in the bit you want to speed up - i.e. the interface
You also eliminate the overhead of rendering any complicated views - you just grab the html (or xml, or json) straight from the cache and spit it back.
It utilises, and fits in with, one of the fundamental ideas of resource-based IA - that the URL (including the query string) should uniquely identify the resource(s) requested
The application controller gives you a nice central place to generate your keys
If you have to display different data to users, no problem - just put the user id as part of the key.
Rails conveniently puts the controller and action names into the params hash, so your cache key generation is very simple
The admin interface can then easily work off up-to-date data
You can also provide an admin "Clear the cache" button that just has to bump up the current cache version number.

Etc etc - I could go on, but I won't. The net result is that pages which used to take several seconds to render now take just a few milliseconds, it's much much simpler and more elegant this way, and if you're not convinced by now, just give it a try. <mrsdoyle>Go on - ah go on now, ah you will now, won't you Father?</mrsdoyle>

app/models/cache_version.rb

class CacheVersion < ActiveRecord::Base
    def self.current
      CacheVersion.find(:last) || CacheVersion.new(:version=>0)
    end

    def self.increment
      cv = current 
      cv.version = cv.version + 1
      cv.save
    end 
end

app/controllers/application_controller.rb

require 'memcache_util'

class ApplicationController < ActionController::Base
  # load the current cache_version from the db
  # this is used to enable easy memcache "expiration"
  # by simply bumping up the current version whenever data changes
  include Cache
  before_filter :get_current_cache_version

private

  def cache_key
    "#{@cache_version.version}_#{params.sort.to_s.gsub(/ /, '_')}"
  end

  def get_current_cache_version
    @cache_version = CacheVersion.current
  end

  def with_cache( &block )
    @content, @content_type = Cache.get(cache_key) do
      block.call
      [@content, @content_type]
    end
    render :text=>@content, :content_type=>(@content_type||"text/html") 
  end
end

in your actual controller:

  def index 
    with_cache {
      # get data
      # NOTE: you must render to string and store it in @content
      respond_to do |format|
        format.html { 
          @content = render_to_string :action => "index", :layout => "application" 
        }
        format.xml {
          @content_type = "text/xml"
          @content = render_to_string :xml => @whatever, :layout=> false 
        }
      end
    }
  end

Thursday, June 26, 2008

Atheistic Error Message Of The Day

while trying to install GOD, the Ruby process mgmt gem, I got a sudden attack of poignance:

[sonar@tryfan sonar-solr]$ gem install god
ERROR:  While executing gem ... (Gem::RemoteSourceException)
    HTTP Response 404

Discuss, in not less than 3000 words...

Thursday, April 03, 2008

Rake gotcha on Windows

We use Rake as our build system of choice for all of our projects, as it's much more flexible and pleasant to work with than make or ANT, and by and large it's pretty cross-platform. Except.....

I just spent about two hours tearing my hair out over something which should have worked transparently on Windows aswell as *NIX, but it just wasn't playing.

For reasons which would be tedious to go into, inside one particular rake task I needed to cd to a different directory and execute rake in there, to pick up a completely different set of tasks and ActiveRecord model classes, etc, and then resume execution of the containing task.

The following code worked fine on UNIX, but just completely failed to do anything on Windows:

Dir.chdir("../sonar-web") { system( "rake", "db:name_of_other_task" ) }

No error messages, nothing, it just wasn't doing anything.

I'll spare you the headbanging frustration and exhaustive list of things I tried that didn't work, and jump straight to the solution -

rake on Windows needs to execute c:/ruby/bin/rake.bat, not c:/ruby/bin/rake !

rake.bat will in turn call ruby.exe c:/ruby/bin/rake and pass on the command line parameters.

So at the top of my rakefile, I just added:

# on Windows, you can't invoke rake via a "system" cmd, as rake actually should invoke rake.bat
RAKE_CMD      = RUBY_PLATFORM.match(/win/) ? "rake.bat" : "rake"

and then changed my system call to :

system( "#{RAKE_CMD}", "name_of_other_task" )

....and everything worked fine.

Tuesday, February 26, 2008

I still miss CFOUTPUT

It's been about a year now since I last coded CF in anger, and what coding I've done since then has been mainly in Java and Ruby On Rails. These days, most of my coding is done in my spare time, which is a resource in increasingly short supply - so coding actually tends to be done during my random bouts of insomnia. Like many similar turncoats, I've found RoR development to be settling into a fairly steady cycle :

Do some really complicated stuff really quickly.

Nod appreciatively and make coo-ing noises.

Stop yourself just before standing up and announcing that your name is Boris and you are invincible

Try to do something ostensibly simple, get stuck

Spend ages scouring the web for the simple solution that surely must be out there somewhere

Get pissed off because you found a blog post telling you that you shouldn't want to do that because it's "not the rails way"

Figure out a really ugly cumbersome way of doing it step-by-step

Fail to get to sleep because it's just bugging you that something so simple should be so hard in a framework that makes so many other more complicated things so simple

Two days later, discover entirely by accident that the fifty lines of hackery could have been done in one long line of twenty method calls ( things.each do{ |foo| foo.do.some.other.thing }.and.then.do.some.thing.else( bar ) ) if only you'd known that the method you needed was called (insert counter-intuitive method name here) and that there was a plugin called (insert name of obscure plugin here)

Go back and redo your hack with the one long line of twenty method calls. Feel like a l33t h4x0r d00d because you just replaced fifty lines with one.

Discover that somewhere in your one long line of twenty method calls, one of them is returning nil.

Think "hey, it's ok, I can debug this easily with the console!". Feel smug.

Spend what seems like an aeon starting the console, reproducing the conditions in which you get your nil, changing something, then restarting the console so you can test your change.

Resolve to write better unit tests in future.

Pine a little for the good old days of "make your change then hit f5" to see if something works.

Start a trawl through the source code looking for the cause of the problem

Reflect that while duck typing is indeed an orgy of sheer loveliness, in this particular case it would be nice if just this once you could know for sure that this particular object is a Foo and therefore all you need to know would be found in foo.rb

Discover that the cause of your woe is that at least one of your magical plugins doesn't work on Oracle, or SQL Server, or indeed anything other than MySQL.

Go back to the blog on which you found the plugin to see if it's a known problem with a new version

Discover that the blog is down.

Write a plugin to patch the plugin to work on Oracle

Feel vaguely uneasy that you now have a chain of umpteen plugins patching plugins patching plugins patching plugins patching ActiveRecord to do something that you apparently shouldn't want to do, but dammit, you needed to get it done by 5pm.

Go out and pull scary faces at small children to make them cry for a while until you feel better.

Come back feeling much better now that you've spread a little frustration around. Reflect that you're probably just not thinking about things in the "right" way.

Adjust your thought-angle, and come at it again

Repeat from step 1

To be clear, I do think that Ruby has some wonderful features. Blocks, open classes, method_missing - all of these little niceties make some fantastically cool things possible. The Enumerable#sort_by method in particular was one discovery that just gave me a wonderful warm fuzzy feeling - e.g.

some_collection.sort_by { |element| [ element.method1, element.method2, element.method3 ] }

Rails, also, has some really great features, and makes some of the donkey work so easy it's almost laughable. But sometimes it feels like all the thinking went into the elegance of the back-end design, and not enough thought went into the templating. RHTML feels like a tacked-on afterthought. HAML is better in some respects, but it still feels awkward. I haven't yet found any templating language that even comes close to the sheer simplicity and ease-of-use of CFOUTPUT.

The example that triggered this rant was grouped output. Let's keep this simple for the purpose of example - say I have a recordset with three fields:

Type	Sub-type	Title
Type 1	Sub-type 1	Foo 1
Type 1	Sub-type 1	Foo 2
Type 1	Sub-type 1	Foo 3
Type 1	Sub-type 2	Bar 1
Type 1	Sub-type 2	Bar 2
Type 1	Sub-type 2	Bar 3
Type 2	Sub-type 3	Foobar 1
Type 2	Sub-type 3	Foobar 2
Type 2	Sub-type 4	Foobar 3

- etc etc.

If you wanted to output these with headers and sub-headers whenever the type or sub-type changed, it would be almost trivially easy in CF:

<cfoutput query="myRecordset" group="type">
  <h2>#myRecordset.type#</h2>
  <cfoutput group="subtype">
    <h3>#myRecordset.subtype#</h3>
      <ul>
      <cfoutput>
         <li>#myRecordset.title#</li>
      </cfoutput>
      </ul>
  </cfoutput>
</cfoutput>

- but in Rails? I'm still at step 5. I know I'm not the first person to need to do this, not by a long shot, but I just don't yet know what the method that surely must exist would be called. I know you can use Enumerable#group_by to group an array of
objects into something approximating the raw recordset above, but that's kind of working backwards to me.

I also know that someone will probably post the answer in a comment, probably with some kind of dismissive one-word instruction like "Read." or "Learn." linking to the relevant part of the docs. And that's all well and good. I'm just in a temporary bout of misty-eyed nostalgia for the things that CF made easy - particularly CFOUTPUT.

Instant Badger