Monday, September 03, 2012

Getting Accurate Count on Grouped Arel Relations with Squeel

Let's say you have a scope on a model that joins onto potentially several related models:


class Post < ActiveRecord::Base
  has_many :comments

  scope :commented_on_by, lambda{ |user|
    .joins(:comments).where( :comments=>{:author_id => user.id} )
  } 
end

( OK, so this is a contrived artificial example, there are better ways of doing this - but the real example which triggered this post is way too complex to go into in detail, and we're all familiar with blogs, posts and comments, right? )

So if you call Post.commented_on_by(user) you'll get all the posts which the given user has commented on. All well and good. BUT - what if the same user comments multiple times on the same post? You'll then get the same post repeated as many times in your resultset.

So the logical thing to do here is to introduce a group by clause, to make sure you only get each post once, right?


class Post < ActiveRecord::Base
  has_many :comments

  scope :commented_on_by, lambda{ |user|
    .joins(:comments).where( :comments=>{:author_id => user.id} ).group(:id)
  } 
end

OK, that's that problem solved. Except.... what if you now want to run an aggregate query over that result set? Say, you want to count the number of posts the user has commented on?


> Post.commented_on_by(user).count
> {1=>2, 2=>1, 3=>2}

Er... whut? We get a Hash?

Well yes - because we've grouped the query, ActiveRecord will group the count results as well. It's saying "post id 1 has two comments by this user, post id 2 has one, and post id 3 has two". All of which is entirely correct, but a bit annoying if you just want to get the number of posts. OK, you could do Post.commented_on_by(user).count.size, but that kind of defeats the intended purpose of a count call, right?

The fundamental problem is the underlying structure of your query. You're essentially collapsing multiple rows into one with the group clause, and your SQL parser interprets it as "get me the count of rows for each row BEFORE it's collapsed" rather than after.

So, how do we get it to treat these multiple rows per-post as if they were one? IN() is your friend!

The good news is, you can get round this using Ernie Miller's rather excellent gem Squeel. We've been using Squeel a lot at iTrigga, as it helps us keep on top of syntax for some pretty complex queries.

One of the many nice things about Squeel is that it lets you supply ActiveRecord::Relation objects as a value for a predicate - so if you can restructure your query to use an IN() clause, you can supply your already-built Relation object as the value to it.

In this case, you can rewrite:


> Post.commented_on_by(user).count

as


> Post.where{ id.in(Post.commented_on_by(user).select(:id)) }.count

and the result should be correct

1 comment:

Tara said...

Great reading your bllog post