Friday, December 21, 2007

SQL QOTW - SELECT all rows up to a cumulative total

Here's a little SQL test that arose here the other day. Let's say you have a DB table called documents, in which you have the following fields:


full_content is a BLOB, containing all the text extracted from whatever document the record represents.
bytesize is, cunningly enough, the size of the content in bytes.

You want to get all the documents, and do something to them - what you want to do is not important here, just that it involves the full_content. However, a moment of pondering will show the nasty bit of this problem... Let's say you may have tens of millions of documents, and the average bytesize is around a meg. Clearly attempting to read the full table into memory is not a good idea (at least until we get Terabytes of RAM in our servers).

A far better idea is to limit the maximum memory usage to some value, and only process up to X MB of documents in one batch. The next time round, you'll get the next X MB of docs, and so on.

We turned this problem over and over for a while, and we thought of a few ways of doing it, but all involved multiple queries, and/or things like temporary tables... and, dammitall, it SHOULD be do-able in one query, dammit man! (cue much indignant harumphing and twizzling of large mutton chop side whiskers, in a Victorian man-of-letters kind of way) Oh yeah, and here's the killer - it has to be portable across Oracle 10g and MySQL.

So the problem is this - given this table, can you construct a single query that will read all the documents up to a given maximum number of megabytes, that will run on Oracle 10g AND MySQL?

Well, after pondering this in several isolated attempts for a few days, I went back to it yesterday, and the answer just popped out straight away - and it's actually deceptively simple. I was going to just give you the answer, but I thought it might be a bit more fun to leave it as a challenge, so a big fat Brucie Bonus goes to the first person who gets it - I'll put the answer up here later on today, if no-one gets it in the meantime.


Mike Bravo said...

Just wanted to let you know that you will be getting an invoice for the medical bills that have accumulated from my head injuries, which in turn were caused by this challenge. :) j/k

I have to go back to work now as I don't have time to do all the research necessary to even take a respectable stab at this one.

Will be looking forward to the solution! Thanks!

Alistair Davidson said...

Hi Mike

Hehe - i'm glad that after taking several days to work it out, i was't instantly deluged by comments saying "pffft! That's eeeeaaaasy!"

I decided to put the solution into a separate post, as it needed a bit of explanation. So the answer, should you want it, is here