Thursday, March 16, 2006

Tip of the day : Selecting Random Rows

A common feature request is to show a random selection of X rows from a table. The basic premise of how it's done is simple, but subtle inconsistencies mean that the syntax which works on MySQL won't work on SQL Server, and vice-versa. Here's how we've done it on each platform.

Basic Premise

In your query, as well as the columns you want to select, also select a calculated column which contains a random, non-sequential element, and sort on that column first.

MySQL

In MySQL v.4.1.2, the UUID() function was introduced. So why can't you select a UUID() and sort on that? Well, because although the UUID() function creates a unique identifier, it always seems to create them in a sequential order - the following query :
SELECT UUID(), name
FROM `help_keyword`
ORDER BY 1
LIMIT 5

produces these results on first run:




UUID()name
ba109f04-064f-1029-82dd-fe32006a1320<>
ba109f35-064f-1029-82dd-fe32006a1320ACTION
ba109f4b-064f-1029-82dd-fe32006a1320ADD
ba109f5e-064f-1029-82dd-fe32006a1320AES_ENCRYPT
ba109f73-064f-1029-82dd-fe32006a1320AFTER

and these results on the next run:




UUID()name
5d7f49d4-0650-1029-82dd-fe32006a1320<>
5d7f4a08-0650-1029-82dd-fe32006a1320ACTION
5d7f4a1f-0650-1029-82dd-fe32006a1320ADD
5d7f4a33-0650-1029-82dd-fe32006a1320AES_ENCRYPT
5d7f4a48-0650-1029-82dd-fe32006a1320AFTER
so you can see that although each UUID is unique, they're always created sequentially, which means it's no use to use in our random selection.

The solution is to use a random number, created via the RAND() function. Using this query:
SELECT RAND(), name
FROM `help_keyword`
ORDER BY 1
LIMIT 5

we get these results:




uidname
0.00856768061273515OUTFILE
0.0149379475181344COMPACT
0.0210173549326298RENAME
0.0219931677188661LAST
0.0236370014246898MOD

on the first run, and on the second, we get :






uidname
0.00319886301010741MASTER_PORT
0.00735198055147378PACK_KEYS
0.00786910020547835DISTINCT
0.00797068235275399BIGINT
0.0113145066530579LINESTRINGFROMWKB

So that's how a random selection can be made on MySQL.

SQL Server

On SQL Server, there is also a random number generator, also called RAND() :
select TOP 5
RAND() AS randomNumber,
tblUser.nvcSCreenName
FROM tblUser
ORDER BY 1

but this produces, shall we say, unexpected results:




randomNumbernvcScreenName
0.33363907482884941Al
0.33363907482884941AlanPartridge
0.33363907482884941AlD
0.33363907482884941Ali dahodza
0.33363907482884941Andy
In SQL Server, the random number generator only gets seeded once per query, not once per row - which makes it of no use for our purposes.

The solution is to create a UUID instead - but on SQL Server the funciton you will need is NEWID():
select TOP 5
NEWID(),
tblUser.nvcSCreenName
FROM tblUser
ORDER BY 1

which gives us these results on first run:






randomNumbernvcScreenName
8B7CC32F-20EC-4111-86DE-17B4183B4C94Al
DCF6B780-55B5-4C2D-A7FA-27D9C8C44F33neil
B5C5F469-ADC2-4866-99F0-297355DFBF22AlanPartridge
C7DBE10F-94A0-420E-8894-33511DE46C1Ddave10
A3360F90-9B0F-4C4F-9438-3B6F58BF5670RuairĂ­
and these results on second run:
randomNumbernvcScreenName
C27A9E7F-9D51-45A3-8661-0FABFA511617dave3
BF9AB329-F863-4330-BBBC-105F1C19717FAlanPartridge
D45DE252-FF96-4D44-8D76-116981211A93RuairĂ­
D8EAA1B7-9558-41CF-B844-14E5E792CCA9dave5
9A34C260-10A9-40E6-B020-156737A11C43Al


So bingo, bob's yer firkin, job's a good'un, etc etc - that's how it can be achieved on SQL Server.

7 comments:

Anonymous said...

You could actually just do ORDER BY newid() in the t-sql example without actually selecting the newid(), small difference I suppose but it looks cleaner to me. There are some drawbacks to the technique which the commenters on my similar post mention at http://www.remotesynthesis.com/blog/index.cfm/2006/1/30/Random-Records-in-TSQL

Kristofer Krause said...

Works great. Thanks!

Amboy Observer said...

Nice trick. Thanks.

Anonymous said...

Visit http://technoexperts.blogspot.com/2008/08/random-records-from-sqlserver-database.html for More Details.

Anonymous said...

Visit Select Random Records for More Details.

Bala said...

I tried this query

select top 5 RAND() as RandomNumber,Title from HumanResources.Employee
order by 1

but unfortunately i get the same random numbers and the resultset doesn't seem to be changed

Alistair Davidson said...

Bala

Are you on SQL Server? If so, then as the article says, the random number generator only gets seeded at the start of execution

Al