Editorials

Retrieving Random Records

Random record selection is a rather involved topic. You can use random number generators when creating data, and use the randomly created values for selection. You can select records at runtime, randomly assign a value, and select a few meeting your criteria. You can also purchase third party tools that optimize and automate the process for you. And much more.

Today I am presenting a simple solution that often is adequate. It uses a modulus on a sequential value. If the value Mod [test number] = 0, then the record is included in the output. My strategy determines how many rows are in a table, and, using a value for the number of records to return, creates a modulus that applied to all the records in the table, returns the desired number of records, evenly parsed throughout the whole set. Of course, on a very large table, this technique would be quite slow. Let’s start out with how I determine the Mod comparison value. I’m using the Adventureworks 2012 database for this example.

DECLARE @Records INT = 5

DECLARE @ModAmount INT =

(SELECT COUNT(1) FROM Production.Product) / (@Records + 2)

SELECT @Records as Records, @ModAmount as ModAmount

Once this query is done it returns

Records ModAmount
5 72

My products table has 504 records. By using a modulus of 72, it breaks the table up into 7 sections. I get the record that divides each of those sections. Since the first section starts at 1 it will be excluded. I cap the number of records returned to 5, excluding the last selection. This results in 5 records, evenly disbursed in the middle of the entire set.

Now that I have my segments, I run three different queries, demonstrating how you can get different random results from the same set, using different sequences contained in the table. Some of the sequences you have to generate yourself through the use of a ROW_NUMBER() function. The trick is to have a sequential number upon which you can apply your modulus comparison.

This first query does not require a ROW_NUMBER function. Instead it uses the incremental primary key column generated as records are inserted into the table. The problem with this method is that the records are not guaranteed to be sequential, as is the case when using ROW_NUMBER. However, the sequence already exists, and no processing ower is used to create a new one just for your query.

SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice

FROM production.Product

WHERE ProductID % @ModAmount = 0

This second query uses the same Modulus comparison. This time it generates a sequential number based on the ProductNumber column. This is a string value, and a MOD operation can’t be applied to it. However, it can be sorted, and a number assigned to each unique instance. Then you can apply the MOD operation to the generated sequence.

;WITH Products

AS

(

SELECT ProductId, ProductNumber, Name, ListPrice,

Row_Number() OVER (PARTITION BY 1 order by ProductNumber) as RowId

FROM production.Product

)

SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice, RowId

FROM Products

WHERE RowId % @ModAmount = 0

Like the previous query, the following query generates a sequence based on the ListPrice. This is a nice query because it gets records from a range of prices throughout the entire table. That’s a pretty good representation. It is also weighted in a fashion. If there are more prices at the low end, there will be more records with low prices. Then the MOD comparison returns more low price records than high priced ones.

;WITH Products

AS

(

SELECT ProductId, ProductNumber, Name, ListPrice,

Row_Number() OVER (PARTITION BY 1 order by ListPrice) as RowId

FROM production.Product

)

SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice, RowId

FROM Products

WHERE RowId % @ModAmount = 0

Here are the results I had from these queries on my instance of AdventureWorks.

Query One – Order By ProductId
Product
Id
Product
Number
Name List
Price
Row
Id
420 LI-7160 Internal Lock Washer 8 0.00 0
504

RA-2345

Cup-Shaped Race 0.00 0
756 BK-R68R-44 Road-450 Red, 44 1457.99 0
840 FR-R92B-52 HL Road Frame – Black, 52 1431.50 0
924 FR-M21B-42 LL Mountain Frame – Black, 42 249.79 0

Query One – Order By ProductNumber
Product Id Product Number Name List Price Row Id
749 BK-R93R-62 Road-150 Red, 62 3578.27 84
737 FR-R38B-48 LL Road Frame – Black, 48 337.22 168
359 HJ-1213 Thin-Jam Hex Nut 9 0.00 252
432 LJ-5811 Thin-Jam Lock Nut 13 0.00 336
907 RB-9231 Rear Brakes 106.50 420

Query One – Order By List Price
Product Id Product Number Name List Price Row Id
405 LE-1201 External Lock Washer 9 0.00 84
489 MT-1000 Metal Tread Plate 0.00 168
715 LJ-0192-L Long-Sleeve Logo Jersey, L 49.99 252
820 FW-R820 HL Road Front Wheel 336 336
888 FR-T98Y-50 HL Touring Frame – Yellow, 50 420 420

If you wish to dig into more sophisticated methods of random record selection, I found a good sampling of different techniques here. https://www.mssqltips.com/sqlservertip/3157/different-ways-to-get-random-data-for-sql-server-data-sampling/

Cheers,

Ben