Editorials

A Little ROW_NUMBER Magic

I really like the ROW_NUMBER() function in SQL Server as a means of getting the first or last instance of a record for each specific group within record set containing more than one group. Ok, let me break that down a little bit.

It’s pretty easy to get the last record of the line items for a single purchase order. But, what if you wanted to get the last line item for all purchase orders? You can; they are all in the same table. But, without using ROW_NUMBER() the task is a little more complicated. Since this editorial is about ROW_NUMBER, I won’t take the time to show you how we used to do it.

For my example today I am using the EmployeePayHistory table found in the AdventureWorks2012 database, because it demonstrates the problem, and I have a copy of that database handy. The table has BusinessEntityID, RateChangeDate, Rate, PayFrequency and ModifiedDate columns. The first thing I wanted to do was to determine if there were any groups within that table having multiple records. To find that out, I did a GROUP BY query on BusinessEntityID.

SELECT      BusinessEntityID, Count(1) as Records
FROM        AdventureWorks2012.HumanResources.EmployeePayHistory 
GROUP BY    BusinessentityId  
HAVING      count(1) > 1

This query returned 13 BusinessEntityID records, each having 3 unique entries. I wanted to see what those specific records looked like, so I used the results of the previous input in a new query, joining back to the detail table to get the records for only those 13 BusinessEntities.

WITH MyEntities
AS
(
    SELECT      BusinessentityId, Count(1) as Records
    FROM        AdventureWorks2012.HumanResources.EmployeePayHistory 
    GROUP BY    BusinessentityId  
    HAVING      count(1) > 1
)
SELECT      eph.*
FROM        MyEntities  e
JOIN        AdventureWorks2012.HumanResources.EmployeePayHistory eph
            ON  e.BusinessEntityID = eph.BusinessEntityID
ORDER BY    eph.BusinessEntityID, eph.RateChangeDate

Following is a table with the first BusinessEntityID

BusinessEntityID RateChangeDate Rate PayFrequency ModifiedDate
4 2002-01-05 8.62 2 2001-12-22
4 2004-07-01 23.72 2 2004-06-16
4 2006-01-15 29.8462 2 2006-01-01

The same kind of history was demonstrated for the other 12 Business Entities. I now want to write a query that returns only one record for every Business Entity in the detail table, regardless of how many records may be present for a single Business Entity. Using ROW_NUMBER() it is easy to get the first or last record for each individual Business Entity. In my query I want to get the last record for each BusinessEntityID based on the RateChangeDate. From the table above, I want to get the third record, having the last RateChangeDate.

WITH MyPay
AS
(
    SELECT      BusinessentityId
               ,RateChangeDate
               ,Rate
               ,PayFrequency
               ,ModifiedDate
               ,ROW_NUMBER() OVER 
                    (
                        PARTITION BY BusinessEntityID
                        ORDER BY RateChangeDate DESC
                    ) as RowId
    FROM        AdventureWorks2012.HumanResources.EmployeePayHistory 
)
SELECT      BusinessentityId
           ,RateChangeDate
           ,Rate
           ,PayFrequency
           ,ModifiedDate
FROM        MyPay
WHERE       RowId = 1

This query returns the last record for each Business Entity, based on the RateChangeDate. I enforce this behavior by Partitioning by each BusinessEntityID. Using this partition causes the ROW_NUMBER function to start counting from 1 each time the BusinessEntityID changes. In the ROW_NUMBER function I also sort the records with the ORDER BY RateChangeDate DESC clause, so the data is sorted by the latest to the oldest RateChangeDate. I wrap all of that magic in a CTE I call MyPay, which I may then use to write my query, returning only the records I want with the where clause WHERE RowId = 1, the value of my ROW_NUMBER function in the CTE.

Your final results are now 290 records in my database, 26 less than the total number of records in the table.

There you have it, a way to get max out of a group within a set of groups.

Cheers,

Ben