Editorials

OVER() ++

Yesterday we talked about how using the OVER operation in conjunction with an Aggregate function results in an aggregate calculation unique to each detail row without requiring a GROUP BY clause in the query.

We demonstrated how you could use the OVER operation without any parameters, resulting in an Aggregate calculation covering the entire set of data being processed. This is a very powerful, and useful, capability that was more complicated to implement prior to the inclusion of the OVER operation.

There are two parameters you can supply when using the OVER operation that make it possible to generate more granular summarization results. Those parameters are probably familiar to you as “PARTITION BY” and “ORDER BY”. The values you provide for these two parameters come from the columns of the data in your select statement. They may consist of one or more columns, or even a constant value.

PARTITION BY does exactly what it says. When you return your aggregate data, it will begin a new aggregate calculation for each unique value of the columns data defined in your PARTITION BY clause. I like using the ROW_NUMBER() aggregate function with partition by to find a series of records within a giving group.



SELECT poi.*,

ROW_NUMBER() OVER (PARTITION BY poi.PurchaseOrderId ORDER BY 1)

FROM PurchaseOrderItems poi

ORDER BY poi.PurchaseOrderId

This query returns all purchase order items in the database, and computes a sequence starting at 1 for each unique PurchaseOrderId. It doesn’t take into account anything else from the items because the ORDER BY clause is set to a constant value, and will not change from row to row.

Let’s say you want the same query to compute the ROW_NUMBER for each PurchaseOrderItem, starting at 1 for each Purchase Order, but you want to have it calculate from the Highest Prices Item to the Lowest Price Item. You can modify your query to replace the constant with the price and the computed row number will be based on the PurchaseOrderId for when a new series is started, and will order the records within a PurchaseOrderId group from the highest price to the lowest price…

SELECT poi.*,

ROW_NUMBER() OVER (PARTITION BY poi.PurchaseOrderId ORDER BY poi.Price DESC)

FROM PurchaseOrderItems poi

ORDER BY poi.PurchaseOrderId

Notice that the ORDER BY clause works just like sorting data from a SELECT query. You may use the DESCENDING or DESC parameter to sort the data from highest to lowest. Also like the ORDER BY clause, ASCENDING is the default operation if you don’t supply the parameter.

Just a final note…if you don’t want to partition by anything specific, but wish to use the ORDER BY operation, you may use a constant value in the PARTITION BY parameter as well.

Take a deeper dive into the OVER Parameter in your aggregate functions. It has a lot of power, and when you get your brain wrapped around it, you can do some really cool things.

Cheers,

Ben

Facebooktwittergoogle_plusredditpinterestlinkedinmail
  • Mattias Lind

    Key here is to understand that PARTITION BY is the important part of grouping similar rows so you can execute the aggregate on a partitioned set of the data. That is PARTITION BY is grouping in the same manner as GROUP BY, but the group is applied from the granulary level. If you create a nested subquery, running the aggregate filtered by the grouping column, as part of the expression list you get the same result. Nothing magical, but really awesome.

  • Peter Heller

    Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions (Developer Reference) 1st Edition (ISBN-13: 978-0735658363) by Itzik Ben-Gan (Author) is a great book covering this topic.