ARTICLES

Home  > Articles  >  Use SQL 2K Functions To Optimize Your Queries - Part 2

Use SQL 2K Functions To Optimize Your Queries - Part 2
by Ben Taylor

 

Introduction

 

What do you do when you need to run a query that returns multiple rows and for each row you need to run an additional query returning a single value? For example, you may need to compute the Median cost per item for each order in a purchase order system. SQL Server does not provide a Median aggregation function. It has SUM, COUNT, Min, MAX, STDEV and a few others. But you require the Median value for a series of rows represented by an SQL Query. Here come SQL 2K User Defined Functions to the rescue. This is the second article in a three part series demonstrating the use of User Defined Functions to perform a correlated query returning a single value that can be encapsulated (correlcated) into another query.

The first article demonstrated a method of returning multiple values for each row in a query as a single comma separated list value. This article demonstrates an User Defined Function that extends SQL Servers aggregate functions providing a median aggregation.The third article (still to come) demonstrates a method for computing percentage of investment return over a period of time specified at execution time.

 


Use UDF Median Function

 

SQL Server does not provide a Median Aggregate Function. If you don’t do a lot of statistical work you may not remember that the Median is the middle record in a set of records. The value 6 is the median of the following list:

 

1

4

6

9

23

 

The median of a list with an even number records is the record just before the middle. The value 6 is the median of the following list:

 

1

6

9

23

 

 

Finding a specific row from a set of records is one of the most challenging kinds of queries that can be performed in SQL Server. This process becomes even more complicated when you need to obtain the Median for multiple records. What if you wanted to return the Median Item_Amt of the Order_Items table for each record in the Orders table (See figure 1)? This can be performed with some sophisticated queries. However, an UDF makes this process much more painless, the queries are easier to understand and the query still performs quite well.





Figure 1


Create the Function

 

Here is the syntax for an UDF that computes the Median.

 

CREATE FUNCTION fnMedianOrderDesc (

      @Order_ID INT)

 

      RETURNS MONEY

 

AS


BEGIN

 

DECLARE @MEDIAN         MONEY

DECLARE @RECCOUNT       SMALLINT

DECLARE @ITEM_NO       SMALLINT

 

SELECT @RECCOUNT = COUNT(*)

FROM  Order_Items

WHERE Order_ID = @Order_ID

 

IF @RECCOUNT = 0

BEGIN

      SET @MEDIAN = NULL

END

ELSE

BEGIN

IF @RECCOUNT ^ 2 = 0

BEGIN

      SET @ITEM_NO = @RECCOUNT / 2

END

ELSE

BEGIN

      SET @ITEM_NO = (@RECCOUNT / 2) + 1

END

 

SELECT @MEDIAN = Item_Amt

FROM  Order_Items

WHERE Order_ID = @Order_ID

AND   ITEM_NO = @Item_NO

END

 

RETURN @MEDIAN

 

END

 


How It Works

 

First the UDF is defined. A parameter is passed to the UDF to identify the Order_ID that will be used to identify the Order_Items records to be included in the determination of the Median.

 

CREATE FUNCTION fnMedianOrderDesc (

      @Order_ID INT)

 

The function is further defined to return a Money data type.

 

RETURNS MONEY

 

AS


BEGIN

 

Next the UDF declares working variables needed to find the median within the records meeting the criteria. We are going to have to return the actual median value so that is tracked in the variable @Median. We need to find the number of records that match our criteria so we can determine what is the middle (MEDIAN) record. @RECCOUNT stores this value. We need to know what record to actually retrieve. This value is stored in @ITEM_NO.

 

DECLARE @MEDIAN         MONEY

DECLARE @RECCOUNT       SMALLINT

DECLARE @ITEM_NO SMALLINT

 

At this point the UDF retrieves the number of records from Order_Items that match the Order_ID sent to the function in the variable @Order_ID

 

SELECT

@RECCOUNT = COUNT(*)

FROM  Order_Items

WHERE Order_ID = @Order_ID

 

If no records are in the database for @Order_ID then we want to return a MEDIAN value of NULL. So that we have only one point of return we always store the value to return in the variable @MEDIAN. In this case we store NULL.

 

IF @RECCOUNT = 0

BEGIN

      SET @MEDIAN = NULL

END

 

When there are records in the database for @Order_ID we process the result differently. There are two different formulas to determine the MEDIAN record. One formula is used if the number of records is Even.

 

ELSE

BEGIN

 

A different formula is used if the number of records is Odd. A test of the MODULO of the record count with 2 determines if the number of records is even or odd. If the remainder of the variable @RECCOUNT divided by 2 is zero then there is an even number of records. Otherwise there is an odd number of records.

 

ELSE

BEGIN

IF @RECCOUNT ^ 2 = 0

BEGIN

ELSE 

 

If there is an even number of records then we want to get the value from the Order_Item record that is the same as @RECCOUNT / 2.      

 

BEGIN

      SET @ITEM_NO = @RECCOUNT / 2

END

 

Otherwise we want to get the value from the Order_Item record that is the same as (@RECCOUNT / 2) + 1. This works because the datatype for @RECCOUNT is an int. Therefore the result of @RECCOUNT/2 will be rounded down for the fraction. We can then add 1 to point to the exact middle record of an ODD numbered recordset.

 

ELSE

BEGIN

      SET @ITEM_NO = (@RECCOUNT / 2) + 1

END

 

Now that we know what record contains the median value we can retrieve the Item_Amt value for that record and store it in @MEDIAN as the value to return for the function.

 

SELECT @MEDIAN = Item_Amt

FROM  Order_Items

WHERE Order_ID = @Order_ID

AND   Item_NO = @Item_NO

 

END

 

The function has computed @MEDIAN based on the criteria of No Records, Even Records or Odd Records. It now completes passing the resulting formula as a Money value through the Variable @MEDIAN.

 

RETURN @MEDIAN

 

END


Use the Function

 

Now that we have the function written we can include it in a query. Consider the following query:

 

SELECT

      Order_ID,

      Order_Date,

      Median_Amount = dbo.fnMedianOrderDesc (Order_ID)

FROM  Orders

WHERE Orders.Order_Date > ’4/2/2004’

 

This query returns a record for every Order in the Orders Table that has a date > ‘4/2/2004 00:00:00’. It has three columns for each row showing the Order_ID, Order_Date and the MEDIAN Item_Amount for each Order_ID.



About The Author

 

Benjamin (Ben) Taylor has been working with Relational Databases since 1984 in multiple languages and platforms. He has worked with some of the earliest Relational Databases prior to the publication of SQL such as Bloom and Harvest from the DEC world.

 

Since 1993 Ben has worked primarily with Microsoft SQL Server starting with SQL Server 4.21. He has designed SQL solutions working with both Transaction Databases and Data Warehouses. His Query and Database Optimization techniques are in great demand in the Indianapolis area. Ben has written both software publications and white papers. Ben is currently employed by CTI Group, Inc. in Indianapolis Indiana designing an Electronic Telephone Billing Database Warehouse estimated to exceed 6 Terabytes and support 500+ concurrent users

 

Contact Ben at benstaylor@sbcglobal.net.

 

Copyright – Benjamin S. Taylor (December 2004)