ARTICLES

Home  > Articles  >  Create Your Own User Defined Aggregate

Create Your Own User Defined Aggregate

By Ben Taylor
btaylor@sswug.org

Introduction

SQL Server ships with a number of Aggregate functions built in. These functions cover most of the normal statistical requirements by providing AVG, Checksum, Count, Sum, Min, Max, Stdev, Stdevp, Var and VarP. Microsoft has released the ability to integrate with the Group By clause in SQL Server 2005, providing the savvy developer with the ability to create their own User Defined Aggregate for those formulas that are not part of the standard set shipping with SQL Server. This article explores how this is done and defines a few aggregate functions as examples.

The Problem

Each of the functions listed above are useful when working with large sets of data in SQL Server. As the size of the dataset grows the value of aggregate functions becomes more apparent.

The problem is that when you perform statistical analysis there are often a number of aggregate functions that are not part of SQL Server as it is shipped which are necessary. In this case you have a few options.

1) You can extract all of the detail into another program for computing the aggregate. This option requires a lot of overhead transferring data to an external service. Often Excel is used for this purpose, but has a limit to the number of rows that may be returned.

2) You can write a correlated sub-select query (sometimes) which will compute the aggregate value for each row returned.

3) You can write a User Defined Function passing the parent value as a parameter. The User Defined Function returns a single value for each parent row and computes the appropriate aggregate (see my article computing a Median value using a User Defined Function).

4) You can write a User Defined Aggregate. This article will demonstrate how to write and use a User Defined Aggregate. The User Defined Aggregate option allows the developer to hook into the Grouping function of SQL Server, and define an Aggregation formula which is called for each set of values meeting the Group By criteria.

By tying into SQL Server aggregation routines you are able to write a new aggregation formula that is not specific to any particular data structure (like options 2 and 3) and does not require the use of an external program (like option 1). This allows you to use your user defined aggregate inline with other SQL Server aggregation functions in a single query.

Create A User Defined Aggregate Framework

In this article I provide you with the basic framework for creating a User Defined Aggregate. I do not spend a lot of time telling you how it works. Apress has a good study for this topic by Robin Dewson and Julian Skinner entitled, "Pro SQL Server 2005 Assemblies." Instead, I focus on the aggregate functions themselves. In order that you may use this study as is let me provide the basic framework here (figure 1)

The basic UDA requires four methods to be enabled based on the UDA interface specification. They are:

  • Init
  • Accumulate
  • Merge
  • Terminate

In addition you are going to need a variable for your accumulation value.

using System;
using System.Data;
using System.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefindedAggregate(Format.Native)]

public struct functionName
{    public void Init()
      { //Put your code here...
      }

    public void Accumulate(SqlString Value)
      { //Put your code here...
      }

    public void Merge(functionName Group)
      { //Put your code here...
      }

    public void Terminate()
      { //Put your code here...
         return new SqlString("");
      }

      // Accumulator variable
      private int var1;

}

Figure 1 - User Defined Attribute Framework

This framework will be used for the examples provided in this article today. Some of the data types may be modified to meet the requirements of the aggregate being defined.

How to Compute the Sum of Percentage Gain

Summarizing percentages is not complicated if you know the formula. This is a good example of the User Defined Aggregate capability. Given the set of data in figure 2 you can not use any standard SQL Server aggregate function to compute gain/loss over time.

Date Value Pct Gain Running Gain
Jan 100 0 0
Feb 200 1 1
Mar 300 .5 2
Apr 400 .333333 3

Figure 2 - Percentage of Gain Table

Formulas

Value = Value of an asset measured at some point in time
CV = Value of the asset for a Point In Time
PV = Value of the asset for the previous period
PRG = Previous Running Gain
CPG = Percentage Gain for a point in time

Pct Gain = (CV-PV)/PV
Running Gain = (PRG + 1) * (CPG + 1) - 1

Check

Period Gain Running Gain
Feb (200-100)/100 = 1 (0+1) * (1+1) - 1 = 1
MAR (300-200)/200 = .5 (1+1) * (.5+1) - 1 = 2
APR (400-300)/300 = .333333 (2+1) * (.333333+1) - 1 = 3

Figure 3 - Formula Check

Total Gain Computed From Value

(400-100)/100 = 3

As you can see, if you have the value available you can readily compute the gain at using the value from the first and last period to plug into your formula. However, if all you have available is the percentage of gain for each valuation you must use a different aggregation than that provided by SQL Server.

The reason that this formula works well for a User Defined Aggregate is because it does not matter what the order is for the percentage gain when computing the total. You can mix up the gain starting with April through January or the other way around. It always returns the same result.

So, now that you know what it is we are going to try and accomplish, let's dig into the C# code to implement this UDA.

Create SumPct User Defined Aggregate Function

Figure 4 presents the code I use to compute the Sum of Percentage Gain from a series of valuations over time. This function differs from the prototype in that the input and output parameters are changed from SqlString to SqlDouble. This is because we are working with numbers. In SQL Server this will be displayed as FLOAT.


using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct SumPct
{
   // Initialize the value for the accumulator to zero
  public void Init()
  {
     this.pctAccumulator = 0;
  }

  // Add the Percentage Gain from Value to the Total in the Accumulator
  public void Accumulate(SqlDouble Value)
  {
     if (!Value.IsNull)
     {
        this.pctAccumulator = (this.pctAccumulator + 1) * (Value + 1) - 1;
     }
  }

  // For Parallel processing...if sub totals exist sum them together
  public void Merge(SumPct Group)
  {
        this.pctAccumulator = (this.pctAccumulator + 1) * (Group.pctAccumulator + 1) - 1;
  }

  // Done computing...return the results
  public SqlDouble Terminate()
  {
        return (this.pctAccumulator);
  }

  // This is the accumulator for each roll up
  private SqlDouble pctAccumulator;

Figure 4 - SumPct C# Code

How it works

The class first instantiates a private variable pctAccumulator. When the Init method is called (the first time the class is executed per group) then pctAccumulator is initialized to zero.

Each time a new value is merged into the aggregate value the Accumulate method is called. During this call we use our formula (pctAccumulator + 1) * (Value + 1) - 1 resulting in the merge of the current accumulation with the current percentage.

If SQL Server breaks the query up into multiple processes for performance reasons there may be a need to merge two different instances. This is done using the Merge method. The formula is the same as that for a single value. The current instance is merged with the value from the Merge method.

Finally the Terminate method is called once all of the records have been processed for a group value in the Group by clause. When this is called the UDA returns the SqlDouble value found in the pctAccumulator member variable.

Use the SumPct UDA

Before you can use your new UDA you must first register it with SQL Server. This is done by registering the Assembly and then registering the UDA.

CREATE ASSEMBLY UDA
FROM 'C:\Program Files\SQL Server Assemblies\UDA.dll'
WITH PERMISSION_SET = SAFE
GO

This command registers the assembly (DLL) with SQL Server. Using the PERMISSION_SET = SAFE clause tells SQL Server it will use only managed code and that this assembly does not call any functions outside of SQL Server. Since we are not addressing any disk files or other operating system objects it is a best practice to use the most restrictive permission set possible.

CREATE AGGREGATE SumPct(@Value FLOAT)
RETURNS FLOAT
EXTERNAL NAME UDA.SumPct
GO

This command links a SQL Server UDA Prototype to the function call in the DLL that implements it. A new aggregate function is created (dbo.SumPct). This UDA expects a float data type as value sent to the function. It returns a float data type as a result. It identifies it with the appropriate assembly by using the registered assembly name (UDA) and the UDA class contained in the assembly (SumPct). You may have noticed that you can have more than one UDA in a single assembly. In fact, you cn combine any UDA, UDF and UDT in the same assembly. I anticipate researching best practices for how to break up assemblies and user defined objects for SQL Server 2005.

Now that you have your function you can simply execute it. Note that the function is only available in the database(s) where you have registered it.

CREATE TABLE T (pctGain FLOAT)
INSERT INTO T SELECT 0
INSERT INTO T SELECT 1
INSERT INTO T SELECT .5
INSERT INTO T SELECT .333333
INSERT INTO T SELECT .25

SELECT Min(pctGain), Max(pctGain), dbo.SumPct(pctGain) FROM T


As you can see, the creation of the UDA allows us to to mix our custom UDA with the default aggregation functions Min and Max. The gain is computed correctlyat nearly 4.0. There is some room for rounding error. In this case the rounding is at the 6th decimal place which is quite acceptable.

Final Notes

This article assumes that you have already turned on the ability to use CLR code in your instance of SQL Server 2005. If you have not enabled the CLR you can do so with the following code:

SP_CONFIGURE 'CLR ENABLED', 1
GO

RECONFIGURE
GO

As always, test your code before deploying it in a production environment. In this case, be sure to test for performance and resource usage.

This article demonstrates only the most basic aggregation technique. There are other methods which are better suited for different kinds of aggregation, but require a lot more implementation. Once again, when you find a need for deeper understanding I recommend the Apress resource, "Pro SQL Server 2005 Assemblies."

Conclusion

A user defined aggregate allows a developer to attach to the grouped data from the GROUP BY clause in an SQL statement allowing for enhanced aggregation capabilities. Now that you understand how to create your own user defined aggregate, you can continue to enhance your database capabilities. Be sure to include error trapping in your code for those un-expected errors in order to maintain your database stability.

About The Author

Ben has been working with relational databases since 1982. Primarily as a consultant he has worked with databases in many different industries from service, manufacturing, statistical analysis, high transaction sales, marketing and warehousing. He utilized many different relational platforms but focused on SQL based engines due to the growing demand and popularity of the language. Ben began using Microsoft SQL Server 4.21 running under Windows NT in 1993 and has focused on SQL Server since that time working with each progressive Microsoft release. While Ben still works full time exercising his database skills he also teaches others. He is a regular contributor to www.sswug.org with practical articles full of the “HOW TO” kind of support. He has taught college courses in database management systems and focus groups on SQL Server techniques.

Ben resides in Indianapolis, IN with his wife and two children where he hides from the cold and thrives in the summer. He gardens, camps, plays guitar, builds model airplanes and studies computer programming and software development techniques. Sometimes he just mows the lawn.

Please contact Ben if you have an SQL question you would like to see addressed in future articles or if you would like to engage his services in some other fashion. You can contact Ben at btaylor@sswug.org.

Goto Page 1  2  3  4  5  6  Vote