ARTICLES

Home  > Articles  >  High Speed Data Import Using Synthetic Keys

High Speed Data Import Using Synthetic Keys

by Ben Taylor

 

Synthetic keys are a common database design in SQL Server. A synthetic key is a system assigned value that has nothing to do with any of the data contained in the record it represents. Integers or GUIDs are often used as synthetic keys. Many methods are used to generate syntetic keys. Most built in methods are not efficient for the persistence of high volume data (ie. Auto Number or Identity columns), requiring complicated coding or multiple round trips, especially when synthetic keys provide foreign key values in other tables. While GUIDS are a popular option which may be generated in any layer, many developers prefer a sequential synthetic key; especially when no natural key exists in the data being stored. This article demonstrates one method for centrally assigning sequential synthetic keys prior to data persistence, resulting in highly scalable data generation and persistence.
 

With the explosion of grid software platforms it is easy to create a system that outperforms your database. This becomes more apparent as you persist complex objects that represent multiple records in multiple tables in your database. For example, if you had a perchase order object like figure 1 that had ten line items in the lineItems list,  this would translate to a record in the PurchaseOrder table and ten records in the PurchaseOrderLineItem table in figure 2. In order to persist this data you must first persist the PurchaseOrder record and return the synthetic key assigned. Then, each PurchaseOrderLineItem record may be saved with the key assigned. This can be done easily with a stored procedure in SQL Server 2008 by passing all the line items as a set. In a real time OLTP system both the round trip or stored procedure method work efficiently.
 


Figure 1 - Purchase Order Objects

 

 


Figure 2 - Purchase Order Tables
 

 When your application layer becomes much faster than your database, a different solution is needed. This methodology breaks down when you have to persist hundreds, thousands, or even more purchase orders all at the same time. A grid system may cache large amounts of data and persist them at a later time using more efficient methods.
 

What if your application layer assigned the primary keys keys for your PurchaseOrder table? It could then assign the foreign key value in the PurchaseOrderLineItem table without further communication with the database. In fact, you could use BulkCopy to persist your data quickly, bypassing primary key assignment in the database. I am not talking about ignoring referential integrity. Referential integrity is still in force, and the order of persistence is mandatory. I am talking about modifying the techniques used to assign synthetic keys so that data can be persisted quickly, accurately representing the objects in your application.
 

Now we have a different problem. In a grid based application you have more than one process or thread, on one or more machines, performing the same task. You have to synchronize the assignment of those keys accross all processes, assuring different values for each PurchaseOrder. Indeed, each of those processes must be managed to sychronize the incremented synthetic key values, This article demonstrates how you can manage the key assignment for multiple tables, assuring unique keys for all processes in a high speed environment, using native features built into SQL Server.
 

This solution provides the following requirements for assigning synthetic keys in a grid/cloud environment:  

  • State mangement through persistent storage of the current synthetic key value
  • Reset of State at system startup
  • A centralized service available to the grid
  • Key Locking assuring sequential access
  • ACID Transactions (no two processes may receive the same key)
  • High speed response time

The Key Manager Solution


The key manager solution is not rocket science. It consists of a single table containing records for tables that have key manged primary keys, and a stored procedure called by your applications requesting key assignment. Below are scripts to create both the table and the stored procedure for key management. All that is left for you to do is decide which tables you will manage this way, and set the seed values in the KeyManager table. 
Figure 3 Key Manager Table

 

Create the KeyManager table with the following SQL code. 
 

CREATE TABLE KeyManager (
    TableName    VARCHAR(128)  NOT NULL
   ,KeyValue     BIGINT        NOT NULL
   ,CONSTRAINT PK_KeyManager PRIMARY KEY CLUSTERED (TableName)
)

 

Once the table is in place, update the KeyValue column for the appropriate record having a TableName value matching the Table into which you are inserting records. Our example woudl use the PurchaseOrder table. I prefer to do this with a stored procedure, taking advantage of high speed SQL Server techniques for locking and gathering the end results of the update. I reduce the number of calls to the KeyManger by reserving as many keys as I know I need. If I intend to insert 100 purchase orders, then I will increment the KeyManager record by 100, reserving 100 unique Keys. The stored procedure then returns to me the number of the first key, and I may safely increment up to 100 without fear of overlap. Because the keys are synthetic, I am not concerned with Gaps in the sequences. If I reserve a key segment and don't use it, then I drop it on the floor.
 

Let's say you had another table that used the PurchaseOrderLineItemID from the PurchaseOrderLineItem table as a foreign key. In this case, you could use the KeyManager method to assign key values not only to the PurchaseOrder object and table, but to the PurchaseOrderLineItem objects and table. In our example of a purchase order with ten line items we would require two calls to the key manager. One call to reserver the key for the PurchaseOrder table, and one call to reserve ten keys for the PurchaseOrderLineItem table. A system with a collection of PurchaseOrder objects could count the purchase orders, and the line items in each, and make two calls reserving all the keys needed for all purchase orders and their line items in two calls; on call for the purchase orders and another call for the line items. This capability of generating a large volume of keys in an extremely lightway method results in extremely high performance data persistence.
 

Use this code to create your stored procedure to maintain KeyValues.
 

IF OBJECT_ID('usp_KeyManager_Get_KeyValue') IS NOT NULL
BEGIN 
    DROP PROCEDURE usp_KeyManager_Get_KeyValue
END
GO
 
/* Stored Procedure usp_KeyManager_Get_KeyValue 

   By Benjamin Taylor 

   Test 

   -- Test For Non-existent Table 
   DECLARE @KeyValue BIGINT 

   EXEC usp_KeyManager_Get_KeyValue
       @TableName = 'BogusTable' 
      ,@RowsToReserve = 20 
      ,@KeyValue = @KeyValue OUTPUT 

   IF @KeyValue IS NOT NULL RAISERROR('Test Failed for Bogus Table...Expected No Returned Value', 12, 12)    

   -- Test For Pass and Incremented Value 
   BEGIN TRAN 

   INSERT INTO KeyManager (TableName, KeyValue)
   SELECT 'Bogus', 10 

   DECLARE @KeyValue BIGINT 

   EXEC usp_KeyManager_Get_KeyValue
       @TableName = 'Bogus' 
      ,@RowsToReserve = 20 
      ,@KeyValue = @KeyValue OUTPUT 

   IF ISNULL(@KeyValue, 0) != 10 RAISERROR('Returned Key Incorrect...expected %d but was %d', 12, 12, 10, @KeyValue)        

   SELECT   @KeyValue = KeyValue
   FROM     KeyManager
   WHERE    TableName = 'Bogus'

   IF @KeyValue != 30 RAISERROR('Returned Key Incorrect...expected %d but was %d', 12, 12, 30, @KeyValue) 

   ROLLBACK
*/
 
CREATE PROCEDURE usp_KeyManager_Get_KeyValue ( 
    @TableName            VARCHAR(128) 
   ,@RowsToReserve        INT 
   ,@KeyValue             BIGINT                 OUTPUT)
AS 

   SET NOCOUNT ON 

   DECLARE @RIDS TABLE (KeyValue BIGINT NOT NULL) 

   UPDATE    KeyManager WITH (ROWLOCK) 
   SET       -- KeyValue = KeyValue + @RowsToReserve -- SQL Server 2K5 Syntax 
             KeyValue += @RowsToReserve              -- SQL Server 2K8 Syntax 
   OUTPUT    DELETED.KeyValue 
   INTO      @RIDS 
   WHERE     TableName = @TableName 

   SELECT    @KeyValue = KeyValue 
   FROM      @Rids
GO 

 

I include in line Unit tests in the code as a standard practice. These tests provide examples for TSQL usage and demonstrate the expected results of the stored procedure. Placing these unit tests in comments prior to the CREATE PROCEDURE command embeds the unit tests for users who may veiw it in SQL Server Enterprise Manager.

I am using the OUTPUT clause, introduced in SQL Server 2K5, in this update statement to return to the caller the current key value. The output clause works like a trigger. It provides two virtual tables that have the same schema as the table being modified. The INSERTED table has the record that were added, or the after image of records that were updated. The DELETED table as the records that were deleted, or the before image of records that were updated. In this case, I want the current key value to be assigned, and use the DELETED image for the before value. Using this OUTPUT clause allows me to Lock the record during the update, and get out quickly so as to not cause contention with other processes needing to reserve key values. 
 

I also use a lock hint in this query to assure the least locking impact on the table using WITH (ROWLOCK). This lock hint assures that my update statement only locks the row for the table I wish to increment the KeyValue. Other processes should be able to update the KeyValue of other records without contention.

Since the OUTPUT clause returns a data set, I use a temporary memory table which only has one column, and after the update statement has only one row. I convert that data table into the scalar output paramenter @KeyValue by simply selecting the value from the temp table into the output variable @KeyValue. I could have simply returned the results in the temp table and used that in the consuming program. I chose an scalar parameter with the direction of output because of the overhead of using a table when none is needed. Even the ADO.Net method ExecuteScalar() method requires more overhead than simply returning an output parameter. Since I am after speed here, I have optimized to the fullest.
 

Using the Key Manager


While this technique is provided as a Microsoft SQL Server implementation, it can be applied to any relational database that supports locking of records. MS SQL Server syntax makes this easy for me with the Output clause. The sytax can be converted to other RDBMS engines. There is no reason, however, that you could not use this system to assign keys in some other database, including non SQL Server data stores.
 

Following is a simple C# code snippet utilizing ADO.Net with the usp_KeyManager_Get_KeyValue stored procedure. The method assumes you are passing to it a connection object that is already opened, pointing to the database where the usp_KeyManager_Get_KeyValue stored procedure is located. In this example I am using an output parameter .
 

public long GetKeyValues(string tableName, int rowsToReserve, SqlConnection conn)
{
      long KeyValue = 0;
      SqlParameter parm = new SqlParameter("@KeyValue", SalDbType.BigInt);
      parm.Direction = ParameterDirection.Output;

      SqlTransaction tran = conn.BeginTransaction(IsolationLevel.ReadCommitted, "GetKeys");
      SqlCommand cmd = new SqlCommand("usp_KeyManager_Get_KeyValue", conn, tran);
      cmd.Parameters.AddWithValue("@TableName", tableName);
      cmd.Parameters.AddWithValue("@RowsToReserve", rowsToReserve);
      cmd.Parameters.Add(parm);

      try
      {
            object ret = cmd.ExecuteNonQuery();
            tran.Commit();
            if (cmd.Parameters["@KeyValue"].Value != null)
            {
                  KeyValue = (long)cmd.Parameters["@KeyValue"].Value;
            }
            else
            {
                  throw new Exception("Key Manager Did Not Return a New Key Value");
            }
      }
      catch (Exception e)
      {
            if (tran!=null)
            {
                  tran.Rollback();
            }
            throw;
      }

      return KeyValue;
}


One of the things you may note about this C# code snippet is the use of a transaction. I intentionally use a transaction and isolate it here from all other transactions by using the isolation level of ReadCommitted. I commit or rollback the transaction as quickly as possible. I do this so that all locks on the shared database resource are removed as quickly as possible. During the execution of this query the record in the KeyManager table is locked until the transaction is committed or rolled back. It is not important if the transaction is committed or rolled back as far as other processes are concerned. The point is to resolve the transaction, thereby removing the record lock, as quickly as possible so that other processes are not blocked from assigning key values.
 

Regarding the process requesting the key value, the code either returns a valid value or throws an error. The calling application must handle any exeception returned. I use a try/catch block so that I can either commit a successful transaction, or rollback a failed transaction.
 

Ideas for Enhancement

  • Run your key assignment utility in an embedded version of SQL Server running completely in memory. Using this method you are going to have to develop a method to establish the current key values at startup.
  • Use PIN TABLE on the KeyManager table. This step is probably not necessary if the KeyManager process is called often
  • Create this system in a separate database. Alter the KeyManger table schema to fully qualify tables under managment. Add columns for the database, owner and table for which a key is being generated. You could even add the database instance as well. I would not do this on a system where there is a lot of activity; you need the key assignment to be near insteantaneous, and putting lots of systems on this same key manager could produce a bottleneck.
  • Modify the stored procedure to initialize the starting key value for empty tables
     

Conclusion


Use the KeyManger system to centrally coordinate key values, while assigning those values in a distributed environment. This technique is effective in systems that persist high volumes of data generated externally from your database. Feel free to send your comments about the KeyManger to btaylor@sswug.org.