Editorials

The Keys to the Kingdom

So how do you create a primary key in a table? Believe it or not, there are a few different methods that have proven useful in different scenarios. Let’s start with some of the traditional methods supported by most SQL Engines.in

First there is the database generated sequential number generation from the SQL Engine itself. In SQL Server we call it an IDENETITY column, where the table automatically increments the next value for a record in the table based on the sequence specified when the table is created. The IDENTITY column does not have to increment each value by only 1. It can increment by any number. Of course, the larger the number you use, the less options will be available before the table has used all possible values. One interesting thing many people don’t know is that you can increment using a negative value.

Sometimes when multiple databases are merged through replication or something of that kind, you will find IDENTITY Values working differently in order to assure there are no key collisions between different databases. Let’s say you had ten different databases you were going to merge into a central warehouse, and you wanted to retain the original key assigned to the data in the individual warehouse. You could seed the key for each database at a different place from 1 to 10. Then if each database had an increment of 10, there would never be any collision when data was merged together.

Some databases such as Oracle and SQL Server also support the creation of Sequences. This is the ability to have a Sequence engine that provides values, just like an IDENTIY column, except the sequences are not attached to a single, specific table. Using Sequences you can create unique identifiers, and use them in different tables. It really doesn’t make sense to us a sequence in a single table. In that case you would simply use an IDENTITY column.

UNIQUE IDENTIFIERS, or GUIDs as we generally recognize them, are also a popular key mechanism. They are used a lot in replication processes. Still, they may be used for key values as well. 64 it machines make the use of GUIDs less of an issue when it comes to performance.

One technique I haven’t used for years is a random number generator. Back when SQL Server had locking only at the page level, one trick to gain performance was to pad your data to consume a whole page (it was only 2k back then). Using a random number generator we would put the records in different locations of the table, reducing hot spots on the disk. I haven’t heard of anyone using the random number technique for a long time. The issues that were resolved with this implementation have been solved through the improvement of the data engines so it is no longer necessary.

Another technique used is to assign your keys externally. There are many mechanisms to do this. The biggest issue for external keys is how to assure they are always unique, and do not clash with any keys already assigned. I’ll unpack that concept a bit tomorrow.

Cheers,

Ben