Editorials

Data Types Matter for Key Columns

When considering data types and the desire to constrain the values to the minimum required, there is a tangent question you might consider. The question, “Is this column part of the primary key, or a foreign key?”

Before we get into a heated battle regarding primary keys, and always having that key be a sequential integer, let me suggest that we put that discussion on the table for now. There are many proponents of different keys such as external, natural keys, GUIDs, strings, etc. For years I have been satisfied with IDENTITY columns as primary keys, and SQL Server is tuned specially to support that implementation. However, I have done some non-scientific comparisons and found that any difference in performance for different options is difficult to measure.

Joining a million record table to a one hundred thousand row table based on a GUID both return with a mere difference of milliseconds. With 64 bit processors, some of these things that we complained about in the past have simply faded away. They can be a thing of the past.

Others don’t like composite indexes. Again, the difference is more in how much you have to type when joining tables using a composite index, than a performance consideration.

Even taking all of these things into consideration, I have found the following to be ideas I consider.

When using an Identity column, I use an INT data type if I am sure the number of records will be lower than the max value allowed for a 32 bit signed int. Otherwise, I simply use a BIGINT so I don’t have to worry about reaching the max value for a long time.

I don’t mind using GUIDs for keys. What is more important is where the value is assigned for the GUID. Theoretically, two different systems could generate the same key. So, I prefer to have a single authority assigning a GUID, just like IDENTITY values.

I don’t generally use String data types as keys. If I do, I like to keep the max number of characters low for indexing and join purposes.16 characters or less can often be quite unique. Any of the MAX data types are not used for keys because they cannot be indexed.

To summarize, the data types you use for index columns are important. Data Types that can be compared in a single 64 bit comparison will have better performance. Composite indexes consisting of a few 64 bit values can provide flexibility, data integrity, and good performance.

Cheers,

Ben