Editorials

Detailed Table Design Questions

Today I want to talk about some very specific philosophical questions when it comes to defining tables. The choices we make have implications in both the short and long term.

Let me start with a simple premise. I have been taught that when you create a table, for each column in the table, it should be constrained to allow only the maximum value possible. For example, if you have a column that may be only true or false, then use a bit data type. If you have a column where the data domain is a number from 0 thru 255, use a TINY INT. Use INT or BIG INT when it applies. The same can be applied to DECIMAL or MONEY data types. Only define column storage for the max value needed.

The same approach is true when creating string or binary variables. You could simply use VARBINARY(MAX) or VARCHAR(MAX) for all of your string data types. Your performance may be quite good if you don’t have a lot of large data values.

In SQL Server you will see a lot of examples where the second approach is taken. Instead of using VARCHAR for a string data type, you will often see NVARCHAR in its place. NVARCHAR takes twice as much space as VARCHAR. However, NVARCHAR supports many more language character sets than VARCHAR. Do you know for certain that your application will never be used outside a limited set of languages?

Two questions are raised about this philosophy of restricting data storage.

  • What about when I find that I didn’t reserve enough space, and I find I need to store data bigger than the design allows?
  • What’s the problem with just allowing more space than I need?

If you tightly manage the space used for your data you can optimize storage, potentially increase performance by reading more records in a single read, and keep your users from abusing your database storage. When your users can’t enter data because there isn’t enough space allowed, you’ll have to address it, or they make compromises in the data they enter. Extending the space allowed can be painful.

If you don’t restrict space, you open yourself up for abuse of your system storage. Potentially, the extra space can be used for hacking. If you are using fixed length data types such as CHAR or NCHAR, then there is the potential for a lot of wasted space and reduced performance. When it comes to integers, I’m finding there is less value in minimizing data types there, especially on 64bit processors, even though it does take more storage.

When it comes to strings, some people ask the question, “should you have a VARCHAR(1) column definition?” It takes extra space for VARCHAR(#) data types to store the number for the bytes allocated. So, storing a BYTE(1), even if it is empty, takes less space than storing VARCHAR(1) if it has a value.

How much do you find yourself being concerned about these detailed questions? What solutions have you found work for you, and why? Let’s hear from you in our comments today.

Cheers,

Ben