Editorials

Optimize High Volume With a Staging Table

There are many things you can do to improve the performance of your OLTP system when it experiences exceptionally high volume causing the system to slow to the point of visible perception. Normalization, file separation, index and statistics tuning, memory management, and report separation are your first targets for any OLTP database. We even talked about pre-assigning keys externally last week as a more extreme measure for performance boosting. Sometimes this just isn’t enough.

Today I want to talk a little more about using multiple stage persistence to speed up high volume loads without losing any work. When you think about a normalized database and writing data to multiple tables, physics plays a big part in the performance you can expect. Let’s say you have a transaction that needs to write to four tables in order to complete an ACID transaction. There are a lot of things that have to happen in order for the transaction to complete.

In SQL Server, most of the heavy lifting is done in memory. The necessary data structures are returned from disk (if not already in memory) where the data is to be stored. Entries are added to the transaction log for the work to be done. Locks are created for the tables being modified. The data is added or updated in memory. Ultimately, the lazy writer gathers the changes made in memory and flushes them to disk. The entries in the transaction log are marked as complete. Locks are released. I may not have the order of execution exact here; but you can see things that need to be accomplished.

When you persist to multiple tables, these tasks can become more complicated. Then as the overhead of a very busy system increases, any duration for this process impacts other processes, thus increasing the wait time for everything to happen. So, what can you do to simply the process? This is where a two stage persistence implementation provides additional performance.

Instead of writing everything to for normalized tables, as is the desired ultimate state, write the data into a de-normalized data store of some kind. Most systems with high volume do not require immediate access to the normalized data, much like a data warehouse. However, if the transaction is completed from the perspective of the client, then it must be guaranteed that the transaction will ultimately be persisted in the long term storage.

If you write the entire transaction to a single record in something like an XML or JSON blob, or even as a binary serialized representation of your data, that requires an insert lock on a single table and much less complexity for keys, foreign keys, and all the other overhead. If you are saving to multiple tables that requires disk head movement for the various tables in that single transaction. When you write one record to a single table, there is less disk contention. This single write can be restricted to a row lock, and thus not be a blocking transaction to other processes.

Once your data is in your staging record, you can process them first in-first out. The data enters your OLTP database in the same order as expected, but the client has moved on without waiting for the final persistence.

Cheers,

Ben