Search SSWUG.ORG:

EDITORIALS

Editorial Rss
12345678910...
TitleDate

Staging Tables Optimize SSIS



I recently interviewed for the opportunity to do some work for a new company. They were asking a number of questions around performance tuning and optimization. One question specifically targeted SSIS, and the performance of importing data. My immediate reply, based on personal experience, was that having staging tables to bring the data into SQL Server makes the work easier to perform, and optimizing easier to trace and manage.

I was pleased to hear them say, “Thank you for validating our implementation.” 

But there’s more. Staging tables provide a lot of value when merging new data into existing data. Even if the new data is completely autonomous, and might even be placed in its own table, and merged into a partitioned table, there is value to bringing it into a staging area first. 

The question is when do you bring the data into your staging area? You can bring it into SQL immediately. You can pre-process or cleanse the data prior to importing, or have different stages within your staging area. Sometimes there is aggregation, de-duplication, normalization, de-normalization, pivoting or un-pivoting. All of these processes take a lot of resources.

Sometimes you best break the data up into smaller chunks so that it can fit within the memory and processing constraints of the SSIS server. It may be easier to do that with staging tables, which might bring staging tables into the process at an earlier stage. The primary consideration is performance. Since SQL Server tables utilize transactions, it may be more efficient to do the transformation outside of the database storage. Maybe even split the importing data before doing any processing.

With all of the different ways to approach importing data, I have always found that just before the final stage of merging new data into existing data, it is much easier to develop when the data is already inside the database, as you try and merge the new data into the existing sets. Most often, this has the best performance.

Are you a fan of staging tables? Do you have different processes that work for you? Feel free to leave a comment for our readers. It’s always nice to hear from multiple voices.

Cheers,

Ben
 
4/17/2015
12345678910...