Editorials

Integrating Dot Net Code in SSIS

SSIS (SQL Server Integration Services) is a really great platform for ETL processes. Most of you have used it at one time or another. It has some really amazing capabilities such as the ability to use fuzzy logic for comparisons when your data isn’t highly standardized. For example, the English names Robert, Rob, Robby, Bob, Bobby are all interchangeable to some degree. So, when you are combining data from multiple sources it can be difficult to identify the records as being the same person. Fuzzy logic is one of the tools that help with those kinds of situations.

Although there are some interesting tools that have been added to SSIS, there may be times when you want to integrate your own packages using external tools, calling them from within your SSIS process, with the purpose of creating code in a more maintainable way.

SSIS packages are difficult to unit test. It is harder to create MOCK implementations, and test smaller parts of the package code. It’s not impossible, but the feature isn’t as mature as simply writing Dot Net code. Since it is easy to include Dot Net code into your package, sometimes it makes sense to some aspects of your ETL in C# or VB.net.

When you use a Dot Net dll for your ETL logic, you can create mock representations of your input data, covering the different ranges of unique problems your ETL must solve. As a result, you can use object oriented techniques and patterns to resolve those different scenarios, and be sure that when large volumes of data are introduced, you have already proven you code handles those situations. You can write code to handle multiple edge cases much faster, and be able maintain the code more efficiently than if the different branches for edge cases are encapsulated in the SSIS GUID driven logic.

This doesn’t mean SSIS goes away. It’s still a great tool with a lot of infrastructure wrapped around it that you don’t want to re-create. In fact, it may be all that you need. I have found that when things start becoming complicated, and you have dozens, hundreds or thousands of edge cases, at that time I begin to ask myself if a Dot Net package may be the more efficient implementation for accuracy, maintainability, testability, and extensibility.

Have you integrated Dot Net code in your ETL process? Please share a comment with how that has worked for you?

Cheers,

Ben