ARTICLES

Home  > Articles  >  Tips for using SQL Server 2005 Integration Services (Part 3)
Rate and Comment

 Articles in this Series

Rate and Comment

Tips for using SQL Server 2005 Integration Services (Part 3)

Integration Services (SSIS) is a new SQL Server 2005 service that replaces Data Transformation
Services (DTS), which was first introduced as a component of SQL Server 7.0. By using the SSIS
you can load data into or out from SQL Server relational databases or data warehousing. In this
article, you can find some tips to boost the SQL Server 2005 Integration Services performance.


Include annotations to the package control flow, data flows, and event handlers.
Using annotations make it easy to understand the details of the SSIS packages and help to correct
the packages later when you forgot the details.


Use breakpoints when you debug the packages.
SQL Server 2005 Integration Services provide the ability to set breakpoints on packages,
containers, and task events. Using breakpoints can help you debug your packages and find
potential problems during the testing before transferring to production environments.


If you need to modify column values, you can use the Data Conversion and
Derived Column transformations.

These transformations were first added in SQL Server 2005 Integration Services. The Data
Conversion transformation is used to convert the data type of a column to a different data type.
The Derived Column transformation is used to populate columns with the results of expressions.


Use SQL Server 2005 Integration Services containers.
Containers are SSIS objects that provide structure to packages and services to tasks. Containers
can include tasks and other containers. SSIS supports four containers: Foreach Loop Container,
For Loop Container, Sequence Container, and Task Host Container. The Foreach Loop Container is
used to run a control flow repeatedly by using an enumerator. The For Loop Container is used to
run a control flow repeatedly by testing a condition. The Sequence Container is used to group
tasks and containers into control flows that are subsets of the package control flow. The Task
Host Container is used to encapsulate a single task (it allows to use variables and event
handlers on the task level).
 
Use Integration Services tasks that shipped with SQL Server 2005.
Though you can create your own task to build packages (you can write custom tasks using a
programming language, such as Visual Basic or C#) it is a good idea to use tasks that shipped
with SQL Server 2005. If it is possible to use internal SSIS tasks, you should use them.


Try to avoid using the "Table or view" access mode in the OLE DB Source Adapter.
You should use the SELECT statement to retrieve data as more quickly way to perform the same
operation.


Use the File System task to upload, download and copy files and folders in the
file system.

This task was first added in SQL Server 2005 Integration Services. SQL Server 2005 Integration
Services include some tasks that perform data preparation operations, such as performing
operations on files and folders in the file system.


If you need to distribute data rows to multiple downstream data flow components,
you can use the Conditional Split and Multicast transformation.

This transformation was first added in SQL Server 2005 Integration Services and make easy for
developers to build packages with complex data flow without writing any code.


Consider increasing the EngineThreads property.
This data flow property controls the number of worker threads the Execution Engine will use.
The default value for this property is 5. To improve the SSIS performance, check how many threads
the data flow naturally needs and try to keep the EngineThreads value reasonably close to it.


Use the Raw File source and destination, for reading and writing raw data to files.
In SQL Server 7.0 and SQL Server 2000 DTS packages can extract data from SQL Server, OLE DB,
and flat file sources and destinations. In comparison with the previous versions of SQL Server,
in SQL Server 2005, Integration Services packages can extract data from many additional types
of data sources.


Consider increasing the MaxConcurrentExecutables property.
This data flow property controls the number of data flow tasks that can be executed simultaneously.
To improve the SSIS performance, check how many tasks the data flow naturally needs to execute
simultaneously and try to keep the MaxConcurrentExecutables value reasonably close to it.


Try to eliminate the data that needed to be transferred using the SSIS data flow.
To eliminate the data that needed to be transferred, you should remove unneeded columns and rows
from the source. Specify in the SELECT statement only the necessary columns and use the WHERE
clause to specify only the necessary rows.


If you need to correct terms in a transformation input column, you can use the
Term Extraction and Term Lookup transformations.

These transformations were first added in SQL Server 2005 Integration Services. The Term Extraction
transformation is used to extract terms from text in a transformation input column, and then writes
the terms to a transformation output column. The Term Lookup transformation is used to match terms
extracted from text in a transformation input column with terms in a reference table.


Try to eliminate logging.
Use logging during debugging and troubleshooting and eliminate unneeded logging when deploying
completed packages to production.


Use the XML source for exacting data from XML documents.
In SQL Server 7.0 and SQL Server 2000 DTS packages can extract data from SQL Server, OLE DB,
and flat file sources and destinations. In comparison with the previous versions of SQL Server,
in SQL Server 2005, Integration Services packages can extract data from many additional types
of data sources.


Use Partial cache mode or Full cache mode with the Lookup transformation.
Consider using Partial cache mode with the Lookup transformation when the incoming data is
repetitive and only references a small percentage of the total reference table. Consider using
Full cache mode with the Lookup transformation when the reference table is relatively small and
the incoming data references the full spectrum of reference table rows.


Try to avoid using the Aggregate and Sort transformations.
Because using these transformations may be very resource expensive, you should try to sort and
aggregate data at the source.