Editorials

Using Indexes on Temp Tables

The thought of using indexes in Temp Tables is counter intuitive. The whole point of temp tables is to be temporary, short lived, intermediate steps in query execution, with the purpose to optimize aspects of a query. If a temp table has a short life, why would you take the time to create an index on it’s contents? The answer is that you do it when it helps SQL Server make the best query plan for performing work.

First, why do we create temp tables in the first place? As you know, SQL Server does it all the time in an automated fashion. Work from one query step, in a multi-step query plan, is often cached in a temp table, and then later brought forward for the next step. For example, if I have a query that joins sales to customers and products, a common query play will first join sales to the customers and save the result in a working table (a temp table, probably in memory). Next it takes the results from the working table, and joins it to the products. Depending on the join technique it determines is best, it may organize the contents of the temp table into a hash set, and the products table into a hash set, so that it can do a hash join. If the temp table is shorted by product, it could do a merge join, or perhaps, if there are not many products, it can do a loop join.

As you can see, the query engine cannot do everything in a single step. It is EXCELENT at making good choices if there are reasonable indexes and statistics on the base tables. Sometimes, and those times get fewer and farther between with each enhancement of the SQL Query engine, you can build queries that perform even better on your own, if you have a good understanding of your data distribution, and the task at hand. In those cases you can create and use temp tables of your own design.

A good example is when you need to join the contents of two very large tables (millions of records). You are going to filter those tables by highly selective indexes contained in one or both tables. So, you need an index to execute the filter efficiently, and an index to join the tables. This is not a very common scenario. Most often you will have a large table you wish to join to one or more smaller tables. In that case, other techniques will probably prove more useful.

Back to my big two table filter and join. I have had a few of these kinds of queries. When they happen I have found times where it is best to take the two large tables and extract the records meeting my filter criteria into temp tables. Then, I create indexes on those temp tables, and produce my final result by joining the two temp tables. Usually, that final query uses a merge join because both sets are organized in the same sort order…which is REALLY fast. If you’re wondering how I came to use this method, it was by reading SQL Server query plans, and observing that it will sometimes re-organize data so that it can use a more efficient join technique.

If you’re thinking you can do that with a memory table, you are right that you can do that sometime. However, you cannot create an index on a memory table variable. You can create unique constraints, which are implemented by an index. So if you join criteria can be implemented by a unique constraint, then a memory temp table may work just as well. I have found that creating a hash table without the unique constraint, and after populating it with data, adding an index, performs faster than a memory table with a unique constraint, which has to be applied during data insert. Your mileage may vary.

I follow this process when optimizing queries. First, observe the query plan. Create indexes and statistics as needed. If that doesn’t suffice, then I may try and break the query down along the lines of the original plan with my own specified sequential steps. Then I can optimize each step individually. In the query plan you can see which steps have the highest cost, and address those first. Since SQL Server 2012 was released, I rarely go beyond the first step. Usually good indexes and statistics result in great performance.

If you want to build your skills in this area, you should focus on learning what the three different join methods are (hash, merge, loop) and how they work under different conditions. You’ll also want to understand how indexes and statistics enable efficient joins. Then you can begin to optimize your queries when the ultimate performance is mandated.

Cheers,

Ben