Editorial Rss

Performance Through Parallism

When tuning a database environment for performance, Jim has found contradicting advice for configuring SQL Server for parallel performance. Jim writes:

The question that plagues me is what to do about parallelism in an OLTP environment. I know two eminent SQL Server experts that take polar opposite views on the issue. The one expert says that the "Cost Threshold for Parallelism" default value of 5 was set back when processors were nowhere as robust as they are now. We are basically letting a good machine thrash when we don't bump this up to 20, 30, or even 50. The other expert says (and this falls in line with what you are saying with the Optimizer) to let the experts at Microsoft manage this. Bump up the threshold value only when necessary, and you will find that in the long run it will manage itself very well. Which is correct? I dunno. But I think I am leaning to the side that says to leave it alone until necessary.

Before taking on the question of how to configure for parallelism, let me give a simple explanation of what is going on. When executing a query, it is often possible to break the tasks of the query up into multiple threads, and then bring the results back together. A good example might be a merge join. It can retrieve data from two different pipelines concurrently on different threads, and then feed them to another thread performing the merge of the two different sets. I don’t know that this is exactly what SQL Server is doing. But, the concept represents breaking up a big task into multiple smaller, concurrent tasks.

Let’s consider the problem mathematically. How many cores are available for your SQL Engine? If you have 4 cores, and your query is broken up into more than 4 parallel tasks, SQL Server will have to do context switching, a very expensive process, in order to maintain the separate threads.. 

The problem exacerbates. How often does SQL Server perform more than a single concurrent query? So, if two or more queries are able to be divided up into multiple threads, they all compete for the limited number of cores. After some point, the cost of context switching is more expensive than the benefit achieved by parallel threads.

Since you can only set the max degrees of parallelism to a single value, I have found that the only way to get the optimum setting is to test under peak load. You need to track query performance, change the setting, and then do the same test again. 

I compare the setting in my mind to that of using the Parallel library in dot net. You can specify, in a parallel foreach loop, the number of parallel threads to allow in a process. On my laptop, with 4 cores, if I set the value greater than 4, the performance is worse. The higher the number, the slower it is to perform all of the work. Sometime, depending on the number of concurrent threads in the application already, it is better to set it to 2 or less. The reason, once again, is that there are only so many concurrent threads available at one time. Requesting more than that in your application causes it to slow down everything due to context switching.

So, unless you have a massive number of cores, or rarely run concurrent queries, I can’t imagine a value greater than 5 being useful. Remember, multi-threading doesn’t necessarily speed things up. It just makes it look like multiple things are running at the same time. You only experience performance boost if you have more available processors than you have concurrent threads.

I would like to note that my thoughts here are from logical reasoning, and personal experience. I have had a time where I set the max parallel configuration to 2 on a production machine with 24 cores. We had instrumented query performance on the machine so that we could track execution times. As a result, we could change the configuration, and observer the trend change for query performance. Over a period of 5 to 10 minutes of heavy usage, we could easily see a difference in the configuration setitngs.

While a higher degree of parallism resulted in a much lower MIN execution time over the period, the MEDIAN and AVERGE times were much higher. We were more concerned with having a consistent performance, than peaks and valleys. So, the lower setting was preferable. I cannot telly you the technical reasons for the difference. However, I find the pragmatic approach is often more valuable than a deep understanding of the internals in this case.