Editorials

What is a Correlated Sub Select?

What is a correlated Sub Select? Historically, when it comes to performance, it was a dirty word. Let me give you an example as it will more easily explain what a correlated sub select is.

SELECT

Customer.Id

,Customer.Name

,Customer.Region

-- Here Comes the Sub Select

,(SELECT SUM(TotalAmount)

FROM Sales

WHERE Sales.CustomerId = Customer.Id

) AS TotalSales

-- Back to the outer select

FROM Customer

The inner query in parentheses is what we call a sub select. It is run separate from the outer query. What makes it a correlated sub select is the clause

WHERE Sales.CustomerId = Customer.Id

Because the WHERE clause of the sub select includes data from both select statements, it correlates the two queries together. Without this where statement, the inner query will be run once, getting the total of all Sales. Then the outer query will be run once, and the total for all sales will be added to the result set of each outer row.

Because of the correlation, the outer query is executed, and then the inner query is executed once per row. This mean that if the outer query returns 100 rows, then the inner query is executed 100 times, once per outer row. Of course this is a more meaningful result, so it is more likely the query we will want to run.

You may say to yourself, what if I create scalar function returning the total sales for a customer? Then I’ll simply call the function once per row. While there are times when you want to make that kind of a function, in this case, it doesn’t work any differently than using the correlated sub select. Both implementations work exactly the same.

The preferred way to return this result is to join both tables, and then use a SUM aggregate with a GROUP BY clause. It could look like this:

SELECT

Customer.Id

,Customer.Name

,Customer.Region

,SUM(Sales.TotalAmount)

FROM Customer

JOIN Sales ON Customer.Id = Sales.CustomerId

GROUP BY

Customer.Id

,Customer.Name

,Customer.Region

Perhaps this is old school. They keep optimizing the SQL Engines to adapt to our habits. I prefer the later syntax in that I find the intention to be more easily read. The choice is up to you. If you’re getting the performance you want then go nuts.

Cheers,

Ben