Editorials

Entity Framework Using Include

Have you ever run a trace on Entity Framework query execution? It’s really interesting on the SQL Server side, when you see the SQL that Entity Framework designs.

I have an application that causes a Stack Overflow if I try and use lazy loading with my Entity Framework classes. This is due to a recursive (hierarchical) design in the data, where the table self joins for parents or children (think of the directories of your file system). So, I have to turn off lazy loading, and include manually, all of the child objects I wish to retrieve from the database, based on the foreign key relationships.

So, if I have a sales object, consisting of a collection of line items, each of which has a product, I have to write a statement such as:

var mySalesOrder = context.Set()
.Include(t=> t.SaleslineItem.Product)
.Where (f=> f.SalesOrderId == salesOrderId)
.FirstOrDefault();

Because I have included the SalesLineItem and the Product, Entity Framework creates a very interesting query. The short story is that it creates a single table result. It defines the table with all of the columns from all of the sources in the Include operation. All the column names are simple a sequence from COL01 to COLn depending on what is being returned.

Let’s say we have the following simple objects for an example:

  • SalesOrder (SalesOrderId, CustomerId, SalesDate)
  • SalesLineItem (SalesLineItemid, SalesOrderId, ProductId, Qty, ExtendedPrice)
  • Product (ProdictId, ProductName, Price)

When Entity Framework returns all of these objects it creates a virtual table such as:

Col01, Col02, Col03, Col04, Col05, Col06, Col07, Col08, Col09, Col10, Col11, Col12

Col01 thru Col03 are populated with data from SalesOrder.

Col04 thru Col09 are populated with data from SalesLineItem.

Col10 thru Col13 are populated with data from Product.

I’m not getting the exactly perfect, but it gives you the basic idea. If you had one SalesLineItem in a SalesOrder, you would return a table with three rows. The first row will populate the columns defined by the SalesOrder. The second row will have the columns populated for the SalesLineItem data. The third row will have the columns populated for the Product data. For each child item, there is a column created that is populated with the foreign key to the parent table.

What is nice about this implementation is that it allows a single database query to return data in any shape or form, based on the tables included on the EF query. What is not so nice is that the more tables you add, the wider this virtual table becomes, and the complexity of the query to shape the data to this virtual design increases. At some point, the performance of your query diminishes.

Tomorrow we’ll take a look at how this compares with other ORM techniques.

Cheers,

Ben