Editorials

Thoughts on Entity Framework (EF)

I’ve been using Entity Framework for a few years now. Having some production applications based on EF I have found some of the following guidelines to make it work for me.

EF is fine for Many to Many relationships. In doing so, you create your data models with having virtual collections of the other object. Then when you map the opposite object, it handles the relationship. You don’t have to create a physical link object representing the link table. EF does this virtually for you.

However, if you need to have additional data in the link table, the easiest approach is to actually create a link object. Then each of the outer tables have a list of the Link object, which has an instance of the other table in the Many to Many relationship. In this case, both tables on either side have an instance of the object representing the link table. Since you now have a link object, you can add additional properties that are only relevant in the joining of two separate tables. This works really nicely.

You can optimize EF by not using the Include method when retrieving data. Include results in HORRIBLE SQL statements and isn’t really any faster than making multiple round trips to populate details.

Let’s take a simple example. You want to get a list of purchase orders for a customer. For each purchase order it will have a collection of purchase order line items. How can you do this efficiently through EF? Currently I have been using a filter on the purchase order to only return a list of purchase orders for customerID = somevariable. If I use .Include, then the results right out of the EF system is a list of purchase order objects, each with its own collection of purchase order line items. So, what’s wrong with that? Well, nothing really. The only problem is when you have more objects you want returned such as Customer, Product, Pricing, etc. Then using .Include for each one of those continues to a point where the application begins to crawl. So, one word of advice…if you have to include more than one or two other objects, consider using a different method to populate your lists.

One of the coolest things about EF is that if you build it correctly, your filter data becomes a part of the where clause for your actual executed SQL. So, in the case of purchase orders, you could simply get the list of purchase orders. Then you could have a query that returns purchase order line items by PurchaseOrderId = somevariable. This query would actually run on SQL and not return a bunch of unmapped data. You could call this purchase order line items within a Linq expression walking through the purchase orders. From the database perspective, SQL will see a query to get a list of purchase orders for a customer. Then for each record returned, it will have a query to get the line items. This is a lot less load on SQL than the implementation executed using .Include.

I mentioned writing a query with a lot of .Include statements. That isn’t the only criteria. You can have horrible performance just using a single .Include if your application (for a good or bad reason) returns a lot of records.

What I have come to understand is that EF is not always the best or only solution. I have also found that if you take the time to learn how EF works and what it does, you can change the techniques for which you use EF to optimize performance.

That’s enough thoughts for today.

Cheers,

Ben