Death to the DAO and How to Test LINQ

Occasionally I hear complaints that LINQ is hard to unit test. These complaints aren’t about LINQ to objects, mind you, they’re specific to the complexities of the flavors of LINQ that turn C# code into something else like SQL or CAML using expression trees. The most common technologies are LINQ to SQL, the Entity Framework, or in my case at the moment LINQ to SharePoint. In this post I’m going to propose a technique that makes testing LINQ not just easy, but downright elegant – assuming you’re ok with extension methods – lots of extension methods. And assuming you’re ready to kill your Data Access Objects (DAO) tier.

The Unit Testing Problem

Any architecture needs a place to put code that finds entities. For instance FindBySocialSecurityNumber(). In a traditional architecture we might put a method like this is in a DAO layer. If so our method will look something like this:

public class EmployeesDao {
    public Employee FindBySSN(Context ctx, string ssn) {
        return ctx.Employees.SingleOrDefault(e => e.Ssn == ssn);
    }
}

So how would we go about unit testing this?

One fairly typical solution would be to use an in-memory database. That approach works if our data store is a database, but it certainly doesn’t work if the data store is something less traditional like SharePoint. But even if our store is a database, we’ll still have the hassle of setting up the in-memory database.

Another solution might be to use a mock Context that returns an IQueryable. But wouldn’t it be wonderful if we could avoid mocking all together?

Killing the DAO

The first question is why we even have a DAO tier to begin with. The original idea was that we wanted a place to put code specific to a particular data store. In other words we wanted to isolate the code that will need to be changed should the data store switch from SQL Server to Oracle. But isn’t that exactly what LINQ does? I’d be pretty surprised if there wasn’t a decent LINQ provider for just about any data store at this point that required more than minimal code changes. So why not embrace LINQ and reconsider alternatives to a DAO tier?

One alternative that I’ve been using for over a month now is to switch to extension methods. To give credit where it’s due the idea originated with a conversation with fellow Near Infinity employee Joe Ferner. And I'm sure the idea isn't particularly original (please post in the comments if you know others that use this approach).

Using this technique our code changes from something like this:

var employeeDao = new EmployeesDao(); // or use IOC of course
employeeDao.FindBySSN(ctx, "111-11-1111");

To something like this:

ctx.Employees.FindBySSN("111-11-1111");

Among other things I find this far more aesthetically pleasing because each of the three elements to the statement represent a subsequent filtering of data. It's a more functional way of looking at things.

We could implement this off of the Employees property of the context if we have control over that (which I don't with spmetal). But if we implement this as an extension method like this:

public static class EmployeeExtensions {
    public static Employee FindBySSN(this IQueryable<Employee> employees, string ssn) {
        return employees.SingleOrDefault(e => e.Ssn == ssn);
    }
}

We now have something that’s considerable easier to unit test.

Testing It

Once we’ve refactored our function as an extension method that filters down the corpus of entities, we can test the code using in-memory objects with a call to .AsQueryable(). For instance:

public void FindBySSN_OneSsnExists_EmployeeReturned() {
       var employees = new [] { new Employee { Ssn = "111-11-1111" } };
       var actual = employees.AsQueryable().FindBySSN("111-11-1111");
       Assert.IsNotNull(actual);
}

Notice we didn’t have to mock anything.

Testability, but at What Cost?

This technique works great for the example above, but how does it scale to harder problems and what other downsides are there?

As far as scalability I’ve found this technique works great for every scenario I’ve run across in the month I’ve been doing it. It works for joins, aggregations, and even for inserts, update, and deletes.

As far as downsides the astute reader may be wondering about mockability. For instance what if we want to mock the call to FindBySSN and give it the exact Employee that will be returned. This scenario is admittedly harder. But what I've found is that far more often than not I don’t really need to mock the types of things that used to live in the DAO tier. Instead I just mock the Employee object off of context to return in-memory objects and make my tests slightly larger in scope. Most of the time I find the larger scope increases the usefulness of the test. In the occasional case where I do really want to mock the "DAO" tier I use a technique described in either this post or this post by Daniel Cazzulino.

Conclusion

Obviously there is more to this architecture, for instance how do you handle insert and update operations? The short answer is it’s easy, but I’ll save that topic for a future post. For now why not give this approach a try? You weren’t really happy with that useless old DAO tier anyway, were you? I say we eradicate it and never look back.

Comments

Dmytrii said…
Two main problems here from top of my head:
1. Impossible to preload related objects and optimise query with query options and fetch plans.
2. In many non-trivial cases LINQ extensions with the fake queryable collection (LINQ to Objects) will work, but when running application will fail because the context is different and is not supported/implemented (EF). This draws such tests useless and harmful.

This is not different from having a bunch of static methods in a large bin (which often becomes rubbish one). Anything gets thrown into it.

Ragards,
Dmitriy.
Boyan Mihailov said…
You'd better use Repository pattern. I don't like DAO objects. When using a repostiroy, you have an interface which you can easily mock.
Lee Richardson said…
@Dmitriy

1. You can handle this the same way you handle inserts and updates (InsertOnDelete etc) which is, when necessary, to encapsulate the "Employees" item off of context and have a Fake for your tests that ignores the preload instructions and implements the inserts, updates, and deletes for in-memory objects. I was going to write about this in a future post, but that's the gist of it.

2. I'd like to see an example of what doesn't work. This approach is working well for me, but I don't use the EF.

3. Yes failing to organize your extension methods would be a terrible idea. Fortunately I group mine by Entity.
Anonymous said…
Your link to Daniel Cazzulino's post is broken. Would you mind updating it? I'd love to read it to get more context for your comparison.
Lee Richardson said…
@Anonymous Thanks, fixed the link. Actually I found a second post by him which includes more detail so I linked to that too.
Erlis Vidal said…
Hi Lee,

You asked Dmitriy for an example that doesn't work. I have one for you.

If using EF with MySql (this was what I used) you call .SingleOrDefault() you will have an error, you must use FirstOrDefault()

This will work beautiful in your tests because Linq to Object will simply accept .SingleOrDefault and you will only notice this on production. That's why some integration tests are needed also.

--Erlis
Lee Richardson said…
Erlis Vidal,

Thanks, that's a great example and now I see what Dmitriy meant. I guess that's an issue specific to the EF. LINQ to SharePoint fails over to LINQ to Objects if it can't convert something to CAML, so performance may be bad, but you'd never have a problem where your unit tests work, but your production code wouldn't.

Regardless you're absolutely right this technique is not a replacement for integraiton tests.

- Lee
dave.dolan said…
This isn't really a huge change though. I mean you still have to implement the static class for the extensions, and that's almost like any old repository... Also, the objects that come directly from the hopper are bound to the change events and the context so you have to stomp on them accordingly if you want to do a lot of 'normal' things. This is not to mention the lack of serializability of the stuff you get from most LINQ to X (again because of the context dependancy.) The easy way around all of that is to .. encapsulate them inside DAOs that produce POCOs, and translate them back again. Even using the new extensions to EF that allow POCO integration, you're really not 'removing a layer' like you'd think, you're just moving everything into the same project file. Potato - Pototo. Different yes, somewhat shorter in simple cases, sure. Huge revolutionary new feature and less code? No, not really.