NHibernate – removing duplicates combined with paging
The criteria API for NHibernate gives us a really easy way to perform paged queries. Combine this with .Net extension methods and you can get something quite neat. I use the following extension method to page my criteria queries:
public static ICriteria Page(this ICriteria criteria, int pageIndex, int pageSize) { return criteria.SetFirstResult(pageIndex*pageSize) .SetMaxResults(pageSize); }
This means that I can write queries like this:
_session.CreateCriteria(typeof(IResource))
.Page(pageIndex, pageSize)
.List<IResource>());
Which is nice. However, when paging a result set you may encounter a problem in that some queries will generate duplicates. In the example above I am returning “resources”. Let’s assume that resources can belong to one or more “groups”. If I want to restrict my query to resources that belong to groups with some characteristic then I am potentially going to get duplicate results. If I don’t mind duplicates then I have no problem, but if I want to remove them then I have to be careful.
The easiest way to remove duplicates in NHibernate is to use a DistinctRootEntityTransformer like this:
_session.CreateCriteria(typeof(IResource)) .CreateCriteria("Groups") .SetResultTransformer(new DistinctRootEntityResultTransformer()) .List<IResource>();
The problem is that this removes duplicates in the code after the query has been executed. This is problematic if you are combining it with paging because the query will return you a certain number of records (a page full) and then the transformer will potentially remove some of those records and spoil your paging. So if you want to take advantage of the database for paging rather than doing the paging in memory (not always an option for large datasets) then you need a different strategy.
Obviously it is necessary to do the “de-duping” as part of the query. One way to do this is to use the DetachedCriteria capability in NHibernate. The idea is to create a DetachedCriteria which returns a projection of just the distinct ids of the entities that you want to retrieve (in this case resources). All of your restrictions are applied to the DetachedCriteria:
var resIds = DetachedCriteria.For<Resource>()
.Add(Restrictions.EqProperty("id", "res.id"))
.CreateCriteria("Groups")
.SetProjection(Projections.Distinct(Projections.Id()));
So this gets us the list of Ids we want without any duplicates. But how to translate that into a list of domain objects with no duplicates, paged correctly.
_session.CreateCriteria(typeof(IResource), "res") .Page(pageIndex, pageSize) .Add(Subqueries.PropertyIn("res.id", resIds)) .List<IResource>());
Using a correlated subquery we can restrict the query to just the distinct Ids returned by our DetachedCriteria and perform the paging all as part of the same query. Nice. The resultant sql looks something like this:
SELECT TOP 10 this_.resourceid AS resourceid13_0_, this_.name AS name13_0_, this_.firstname AS firstname13_0_ FROM rm_resource this_ WHERE EXISTS (SELECT DISTINCT this_0_.resourceid AS y0_ FROM rm_resource this_0_ INNER JOIN rm_resourcegroup resourcegr1_ ON this_0_.resourceid = resourcegr1_.resourceid WHERE this_0_.resourceid > @p0 AND this_0_.resourceid = this_.resourceid)
I’ve removed some of the bits form the select clause just to save space but you get the idea.
3 comments