Performance of content aggregation queries on multiple lists


Recently I got the task of creating a custom content aggregation Web Part which would roll up content from two lists. As this can be done in many different ways, I’ve decided to do a little research on which of these methods would perform the best.

The case

As I’ve already mentioned, the task was to aggregate some content from two lists (in the future it might be even more). One of the lists would contain general information while the other, stored within a topic site (child SPWeb), would contain information about that single topic. As in some cases the general information can be relevant to a particular topic, they should appear in the content roll up called by that particular topic.

The Site Collection where the content aggregation would run on is a medium size Publishing Site with a couple hundreds Publishing Pages and Documents.

Content Aggregation approaches

If you want to aggregate content from multiple lists in SharePoint, there are numerous ways to do this. Depending on your background you might either immediately go for separate aggregation of each list and then merging the retrieved DataTables or creating one query for the whole process. Here are the approaches I have tested:

  • Two CrossListQueryInfo queries with DataTable merge
  • One CrossListQueryInfo query on specific lists
  • One CrossListQueryInfo query on Pages Libraries
  • Two SPSiteDataQuery queries with DataTable merge
  • One SPSiteDataQuery query on specific lists
  • One SPSiteDataQuery query on Pages Libraries
  • Two SPQuery queries with DataTable merge

One approach I haven’t tested was using the SharePoint search engine. Because it wasn’t supported at the particular configuration available at the customer I haven’t considered it. Depending on your environment you might want to have an additional look at it as well.

Both CrossListQueryInfo and SPSiteDataQuery classes allow you to either retrieve data from one particular site or run one query over the whole Site Collection. Additionally, using the Lists property, you can specify either a List Template, List Type or specific lists which should be queried (lists targeting).

Personally I like the idea of lists targeting. Depending on how the particular query class works internally, querying items from specific lists only instead of all Page Libraries should narrow the initial data set and therefore improve the performance of the query. On the other hand you need to obtain the ID’s of these lists first which costs some time as well. To check what the difference between these two approaches are I have tested them both as well.

The Test

To test the various approaches I have created a simple Console Application. Because the CrossListQueryInfo query requires the context information, I have provided it using the following code:

#region Context
using (SPSite site = new SPSite("http://moss"))
{
    using (SPWeb web = site.RootWeb)
    {
        // provide context information
        if (HttpContext.Current == null)
        {
            HttpRequest request =
        new HttpRequest("", "http://moss", "");
            HttpContext.Current =
        new HttpContext(request,
          new HttpResponse(new StringWriter()));
            HttpContext.Current.Items["HttpHandlerSPWeb"] = web;
        }
    }
} 
#endregion

I have run each query 30 times. Each query calling method contained all the logic required to perform the query, eg. retrieving references to ID’s, opening SPWeb instances, etc.

I have run different approaches separately, so the test application would have to load all the assemblies and create the required references.

The results

After running the test I got the following results:

Test results: the first run differs too much from the following results, so that it is impossible to read the values

As you can see the initial run costs huge amount of time to run comparing to the subsequent calls. To examine the data I have removed the first call from the chart.

Test results excluding the first run

Having the first run removed gives us the possibility to have a closer look at the results of the subsequent queries. The first thing to notice is that the CrossListQueryInfo approach is the best performing one. The Content Query Web Part shipped with MOSS 2007 uses this approach. No matter whether you’re querying all Pages Libraries or only specific ones the results are similar. While the CrossListQueryInfo performs really great, running two (or more) separate queries and merging the results it’s a bad idea. It decreases the performance almost 7 times!

Another frequently used aggregation method is the SPSiteDataQuery class. In some cases you cannot avoid it, especially when you’re using WSS and it’s the only thing available to you for performing Site Collection wide queries. A piece of advice here: you can either run multiple queries separately or create one to query content from specific lists. The results are similar (except a few peaks). The one thing not to do is to query all Page Libraries within the Site Collection. It’s going to cost you performance: it’s three times slower than the two approaches I have just mentioned.

Running multiple queries based on the SPQuery class came out of my test as the least performing aggregation method. It provides you little flexibility so the only way is to get the references to all the lists, run the query and merge the retrieved DataTables.

To provide you some more overview of the results, I have compared the average duration time of all calls and excluding the first one.

Average results

On the chart above you can clearly see the difference between the first and the subsequent calls. Generally the performance of your queries should be close to the red bar.

Summary

There are many situations which might require aggregating data across the whole Site Collection. MOSS 2007 provides you with several various approaches to deliver that functionality. Depending on the configuration of your environment you might have to choose for one of the other approach. However, if you’re free to choose, and the performance is really important, like in every WCM solution, you are very likely to choose for the CrossListQueryInfo method running one single query.

Technorati Tags: SharePoint, SharePoint 2007, MOSS 2007, WSS 3.0, WCM

Others found also helpful: