Performance of content aggregation queries on multiple lists

, , , ,

Recently I got the task of creating a custom content aggregation Web Part which would roll up content from two lists. As this can be done in many different ways, I've decided to do a little research on which of these methods would perform the best.

The case

As I've already mentioned, the task was to aggregate some content from two lists (in the future it might be even more). One of the lists would contain general information while the other, stored within a topic site (child SPWeb), would contain information about that single topic. As in some cases the general information can be relevant to a particular topic, they should appear in the content roll up called by that particular topic.

The Site Collection where the content aggregation would run on is a medium size Publishing Site with a couple hundreds Publishing Pages and Documents.

Content Aggregation approaches

If you want to aggregate content from multiple lists in SharePoint, there are numerous ways to do this. Depending on your background you might either immediately go for separate aggregation of each list and then merging the retrieved DataTables or creating one query for the whole process. Here are the approaches I have tested:

  • Two CrossListQueryInfo queries with DataTable merge
  • One CrossListQueryInfo query on specific lists
  • One CrossListQueryInfo query on Pages Libraries
  • Two SPSiteDataQuery queries with DataTable merge
  • One SPSiteDataQuery query on specific lists
  • One SPSiteDataQuery query on Pages Libraries
  • Two SPQuery queries with DataTable merge

One approach I haven't tested was using the SharePoint search engine. Because it wasn't supported at the particular configuration available at the customer I haven't considered it. Depending on your environment you might want to have an additional look at it as well.

Both CrossListQueryInfo and SPSiteDataQuery classes allow you to either retrieve data from one particular site or run one query over the whole Site Collection. Additionally, using the Lists property, you can specify either a List Template, List Type or specific lists which should be queried (lists targeting).

Personally I like the idea of lists targeting. Depending on how the particular query class works internally, querying items from specific lists only instead of all Page Libraries should narrow the initial data set and therefore improve the performance of the query. On the other hand you need to obtain the ID's of these lists first which costs some time as well. To check what the difference between these two approaches are I have tested them both as well.

The Test

To test the various approaches I have created a simple Console Application. Because the CrossListQueryInfo query requires the context information, I have provided it using the following code:

#region Context
using (SPSite site = new SPSite("http://moss"))
{
    using (SPWeb web = site.RootWeb)
    {
        // provide context information
        if (HttpContext.Current == null)
        {
            HttpRequest request =
        new HttpRequest("", "http://moss", "");
            HttpContext.Current =
        new HttpContext(request,
          new HttpResponse(new StringWriter()));
            HttpContext.Current.Items["HttpHandlerSPWeb"] = web;
        }
    }
}
#endregion

I have run each query 30 times. Each query calling method contained all the logic required to perform the query, eg. retrieving references to ID's, opening SPWeb instances, etc.

I have run different approaches separately, so the test application would have to load all the assemblies and create the required references.

The results

After running the test I got the following results:

Test results: the first run differs too much from the following results, so that it is impossible to read the values

As you can see the initial run costs huge amount of time to run comparing to the subsequent calls. To examine the data I have removed the first call from the chart.

Test results excluding the first run

Having the first run removed gives us the possibility to have a closer look at the results of the subsequent queries. The first thing to notice is that the CrossListQueryInfo approach is the best performing one. The Content Query Web Part shipped with MOSS 2007 uses this approach. No matter whether you're querying all Pages Libraries or only specific ones the results are similar. While the CrossListQueryInfo performs really great, running two (or more) separate queries and merging the results it's a bad idea. It decreases the performance almost 7 times!

Another frequently used aggregation method is the SPSiteDataQuery class. In some cases you cannot avoid it, especially when you're using WSS and it's the only thing available to you for performing Site Collection wide queries. A piece of advice here: you can either run multiple queries separately or create one to query content from specific lists. The results are similar (except a few peaks). The one thing not to do is to query all Page Libraries within the Site Collection. It's going to cost you performance: it's three times slower than the two approaches I have just mentioned.

Running multiple queries based on the SPQuery class came out of my test as the least performing aggregation method. It provides you little flexibility so the only way is to get the references to all the lists, run the query and merge the retrieved DataTables.

To provide you some more overview of the results, I have compared the average duration time of all calls and excluding the first one.

Average results

On the chart above you can clearly see the difference between the first and the subsequent calls. Generally the performance of your queries should be close to the red bar.

Summary

There are many situations which might require aggregating data across the whole Site Collection. MOSS 2007 provides you with several various approaches to deliver that functionality. Depending on the configuration of your environment you might have to choose for one of the other approach. However, if you're free to choose, and the performance is really important, like in every WCM solution, you are very likely to choose for the CrossListQueryInfo method running one single query.

Possibly related posts

7 Responses to “Performance of content aggregation queries on multiple lists”

  1. Avoid One Thing » Blog Archive » Performance of content aggregation queries on multiple lists Says:

    [...] Performance of content aggregation queries on multiple lists A piece of advice here: you can either run multiple queries separately or create one to query content from specific lists. The results are similar (except a few peaks). The one thing not to do is to query all Page Libraries within the … [...]

  2. Amit Says:

    Hi Waldek,

    I'm working on extending the Content Query webpart to fetch the content from across the Site Collections. I've to show up the content from 3 different site collections (SiteCollA, SiteCollB & SiteCollC).

    Since the query remains the same for these 3 content sources, I build the query first in my code using CrossListQueryInfo object.

    CrossListQueryInfo queryInfo = new CrossListQueryInfo();
    and assign all the properties of CrossListQueryInfo.

    Now I pass this query to the three content sources, using separate CrossListQueryCache for each content source and merge the DataTable.

    for e.g.
    CrossListQueryCache queryCacheSiteCollA = new CrossListQueryCache(queryInfo);
    dtResults.Merge(queryCacheSiteCollA.GetSiteData(<SPSite object of SiteCollA>));

    CrossListQueryCache queryCacheSiteCollB = new CrossListQueryCache(queryInfo);
    dtResults.Merge(queryCacheSiteCollB.GetSiteData(<SPSite object of SiteCollB>));

    CrossListQueryCache queryCacheSiteCollC = new CrossListQueryCache(queryInfo);
    dtResults.Merge(queryCacheSiteCollC.GetSiteData(<SPSite object of SiteCollC>));

    I'm experiencing a strange behavior, If I set the UseCache property of CrossListQueryInfo as false, then I get the results immediately, of all three site collections.

    Whereas, if I set the UseCache property of CrossListQueryInfo as true, then

    On first page load, I got File Not Found Exception on webpage. After debugging, i found out that I got results of only first Site Collection i.e. SiteCollA, and got an exception at queryCacheSiteCollA.GetSiteData(<SPSite object of SiteCollA>), which says "Invalid URL".

    On second page load, I again got File Not Found Exception on webpage. After debugging, i found out that I get results of first Site Collection i.e. SiteCollA, and second Site Collection i.e. SiteCollB. I got an exception at queryCacheSiteCollC.GetSiteData(<SPSite object of SiteCollC>), which says "Invalid URL".

    On third page load, finally I got the correct result set on webpage.

    Have you experience anything like this, while doing custom aggregation using CrossListQueryInfo from across sites?
    & plz let me know What value did you use for the property UseCache of CrossListQueryInfo?

  3. Waldek Mastykarz Says:

    @Amit: so far I haven't used the CQWP to perform cross Site Collection queries. Personally if I had to do aggregate content from multiple Site Collections I would use SharePoint Search as it's better performing than running three queries and merging the DataTables with results.

  4. Roy Says:

    Hi Waldek,
    In some cases we need to retrieve data from multi-value fields (lookup/user), in such cases the SPSiteDataQuery will fail to retrieve the values.
    What would you say is the best approach to fetch this data?

    Thanks,
    Roy

  5. Waldek Mastykarz Says:

    @Roy: unfortunately, the SPSiteDataQuery doesn't support Multivalue Lookup Fields. Have you tried using search for that purpose?

  6. Shakti Srivastava Says:

    Hi,

    I also have the same issue…I want data to be fetched from all the site collections in a web application…Can you plz suggest any way for this…Apart from Search…
    I have tried a lot…BUt no luck….

    Also i have another problem…if you may help me…
    Problem:
    I am trying to filter a list using Data View Web part…and CHoice Filter web parts…BUT the problem is that Data View Web Part (being a consumer for choice filters) CANNOT accept more than one value from choice filter web parts…
    Is there any workaround or any alternative to this…
    I thought to do this using SPQuery approach BUT i face 2 challenges then:
    1. i have to show the result as a table BUT i want the look and feel like that of a Data View Web Part
    2. CAML queries do not support more than 2 AND operators…even if they do support then generating that query dynamically is a tough task…

    -Shakti

  7. Waldek Mastykarz Says:

    @Shakti: aggregating across Web Applications can be only done with Search. One more approach is to do aggregation per Site Collection and aggregate those, but it would perform very poorly.
    As for the other question, your scenario is pretty complex, and so is the solution. Unfortunately you cannot do it any easier than you already know.

Leave a Reply

Security Code:

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS
Copyright © 2007 - 2010 Waldek Mastykarz

Creative Commons License