Performance of various data merging methods
Recently I got the task to develop a Web Part which would aggregate the contents of a couple of RSS Feeds, sort them descending on the publishing date and display the top n of them. Thinking about how the whole thing could be done, I have found out that there are multiple methods to get things done. The Web Part was supposed to work on an Internet site so I decided to have a closer look at the performance of the various methods.
RSS Feeds are just an example. You could apply all the presented methods for merging any kind of XML documents as long as they are using the same schema.
Data merging methods
During the research on the possible methods I have defined three approaches on how the data merging could be done:
- Retrieving the XML documents (RSS Feeds) separately, reading them into DataTables and then, by iterating through the rows getting all the rows in one DataTable.
- Retrieving the XML documents separately, reading them into DataTables and then merging the DataTables into one using the DataTable.Merge method.
- Using a pure XSLT-based approach.
I have worked quite a lot with the first two approaches at the beginning with my SharePoint adventure. I knew those were suitable to do the job, but the challenging part was to make the XSLT take care for the whole process with very little custom code.
To test the performance of the various data merging methods I have prepared a few test runs. As I was about to test methods which would iterate through the rows, I wanted to check whether there would be a dependency between the number of RSS Feeds to merge, number of articles to display and the performance. I have defined the following tests:
- Merging two feeds, displaying 5 latest articles
- Merging two feeds, displaying 20 latest articles
- Merging four feeds, displaying 5 articles
- Merging four feeds, displaying 20 articles
In a real life scenario, you would obtain the RSS files from the Internet. Because the download speed and duration are not constant however, it could result in an inconclusive test. To suppress this factor, I have used in the test local copies of the RSS Feeds.
The image below presents the average results of the various data merging methods.
The first thing you notice is that the XSLT based approach is definitely much faster than other methods using DataTables. So if you have to provide a good performing data merging solution, you might want to check whether you are able to benefit of XSLT.
Another thing that becomes clear is, that there is little dependency between the number of Feeds to parse and article to display when you're using XSLT. The performance of DataTable based methods, is highly dependent on the number of items you need to parse and display. While it might be acceptable for situations when you're working with a few items, processing multiple Feeds and displaying more than a couple of items, might have negative effect on the performance.
In the test I have used the XslCompiledTransform. The values presented above were excluding the initial instantiation of the XslCompiledTransform object. Because in the real life scenario you would create a static instance of that object and reuse for subsequent calls, including it in the test result would provide negative picture of how that approach really performs.
While you might think that merging the DataTables using the DataTable.Merge method is faster than enumerating the DataRecords and adding them one by one, there is some extra work required to make the Merge method do the job. The first thing to do is to get rid of the primary keys and constraints what, depending on the schema of the XML file, might require you to access another tables from the DataSet.
What many use is not always the best. While creating the XSLT for data aggregation, I had to search for a couple of answers on the Internet. Surprisingly there were more people looking for similar answers, but eventually they would either fall back on the known and trusted DataTable based approaches or use very complicated XSL transforms.
Using XSLT is definitely a great way to merge data from different sources. You can benefit of its full power in particular, when both the input and the output are XML. If you want to provide a data merging solution, considering XSLT might improve its overall performance.