How to do it: RSS aggregation (merging multiple XML files using XSLT)


In my previous article I have presented you a little test case I have made for testing the performance of various data merging approaches. Using an XSLT based method turned out to be the best performing way of merging multiple XML files.

Combining multiple XML files into one is not difficult, but some parts of the process might nog be obvious at the first glance. Below you will find a step-by-step guide of creating your own XML data merging solution.

Let’s start with an empty XSLT file:

Empty XSLT document

The first thing to do is to read the XML documents you want to aggregate. In this example I’m using very simplified XML files which reflect the structure of an RSS Feed XML. To read the XML files I’m using the XSL document() function which takes as a parameter the path to the XML file. The path can be absolute, relative or can point directly to an RSS Feed located somewhere on the Internet. I’m storing only item nodes (/rss/channel/item) from the read documents in a variable. That will allow me to process the nodes further.

Importing XML documents into XSL

After the nodes have been stored in the Aggregation variable, it’s time to sort them. The most common scenario would be to sort all feeds descending on the publishing date stored in the pubDate node. The date is stored using the following format: ddd, dd MMM yyyy hh:mm:ss GMT. You can sort nodes in XSL using the xsl:sort element. Unfortunately it allows you to sort the data as number or string. Because the publishing date, as included in RSS, cannot be properly sorted by using neither of the two types, you have to split the sorting in a couple of steps.

Sorting retrieved articles descending on the publishing date. Sorting on months remains an issue as months are stored as strings

To do the sorting you have to obtain all the part of the date, like year, month, day, hour, minute and second separately. All of these values are numbers so you can sort the nodes using the standard xsl:sort element with the data-type set to number. The only challenge is to sort on month as it’s stored as text. To do this, I have created a new variable which contains all possible values separated with a semicolon. By comparing the position of the searched month name, you can actually do sorting on months!

Sorting retrieved articles correctly on months

The RSS variable does nothing besides sorting the nodes descending on the publishing date so all you want to do is to copy the complete node using the xsl:copy-of node.

The last thing to do is to build the output XML. I have chosen to build RSS-like XML, to provide reusability. For the brevity I have skipped some information required by an RSS Feed like title, description, etc.

The RSS variable contains a set of item nodes. The first thing you need to do is to create a document element for the output XML document (rss element). Getting the nodes to the output is as easy as copying them into the output.

Producing the XML RSS-like output

In many scenario’s you might need to provide an extra functionality of limiting the display to n latest feeds. Because you are aggregating multiple feeds, a raw output is very likely to be way too long. To provide this functionality I have added to the XSLT another variable (FeedsNumber) which stores the number of articles you want to display. Furthermore I have extended the template responsible for creating the XML output with a for-reach loop. To pick the first n items, I’m using the position() XSL function which returns the position of the current node in the RSS variable.

To keep the possibility to display all articles, I have provided a check: if the number of articles is equal to 0, the XSLT will display the complete result of the aggregation.

Selecting the n top articles

That’s all. By now it should be all working and you should get a result similar to the one below.

Results of the XSL transformation

The first column contains the XSLT file responsible for the transformation. In the next column you see the result of the data merging. The last two columns contain the two XML files used as input.

To spare you the typing, I have posted the source files I have used for testing as well. You can download them to either run some tests or extend it further to fit your requirements.

Summary

There are multiple ways of merging multiple XML documents. Many of them require custom code which can be difficult to maintain, scale and perform poorly. Using XSLT for this purpose provides you a great overview of the merging process, and allows you to design the merging in an easy an intuitive way. What’s even more important: XSLT based merging approach performs a couple of times faster than other custom code based methods.

Others found also helpful: