Remove HTML markup in Content Query Web Part


How many times have you tried to generate a short preview out of a Rich HTML field using the Content Query Web Part (CQWP)? How many times did you end up making your customers using the Description field instead just because the CQWP doesn’t provide an out of the box mechanism for removing HTML markup? Guess what: it does.

Removing the HTML in the real world

What would you do if someone asked you to remove all HTML from some string in your custom application? Depending on your skills you would either create an overweighed IndexOf/Replace method or use a simple Regular Expression (RegEx): <[a-zA-Z/][^>]+> (instead of the common <[^>]+> to prevent from removing all  regular < and > from text). Either way a solution is just around the corner. But what if someone would ask you to do the same with the output of the Content Query Web Part?

As we all know CQWP uses XSLT in the presentation layer. And while the XSLT is really extensible the CQWP ships with just a minimal implementation extended with the ddwrt namespace. The downside is that neither the default XSLT implementation nor the ddwrt namespace provide a string replace function.

Does it mean that you have to subclass the Content Query Web Part and extend its XSLT in order to do something as simple as removing all HTML markup from a string? You could but there is an easier way…

Introducing XSLT templates.

You could think of XSLT templates as functions. You will find many of them inside the out of the box XSLT files like ItemStyle.xsl and ContentQueryMain.xsl shipped with the Content Query Web Part. You can apply templates to particular nodes using XPath expressions but you can also call them passing parameters. Knowing that you could create a template which would remove all HTML markup from any string:

<xsl:template name="Imtech.RemoveHtml">
  <xsl:param name="String"/>
  <xsl:choose>
    <xsl:when test="contains($String, '&lt;')">
      <xsl:value-of select="substring-before($String, '&lt;')"/>
      <xsl:call-template name="Imtech.RemoveHtml">
        <xsl:with-param name="String"
          select="substring-after($String, '&gt;')"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$String"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

…and then call it like this:

<xsl:call-template name="Imtech.RemoveHtml">
  <xsl:with-param name="String"
    select="'Some content with HTML'"/>
</xsl:call-template>

How it works?

The template first checks whether the String contains any < (for the purposes of XSLT encoded as < in line 4). If the string doesn’t contain any < it means that it doesn’t contain any HTML markup and is passed to the output as-is (line 12).

If there is a < however, the template does two things. First of all it passes all the string before the first < to the output (line 5). Secondly it calls itself recursively (line 6) passing as the String parameter the part just after the first > (encoded as > for using in XSLT; lines 7/8).

While the above template is really easy and sufficient for most situations, it’s not fool proof unfortunately. Imagine having a string, like: “2 < 3 and 4 > 3”. Passing it through the template would result in “2 3” – not really something you meant. If you’re looking for a more powerful and flexible mechanism you will end up with a subclassed Content Query Web Part and a custom XSLT function – at least until we find out how to make the above template smarter.

Technorati Tags: SharePoint, SharePoint 2007, MOSS 2007, WCM, CQWP, Content Query Web Part, XSLT

Others found also helpful: