Semantic URL's in MOSS 2007 (Imtech SharePoint Semantic URL's free Feature)


Semantic URL’s are one of the things I always wanted to have in Microsoft Office SharePoint Server 2007. Since the first time I have opened an Internet facing site made in SharePoint 2007 I knew it would be great if the Pages part of the URL could be removed as well as the .aspx extension.

I believe that everyone working on Web Content Management (WCM) solutions would agree, that semantic URL’s are very important in today’s Web. First of all they are much more user friendly. Visitors are able to memorize certain URL’s based on their information. Compare for example http://mycompany.com/Division/Services with http://www.mycompany.com/Division/Pages/Services.aspx. The second one will be obviously much more difficult to remember for someone who doesn’t know SharePoint.

Another reason, why semantic URL’s are important, is the Search Engine Optimization (SEO). I like to think of it as a set of best practices on how to get the most of your site and let search engines index the content of your web site as high as possible. At the end of the day it’s all about finding and being found right? Semantic URL’s are being used by the most popular and most important search engines. Depending on how short and how meaningful the URL is, the found result will end up higher in the query results.

Getting back to MOSS 2007… SharePoint 2007 has been built upon ASP.NET 2.0. Theoretically configuring a URL rewriting engine should get you there, right? Well, not quite. It will take you only half way there. In order to have your URL’s being picked up by the URL rewriting engine, they have to match a certain pattern. So you either need an extra level in the URL (for example http://mycompany.com/p/Division/Services where you would pick up everything after /p/ and remap it to the real page) or you would need to define an extension which would tell the redirect engine to remap the request. The second solution is being used for example by the Rapid for SharePoint, which uses the .htmx and .htmc extensions.

In my opinion none of the above is the right solution. In the first case you create an extra level in the hierarchy. Just like sticking the /Pages/ between the hierarchy and the page, putting something else meaningless in the URL will definitely not improve either the user friendliness or the SEO of your site. The second approach gets you only half way there. While the URL’s are more semantic the extension is still there.

The Challenge

URL rewriting in SharePoint 2007 is quite challenging in my opinion. There are few different types of URL’s you must consider.

First of all the Publishing Pages - these are the URL’s you want to rewrite, because these URL’s are being used by search engines and are exposed to the visitors.

Then there are all the resources like CSS, JavaScript, images and so on. You don’t need to rewrite these URL’s.

Then there are a couple of SharePoint resources which you don’t want to rewrite like for example Application Pages (.aspx pages which reside in the _layouts directory), View Forms (.aspx pages used with lists which reside in the /Forms/ folder of a list) and SharePoint Web Services to name a few.

As we want to use extensionless URL’s another challenge appears: how to distinguish a Publishing Page from Publishing Site or a list? Probably the only way would be to get the URL and try to open an instance of SPWeb from it. If it succeeds it’s a Site otherwise it should be a Publishing Page. Can you imagine the performance penalty of such an approach?

Introducing the Imtech SharePoint Semantic URL’s Feature

As I’ve already mentioned, I’ve been thinking on solving this challenge for quite some time now. Just recently I have succeeded in creating a working Proof-of-concept.

The main functionality of the Imtech SharePoint Semantic URL’s Feature is to allow you to use semantic URL’s on your WCM sites. By using it, you can refer to your pages using for example: http://mycompany.com/Division/About instead of http://mycompany.com/Division/Pages/About.aspx.

The Feature doesn’t use a redirect, so the pages will get indexed by the search engine.

There is a little performance penalty for benefiting of the semantic URL’s in SharePoint. There is no magic to it: I have to do the check as well, but in the Feature I have tried to postpone it as long as possible and run a few different - very lightweight checks, to check whether the URL should be redirected or not.

First of all I check whether the last part of the URL contains a dot. If so, the URL won’t be remapped. So for example http://mycompany.com/Division/Images/image.jpg won’t be remapped while http://mycompany.com/Diviosn/About will point to http://mycompany.com/Division/Pages/About.aspx. This is actually the only limitation which seemed to me like a fair price to pay for the fact that you can use semantic URL’s in SharePoint 2007.

The next thing I do is to check whether the URL refers to one of the SharePoint system paths, like _layouts, Forms, Lists, _vti_bin, etc. These URL’s are not rewritten as well.

Finally the last thing is to check whether the URL is a valid Site URL or it points to a Publishing Page and needs to be remapped. This check is the biggest bottle neck of the Feature. Because WCM solutions are all about performance and great User Experience (UX) I have developed some extra functionality to suppress the penalty of the remapping.

Introducing the Imtech SharePoint Semantic URL’s Crawler

In order to improve the overall performance of the remapping I have decided to keep track of the rewritten URL’s. Each time a URL is processed by the Imtech SharePoint Semantic URL’s Feature it is being stored in a dictionary. The second time the URL is called, the Feature knows what to do with the URL and it doesn’t have to be processed anymore.

Another thing I have included in the Feature is a crawler for creating the semantic URL’s. Basically it enumerates all sites and pages within the current Site Collection and creates semantic URL’s for each one of them. These URL’s are being stored in the cache so that they can be used by the remapping module. Practically you would want to run the crawler each time you have created a new Site in order to avoid the extra penalty.

Imtech SharePoint Semantic URL's crawler 

And what about all the other URL’s which are not being crawled? All these URL’s are being stored as well, so eventually, as long as the Application Pool is alive, each URL’s will be processed only once and then stored in the URL map to avoid any performance penalties.

The Imtech SharePoint Semantic URL’s Feature is fully functional. At this moment however, there are number of things you have to take care of yourself. While the Feature allows you to use semantic URL’s to request SharePoint content, you have to take care yourself that you’re using such links. This requires not only adjusting any existing links within the content of the web site but slightly modifying the navigation as well. Eventually the Imtech SharePoint Semantic URL’s Feature will contains all components required to render semantic URL’s in the HTML.

Imtech SharePoint Semantic URL’s Feature is free and you can use it without any restrictions. As I’m sharing it with you in the early stage it would be great if you let me know that you’re using it and what your experience is.

Download: Imtech SharePoint Semantic URL’s v1.0.0.0 (11KB)

Technorati Tags: SharePoint, SharePoint 2007, MOSS 2007, WCM

Others found also helpful: