Standard SharePoint Publishing Site accessed using a semantic URL: /Press-Releases/Press-Release

We all know Microsoft Office SharePoint Server (MOSS) 2007 Web Content Management (WCM) solutions for their Pages URL’s. Purist web designers/developers hate SharePoint not only for the fact that it’s injecting something into URL’s but mostly for the inability of changing anything about it. And while many people think that SharePoint and semantic URL’s just don’t play along, it turns out that there is a solution – one that doesn’t involve a single line of custom code.

It’s almost history

It’s not the first time that I was thinking about SharePoint and semantic URL’s. Last year I accepted the challenge and came up with a solution. I developed a custom HttpModule which allowed you to use semantic URL’s in SharePoint WCM solutions. The logic inside the module would rewrite the requested URL to a valid page in SharePoint.

While that module did the job it had quite a few limitations. First of all it used some custom logic to try to determine what page/URL has been requested. The way I created the proof-of-concept you didn’t have any kind of access to that logic from outside the module. Changing anything about the way the URL’s should be rewritten was equal to having to dive into the code and redeploy the module. To mitigate this issue you could provide some kind of storage for the rewrite rules but it would mean quite some development to get it done.

Another serious drawback that the module had was the fact that it was an HttpModule. As far as I know using custom modules in your SharePoint solutions takes the Web Application into unsupported state. That’s because the HttpModules are being registered in the ASP.NET request/response pipeline and can conflict with the SharePoint runtime. As HttpModules are easy to disable (a single line in web.config) the change is not irreversible. Still, wouldn’t it be better if we had something, let’s say, available out of the box?

Semantic URL’s vNext – totally codeless

A couple of months ago Scott Guthrie’s team released the Microsoft Web Platform Installer – a set of components for configuration of web servers and web development. The installer allows you to extend your server with quite a few different tools and applications among which the URL Rewrite module (stable 1.1 and a beta 2.0 available at the moment). It turns out that using the URL Rewrite module you can turn the default /Press-Releases/Pages/Press-Release.aspx into a semantic /Press-Releases/Press-Release URL without a single line of code.

Semantic URL’s with the IIS7 URL Rewrite module – before we start

In the following steps I’m assuming that you have already installed the URL Rewrite module. As the installation is really straight-forward and basically means navigating to http://www.microsoft.com/web/ and selecting the apps you want to install on your server I thought I’d better just get to the point and show you how to make SharePoint URL’s semantic.

In order to make semantic URL’s work in SharePoint you will have to take quite a few things into account. The URL Rewrite module uses Rules to determine whether the URL should be processed and if so, how it should be processed. Basically all you have to do is to create a set of rules to cover all the different sorts of URL’s that make part of your web site.

The rewriting Rules use Regular Expressions to define the URL pattern. If you really want to understand how it all works, you should consider spending sometimes on understanding the basics of Regular Expressions.

Step 1: Rewrite /Press-Releases/Pages/Press-Release.aspx into /Press-Releases/Press-Release

Let’s start off with the basics. The first thing we’ll do is to rewrite a URL of a Publishing Page located under a subsite.

Let’s add a new Rule:

IIS7 URL Rewrite module Add Rule dialog

The way the Rules work in the URL Rewrite module is that you get a semantic URL and have to rewrite it so that it points to a valid SharePoint URL. That means that we will be rewriting /Press-Releases/Press-Release to /Press-Releases/Pages/Press-Release.aspx rather than the other way around.

Let’s create the first pattern:

^(.*?)([^/]+)$

This pattern separates the subsites from the page:

Testing the pattern returns three back references: 0 to the whole URL, 1 to the path (subsites) and 2 to the page URL

It is important to notice that the URL Rewrite module retrieves the URL’s without the leading /, so a URL like /Press-Releases/Press-Release will enter the URL Rewrite’s processing engine as Press-Releases/Press-Release.

Let’s finish the rule by providing the Rewrite URL and applying the changes:

Completed Rule including the Rewrite URL

and let’s have a look at the results so far:

The default SharePoint Publishing Site after applying the first rewriting Rule: the CSS and JavaScript files are missing

Although you can see the contents of the page, there are still some things that need to be configured. At this moment the page misses all the CSS and JavaScript files and images.

As I mentioned before, properly implementing a URL rewriting solution is all about recognizing all the different types of URL’s and patterns that they are using. The pattern we have just provided is too general: it rewrites not only URL’s of Publishing Pages but other resources as well. We need to add some more information to the rule in order to make it work.

The conditions

If you look at the HTML source, you will find quite a few URL’s like /Style%20Library/en-US/Core%20Styles/controls.css. When you test such URL against the expression we’ve just entered, you’ll see that it passes the test: the Rule thinks it’s a semantic URL:

Results of testing a URL to a CSS file against the first expression: the URL is being faulty recognized as a semantic URL

The above URL is an example of a rewriting exception. While working with URL Rewriting rules you will not only have to recognize them but also apply them either as rules or exceptions and set them in the right order to get things working correctly.

The URL Rewrite module allows you not only to provide a URL pattern but also to define some conditions which can help you exclude certain URL’s from being processed by the specific rule.

Let’s add our first condition which will help us exclude all URL’s which end with an extension:

^.*?[^/]+\.[^/]+$

Adding a condition to ignore all URL’s which end with an extension

and let’s have a look at the results:

Standard Publishing Site after applying the condition: the page looks as it’s supposed to

At this point the Publishing Page looks correctly. Next to fixing the branding of the Publishing Page the above rule automatically excludes from rewriting URL’s to application pages (_layouts/.., etc.) and lists forms (Lists/…/… or Forms/…).

There are however still some scenario’s that the first rule doesn’t cover.

Step 2: The Welcome Page

If you type /Press-Releases/ in your browser, you notice that the URL is being picked up by SharePoint which automatically provides a redirect to the Welcome Page. That redirect is quite bad from the SEO perspective as it is a temporary (302) redirect. If a search engine finds such URL it doesn’t follow it as the target might be changing in the near future, meaning everything that is underneath such URL won’t get crawled.

We can fix the problem of the Welcome Page redirect using a simple rule:

^(.*/)?$

Adding the rule to fix redirecting to the Welcome Page

If you navigate to /Press-Releases/ now you should see the Welcome Page directly without being redirected to Pages/default.aspx:

Welcome Page accessed by using the parent site’s URL ending with a slash

The above Rule also covers redirecting from the Site Collection root:

Browsing the Root Welcome Page with a semantic URL

Step 3: The links

So far we have added to our web site support for semantic URL’s. We can access both sites and pages using semantic URL’s. The Rules we have defined rewrite semantic URL’s into valid SharePoint URL’s. But what about all the links included on pages?

From the content point of view

As you might’ve expected this is something you will have to cover by yourself. Most of the links are included inside the content meaning you can edit them yourself to point to semantic URL’s. As for the navigation you would either have to create your own controls or create Control Adapters to rewrite the URL’s rendered by the menu control.

Another approach you might consider is to create a Page Adapter to rewrite all URL’s in the whole page once it’s rendered. The downside is that such approach requires working with Regular Expressions on a large string (HTML source of the complete page) and might have a too large performance penalty to accept. Once again it all depends on your scenario and requirements.

From the SEO point of view

Making your URL’s semantic is even more important from the SEO point of view. At this point in our solution, SharePoint accepts both semantic and out-of-the-box URL’s, meaning a search bot can access the same content using two different URL’s. In order to improve the SEO of your web site, your content should be accessible by only one URL which can be used to distinguish all the different pages.

While rewriting the URL’s for users might get quite challenging, making semantic URL’s for the search engines is a bit easier and can be done using the URL Rewrite module.

The URL Rewrite module allows you not only to define Rules for rewriting URL’s but also to define redirect Rules.

Let’s create a Rule that will redirect all requests to Publishing Pages using the .aspx URL’s to their semantic equivalents:

^(.*/)?Pages/([^/]+)\.aspx$

Adding the Rule to redirect all requests to Publishing Pages using the .aspx URL’s to their semantic equivalents

There are a couple of things you should note here. First of all while defining this rule we have checked the Ignore Case checkbox. Because the Regular Expression we have used to define the pattern contains the Pages word which could be also spelled as pages and IIS isn’t case-sensitive we have to make the Regular Expression case-insensitive.

Another thing is the redirect itself. In the Action properties group there are two fields: Redirect type and Redirect URL (which should be called the other way around). What should be Redirect type and is labeled as Redirect URL allows you to pick a few types of redirects. The Permanent (301) redirect is the one we’re interested in as we want the search engine to index the semantic URL and not the .aspx one.

Let’s apply the changes and have a look at the results. Let’s navigate to our Press Release using the menu:

Navigating to the Press Release using the non-semantic link in the navigation is being redirected to the semantic URL

While it seems to work correctly, try to navigate to the Press Releases site using the menu:

Navigating to the Press Releases site using the non-semantic link in the navigation is being redirected to the semantic URL but it ends with default

Have you noticed the default word at the end of the URL? That’s because our Rule doesn’t distinguish between Welcome Pages and regular Publishing Pages.

Step 4: The Welcome Page link

Let’s define a new Rule which will help us fix redirecting the links to Welcome Pages:

^(.*/)?Pages/default.aspx$

Adding the Rule for redirecting Welcome Pages to their semantic equivalents

If you navigate to the Press Releases site right after applying the changes you won’t see any difference. Nothing happens because the previous Rule is above the one we’ve just defined. Because the URL matches the previously defined pattern, it never enters the Rule we’ve just defined.

We can fix this easily by moving the last Rule up:

Moving the Rule from Step 4 up

Let’s have a look at the Press Releases site once again:

Requesting a Welcome Page using a non-semantic link is being correctly redirected to the URL of the parent site with a trailing /

And that’s all: in a few easy steps we have turned standard SharePoint URL’s into semantic, web-ready URL’s. And we did all that without a single line of code.

Sites vs. Pages URL’s

Implementing semantic URL’s in SharePoint using the URL Rewrite module is pretty straight-forward. There are however a couple of things you should keep in mind.

One of such things is distinguishing between sites and pages URL’s. The only way to distinguish between a page and a site is by using the trailing / in the URL. After implementing the semantic URL’s /Press-Releases became a page URL. Before that you could use that URL to navigate to the Press Releases site. As the URL Rewrite module has no knowledge of SharePoint being on the other side of the line it’s up to you to provide correct URL’s.

Summary

URL Rewrite module provides you with an easy way to do something many considered very challenging/impossible: making URL’s of Web Content Management solutions built on top of Microsoft Office SharePoint Server 2007 semantic. Using nothing more than a few simple Rules and no custom code you can make your URL’s web-ready and increase the findability of your content.