Optimizing Sitecore 8 XP for Internet search engines with canonical URLs


Canonical URLs help you optimize the ranking of your pages in Internet search engines. Here is how to render canonical URLs in Sitecore 8 XP.

One page, one URL?

When building websites each page has its own URL. That page can be then accessed through this URL. Ideally only one URL is associated with every page and when Internet search engines crawl the contents of your website, they know which URL to return for that page in the search results. In a simple website this might be the case but when building more dynamic websites things look slightly different.

In dynamic websites different URLs might return the same or nearly the same content. Either if you are reusing existing content across your website or returning dynamic content based on some parameters you end up with duplicated content available via different URLs. Although it’s not an issue for your visitors, it confuses Internet search engines. As a result you lose control about how Internet search engines rank your content and let them decide how to divide the page ranking.

The good news is, that there is a way to get control over your content back and to instruct Internet search engines how they should index your pages.

Canonical URLs

Canonical URL is the URL that Internet search engines should use to index your page at. Even if the same page is published via many URLs on your website, if all these pages have the same canonical URL, that page will be indexed only once using the URL you specified.

Canonical URLs are rendered in the head section of your webpage:

<link rel="canonical" href="http://www.mavention.com/">

So how would you get the canonical URL of your page in Sitecore?

Rendering canonical URLs in Sitecore 8 XP

As I mentioned before, if you don’t reuse content across your website or serve content dynamically, you might not need canonical URL information embedded in your pages. If you however do use the more advanced web content management capabilities of the Sitecore platform, including canonical URLs in your pages’ HTML might help you optimize the ranking of your pages in Internet search engines.

Getting item’s friendly URL

You can get the URL of your page in Sitecore by using the LinkManager class from the Sitecore API:

Sitecore.Links.LinkManager.GetItemUrl(Sitecore.Context.Item);

Getting item’s friendly absolute URL

Calling the GetItemUrl method will return a server-relative URL of your page.

To avoid problems with relative URLs and base paths Google recommends using absolute URLs for rendering canonical URLs.

We can easily turn the server-relative URL into absolute by passing options to the GetItemUrl method:

Sitecore.Links.UrlOptions urlOptions = Sitecore.Links.LinkManager.GetDefaultUrlOptions();
urlOptions.AlwaysIncludeServerUrl = true;
Sitecore.Links.LinkManager.GetItemUrl(Sitecore.Context.Item, urlOptions);

This time Sitecore will return an absolute friendly URL of the current page.

Rendering the canonical URL on a page

Assuming that you are building your website using MVC you could create a custom HTML helper method to render the canonical URL in your layout:

public static System.Web.HtmlString CanonicalUrl(this Sitecore.Mvc.Helpers.SitecoreHelper sitecoreHelper) {
    Sitecore.Links.UrlOptions urlOptions = Sitecore.Links.LinkManager.GetDefaultUrlOptions();
    urlOptions.AlwaysIncludeServerUrl = true;
    
    string canonicalUrl = Sitecore.Links.LinkManager.GetItemUrl(Sitecore.Context.Item, urlOptions);
    
    return new System.Web.HtmlString(canonicalUrl);
}

Then in your layout you would call the CanonicalUrl helper to render the canonical URL in your page:

<link rel="canonical" href="@Html.Sitecore().CanonicalUrl()">

This will render the canonical URL in your page’s HTML. Unfortunately there are a few more things that you need to take into account if you want to truly leverage Sitecore’s capabilities.

Multilingual and canonical URLs

Sitecore supports a number of different ways for building multilingual websites. Depending on your requirements you might for example choose to associate different top-level domains with each language or have them published as a path (ie. /en, /nl).

Sitecore takes your configuration into account and automatically includes it in the URL. If you want to control how your canonical URLs are built yourself, you might want to specify it explicitly in the URL options passed to the GetItemUrl method.

For example, by setting the UrlOptions.LanguageEmbedding property to Never you would instruct Sitecore never to include the language as a part of the URL:

Sitecore.Links.UrlOptions urlOptions = Sitecore.Links.LinkManager.GetDefaultUrlOptions();
urlOptions.AlwaysIncludeServerUrl = true;
urlOptions.LanguageEmbedding = Sitecore.Links.LanguageEmbedding.Never;

Sitecore.Links.LinkManager.GetItemUrl(Sitecore.Context.Item, urlOptions);

URL aliases and canonical URLs

One of the web content management capabilities provided with Sitecore 8 XP is the ability to specify URL aliases for items. Using aliases you can have multiple URLs associated with the same item. The important part here is that aliases are not redirects: the whole page is accessible through the alias and the alias remains visible as the URL of the page.

Although it might be desirable to use this capability for some scenarios, it results in making the same content available via multiple URLs and fragmenting your page’s ranking in Internet search results.

The good news is that if you are rendering canonical URL using the code snippet above, Sitecore will automatically revert to the original URL of the item rather than the alias used to view it. With that there is no additional effort to properly deal with aliases from the SEO point of view.

Inconvenient item clones and canonical URLs

Another web content management capability available with Sitecore 8 XP is the ability to clone items. Cloning items is similar to duplicating with the slight difference that there is a one-way relationship between the source and its clones. Changing the content of the source item can easily be pushed to its clones making it very easy for you to reuse content in your website.

Comparing to aliases clones are different as cloning creates a new item in the content hierarchy. Ideally, assuming that the majority of the content of the cloned item is the same as its source’s, you would want to render the canonical URL pointing to the URL of the source item when viewing cloned pages.

Before publishing an item and its clones, it’s easy to determine whether an item is a clone or not. If an item is a clone, the item from which it has been cloned can be retrieved through the Sitecore.Context.Item.Source property. Unfortunately, when retrieving this property for the published item, its value is empty.

One workaround around this limitation would be to create a new field in your Data template and in the template’s Standard values have the value of that field set to the $id token. When creating a new item that field would be set to the ID of the newly created item. Creating a clone off that item would however leave that value unaltered allowing you to check if the item is a clone or not. Considered this approach you could create the following function to get the source Item:

private static Item GetSourceItem(Item contextItem, string sourceItemIdFieldName) {
    Item sourceItem = contextItem;
    
    if (!String.IsNullOrEmpty(sourceItemIdFieldName)) {
        Guid sourceItemGuid = Guid.Empty;
        
        if (Guid.TryParse(sourceItem[sourceItemIdFieldName], out sourceItemGuid)) {
            ID sourceItemId = new ID(sourceItemGuid);
            
            if (sourceItem.ID != sourceItemId) {
                Item item = contextItem.Database.GetItem(sourceItemId);
                
                if (item != null) {
                    sourceItem = item;
                }
            }
        }
    }
    
    return sourceItem;
}

Then, in order to retrieve the source item to render the canonical URL you would call the method:

Sitecore.Data.Items.Item item = GetSourceItem(Sitecore.Context.Item, "SourceItemId");

Canonical query string parameters

One of the reasons for which you might want to use canonical URLs in your website might be that you are loading dynamic content based on some query string parameters. Depending on the parameter the rendered content might differ less or more. To control how your pages are indexed and ranked in Internet search results you would want to tell search engines which query string parameters they should treat as if they returned different pages.

To include canonical query string parameters in the canonical URL we would first need to tell the HTML helper which query string parameters should be included in the canonical URL. You could have those stored along with your website’s configuration but you could as well pass them directly into the HTML helper:

public static HtmlString CanonicalUrl(this SC.Mvc.Helpers.SitecoreHelper sitecoreHelper, string[] canonicalQueryStringParameters) {
    // ...
}

In your layout you would extend the call to the helper by passing the list of canonical query string parameters:

<link rel="canonical" href="@Html.Sitecore().CanonicalUrl(new string[] { "a" })">

The following helper method would be responsible for filtering the currently used query string parameters and remove all but the canonical query string parameters:

private static string GetCanonicalQueryString(string rawUrl, string[] canonicalQueryStringParameters) {
    string canonicalQueryString = null;
    
    if (rawUrl.IndexOf('?') > 0 &&
        canonicalQueryStringParameters != null &&
        canonicalQueryStringParameters.Length > 0) {
        string queryString = rawUrl.Substring(rawUrl.IndexOf('?'));
        List<string> canonicalUrlQueryString = new List<string>();
        NameValueCollection queryStringParameters = HttpUtility.ParseQueryString(queryString);
        
        foreach (string queryStringParameterName in queryStringParameters.AllKeys) {
            if (canonicalQueryStringParameters.Contains(queryStringParameterName)) {
                canonicalUrlQueryString.Add(String.Format("{0}={1}", queryStringParameterName, queryStringParameters[queryStringParameterName]));
            }
        }
        
        if (canonicalUrlQueryString.Count > 0) {
            canonicalQueryString = "?" + String.Join("&", canonicalUrlQueryString);
        }
    }
    
    return canonicalQueryString;
}

Before rendering the canonical URL we would call this helper method and append the canonical query string parameters to the canonical URL if applicable:

string canonicalQueryString = GetCanonicalQueryString(Sitecore.Context.RawUrl, canonicalQueryStringParameters);

if (!String.IsNullOrEmpty(canonicalQueryString)) {
    canonicalUrl += canonicalQueryString;
}

Although you might need to adapt this code to your specific requirements, it should be sufficient for the most common scenarios.

Summary

Canonical URLs help you optimize the ranking of your pages in Internet search engines. Sitecore offers rich web content management capabilities that offer you great flexibility with regards to publishing content on your website but which also require you to instruct Internet search engines how to index your web pages. Using canonical URLs you can prevent the same pages on your website being indexed multiple times and fragmenting their page ranking.

Others found also helpful: