Search engine optimization of multilingual URLs of bucketed items in Sitecore


When building multilingual public-facing websites with Sitecore you want to ensure that your content can easily be found using Internet search engines. Although localizing URLs in Sitecore just works, things get a bit more complicated when working with item buckets.

Item buckets in Sitecore

When dealing with large number of items in Sitecore, it’s a good practice not to store them under a single parent but to use an arbitrary hierarchy to store items in sets of around 100. Splitting a large number of items into smaller sets not only improves the performance of the Content Editor but also makes content queries run faster and improves the content management experience in general.

There are different opinions whether you should split a large number of items into smaller sets using the standard Item Bucket functionality provided with Sitecore or build your own solution based on wildcard items. Either way you will end up with that arbitrary hierarchy being included in URLs of all bucketed items. If your hierarchy is meaningful to your visitors and offers added value from the search engine optimization point of view there is nothing that you have to do at this point. More than often however, you will use a hierarchy based on date and time or something else equally meaningless to your visitors and will prefer not to include it in the public URLs of your items.

Excluding bucket hierarchy from the URLs of bucketed items

In order to exclude the bucket hierarchy from the URLs of bucketed items you need two things: a custom Link Provider, that will build URLs of bucketed items excluding the bucket hierarchy, and a custom Item Resolver pipeline, which will resolve bucketed items based on their URLs excluding the bucket hierarchy. Following is a sample custom Link Provider that excludes bucket hierarchy from the URL of bucketed items:

public class BucketItemLinkProvider : LinkProvider {
    public override string GetItemUrl(Item item, UrlOptions options) {
        if (BucketManager.IsItemContainedWithinBucket(item)) {
            Item bucketItem = item.GetParentBucketItemOrParent();

            if (bucketItem != null && bucketItem.IsABucket()) {
                string bucketUrl = base.GetItemUrl(bucketItem, options);

                if (options.AddAspxExtension) {
                    bucketUrl = bucketUrl.Replace(".aspx", String.Empty);
                }

                string slug = options.UseDisplayName ? item.DisplayName : item.Name;

                return FileUtil.MakePath(bucketUrl, slug) +
                    (options.AddAspxExtension ? ".aspx" : String.Empty);
            }
        }

        return base.GetItemUrl(item, options);
    }
}

For a custom Link Provider to work in Sitecore it has to be registered in the configuration. This can be done using the following include file:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <linkManager>
      <providers>
        <add name="sitecore">
          <patch:attribute name="type">Mavention.Sitecore.Publishing.Providers.BucketItemLinkProvider, Mavention.Sitecore.Publishing</patch:attribute>
        </add>
      </providers>
    </linkManager>
  </sitecore>
</configuration>

Although building a custom Link Provider and Item Resolver requires some custom development and knowledge of Sitecore, it’s not overly complex. Things get slightly more complicated however, when you are building URLs using Display Name on your website.

Building URLs using Display Name

By default Sitecore builds items’ URLs using the item’s Name. By changing the settings of the Link Manager in web.config you can however choose to use the item’s Display Name instead. One reason why you might want to do this, is to support localized URLs for multilingual items. Where item’s Name is shared across the different languages, its Display Name can be translated offering additional search engine optimization benefits in result.

While no additional effort is required to have Sitecore use item’s Display Name for building its URL, there are some additional considerations when building custom Item Resolver for bucketed items.

Resolving bucketed items using simplified URLs based on item’s Display Name

When working with multilingual websites, it’s very likely that not only URLs of bucketed items will be localized but URLs of item buckets as well. When resolving bucketed items using their localized simplified URLs the first step is therefore to retrieve the item bucket.

public class BucketItemResolver : HttpRequestProcessor {
    public override void Process(HttpRequestArgs args) {
        if (Context.Item == null) {
            string requestUrl = args.LocalPath;

            int pos = requestUrl.LastIndexOf('/');
            if (pos > 0) {
                string bucketPath = requestUrl.Substring(0, pos);
                bool useDisplayName = LinkManager.GetDefaultUrlOptions().UseDisplayName;

                Item bucketItem = null;
                if (useDisplayName) {
                    bucketItem = GetBucketItemUsingDisplayName(bucketPath, args);
                }
                else {
                    string bucketFullPath = args.Url.ItemPath.Substring(0, args.Url.ItemPath.LastIndexOf('/'));
                    bucketItem = args.GetItem(bucketFullPath);
                }
            }
        }
    }

    private static Item GetBucketItemUsingDisplayName(string itemPath, HttpRequestArgs args) {
        Item item = null;

        using (new SecurityDisabler()) {
            SiteContext site = Context.Site;

            if (site != null) {
                Item startItem = ItemManager.GetItem(site.RootPath + site.StartItem, Language.Current, SC.Data.Version.Latest, Context.Database, SecurityCheck.Disable);

                if (startItem != null) {
                    ContentItemPathResolver pathResolver = new ContentItemPathResolver();
                    item = pathResolver.ResolveItem(itemPath, startItem);
                }
            }

            if (item == null) {
                return null;
            }
        }

        return args.ApplySecurity(item);
    }
}

First, from the URL we remove the page’s slug to get the URL of the item bucket. Based on that URL we then retrieve the item bucket. Since the HttpRequestArgs.GetItem method doesn’t support retrieving items using their Display Name we have to resort to a custom method which first retrieves the start item of the site and uses it to get the item bucket using its URL.

Once we have retrieved the item bucket, the next step is to retrieve the requested bucketed item using its slug.

if (bucketItem != null && BucketManager.IsBucket(bucketItem)) {
    string slug = requestUrl.Substring(pos + 1);

    using (IProviderSearchContext searchContext = 
        ContentSearchManager.GetIndex((SitecoreIndexableItem)bucketItem).CreateSearchContext()) {
        SearchResultItem result = null;

        if (useDisplayName) {
            result = searchContext.GetQueryable<SearchResultItem>()
                .Where(x => x["_displayname"] == slug).FirstOrDefault();
        }
        else {
            result = searchContext.GetQueryable<SearchResultItem>()
                .Where(x => x.Name == slug).FirstOrDefault();
        }

        if (result != null) {
            Context.Item = result.GetItem();
        }
    }
}

As mentioned before the slug is stored in the Display Name field. Since an item bucket may contain hundreds of items it’s essential to retrieve items with as little performance impact as possible. In Sitecore the Content Search API can be used for this purpose. Using the retrieved item bucket reference we first construct a search context. Then, using a query, we search for the first item of which the Display Name is equal to the slug passed in the URL. If an item has been found, it’s being set as the context item for further processing by the Sitecore runtime.

By default the Display Name is excluded from the search index. For the above code to work it has to be included in the index. This can be done by removing the exclusion using the following include file:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration>
          <exclude hint="list:ExcludeField">
            <__display_name>
              <patch:delete />
            </__display_name>
          </exclude>
        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

The last thing left is to register the custom Item Resolver with Sitecore using an include file:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <httpRequestBegin>
        <processor patch:after="processor[@type='Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel']" type="Mavention.Sitecore.Publishing.Pipelines.BucketItemResolver, Mavention.Sitecore.Publishing" />
      </httpRequestBegin>
    </pipelines>
  </sitecore>
</configuration>

Summary

When building multilingual public-facing websites with Sitecore you want to ensure that your content can easily be found using Internet search engines. A good practice is to translate URLs of your items along with their content. In order to optimize localized and simplified URLs of bucketed items for Internet search engines a custom Link Provider and Item Resolver are needed. Particular attention is required for building the custom Item Resolver where retrieving items using their Display Names requires additional effort.

Others found also helpful: