When crawling content with SharePoint Search you might stumble upon the following warning “The item has been truncated in the index because it exceeds the maximum size.”. So what is it and, what’s more important, how to get rid of it?
If you want to leverage the new Web Content Management (WCM) capabilities of SharePoint 2013 the odds are high that you will be looking into implementing search-driven publishing. The idea behind the search-driven publishing model is that all of your content is indexed by SharePoint Search and from there can be published to any other location that has access to the Search Service Application that crawled the content.
Because you will be crawling all of your content it’s very likely that you will get some short pieces of content as well as some longer chunks, let’s say longer than… 16384 characters.
When crawling long content SharePoint will trim it by default to the first 16384 characters. When examining the Crawl Log you will see warnings similar to the following:
The item has been truncated in the index because it exceeds the maximum size. ( Item truncated. Field=PublishingPageContentOWSHTML, Occurrences=27931, Chars=16384; )
Additionally, if you search for one of the items reported as truncated, you will see that only the first 16384 characters will be rendered on page.
In order to avoid content truncation you have to increase the number of characters that the particular Managed Property can store. For this for each property reported as truncated in the Search Crawl Log run the following PowerShell snippet:
$ssa = Get-SPEnterpriseSearchServiceApplication $mp = Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Identity "PublishingPageContentOWSHTML" $mp.MaxCharactersInPropertyStoreForRetrieval = 2097152 $mp.Update()
After updating the Managed Property execute a Full Crawl on your Content Source. With that change applied you shouldn’t see this warning in your Crawl Log anymore.