Sitecore Azure Search Issues

As soon as Sitecore 8.2 came out with a PAAS option i adopted immediately for a client. The main driving factor was that the client was a Microsoft shop and the idea of having a java search tool (SOLR) was a hard thing for them to swallow. They loved the idea of Azure Search and bought into it immediately.

I had my concerns about using a new technology in Sitecore but i decided to give it a try anyway. I found a number of trouble spots, both with Sitecore’s implementation as well as some limitations of the tool in general.

Sitecore API bugs

Sitecore Search API unable to query by datetimes.

If you’re using Sitecore’s API to access the search index, which is Sitecore’s recommended way to go, you are unable to query by datetimes.  So if you’re making an event search tool, you might want to either reconsider using Sitecore’s search API or go with a direct to Azure Search solution.

Cannot query for ID right after an app pool recycle

Immediately following an app pool recycle the Sitecore search api is unable to query by item ID.  This however can be alleviated if you go direct to the index.

Azure Search shortcomings

 

Azure Search only facets with AND logic never OR logic

A very common search scenario is to have the logic be OR within a particular grouping then and between groups.  For example, if you are looking for a new PC you might want to search for a PC that has an I7 processor and is in the price range of 600 – 1200 dollars, if you’re give range facets of 600 – 800, 800 – 1000, and 1000 – 1200.  Logically you would want to select all 3 options to have the range and processor you want.  However the faceting in Azure Search makes it not possible to do this, as soon as you select 600 – 800 all the other options will disappear as there can’t be computers that fall under 2 separate price ranges.

Hypothetically it would be possible to overcome this by making multiple queries to the index, however it would be a quite complex solution and increase the load on your index by potentially many times.

There is no ability to use wild cards in filters

The only wild cards that are accepted are in the text search query, not for filters.  Say for example you’re creating a search for restaurants and you want to give the user the ability to search for a city and text search.  You need to assume that the user typed the entire city name and not just a fragment in order to get any results.

As far as i can figure there isn’t a reasonable way to overcome this issue.

Recommendation

As it stands i would certainly recommend using SOLR in the cloud if you want to do any amount of work with the index.  A good cloud set up i have used is to set up an IAAS VM running SOLR and a virtual network into your PAAS Sitecore environment for fast connectivity.  Strangely enough, this also seems to cost less money than Azure Search.

Direct To Azure Search API

While i don’t recommend it at this time, the tool itself was quite easy to work with directly. Take a look at the documentation  to get started.

Here is an example of taking the connection string Sitecore uses and creating an Azure Search API object.

			ConnectionStringSettings search = ConfigurationManager.ConnectionStrings["cloud.search"];
			if (search == null)
				throw new Exception("Missing connection string for Azure Search");
			Dictionary<string, string> connStringParts = search.ConnectionString.Split(';')
	.Select(t => t.Split(new char[] { '=' }, 2))
	.ToDictionary(t => t[0].Trim(), t => t[1].Trim(), StringComparer.InvariantCultureIgnoreCase);
			try
			{
				SearchServiceClient client = new SearchServiceClient(new Uri(connStringParts["serviceUrl"]),
					new SearchCredentials(connStringParts["apiKey"]));
			//use or cache client
			}
			catch (Exception e)
			{
				throw new Exception("Unable to use connection string values", e);
			}

This is an example of using the index to set up a search with faceting, pagination, and sorting.  Note that i’m using a constants class to abstract away string literals.

			SearchParameters parameters = new SearchParameters
			{
				QueryType = QueryType.Simple,
				Skip = page * 10,
				IncludeTotalResultCount = true,
				Top = 10,
				SearchFields = new List<string>
				{
					AzureSearchConstants.FirstName,
					AzureSearchConstants.LastName,
					AzureSearchConstants.GroupName,
					AzureSearchConstants.LocationName,
					AzureSearchConstants.Address,
					AzureSearchConstants.City,
					AzureSearchConstants.County,
					AzureSearchConstants.State,
					AzureSearchConstants.Zip
				},
				Filter = facetQuery.ToString(),
				Facets = new List<string> { AzureSearchConstants.TypeDescription }
				OrderBy = new List<string> { AzureSearchConstants.LastName }
			};
			var index = _client.Indexes.GetClient(AzureSearchConstants.IndexName);
			var results = index.Documents.Search(query.Trim().Replace(" ", "* ") + '*', parameters);

No Speak Experience Profile Tab

For a client i was asked to collect some extra data and put it in Xdb and use it to personalize content.  The approach was simple create a facet like Pete Navarra outlines here.  Then build some custom rules like i outlined here.  Finally it was asked to add the collected data to the Experience profile.  That’s when the pain came.

Adam Conn has outlined how to do it in the official Sitecore way here.  As you can see the process involves building a Speak component for the tab, this is a long process and very tedious.  This lead me to report it as a large task, and the client wasn’t willing to take the extra time needed to get it all set up properly and the request was abandoned.

This lead me to think that there must be a better way, which i have found!

Enter the Experience Profile Express Tab

You can find the Nuget package here.  and the source code and developer documentation here.

This Module automates the construction of a speak component and wraps it around a proper MVC structure where you build a controller class to generate a model poco generated from the contact and pass it to a view.

For my first application of this module I built a tab to show Demandbase data collected by the Sitecore Demandbase module (which you can contact your Demandbase sales rep to acquire).

demandbasetab

This tab can be accomplished with a single C# class. First we take the data from the Demandbase facet which is a json object. We deserialize this to dictionary and dump it out to HTML.

	public class DemandbaseTab : EPExpressTab.Data.EpExpressModel
	{
		public override string RenderToString(Contact contact)
		{
			dynamic o = JsonConvert.DeserializeObject<ExpandoObject>(
				contact.GetFacet<IXdbFacetDemandbaseData>("Demandbase Data").DemandBaseData ?? "");
			StringBuilder sb = new StringBuilder();
			if (o == null)
				return "<div>Demandbase information not available.</div>";
			IDictionary<string, object> tst = (IDictionary<string, object>) o;
			bool even = false;
			foreach (string attr in tst.Keys)
			{
				if (tst[attr] is string)
				{
					sb.Append(
						$"
<div style='background-color:{(even ? "#fff" : "#eee")}'><span style='width:200px;display:inline-block;font-weight:bold;font-size:medium;'>{UppercaseWords(attr)}</span>{tst[attr]}</div>
");
					even = !even;
				}
			}
			return sb.ToString();
		}

		public override string Heading => "Demandbase Attributes";
		public override string TabLabel => "Demandbase";
		private string UppercaseWords(string value)
		{
			char[] array = value.ToCharArray();
			// Handle the first letter in the string.
			if (array.Length >= 1)
			{
				if (char.IsLower(array[0]))
				{
					array[0] = char.ToUpper(array[0]);
				}
			}
			// Scan through the letters, checking for spaces.
			// ... Uppercase the lowercase letters following spaces.
			for (int i = 1; i < array.Length; i++)
			{
				if (array[i - 1] == ' ')
				{
					if (char.IsLower(array[i]))
					{
						array[i] = char.ToUpper(array[i]);
					}
				}
				if (array[i] == '_')
				{
					array[i] = ' ';
				}
			}
			return new string(array);
		}
	}

TokenManager View Tokens

Likely fitting in the wheelhouse of most Sitecore developers is building a view model and passing it to a view to be rendered.  That’s what the ViewAutoToken class achieves.  The idea being that you collect data from the content authors at the time of token insertion, then use that data to build a view model and pass that model to a view cshtml.

Unique Aspects

When implementing a new view token you should extend the base class of ViewAutoToken.  This is very similar to an AutoToken except instead of implementing a method to render the raw html outputted by the token you define two methods, one to generate the view model and one to determine the view.

		public override object GetModel(TokenDataCollection extraData)
		{
			return extraData;
		}

		public override string GetViewPath(TokenDataCollection extraData)
		{
			return "/views/myToken.cshtml";
		}

AutoToken Features

All features from AutoTokens are available for the AutoViewTokens as well.  Such as gathering data from the content authors when applied to be used during rendering and filtering where the token may be used.

As usual with AutoTokens, you need only implement it in a loaded assembly and TokenManager will pick it up and wire it for use in RTEs.

Complete Example

	public class tokentest : ViewAutoToken
	{
		//Make sure you have a parameterless constructor.
		public tokentest() : base("test", "people/16x16/cubes_blue.png", "terkan")
		{
		}
		//This will add a button to the RTE.
		public override TokenButton TokenButton()
		{
			return new Data.TokenExtensions.TokenButton("test", "people/16x16/cubes_blue.png", 1000);
		}
		//These are the different fields that will be collected by the content authors at the time of insertion.
		public override IEnumerable<ITokenData> ExtraData()
		{
			yield return new GeneralLinkTokenData("LINK", "link", true);
			yield return new DroplistTokenData("Droplist", "droplist", true, new []
			{
				new KeyValuePair<string, string>("Text Label", "Value Passed"),
				new KeyValuePair<string, string>("Blue", "blue"),
			});
			yield return new BooleanTokenData("bool", "bool");
			yield return new IdTokenData("id", "id", true);
			yield return new IntegerTokenData("int", "int", true);
		}
		//These are the templates where the token may be used.
		public override IEnumerable<ID> ValidTemplates() {
			yield return new ID("{78816AC8-4FD7-43C4-A899-17829B4F3B72}");
		}
		//These are the root nodes that make a subtree where the token may be used.
		public override IEnumerable<ID> ValidParents()
		{
			yield return new ID("{A1E1342E-6836-4E20-A2C4-B1A38444B079}");
		}
		//Use the data gathered by the content author to assemble a view model.
		public override object GetModel(TokenDataCollection extraData)
		{
			return extraData;
		}
		//Use the data gathered by the content authors to define a path to the view cshtml.
		public override string GetViewPath(TokenDataCollection extraData)
		{
			return "/views/MyToken.cshtml";
		}
	}

And my view found at [webroot]/views/MyToken.cshtml

@using TokenManager.Data.TokenDataTypes.Support
@model TokenDataCollection
<div><strong>@Model.GetLink("link").Href</strong></div>
<div><strong>@Model.GetString("droplist")</strong></div>
<div><strong>@Model.GetBoolean("bool")</strong></div>
<div><strong>@Model.GetId("id")</strong></div>
<div><strong>@Model.GetInt("int")</strong></div>

Persistent Site and Lang Query string

I’ve always wondered why the default link provider of Sitecore doesn’t carry over site and language parameters.  Quite often I’ve found myself in a situation where the official site resolution for a Sitecore site has to do with domain pattern matching.  This leaves us with a difficult time to test things in an authoring server or development server without the proper DNS names.

There is however a solution.  With a few minor tweaks to the default link provider.  The logic is simple, if there exists in the url currently an sc_site or sc_lang query string parameter then generate all links with these parameters too

Enter the SiteStaticLinkProvider.

	public class SiteStaticLinkProvider : LinkProvider
	{
		public override string GetItemUrl(Item item, UrlOptions options)
		{
			string urlString = base.GetItemUrl(item, options);
			if (HttpContext.Current?.Request.QueryString == null)
				return urlString;
			string[] urlParts = urlString.Split('?');
			NameValueCollection qs = null;
			NameValueCollection currentqs = HttpContext.Current.Request.QueryString;
			if (!string.IsNullOrWhiteSpace(currentqs["sc_site"]))
			{
				qs = HttpUtility.ParseQueryString(urlParts.Length >= 2 ? urlParts[1] : "");
				if (string.IsNullOrWhiteSpace(qs["sc_site"]))
				{
					qs.Add("sc_site", currentqs["sc_site"]);
				}
			}
			if (!string.IsNullOrWhiteSpace(currentqs["sc_lang"]))
			{
				if (qs == null)
					qs = HttpUtility.ParseQueryString(urlParts.Length >= 2 ? urlParts[1] : "");
				if (string.IsNullOrWhiteSpace(qs["sc_lang"]))
				{
					qs.Add("sc_lang", currentqs["sc_lang"]);
				}
			}
			if (qs != null)
			{
				return urlParts[0] + '?' + qs;
			}
			return urlString;
		}
	}

This provider is a good all purpose link provider because if there are no pertinent parameters present it will not do anything.

The end result here is that to test any site in a pre-prod environment you need to only add the sc_lang or sc_site parameter once and it will follow you around the site, making this very easy for content approvers.

Wire it up!

There’re a few options available to overwrite a link provider. You can add a new provider, then change the reference of the providers node to point to your new provider. Slightly simpler however is to straight up override the default sitecore provider like i’ve done below.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
	<sitecore>
		<linkManager>
			<providers>
				<add name="sitecore">
					<patch:attribute name="type">[Namespace].SiteStaticLinkProvider, [Binary Name]</patch:attribute>
				</add>
			</providers>
		</linkManager>
	</sitecore>
</configuration>

Search PDF content in sitecore

To people who have not tried to do this themselves, this seems like and easy task. All we need to do is get all the text content and load it in the search index. Initially i thought i had a good solution with PdfSharp using code that i found from this stack overflow post.  It seemed to be working fine until i attempted to run my site on Azure.   It apparently uses lower level OS based API calls that are just not available on Azure using the new Sitecore Paas setup.

There are several paid libraries that claim to be able to accomplish just this, however like most developers i wasn’t about to pitch buying a license to read PDF content to my clients. So the search continued.  After many hours (which i hope to save you from here) i came across a solution that did the trick (for the most part).

Reading PDF content

This code does require PdfSharp as a dependency, get it here on nuget.

NOTE: this code was adapted from this stack overflow post and is not entirely my own.  Although i don’t think it’s the poster on stack overflow who originated the code either.  Credit is due somewhere, but not quite sure where.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Web;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using Sitecore.Data.Items;

namespace IHN.Feature.Component
{
	/// <summary>
	/// Addapted from code found here http://stackoverflow.com/questions/83152/reading-pdf-documents-in-net
	/// </summary>

	public class SitecorePdfParser
	{
		private int _numberOfCharsToKeep = 15;
		private PdfDocument _doc;

		public SitecorePdfParser(Item item): this(new MediaItem(item))
		{
		}
		public SitecorePdfParser(MediaItem item)
		{
			if (item.MimeType != "application/pdf")
				return;
			Stream s = item.GetMediaStream();
			_doc = PdfReader.Open(s);
		}

		public SitecorePdfParser(PdfDocument document)
		{
			_doc = document;
		}

		public IEnumerable<string> ExtractText()
		{
			if (_doc == null)
				yield break;
			foreach (PdfPage page in _doc.Pages)
			{
				for (int index = 0; index < page.Contents.Elements.Count; index++)
				{

					PdfDictionary.PdfStream stream = page.Contents.Elements.GetDictionary(index).Stream;
					foreach (string text in ExtractTextFromPdfBytes(stream.Value))
					{
						yield return text;
					}
				}
			}
		}
		/// <summary>
		/// This method processes an uncompressed Adobe (text) object
		/// and extracts text.
		/// </summary>

		/// <param name="input">uncompressed</param>
		/// <returns></returns>
		public IEnumerable<string> ExtractTextFromPdfBytes(byte[] input)
		{
			if (input == null || input.Length == 0) yield break;
			StringBuilder resultString = new StringBuilder();
			bool inTextObject = false;
			bool nextLiteral = false;
			int bracketDepth = 0;
			char[] previousCharacters = new char[_numberOfCharsToKeep];
			for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' '; 			foreach (byte t in input) 			{ 				char c = (char)t; 				if (inTextObject) 				{ 					// Position the text 					if (bracketDepth == 0) 					{ 						if (CheckToken(new[] { "TD", "Td" }, previousCharacters) || CheckToken(new[] { "'", "T*", "\"" }, previousCharacters) || CheckToken(new[] { "Tj" }, previousCharacters)) 						{ 							if (resultString.Length > 0)
							{
								yield return CleanupContent(resultString.ToString());
								resultString.Clear();
							}
						}
					}

					if (bracketDepth == 0 &&
						CheckToken(new string[] { "ET" }, previousCharacters))
					{
						inTextObject = false;
						if (resultString.Length > 0)
						{
							yield return CleanupContent(resultString.ToString());
							resultString.Clear();
						}
						continue;
					}

					if (c == '(' && bracketDepth == 0 && !nextLiteral)
					{
						bracketDepth = 1;
					}
					else if (c == ')' && bracketDepth == 1 && !nextLiteral)
					{
						bracketDepth = 0;
					}
					else if (bracketDepth == 1)
					{
						if (c == '\\' && !nextLiteral)
						{
							nextLiteral = true;
						}
						else
						{
							if (c == ' ')
							{
								if (resultString.Length > 0)
								{
									yield return CleanupContent(resultString.ToString());
									resultString.Clear();
								}
							}
							else if ((c >= '!' && c <= '~') || 									 (c >= 128 && c < 255))
							{
								resultString.Append(c);
							}
							nextLiteral = false;
						}
					}
				}

				// Store the recent characters for
				// when we have to go back for a checking
				for (int j = 0; j < _numberOfCharsToKeep - 1; j++)
				{
					previousCharacters[j] = previousCharacters[j + 1];
				}
				previousCharacters[_numberOfCharsToKeep - 1] = c;

				if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters))
				{
					inTextObject = true;
				}
			}
		}
		private string CleanupContent(string text)
		{
			string[] patterns = { @"\\\(", @"\\\)", @"\\226", @"\\222", @"\\223", @"\\224", @"\\340", @"\\342", @"\\344", @"\\300", @"\\302", @"\\304", @"\\351", @"\\350", @"\\352", @"\\353", @"\\311", @"\\310", @"\\312", @"\\313", @"\\362", @"\\364", @"\\366", @"\\322", @"\\324", @"\\326", @"\\354", @"\\356", @"\\357", @"\\314", @"\\316", @"\\317", @"\\347", @"\\307", @"\\371", @"\\373", @"\\374", @"\\331", @"\\333", @"\\334", @"\\256", @"\\231", @"\\253", @"\\273", @"\\251", @"\\221" };
			string[] replace = { "(", ")", "-", "'", "\"", "\"", "à", "â", "ä", "À", "Â", "Ä", "é", "è", "ê", "ë", "É", "È", "Ê", "Ë", "ò", "ô", "ö", "Ò", "Ô", "Ö", "ì", "î", "ï", "Ì", "Î", "Ï", "ç", "Ç", "ù", "û", "ü", "Ù", "Û", "Ü", "®", "™", "«", "»", "©", "'" };

			for (int i = 0; i < patterns.Length; i++)
			{
				string regExPattern = patterns[i];
				Regex regex = new Regex(regExPattern, RegexOptions.IgnoreCase);
				text = regex.Replace(text, replace[i]);
			}

			return text;
		}
		/// <summary>
		/// Check if a certain 2 character token just came along (e.g. BT)
		/// </summary>

		/// <param name="search">the searched token</param>
		/// <param name="recent">the recent character array</param>
		/// <returns></returns>
		private bool CheckToken(string[] tokens, char[] recent)
		{
			foreach (string token in tokens)
			{
				if (token.Length > 1)
				{
					if ((recent[_numberOfCharsToKeep - 3] == token[0]) &&
						(recent[_numberOfCharsToKeep - 2] == token[1]) &&
						((recent[_numberOfCharsToKeep - 1] == ' ') ||
						(recent[_numberOfCharsToKeep - 1] == 0x0d) ||
						(recent[_numberOfCharsToKeep - 1] == 0x0a)) &&
						((recent[_numberOfCharsToKeep - 4] == ' ') ||
						(recent[_numberOfCharsToKeep - 4] == 0x0d) ||
						(recent[_numberOfCharsToKeep - 4] == 0x0a))
						)
					{
						return true;
					}
				}
				else
				{
					return false;
				}
			}
			return false;
		}
	}
}

Then we need to wire this up to the index crawler to make sure that the index uses this class to populate the search index with our Pdf content.

We need to implement a Sitecore IComputedIndexField class to accomplish this.

	public class IndexPdfContent : IComputedIndexField
	{
		public object ComputeFieldValue(IIndexable indexable)
		{
			try
			{
				var sitecoreIndexable = indexable as SitecoreIndexableItem;

				if (sitecoreIndexable == null) return null;

				var pdfContent = new SitecorePdfParser(new MediaItem(sitecoreIndexable)).ExtractText().ToList();

				if (pdfContent.Count == 0) return null;

				return string.Join(" ", pdfContent);
			}
			catch (Exception e)
			{
				Log.Error("Unable to assemble PDF content for the search index ", e, this);
				return null;
			}
		}
	}

And finally wire it up to the indexer

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
	<sitecore>
		<contentSearch>
			<indexConfigurations>
				<defaultLuceneIndexConfiguration>
					<documentOptions>
						<fields hint="raw:AddComputedIndexField">
							<!-- indexes pdf contents into index _content field to allow PDF search -->
							<field fieldName="_pdfcontent" type="[NAMESPACE].IndexPdfContent, [DLL NAME]" />
						</fields>
					</documentOptions>
				</defaultLuceneIndexConfiguration>
				<defaultSolrIndexConfiguration>
					<documentOptions>
						<fields hint="raw:AddComputedIndexField">
							<!-- indexes pdf contents into index _content field to allow PDF search -->
							<field fieldName="_pdfcontent" type="[NAMESPACE].IndexPdfContent, [DLL NAME]" />
						</fields>
					</documentOptions>
				</defaultSolrIndexConfiguration>
				<defaultCloudIndexConfiguration>
					<documentOptions>
						<fields hint="raw:AddComputedIndexField">
							<!-- indexes pdf contents into index _content field to allow PDF search -->
							<field fieldName="pdf_content" cloudFieldName="pdf_content" type="[NAMESPACE].IndexPdfContent, [DLL NAME]" />
						</fields>
					</documentOptions>
				</defaultCloudIndexConfiguration>
			</indexConfigurations>
		</contentSearch>
	</sitecore>
</configuration>

Ending Results

Now we have our search index populated with PDF contents. So if someone wants to find a PDF with a text search it’s as simple as querying the index on the field assigned in the xml with the users search text.

Disclaimer

While this solution is quite good, it’s not perfect. If you have text in PDF images, it won’t find that. Additionally I’ve noticed that in rare cases words might be broken up when they’re being extracted. Presumably this is due to PDF formatting. If you happen to figure out how to resolve this completely, let me know and i’d love to update this code.

Sitecore RTE Button Postprocessing

There may be times that you want to modify the way stock Sitecore RTE buttons work without actually modifying stock Sitecore files.  An easy way to accomplish this is to override the Telerik editor commands manually using a custom js file.

Some common uses of this technique could include

  1. Adding classing to injected elements.
  2. Wrapping injected elements in a wrapping element.
  3. Adding a sibling html element for an icon perhaps.
  4. Modifying the markup for SEO needs.
  5. Modifying the markup to build a responsive website.

Find the operation to patch

The first thing you need to do is find the RTE command for the button you’d like to add post processing to.  Easiest way to do this is by using your browsers inspect feature on the button you’d like to enhance.

finding-command

The class of the span element that makes up the button is the name of the command you’re interested in.  At this point you can start writing your javascript.

The Javascript

var	RadEditorCommandList = Telerik.Web.UI.Editor.CommandList;

var table = RadEditorCommandList["InsertTable"];
RadEditorCommandList["InsertTable"] = function (commandName, editor, args) {
	table(commandName, editor, args);
	var p = editor.getSelectedElement().parentNode.parentNode.parentNode;
	p.classList.add("editor-table")
};

This code will modify the insert table button to add a class of editor-table to the table after it’s injected.

So what are we doing here, let’s analyze it.

  1. Get the telerik editor command list object.  This object stores the javascript that drives each of the buttons in the editor.
  2. Save the original command into a custom variable called table
  3. Replace the method attacked to the telerik editor command list with our own function
  4. Using the telerik editor object to get the selected element after the table is inserted and traverse up to the
    node
  5. Add a class of editor-table to the table root

 
You’ll likely need to utilize the debugger to drop breakpoints down in your code and use the console to identify the correct element you’ll need to manipulate.

Having Sitecore add your javascript to the editor

There’s a simple config patch to add your javascript to the RTE editor.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
	<sitecore>
		<clientscripts>
			<htmleditor>
				<script key="customsrc" src="/relative/path/to/customSitecore.js" language="JavaScript"/>
			</htmleditor>
		</clientscripts>
	</sitecore>
</configuration>

Using this method is not limited to ONLY postprocessing, but you could essentially take stock methods and do whatever you please with them. Sky’s the limit, so go have fun with it.

Installing Azure Search in Sitecore

Azure search became a new option as of Sitecore 8.2 update 1.  It’s a great search provider with all kinds of features out of the box.  Most notably:

  • Supports Lucene query syntax for queries
  • Autocomplete and type-ahead
  • Hit highlighting (show the context in which the keywords appeared)
  • Geo-spacial awareness (show geographical search results to a user)

While Azure search can be utilized by any install of sitecore, it is most notably useful in an azure cloud implementation.  Which has also changed in 8.2 update 1 and my friend and fellow MVP Bas Lijten wrote an amazing blog post about.  Important note, this install can be done automatically if you’re using the Azure ARM scripts that Sitecore provides.  In other words If your Sitecore is on Azure, make it easy on yourself and use the ARM scripts.

I’m also extremely happy to say that it’s quite easy to set up Azure Search in Sitecore.  Those of you who went through the toil of setting up Solr should be particularly pleased about that.

Get it installed

Step 1

Install Azure Search on an azure subscription.

  • Add new
  • Web + Mobile
  • Select Azure Search
  • Click Create

installing

Step 2

Get your access id

  • Find your new Azure Search in the resources list
  • Click Settings
  • Click Keys
  • Copy your primary key

configure

Step 3

Add connection string

  • serviceUrl – The url which Azure has given your search service, see above screenshot for where to find it (it’s blurred out, you can’t use mine!)
  • apiVersion – The version of the rest api to utilize for Azure search.  You’re unlikely to need to change this right now
  • apiKey – The key you retrieved in step 2
<connectionStrings>
	<!-- Your other connection strings -->
	<add name="cloud.search" connectionString="serviceUrl=https://********.search.windows.net;apiVersion=2015-02-28;apiKey=***********************************" />
</connectionStrings>

Step 4

Configure Sitecore

  • Remove or disable
    • App_Config\Include\Sitecore.Speak.ContentSearch.Lucene.config
    • App_Config\Include\ContentTesting\Sitecore.ContentTesting.Lucene.IndexConfiguration.config
    • App_Config\Include\FXM\Sitecore.FXM.Lucene.DomainsSearch.DefaultIndexConfiguration.config
    • App_Config\Include\FXM\Sitecore.FXM.Lucene.DomainsSearch.Index.Master.config
    • App_Config\Include\FXM\Sitecore.FXM.Lucene.DomainsSearch.Index.Web.config
    • App_Config\Include\ListManagement\Sitecore.ListManagement.Lucene.Index.List.config
    • App_Config\Include\ListManagement\Sitecore.ListManagement.Lucene.IndexConfiguration.config
    • App_Config\Include\Social\Sitecore.Social.Lucene.Index.Analytics.Facebook.config
    • App_Config\Include\Social\Sitecore.Social.Lucene.Index.Master.config
    • App_Config\Include\Social\Sitecore.Social.Lucene.Index.Web.config
    • App_Config\Include\Social\Sitecore.Social.Lucene.IndexConfiguration.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.Xdb.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.Index.Analytics.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.Index.Core.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.Index.Master.config
    • App_Config\Include\Sitecore.ContentSearch.Lucene.Index.Web.config
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.Index.Master.config
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.Index.Web.config
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.IndexConfiguration.config
    • App_Config\Include\Sitecore.Marketing.Lucene.Index.Master.config
    • App_Config\Include\Sitecore.Marketing.Lucene.Index.Web.config
    • App_Config\Include\Sitecore.Marketing.Lucene.IndexConfiguration.config
  • Enable
    • App_Config\Include\Sitecore.ContentSearch.Azure.DefaultIndexConfiguration.config.disabled
    • App_Config\Include\Sitecore.ContentSearch.Azure.Index.Analytics.config.disabled
    • App_Config\Include\Sitecore.ContentSearch.Azure.Index.Core.config.disabled
    • App_Config\Include\Sitecore.ContentSearch.Azure.Index.Master.config.disabled
    • App_Config\Include\Sitecore.ContentSearch.Azure.Index.Web.config.disabled
    • App_Config\Include\Sitecore.Marketing.Azure.Index.Master.config.disabled
    • App_Config\Include\Sitecore.Marketing.Azure.Index.Web.config.disabled
    • App_Config\Include\Sitecore.Marketing.Azure.IndexConfiguration.config.disabled
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.Index.Master.config.disabled
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.Index.Web.config.disabled
    • App_Config\Include\Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.IndexConfiguration.config.disabled
    • App_Config\Include\ContentTesting\Sitecore.ContentTesting.Azure.IndexConfiguration.config.disabled
    • App_Config\Include\FXM\Sitecore.FXM.Azure.DomainsSearch.DefaultIndexConfiguration.config.disabled
    • App_Config\Include\FXM\Sitecore.FXM.Azure.DomainsSearch.Index.Master.config.disabled
    • App_Config\Include\FXM\Sitecore.FXM.Azure.DomainsSearch.Index.Web.config.disabled
    • App_Config\Include\ListManagement\Sitecore.ListManagement.Azure.Index.List.config.disabled
    • App_Config\Include\ListManagement\Sitecore.ListManagement.Azure.IndexConfiguration.config.disabled
    • App_Config\Include\Social\Sitecore.Social.Azure.Index.Master.config.disabled
    • App_Config\Include\Social\Sitecore.Social.Azure.Index.Web.config.disabled
    • App_Config\Include\Social\Sitecore.Social.Azure.IndexConfiguration.config.disabled

Step 5

Rebuild your index

  • From desktop Sitecore button -> Control Panel.  From launchpad click the Control Panel button
  • Indexing
  • Indexing Manager
  • Select all indexes
  • Execute

Now you’re good to go!

Sitecore doesn’t ship with wrappers to achieve the advanced search features that Azure Search provides, so you’ll likely need to create your own search service to utilize these features until Sitecore has a plan to wrap this in it’s linq to search feature.

Watch out!

If you have a custom search field with a preceding underscore, you need to make sure you give it an additional attribute of cloudFieldName like you can see on Sitecore’s _templatename field.  This is because Azure Search actually doesn’t allow a field to start with an underscore.

              <field fieldName="_templatename"        cloudFieldName="templatename_1"      boost="1f" type="System.String"   settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" />