Azure Search replication

If you’re trying to get a geo-replicated disaster recovery site set up and you’re using Azure Search you likely ran into the same issue that i did. Azure Search simply does not have the geo-replication tools or abilities that SQL does. This becomes all the more frustrating by the fact that it’s literally the only PAAS element in the Sitecore ecosystem that doesn’t have this functionality. If you don’t have the luxury of being able to re-index your data rapidly, you’re stuck waiting for the data to index. In the context of Sitecore this can take several hours on particularly large sites.

Additionally this can be problematic when dealing with Blue/Green deployments as customer facing content could and should be included in your search index. This problem can be solved in a similar fashion. When added to this method of zero downtime deployments it can give a more complete and safe deployment.

Using a Azure Search Index as a source

Any data processing you needed to do to populate your primary index can be skipped if you simply utilize one main azure search index as the source for the second. I have this brokered through an Azure Function.
AzureFunctionReplicationFlow

The code for the function is as follows:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Azure.Search;
using Microsoft.Azure.Search.Models;


namespace BendingSitecore.Function
{
    public static class AzureSearchReplicate
    {
        [FunctionName("AzureSearchReplicate")]
        public static async Task Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log)
        {
	        string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            dynamic data = JsonConvert.DeserializeObject(requestBody);

			IEnumerable indexes = Enumerable.Empty();
			if (data.indexes != null){
				indexes = ((JArray)data.indexes).Select(x => (string)x);
			} 
	        try
	        {
		        Start(new SearchServiceClient(data.source.ToString(), new SearchCredentials(data.sourceKey.ToString()))
			        , new SearchServiceClient(data.destination.ToString(), new SearchCredentials(data.destinationKey.ToString())),false, log, indexes ?? Enumerable.Empty());
			}
	        catch (Exception e)
	        {
				log.LogError(null, e, "An Error occurred");
		        return new BadRequestObjectResult("Require a json object with source, destination and keys.");
			}

	        return  new OkObjectResult($"Azure Search replication is running, should be finished in about 10 minutes.");
        }
		public static void Start(SearchServiceClient source, SearchServiceClient destination, bool wait,
            ILogger log, IEnumerable indexes)
		{
			List tasks = new List();
			ClearAllIndexes(destination, indexes);
			foreach (var index in source.Indexes.List().Indexes.Where(x => !indexes.Any() || indexes.Any(i => i.StartsWith(x.Name))))
			{
				tasks.Add(Task.Run(async () =>
				{
					try
					{
						destination.Indexes.Get(index.Name);
					}
					catch (Exception e)
					{
						log.LogInformation($"creating index {index.Name}", null);
						destination.Indexes.Create(index);
						await Task.Delay(5000);
					}
					await MigrateData(source.Indexes.GetClient(index.Name),
						destination.Indexes.GetClient(index.Name), log);
				}));
			}
			if (wait)
			{
				foreach (var task in tasks)
				{
					task.Wait();
				}
			}
		}

		public static void ClearAllIndexes(SearchServiceClient client, IEnumerable indexes)
		{
			foreach (var index in client.Indexes.List().Indexes.Where(x => !indexes.Any() || indexes.Any(i => i.StartsWith(x.Name))))
			{
				client.Indexes.Delete(index.Name);
			}
		}

		public static async Task MigrateData(ISearchIndexClient source, ISearchIndexClient destination,
            ILogger log)
		{
			log.LogInformation($"Starting migration of data for {source.IndexName}", null);
			SearchContinuationToken token = null;
			var searchParameters = new SearchParameters { Top = int.MaxValue };
			int retryCount = 0;
			while (true)
			{
				DocumentSearchResult results;
				if (token == null)
				{
					results = await source.Documents.SearchAsync("*", searchParameters);
				}
				else
				{
					results = await source.Documents.ContinueSearchAsync(token);
				}
				try
				{
					await destination.Documents.IndexAsync(IndexBatch.New(GetAction(destination, results)));
				}
				catch (Exception e)
				{
					log.LogError(e, "Error occurred writing to destination", null);
					log.LogInformation("Retrying...", null);
					retryCount++;
					if (retryCount > 10){
						log.LogError("Giving up...", null);
						break;
					}
					continue;
				}
				if (results.ContinuationToken != null)
				{
					token = results.ContinuationToken;
					continue;
				}

				break;
			}
			log.LogInformation($"Finished migration data for {source.IndexName}", null);
		}

		public static IEnumerable<IndexAction> GetAction(ISearchIndexClient client, DocumentSearchResult documents)
		{
			return documents.Results.Select(doc => IndexAction.MergeOrUpload(doc.Document));
		}
    }
}

Additionally make sure your Azure function has these configuration settings.


        AzureWebJobDashboard                     = "DefaultEndpointsProtocol=https;AccountName=$storageName;AccountKey=$accountKey"
        AzureWebJobsStorage                      = "DefaultEndpointsProtocol=https;AccountName=$storageName;AccountKey=$accountKey"
        FUNCTIONS_EXTENSION_VERSION              = "~2"
        FUNCTIONS_WORKER_RUNTIME                 = "dotnet"
        WEBSITE_NODE_DEFAULT_VERSION             = "8.11.1"
        WEBSITE_RUN_FROM_PACKAGE                 = "1"
        WEBSITE_CONTENTAZUREFILECONNECTIONSTRING = "DefaultEndpointsProtocol=https;AccountName=$storageName;AccountKey=$accountKey"
        WEBSITE_CONTENTSHARE                     = "$storageName"
        AzureWebJobsSecretStorageType            = "Files"

Running your function

Execute the function code using a raw json request body like so:

{
    "destination":  "[standby azure search name]",
    "destinationKey":  "[standby azure search key]",
    "source":  "[primary azure search name]",    
    "sourceKey":  "[primary azure search key]",
    "indexes":  null
}

Note: If you want to specify a particular index to manage you may pass in a json array of indexes to manage. It will only clear/refresh the indexes specified or if null it will clear/refresh all indexes

Automating

Through powershell there are ways to create and execute the Azure function given a valid Azure context and some desired names and resource groups. Expect to see a blog post on that shortly in the future.