Performance crime: concurrent collections misuse

Concurrent collections are expected to be slower than non-concurrent counterparts due to an extra cost of synchronization across threads. Even though collections implement IEnumerable interface, it is not a usual enumeration but a moment-in-time snapshot with a few pitfalls. Let’s look at ConcurrentBag enumerable implementation:

  1. Freeze the collection by locking a top-level lock and all low-granularity locks
  2. Traverse full collection content (linked list stored in different memory locations = poor data locality) and copy all the elements into new List<T>()
  3. Unfreeze the collection so that other threads can make a copy

Not only one thread at a time can make a snapshot of the collection, but every enumeration attempt makes an allocation to produce a snapshot/copy. Should the enumerator be used often (f.e. parsing every field in SOLR search results) it would bubble in top 5 dead types in production:

Over 285K arrays are to be cleaned up by GC

The default Sitecore.ContentSearch.SolrProvider.SolrFieldMap class uses ConcurrentBag to store SolrFieldConfiguration – every GetFieldConfiguration API call ends with allocations and system-wide locking:

Concurrent bag attempts to make a snapshot, but cannot as already locked by a different thread

Leading to a bottleneck in multi-threaded environment:

Lock contention during parsing SOLR response

Despite SOLR can reply to concurrent requests in a fast manner, the result parsing on Sitecore side could slow us down.

Benchmark: Measuring stock operation performance

        public SolrFieldMapTests()
        {
            confg = new XmlDocument();
            confg.Load(@"E:\fieldMap_demo.config");

            var factory = new TestFactory(new ComparerFactoryEx(), new ServiceProviderEx());
            _fieldMap = factory.CreateObject(confg.DocumentElement, assert: true) as SolrFieldMap;
        }

        public const int N = 10 * 1000;        

        [Benchmark]
        public void Stock_GetFieldConfiguration()
        {
            for (int i = 0; i < N; i++)
            {
                _fieldMap.GetFieldConfiguration(type);
            }
        }

Almost 9MB spent to locate 10K fields:

That is only for 10K elements

Not only a snapshot is produced, but also stock logic would execute sorting on each execution (instead of once during load). Can it be done better? Yes.

Solution 1: Use IConstructable interface

Since fields are defined in fieldMap section of the Sitecore Solr configuration, it seems adds are called only during object construction. IConstructable interface could have been implemented for the FieldMap to transform data from ConcurrentBag into array.

That would allow multiple threads to be executed simultaneously and save memory allocations since no snapshots are needed.

Solution 2: Use lock-free synchronization

Field configuration is added via AddTypeMatch method defined by configuration:

      <fieldMap type="Sitecore.ContentSearch.SolrProvider.SolrFieldMap, Sitecore.ContentSearch.SolrProvider">
          <!--  This element must be first  -->
          <typeMatches hint="raw:AddTypeMatch">
            <typeMatch type="System.Collections.Generic.List`1[System.Guid]" typeName="guidCollection" fieldNameFormat="{0}_sm" multiValued="true" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />

We could bake lock-free compare & swap solution:

private volatile SolrSearchFieldConfiguration[] availableTypes = Array.Empty<SolrSearchFieldConfiguration>();

        public void AddTypeMatch(string typeName, Type settingType, IDictionary<string, string> attributes, XmlNode configNode)
        {
            Assert.ArgumentNotNullOrEmpty(typeName, "typeName");
            Assert.ArgumentNotNull(settingType, "settingType");
            var solrSearchFieldConfiguration = (SolrSearchFieldConfiguration)ReflectionUtility.CreateInstance(settingType, typeName, attributes, configNode);
            Assert.IsNotNull(solrSearchFieldConfiguration, $"Unable to create : {settingType}");
            typeMap[typeName] = solrSearchFieldConfiguration;

            SolrSearchFieldConfiguration[] snapshot;
            SolrSearchFieldConfiguration[] updated;
            do
            {
                snapshot = availableTypes; // store original pointer
                updated = new SolrSearchFieldConfiguration[snapshot.Length + 1];
                Array.Copy(snapshot, 0, updated, 0, snapshot.Length);
                updated[snapshot.Length] = solrSearchFieldConfiguration;

                updated = updated.OrderByDescending(e => e.FieldNameFormat).ToArray();
            }
            while (Interlocked.CompareExchange(ref availableTypes, updated, snapshot) != snapshot);
        }

public IReadOnlyCollection<SolrSearchFieldConfiguration> GetAvailableTypes() => availableTypes;

It copies the existing array content into a new one placing it next to an additional value. We’ll also do the sorting here once instead of per-call.

Since availableTypes is treated as immutable collection, it is enough only to verify array pointer value.

Benchmark: Array vs ConcurrentBag

Since updated version neither causes memory allocations, nor has sorting, nor jumps between pointers (good locality), it gets over hundred times faster with 30 times less memory allocated:

Conclusion

Concurrent collection usage in a wrong manner could slow down code over 100 times.

A misuse is quite hard to detect on a development machine as nothing obvious is slow. It gets even trickier to detect in case code is sitting next to out-proc resource that is always blamed for slow performance.

2 thoughts on “Performance crime: concurrent collections misuse

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: