Saving data (even HTTP referer) without validation can contaminate system as well:
SELECT TOP (10)
[ContactId]
,[LastModified]
,[FacetData]
,JSON_QUERY(FacetData,'$.Referrers') as [Referrers]
, DATALENGTH(JSON_QUERY(FacetData,'$.Referrers')) as [ReferrerSize]
FROM
[xdb_collection].[ContactFacets]
WHERE
[FacetKey]='InteractionsCache'
AND CHARINDEX('"Referrers":["', FacetData) > 0
ORDER BY [ReferrerSize] DESC
The results show astonishing 28KB for storing single value:
Next time you see Analytics shards worth 600 GB – recall this post.
Amazingly, there is no way out-of-the-box to see running values!
Showconfig shows only sitecore node, while role:define, search:define live inside web.config. Moreover, web.config will have an outdated value when environment variable was picked.
environment:
SITECORE_APPSETTINGS_ROLE:DEFINE: Standalone
SITECORE_APPSETTINGS_SEARCH:DEFINE: Solr
SITECORE_APPSETTINGS_MYPROJECT.ENVIRONMENT:DEFINE: Development
Gotcha: Not all settings were applied
myproject.environment is not applied; environment-specific config is always ON:
Container variables are there, though:
Are environment variables visible to Sitecore process?
Response.Write("<h3>Environment variables</h3>");
var environmentVariables = Environment.GetEnvironmentVariables();
foreach (string variable in environmentVariables.Keys)
{
var values = new[] { environmentVariables[variable] as string };
Write(variable, values);
}
Yes, the missing setting is exposed as environment variable:
How come value is not picked by Sitecore?
Short answer – that is life. Adding the dummy into web.config makes variable be picked:
Adding a key with dummy valueUsing file explorer to upload a modified web.configPage output shows custom:define now
The current implementation requires app:key to present in config to be replaced in runtime. And yes, there is no way to check if it is picked 😉
Summary
A lack of configuration assemble traceability leads to huge investigation effort when things do not work well. File-based configuration can no longer be trusted in containerized deployments; at the same time there is no source of truth to verify its correctness.
Can “site visit frequency from specific place (or better, certain company office)” be just a query away? The needed analytics data is already collected by Sitecore, hence data mining could roughly be:
Figure out area postal code (or reverse it by IP using any reverse IP lookup)
Find all contacts that have the same details postal code (or by other field criteria)
Locate visits done by the contacts
Aggregate the number of pages in each visit to understand the length of their journey
The IP address in our demo belongs to Dnipro with 49000 postal code. It is recorded by Sitecore Analytics in following manner:
The GeoIP data is a part of InteractionsCache facet that belongs to contact; we could find all the contacts from postal code/city/(any condition from picture above) by query:
DECLARE @location NVARCHAR(20) = '49000';
DECLARE @LocationUtcShift INT = 2;
DECLARE @MinInteractionsThreshold INT = 6;
DECLARE @ShardsCount INT = 2;
WITH [ContactsInTheArea] AS(
SELECT
DISTINCT(cf.ContactId) AS [ContactId]
FROM [xdb_collection].[ContactFacets] cf
CROSS APPLY OPENJSON([FacetData], '$.PostalCodes')
WITH ([City] NVARCHAR(100) '$')
WHERE
FacetKey='InteractionsCache'
AND ISJSON(FacetData) = 1
AND [City] = @location)
SELECT COUNT(1) AS [Unique browser sessions] FROM [ContactsInTheArea]
The next step is to locate all the interactions recorded in system:
[InteractionsFromTheArea] AS(
SELECT
i.InteractionId,
DATEADD (HOUR, @LocationUtcShift, i.StartDateTime) AS [StartDateTime],
DATEADD (HOUR, @LocationUtcShift, i.EndDateTime) AS [EndDateTime],
i.[Events],
Pages = (
SELECT COUNT(1)
FROM OPENJSON([Events])
WITH ([Event] NVARCHAR(100) '$."@odata.type"')
WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent')
FROM [xdb_collection].Interactions i
INNER JOIN [ContactsInTheArea] d
ON d.[ContactId] = i.ContactId)
SELECT * FROM [InteractionsFromTheArea]
We found all the recorded interactions performed from the location we originally set. The last step is to aggregate statistics per day:
SELECT
CAST (i.StartDateTime AS DATE) AS [Session Time],
COUNT(1) AS [Test Sessions],
CAST(ROUND(AVG(CAST(Pages AS FLOAT)), 2) AS NUMERIC(36,2)) AS [Avg. pages viewed]
FROM [InteractionsFromTheArea] i
GROUP BY CAST (i.StartDateTime AS DATE)
HAVING COUNT(1) > (@MinInteractionsThreshold / @ShardsCount)
ORDER BY [Session Time] DESC
The last query answers how often our site was visited in the area that belongs to the postal code/(company name owing the IP address):
Summary
A daily statistics of interactions (and their quality) originated from area is a query away, impressive? Since we operated on one shard out of N, the results are to be multiplied by N to get complete picture.
The report is built by burning CPU to parse raw JSON on each go (the more data = the more CPU spent). A lack of data normalization is a price to pay for flexibility (possibility to track/store custom info) that introduces a need of reducing/extracting/aggregating data (constantly adjust report data to reflect data change) and storing into query-friendly format.
Analytics reports have suspicious statistics with lower conversion rates compared to other systems. Can we find out why?
It seem that healthy data is diluted with junk/empty interactions with no value. We assume robot/crawlers activity gets recorded. Is there any OOB protection in Sitecore?
Filter out robots by user agents
Sitecore blacklists robots via a list of user agents defined in the config:
Theoretically, zero interactions with these user agents should be recorded, right? Well, I do not blog about straightforward tasks. We could check the actual number of robots via huge SQL composed by replacing line break character with ',':
Let’s leave this question unanswered (for now) and focus on what robot is?
How to identify robot by behavior?
Page crawler requests pages one by one without persisting any cookies – it will not have interactions with more than one page. We could try to find user agents that do not have more than one page recorded:
WITH PagesInInteractions AS(
SELECT Pages = (
SELECT COUNT(1)
FROM OPENJSON([Events])
WITH ([Event] NVARCHAR(100) '$."@odata.type"')
WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent'),
Created,
LastModified,
UserAgent,
InteractionId,
ContactId
FROM [xdb_collection].[Interactions])
SELECT
COUNT(1) AS Hits,
[UserAgent],
DATEDIFF(DAY, MIN(Created), MAX(LastModified)) AS [DaysBetween],
MIN(Created) AS [Earliest],
MAX(LastModified) AS [Latest]
FROM PagesInInteractions
GROUP BY [UserAgent]
HAVING
MAX(Pages) <=1
AND COUNT(1) > 500
ORDER BY COUNT(1) DESC
This query finds unique user agents that have viewed single-page only and have over 500 interactions:
20% of total interactions recorded system-wide belong to user agents that do not have over 1 page in visit across 500 visits. These user agents are most likely to be added into the blacklist to stop them from being tracked.
Could contacts without interactions exist?
Although that should not happen in theory… you got it:
SELECT COUNT(DISTINCT(c.ContactId)) FROM [xdb_collection].Contacts c
LEFT JOIN [xdb_collection].Interactions i
ON i.ContactID = c.ContactId
WHERE i.ContactId IS NULL
Our case study has 7.5 % of contacts without interactions that was caused by bug.
Summary
The query we developed to locate suspicious user agents allows us to identify robots with better accuracy in future. Unfortunately, the previously collected robot sessions would still remain in system and pollute analytics reports. Needless to say that you pay for storing useless data to your hosting vendor.
In next articles we’ll try to remove the useless data from system to recover the reports.
public class Dummy
{
private readonly BaseItemManager _baseItemManager;
public Dummy(BaseItemManager itemManager)
{
_baseItemManager = itemManager;
}
public string Foo(ID id, ID fieldID)
{
// Legacy approach with static manager
// var item = Sitecore.Data.Managers.ItemManager.GetItem(id);
var item = _baseItemManager.GetItem(id);
return item[fieldID];
}
}
However, straightforward unit test would have a long arrange for Sitecore entities:
public class DummyTests
{
[Theory, AutoData]
public void Foo_Gets_ItemField(ID itemId, ID fieldId, string fieldValue)
{
var itemManager = Substitute.For<BaseItemManager>();
var database = Substitute.For<Database>();
var item = Substitute.For<Item>(itemId, ItemData.Empty, database);
item[fieldId].Returns(fieldValue);
itemManager.GetItem(itemId).Returns(item);
var sut = new Dummy(itemManager);
var actual = sut.Foo(itemId, fieldId);
actual.Should().Be(fieldValue);
}
}
8 lines of code (>550 chars) to verify a single scenario is too much code.
How to simplify unit testing?
A big pile of solution code is typically build around:
Locating data by identifier (GetItem API)
Processing hierarchies (Children, Parent, Axes)
Filtering based on template
Locating specific bits (accessing fields)
The dream test would contain only meaningful logic without arrange hassle:
[Theory, AutoNSubstitute]
public void Foo_Gets_ItemField(FakeItem fake, [Frozen] BaseItemManager itemManager, Dummy sut, ID fieldId, string fieldValue)
{
Item item = fake.WithField(fieldId, fieldValue);
itemManager.GetItem(item.ID).Returns(item);
var actual = sut.Foo(item.ID, fieldId);
actual.Should().Be(fieldValue);
}
Better? Let’s take a closer look what has changed so that test is only 4 lines now.
var bond = new FakeItem()
.WithName("Bond, James Bond")
.WithLanguage("EN")
.WithField(FieldIDs.Code, "007")
.WithTemplate(IDs.LicenseToKill)
.WithChild(new FakeItem())
.WithParent(_M)
.WithItemAccess()
.WithItemAxes()
.ToSitecoreItem();
public class AutoNSubstituteDataAttribute : AutoDataAttribute
{
public AutoNSubstituteDataAttribute()
: base(() => new Fixture().Customize(
new CompositeCustomization(
new DatabaseCustomization(),
new ItemCustomization().....))
{
}
}
public class ItemCustomization : ICustomization
{
public void Customize(IFixture fixture)
{
fixture.Register<ID, Database, FakeItem>((id, database) => new FakeItem(id, database));
fixture.Register<FakeItem, Item>(fake => fake.ToSitecoreItem());
}
}
public class DatabaseCustomization : ICustomization
{
public void Customize(IFixture fixture)
{
fixture.Register<string, Database>(FakeUtil.FakeDatabase);
}
}
Implicit dependency warning: Sitecore.Context
Test isolation is threatened by an implicit dependency on Sitecore.Context (which is static on the surface). There are multiple solutions on the deck.
A) Clean up Sitecore.Context inner storage in each test
Context properties are based on Sitecore.Context.Items (backed either by HttpContext.Items or Thread-static) dictionary that could be cleaned before/after each test execution so that context property change effect is gone once test finishes:
public class DummyTests: IDisposable
{
public DummyTests()
{
Sitecore.Context.Items.Clear();
}
public void Foo()
{
Sitecore.Context.Item = item;
....
}
void IDisposable.Dispose() => Sitecore.Context.Items.Clear();
}
The approach leads to burden/hidden dependency that is a code smell.
B) Facade Sitecore.Context behind ISitecoreContext
All the custom code could use ISitecoreContext interface instead of Sitecore.Context so that all the needed dependencies become transparent:
interface ISitecoreContext
{
Item Item { get;set; }
Database Database { get;set; }
...
}
public class SitecoreContext: ISitecoreContext
{
public Item Item
{
get => Sitecore.Context.Item;
set => Sitecore.Context.Item = value;
}
public Database Database
{
get => Sitecore.Context.Database;
set => Sitecore.Context.Database = value;
}
...
}
Items: has item ID, name, parentId and the templateID item is based on
SharedFields: has itemId, fieldId, and value itself
UnversionedFields: has language for the value, itemId, fieldId, value
VersionedFields: has version number, language, itemId, fieldId, value
The item data is read by a query that unions all the tables and uses ItemID condition:
A caching layer ensures SQL to be executed only in case data was not found in cache. There are 3 main scenarios to load item data:
By item id: database.GetItem(ID) is called
Children: GetChildren is called
By template: during application start, initial items prefetch
Key points
Individual fields are not selected by fieldId as all fields selected for item at once
Items are commonly requested by ID (dominant workload)
Query unions 4 tables via ItemID condition
Query performs sort on database side
None of the tables has primary key defined
How does the SQL Server execute query?
The default query execution plan highlights many steps to be taken to read one item:
Stock query execution plan has many nodes
Unfortunately, item-related tables do not have primary key defined so that every request does RID lookup. Since the volume of reads is far greater than the number of modifications in web and core databases, read workload optimization could be applied:
Defining a primary key (non-unique) for fields table by itemID so that fields belonging to same item are stored next to each other
Simplifying ItemID condition – moving away from where ID in SELECT
Reduce the volume of SQL requests
Measuring the impact
Schema-change decision must be driven by data/statistics analysis, hence we’ll measure the outcome via SQL Server Profiler for default VS optimized versions:
Duplicate the tables with suggested improvements
Ensure SQL Indexes are healthy
Restart SQL Server
Request N items from database
Clustered VS Non-Clustered
Over 3 times faster thanks to clustered indexes:
Avoid ORDER BY
SQL Server sort can be moved into the application logic to get ~50% speed up:
Not only MemortGrantInfo is 0, but also the Estimated Subtree Cost is ~47% less:
Creating SQL view
Although view does not give any boost, it hides the impl. detail on how item data is built:
CREATE VIEW [dbo].[ItemDataView]
AS
SELECT ItemId, [Order], Version, Language, Name, Value, TemplateID, MasterID, ParentID, Created
FROM (SELECT ID AS ItemId, 0 AS [Order], 0 AS Version, '' AS Language, Name, '' AS Value, TemplateID, MasterID, ParentID, Created
FROM dbo.Items
UNION ALL
SELECT ParentID AS ItemId, 1 AS [Order], 0 AS Version, '' AS Language, NULL AS Name, '' AS Expr1, NULL AS Expr2, NULL AS Expr3, ID, NULL
FROM dbo.Items AS Items_Parent
UNION ALL
SELECT ItemId, 2 AS [Order], 0 AS Version, '' AS Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
FROM dbo.SharedFields
UNION ALL
SELECT ItemId, 2 AS [Order], 0 AS Version, Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
FROM dbo.UnversionedFields
UNION ALL
SELECT ItemId, 2 AS [Order], Version, Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
FROM dbo.VersionedFields) AS derivedtbl_1
Simplifying the condition to select items
The stock query would return item fields only in case item definition exists:
Query can be optimized for a mainstream scenario (item data exists) and directly stream the content of the field tables. Application may filter out rows without definitions later on:
Theoretical: Stock vs Optimized
The optimized query is 7.3 times faster than the stock:
Reduce the volume of SQL Queries
The final query streams data from tables in a fastest possible way turning request overhead (like network latency) to be top wall clock time consumer. The volume of requests could be reduced by loading not only item by ID, but also its children:
SELECT * FROM
[ItemDataView] d
JOIN
[Items] cond
ON [d].ItemId = [cond].ID
WHERE (cond.ID = @ID OR cond.ParentID=@ID)
Practice: Testing variations
We will load all the items from Sitecore database:
var item = db.GetItem(Sitecore.ItemIDs.RootID);
System.GC.Collect(System.GC.MaxGeneration,System.GCCollectionMode.Forced, true, true);
Sitecore.Caching.CacheManager.ClearAllCaches();
var ticksBefore = Sitecore.Diagnostics.HighResTimer.GetTick();
var items = item.Axes.GetDescendants();
var msTaken = Sitecore.Diagnostics.HighResTimer.GetMillisecondsSince(ticksBefore);
Results would be measured by SQL Server Profiler and aggregated to get AVG values:
View top metrics
Test combinations
Stock query as a base line
Clustered index only
NS: Clustered index without sort
+KIDS: Clustered index without sort + loading children
InMemory tables for all item-related tables
Symbiosis: InMemory for items + clustered for fields table
Results: Over 30% speedup
Results highlight clustered indexes without sort (NS) is only 10% faster
Loading children with item itself is the winner:
30% faster on a local machine; even a greater win in distributed environment
18% reduce number of SQL queries
25% less CPU spent
35% less reads
The item fetch was improved thanks to understanding how the system operates with data, thus SQL Server can handle a bigger load with no additional cost.
Traverse full collection content (linked list stored in different memory locations = poor data locality) and copy all the elements into new List<T>()
Unfreeze the collection so that other threads can make a copy
Not only one thread at a time can make a snapshot of the collection, but every enumeration attempt makes an allocation to produce a snapshot/copy. Should the enumerator be used often (f.e. parsing every field in SOLR search results) it would bubble in top 5 dead types in production:
Over 285K arrays are to be cleaned up by GC
The default Sitecore.ContentSearch.SolrProvider.SolrFieldMap class uses ConcurrentBag to store SolrFieldConfiguration – every GetFieldConfiguration API call ends with allocations and system-wide locking:
Concurrent bag attempts to make a snapshot, but cannot as already locked by a different thread
Leading to a bottleneck in multi-threaded environment:
Lock contention during parsing SOLR response
Despite SOLR can reply to concurrent requests in a fast manner, the result parsing on Sitecore side could slow us down.
Benchmark: Measuring stock operation performance
public SolrFieldMapTests()
{
confg = new XmlDocument();
confg.Load(@"E:\fieldMap_demo.config");
var factory = new TestFactory(new ComparerFactoryEx(), new ServiceProviderEx());
_fieldMap = factory.CreateObject(confg.DocumentElement, assert: true) as SolrFieldMap;
}
public const int N = 10 * 1000;
[Benchmark]
public void Stock_GetFieldConfiguration()
{
for (int i = 0; i < N; i++)
{
_fieldMap.GetFieldConfiguration(type);
}
}
Almost 9MB spent to locate 10K fields:
That is only for 10K elements
Not only a snapshot is produced, but also stock logic would execute sorting on each execution (instead of once during load). Can it be done better? Yes.
Solution 1: Use IConstructable interface
Since fields are defined in fieldMap section of the Sitecore Solr configuration, it seems adds are called only during object construction. IConstructable interface could have been implemented for the FieldMap to transform data from ConcurrentBag into array.
That would allow multiple threads to be executed simultaneously and save memory allocations since no snapshots are needed.
Solution 2: Use lock-free synchronization
Field configuration is added via AddTypeMatch method defined by configuration:
<fieldMap type="Sitecore.ContentSearch.SolrProvider.SolrFieldMap, Sitecore.ContentSearch.SolrProvider">
<!-- This element must be first -->
<typeMatches hint="raw:AddTypeMatch">
<typeMatch type="System.Collections.Generic.List`1[System.Guid]" typeName="guidCollection" fieldNameFormat="{0}_sm" multiValued="true" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
private volatile SolrSearchFieldConfiguration[] availableTypes = Array.Empty<SolrSearchFieldConfiguration>();
public void AddTypeMatch(string typeName, Type settingType, IDictionary<string, string> attributes, XmlNode configNode)
{
Assert.ArgumentNotNullOrEmpty(typeName, "typeName");
Assert.ArgumentNotNull(settingType, "settingType");
var solrSearchFieldConfiguration = (SolrSearchFieldConfiguration)ReflectionUtility.CreateInstance(settingType, typeName, attributes, configNode);
Assert.IsNotNull(solrSearchFieldConfiguration, $"Unable to create : {settingType}");
typeMap[typeName] = solrSearchFieldConfiguration;
SolrSearchFieldConfiguration[] snapshot;
SolrSearchFieldConfiguration[] updated;
do
{
snapshot = availableTypes; // store original pointer
updated = new SolrSearchFieldConfiguration[snapshot.Length + 1];
Array.Copy(snapshot, 0, updated, 0, snapshot.Length);
updated[snapshot.Length] = solrSearchFieldConfiguration;
updated = updated.OrderByDescending(e => e.FieldNameFormat).ToArray();
}
while (Interlocked.CompareExchange(ref availableTypes, updated, snapshot) != snapshot);
}
public IReadOnlyCollection<SolrSearchFieldConfiguration> GetAvailableTypes() => availableTypes;
It copies the existing array content into a new one placing it next to an additional value. We’ll also do the sorting here once instead of per-call.
Since availableTypes is treated as immutable collection, it is enough only to verify array pointer value.
Benchmark: Array vs ConcurrentBag
Since updated version neither causes memory allocations, nor has sorting, nor jumps between pointers (good locality), it gets over hundred times faster with 30 times less memory allocated:
Conclusion
Concurrent collection usage in a wrong manner could slow down code over 100 times.
A misuse is quite hard to detect on a development machine as nothing obvious is slow. It gets even trickier to detect in case code is sitting next to out-proc resource that is always blamed for slow performance.
Would you as a developer allow a setting that can make system 15 550 times slower?
I’ve received a few memory dumps with high CPU; each scavenges AccessResultCache:
How big is the cache so that every snapshot contains the operation?
Detecting cache size from the snapshot
A ClrMD code snippet locates objects in Sitecore.Caching.Generics.Cache namespace with cache-specific fields & showing only filled caches:
using (DataTarget dataTarget = DataTarget.LoadCrashDump(snapshot))
{
ClrInfo runtimeInfo = dataTarget.ClrVersions[0];
ClrRuntime runtime = runtimeInfo.CreateRuntime();
var heap = runtime.Heap;
var stats = from o in heap.EnumerateObjects()
let t = heap.GetObjectType(o)
where t != null && t.Name.StartsWith("Sitecore.Caching.Generics.Cache")
let box = t.GetFieldByName("box")
where box != null
let name = o.GetStringField("name")
let maxSize = o.GetField<long>("maxSize")
let actualBox = o.GetObjectField("box")
let currentSize = actualBox.GetField<long>("currentSize")
where maxSize > 0
where currentSize > 0
let ratio = Math.Round(100 * ((double)currentSize / maxSize), 2)
where ratio > 40
orderby ratio descending, currentSize descending
select new
{
name,
address = o.Address.ToString("X"),
currentSize = MainUtil.FormatSize(currentSize, false),
maxSize = MainUtil.FormatSize(maxSize, false),
ratio
};
foreach (var stat in stats)
{
Console.WriteLine(stat);
}
}
There are 5 caches that are running out of space, and AccessResultCache is one of them with 282MB running size vs 300 MB allowed:
The Caching.CacheKeyIndexingEnabled.AccessResultCache setting controls how cache is scavenged:
Using indexed storage for cache keys can in certain scenarios significantly reduce the time it takes to perform partial cache clearing of the AccessResultCache. This setting is useful on large solutions where the size of this cache is very large and where partial cache clearing causes a measurable overhead.
Sitecore.Caching.AccessResultCache.IndexedCacheKeyContainer is plugged in should cache key indexing be enabled. The index is updated whenever element is added so that all elements belonging to an item can be easily located. A bit higher price for adding an element in exchange of a faster scavenge.
What is performance with different setting values?
We’ll do a series of Benchmark.NET runs to cover the scenario:
Mimic AccessResultCache inner store & load keys into it
Trigger logic to remove element with & without index
Measure how fast elements are added with & without index
Measure speed for different sizes
Load keys into AccessResultCache inner storage
Default storage is ConcurrentDictionary; cleanup is a predicate for every cache key:
private readonly ConcurrentDictionary<FasterAccessResultCacheKey, string> fullCache = new ConcurrentDictionary<FasterAccessResultCacheKey, string>();
private readonly IndexedCacheKeyContainer fullIndex = new IndexedCacheKeyContainer();
public AccessResultCacheCleanup()
{
foreach (var key in keys)
{
cache.TryAdd(key, key.EntityId);
index.UpdateIndexes(key);
}
}
private void StockRemove(ConcurrentDictionary<FasterAccessResultCacheKey, string> cache)
{
var keys = cache.Keys;
var toRemove = new List<FasterAccessResultCacheKey>();
foreach (var key in keys)
{
if (key.EntityId == keyToRemove)
{
toRemove.Add(key);
}
}
foreach (var key in toRemove)
{
fullCache.TryRemove(key, out _);
}
}
public void RemoveViaIndex(ConcurrentDictionary<FasterAccessResultCacheKey, string> cache, IndexedCacheKeyContainer index)
{
var key = new FasterAccessResultCacheKey(null, null, null, keyToRemove, null, true, AccountType.Unknown, PropagationType.Unknown);
var keys = index.GetKeysByPartialKey(key);
foreach (var toRemove in keys)
{
cache.TryRemove(toRemove, out _);
}
index.RemoveKeysByPartialKey(key);
}
Measuring add performance
Index maintenance needs additional efforts, hence add speed should be also tested:
[Benchmark]
public void CostOfAdd_IndexOn()
{
var cache = new ConcurrentDictionary<FasterAccessResultCacheKey, string>();
var index = new IndexedCacheKeyContainer();
long size = 0;
foreach (var key in Keys)
{
index.UpdateIndexes(key);
size += key.GetDataLength();
}
}
[Benchmark]
public void CostOfAdd_WithoutIndex()
{
var cache = new ConcurrentDictionary<FasterAccessResultCacheKey, string>();
long size = 0;
foreach (var key in Keys)
{
cache.TryAdd(key, key.EntityId);
size += key.GetDataLength();
}
}
Taking into account different cache sizes
Configured 300 MB is 7 .5 times larger than default cache value (40 MB in 9.3), it makes sense to measure timings for different key count as well (58190 keys = 282 MB):
Stock configuration fits somewhere near ~8.4K entries
Understanding the results
Removing element without index takes 15 550 times more
An attempt to remove element costs ~400 KB memory pressure
It takes 3.8 ms for a single removal on IDLE system with 4.8 GHz super CPU
Prod solution in cloud (constant cache hits) shall take ~4 times more
Up to 8.4K entries can squeeze into OOB AccessResultCache size
OOB Sitecore has ~6K items in master database
~25.4K items live in OOB core database
Each user has own access entries
Adding an element into cache with index costs 15 times more
Conclusions
AccessResultCache is aimed to avoid repeatable CPU-intensive operations. Unfortunately, default cache size is too small so that limited number of entries can be stored at once (even less than items in master & web OOB databases). The insufficient cache size flags even on development machine:
However, defining production-ready size leads to ~15540 times higher performance penalties during cache scavenge for OOB configuration = potential for a random lag.
A single configuration change (enable cache key indexing) changes the situation drastically & brings up a few rhetorical questions:
Is there any reason for AccessResultCache to be scavenged even if security field was not modified? To me – no.
Any use-case to disable cache indexing in production system with large cache?
What is the purpose of the switch that slows system 15.5K times?
Should a system pick different strategy based on predefined size & server role?
Summary
Stock Caching.AccessResultCacheSize value is too little for production, increase it at least 5 times (so that scavenge messages no longer seen in logs)
Enable Caching.CacheKeyIndexingEnabled.AccessResultCache to avoid useless performance penalties during scavenge
I’ll ask you to add ~30K useless hashtable lookups for each request in your application. Even if 40 requests are running concurrently (30 * 40 = 1.2M), the performance price would not be visible to a naked eye on modern servers.
Would that argument convince you to waste power you pay for? I hope not.
Why could that happen in real life?
The one we look at today – lack of respect to code mainstream execution path.
A pure function with single argument is called almost all the time with the same value. It looks as an obvious candidate to have the result cached. To make the story a bit more intriguing – cache is already in place.
Sitecore.Security.AccessControl.AccessRight ships a set of well-known access rights (f.e. ItemRead, ItemWrite). The right is built from name via a set of ‘proxy‘ classes:
AccessControl.AccessRightManager – legacy static manager called first
Abstractions.BaseAccessRightManager – call is redirected to the abstraction
AccessRightProvider – locates access right by name
ConfigAccessRightProvider is the default implementation of AccessRightProvider with Hashtable (name -> AccessRight) storing all known access rights mentioned in Sitecore.config:
<accessRights defaultProvider="config">
<providers>
<clear />
<add name="config" type="Sitecore.Security.AccessControl.ConfigAccessRightProvider, Sitecore.Kernel" configRoot="accessRights" />
</providers>
<rights defaultType="Sitecore.Security.AccessControl.AccessRight, Sitecore.Kernel">
<add name="field:read" comment="Read right for fields." title="Field Read" />
<add name="field:write" comment="Write right for fields." title="Field Write" modifiesData="true" />
<add name="item:read" comment="Read right for items." title="Read" />
Since CD servers never modify items on their own, rules that modify data are rarely touched. So that a major pile of hashtable lookups inside AccessRightProvider likely targets *:read rules.
Assumption: CD servers have dominant read workload
The assumption can be verified by building the statistics for accessRightName requests:
public class ConfigAccessRightProviderEx : ConfigAccessRightProvider
{
private readonly ConcurrentDictionary<string, int> _byName = new ConcurrentDictionary<string, int>();
private int hits;
public override AccessRight GetAccessRight(string accessRightName)
{
_byName.AddOrUpdate(accessRightName, s => 1, (s, i) => ++i);
Interlocked.Increment(ref hits);
return base.GetAccessRight(accessRightName);
}
}
90% of calls on Content Delivery role aims item:read as predicted:
item:read gets ~80K calls for startup + ~30K each page request in a local sandbox.
Optimizing for straightforward scenario
Since 9 out of 10 calls would request item:read, we could return the value straightaway without doing a hashtable lookup:
public class ConfigAccessRightProviderEx : ConfigAccessRightProvider
{
public new virtual void RegisterAccessRight(string accessRightName, AccessRight accessRight)
{
base.RegisterAccessRight(accessRightName, accessRight);
}
}
public class SingleEntryCacheAccessRightProvider : ConfigAccessRightProviderEx
{
private AccessRight _read;
public override void RegisterAccessRight(string accessRightName, AccessRight accessRight)
{
base.RegisterAccessRight(accessRightName, accessRight);
if (accessRight.Name == "item:read")
{
_read = accessRight;
}
}
public override AccessRight GetAccessRight(string accessRightName)
{
if (string.Equals(_read.Name, accessRightName, System.StringComparison.Ordinal))
{
return _read;
}
return base.GetAccessRight(accessRightName);
}
}
All the AccessRights known to the system could be copied from Sitecore config; an alternative is to fetch them from the memory snapshot:
private static void SaveAccessRights()
{
using (DataTarget dataTarget = DataTarget.LoadCrashDump(snapshot))
{
ClrInfo runtimeInfo = dataTarget.ClrVersions[0];
ClrRuntime runtime = runtimeInfo.CreateRuntime();
var accessRightType = runtime.Heap.GetTypeByName(typeof(Sitecore.Security.AccessControl.AccessRight).FullName);
var accessRights = from o in runtime.Heap.EnumerateObjects()
where o.Type?.MetadataToken == accessRightType.MetadataToken
let name = o.GetStringField("_name")
where !string.IsNullOrEmpty(name)
let accessRight = new AccessRight(name)
select accessRight;
var allKeys = accessRights.ToArray();
var content = JsonConvert.SerializeObject(allKeys);
File.WriteAllText(storeTo, content);
}
}
public static AccessRight[] ReadAccessRights()
{
var content = File.ReadAllText(storeTo);
return JsonConvert.DeserializeObject<AccessRight[]>(content);
}
The test code should simulate similar to real-life workload (90% hits for item:read and 10% to others):
public class AccessRightLocating
{
private const int N = (70 * 1000) + (40 * 10 * 1000);
private readonly ConfigAccessRightProviderEx stock = new ConfigAccessRightProviderEx();
private readonly ConfigAccessRightProviderEx improved = new SingleEntryCacheAccessRightProvider();
private readonly string[] accessPattern;
public AccessRightLocating()
{
var accessRights = Program.ReadAccessRights();
string otherAccessRightName = null;
string readAccessRightName = null;
foreach (var accessRight in accessRights)
{
stock.RegisterAccessRight(accessRight.Name, accessRight);
improved.RegisterAccessRight(accessRight.Name, accessRight);
if (readAccessRightName is null && accessRight.Name == "item:read")
{
readAccessRightName = accessRight.Name;
}
else if (otherAccessRightName is null && accessRight.Name == "item:write")
{
otherAccessRightName = accessRight.Name;
}
}
accessPattern = Enumerable
.Repeat(readAccessRightName, count: 6)
.Concat(new[] { otherAccessRightName })
.Concat(Enumerable.Repeat(readAccessRightName, count: 3))
.ToArray();
}
[Benchmark(Baseline = true)]
public void Stock()
{
for (int i = 0; i < N; i++)
{
var toRead = accessPattern[i % accessPattern.Length];
var restored = stock.GetAccessRight(toRead);
}
}
[Benchmark]
public void Improved()
{
for (int i = 0; i < N; i++)
{
var toRead = accessPattern[i % accessPattern.Length];
var restored = improved.GetAccessRight(toRead);
}
}
}
Benchmark.NET test proves the assumption with astonishing results – over 3x speedup:
Conclusion
The performance was improved over 3.4 times by bringing respectfor the mainstream scenario –item:read operation. Being a minor win on a single-operation scale, it gets noticeable as number of invocations grows.