Case study: polluted reports


Analytics reports have suspicious statistics with lower conversion rates compared to other systems. Can we find out why?

It seem that healthy data is diluted with junk/empty interactions with no value. We assume robot/crawlers activity gets recorded. Is there any OOB protection in Sitecore?

Filter out robots by user agents

Sitecore blacklists robots via a list of user agents defined in the config:


User agents that should be excluded rain or shine

Theoretically, zero interactions with these user agents should be recorded, right? Well, I do not blog about straightforward tasks. We could check the actual number of robots via huge SQL composed by replacing line break character with ',':

But still, 20% of all contacts (that have interactions) report to have robot user agents. Could it be due to the fact Sitecore uses case-sensitive match for robots, while SQL column has case-insensitive SQL_Latin1_General_CP1_CI_AS collation? While that can easily happen in your case, a cause is different in our case study:

Let’s leave this question unanswered (for now) and focus on what robot is?

How to identify robot by behavior?

Page crawler requests pages one by one without persisting any cookies – it will not have interactions with more than one page. We could try to find user agents that do not have more than one page recorded:

WITH PagesInInteractions AS(
	SELECT Pages = (
			FROM OPENJSON([Events])
				WITH ([Event] NVARCHAR(100) '$."@odata.type"') 
			WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent'),
FROM [xdb_collection].[Interactions])
	COUNT(1) AS Hits,
	DATEDIFF(DAY, MIN(Created), MAX(LastModified)) AS [DaysBetween],  
	MIN(Created) AS [Earliest],
	MAX(LastModified) AS [Latest]
FROM PagesInInteractions 
GROUP BY [UserAgent] 
	MAX(Pages) <=1 
	AND COUNT(1) > 500

This query finds unique user agents that have viewed single-page only and have over 500 interactions:

20% of total interactions recorded system-wide belong to user agents that do not have over 1 page in visit across 500 visits. These user agents are most likely to be added into the blacklist to stop them from being tracked.

Could contacts without interactions exist?

Although that should not happen in theory… you got it:

SELECT COUNT(DISTINCT(c.ContactId)) FROM [xdb_collection].Contacts c
LEFT JOIN  [xdb_collection].Interactions i
ON i.ContactID = c.ContactId

Our case study has 7.5 % of contacts without interactions that was caused by bug.


The query we developed to locate suspicious user agents allows us to identify robots with better accuracy in future. Unfortunately, the previously collected robot sessions would still remain in system and pollute analytics reports. Needless to say that you pay for storing useless data to your hosting vendor.

In next articles we’ll try to remove the useless data from system to recover the reports.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: