Why should an upper limit exist for every saved bit

Case study: polluted reports shows how system can be polluted with dummy data.

Saving data (even HTTP referer) without validation can contaminate system as well:

SELECT TOP (10) 
	[ContactId]
	,[LastModified]
        ,[FacetData]
	,JSON_QUERY(FacetData,'$.Referrers') as [Referrers]
	, DATALENGTH(JSON_QUERY(FacetData,'$.Referrers')) as [ReferrerSize]
  FROM 
	[xdb_collection].[ContactFacets]
  WHERE 
	[FacetKey]='InteractionsCache'
	AND CHARINDEX('"Referrers":["', FacetData) > 0
  ORDER BY [ReferrerSize] DESC

The results show astonishing 28KB for storing single value:

Next time you see Analytics shards worth 600 GB – recall this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: