Can “site visit frequency from specific place (or better, certain company office)” be just a query away? The needed analytics data is already collected by Sitecore, hence data mining could roughly be:
- Figure out area postal code (or reverse it by IP using any reverse IP lookup)
- Find all contacts that have the same details postal code (or by other field criteria)
- Locate visits done by the contacts
- Aggregate the number of pages in each visit to understand the length of their journey
The IP address in our demo belongs to Dnipro with 49000 postal code. It is recorded by Sitecore Analytics in following manner:
The GeoIP data is a part of
InteractionsCache facet that belongs to contact; we could find all the contacts from postal code/city/(any condition from picture above) by query:
DECLARE @location NVARCHAR(20) = '49000'; DECLARE @LocationUtcShift INT = 2; DECLARE @MinInteractionsThreshold INT = 6; DECLARE @ShardsCount INT = 2; WITH [ContactsInTheArea] AS( SELECT DISTINCT(cf.ContactId) AS [ContactId] FROM [xdb_collection].[ContactFacets] cf CROSS APPLY OPENJSON([FacetData], '$.PostalCodes') WITH ([City] NVARCHAR(100) '$') WHERE FacetKey='InteractionsCache' AND ISJSON(FacetData) = 1 AND [City] = @location) SELECT COUNT(1) AS [Unique browser sessions] FROM [ContactsInTheArea]
The next step is to locate all the interactions recorded in system:
[InteractionsFromTheArea] AS( SELECT i.InteractionId, DATEADD (HOUR, @LocationUtcShift, i.StartDateTime) AS [StartDateTime], DATEADD (HOUR, @LocationUtcShift, i.EndDateTime) AS [EndDateTime], i.[Events], Pages = ( SELECT COUNT(1) FROM OPENJSON([Events]) WITH ([Event] NVARCHAR(100) '$."@odata.type"') WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent') FROM [xdb_collection].Interactions i INNER JOIN [ContactsInTheArea] d ON d.[ContactId] = i.ContactId) SELECT * FROM [InteractionsFromTheArea]
We found all the recorded interactions performed from the location we originally set. The last step is to aggregate statistics per day:
SELECT CAST (i.StartDateTime AS DATE) AS [Session Time], COUNT(1) AS [Test Sessions], CAST(ROUND(AVG(CAST(Pages AS FLOAT)), 2) AS NUMERIC(36,2)) AS [Avg. pages viewed] FROM [InteractionsFromTheArea] i GROUP BY CAST (i.StartDateTime AS DATE) HAVING COUNT(1) > (@MinInteractionsThreshold / @ShardsCount) ORDER BY [Session Time] DESC
The last query answers how often our site was visited in the area that belongs to the postal code/(company name owing the IP address):
A daily statistics of interactions (and their quality) originated from area is a query away, impressive? Since we operated on one shard out of N, the results are to be multiplied by N to get complete picture.
The report is built by burning CPU to parse raw JSON on each go (the more data = the more CPU spent). A lack of data normalization is a price to pay for flexibility (possibility to track/store custom info) that introduces a need of reducing/extracting/aggregating data (constantly adjust report data to reflect data change) and storing into query-friendly format.