First Party Data And Server-Side Analytics For Everyone — The Hybrid Model

Published in

Level Up Coding

7 min readFeb 18, 2021

In this post, I will illustrate how you can move a significant chunk of an analytics implementation server-side and therefore considerably improve data accuracy and data control:

With your existing client-side tagging setup
Without using server-side Google Tag Manager and without necessarily getting any Google servers involved
With an easy implementation

One could call it a Hybrid Model — combining the benefits of both worlds.
Client-side data collection.
Server-side data preparation and dispatch.

First party data collection for everyone — with minimal implementation effort. The goal here is neither to create your own data pipeline and processing, nor to purely rely on server-side log data. The objective is to:
→ assemble data client-side (use any client-side analytics JS library)
→ collect, prepare, and clean it on your own server (I am using a PHP file)
→ forward it to Google Analytics/Adobe Analytics/AT Internet or a vendor of choice

Server-side implementation and tag management have the reputation of being difficult to deploy. While it certainly adds an extra step to the process, the work on the server can be stripped down to the bare essentials. While GTM server-side is a great product simplifying the configuration at a large scale, I want to show that first party server-side analytics can also be implemented in a very comprehensible manner. This should cater for a much better understanding of how it actually works, illustrate the pros and cons, and take away the fear of the unknown for those, who have not had many touchpoints with server-side implementations.

In ShortWhat You Gain 
→ Data accuracy, first party server-side cookies, full data control
What You Lose 
→ Reliable geolocation, vendor-side bot detection, cookie ID on first event
What You Need 
→ Access to web hosting server, basic server-side programming
What You Must Consider 
→ Privacy compliance, consent management
What It Looks Like in Practise 
→ https://measure.hinternesch.com/

What You Gain

Data Accuracy

Tracking requests won’t be blocked by ad-/tracking-blockers or browser privacy tools. To give a bit of background, there are several reasons why blocking tools and secure browsers would prevent an analytics request:

Call to a CDN that hosts a tracking library like Google’s analytics.js or AT Internet’s smarttag.js
→ Solution: Don’t use a CDN. Instead, host the library as a local JS file on your server. You don’t even have to use a vendor library. You can also write your own script to capture and transmit selected client-side data.
Call to a client-side tag management system
→ Solution: Don’t use a TMS. Write the implementation code in a local JS file.
Call to a known analytics vendor collection domain
→ Solution: Don’t make a direct request to the vendor endpoint. Instead, request your own file from https://myDomain.com/measure

Of course, you must still obtain consent (see the section What You Must Consider). It cannot be stressed enough that a configuration like this does not intend to illegally bypass ad blockers and violate user privacy. However, I do believe that a lot of legitimate analytics is being lost arbitrarily. Responsible measurement of website usage with consent should not suffer, simply because users understandably want to block privacy-invading third party ad tracking solutions. Legislation, public understanding, and privacy tools like ITP, cookie consent popups, and ad blockers are lacking nuance. And that is where a setup like this could help improve data accuracy by regaining data, which is currently being lost for the wrong reasons.

Server-side First Party Cookies From Your Own Domain

No third party cookies. No client-side JavaScript cookies (except for consent management). Simply one single secure first party server-side measure-cookie with an anonymized ID that you can set up, manage, and control independently.

Full Data Control

Clean and prepare data server-side before it is sent to a vendor. This includes managing IP addresses, ID hashes, and user agents as well as preventing fingerprinting if needed. Always be transparent towards users with regards to which information is sent where and used for what purpose.

What You Lose

Reliable Geolocation And IPs

The eventual vendor-requests will come from your server’s IP. In the example below, I have implemented a workaround to show how the IP can still be forwarded in many cases, but there are quite a few caveats to consider here.

Vendor-Side Bot Detection

When the server is sending the final request, it will by default have the server’s user agent and IP, both of which are usually the main identifiers that vendors use to detect bots. There are a few alternative solutions here:

Client-side approach: Google reCAPTCHAv3 (great rundown by Simo Ahava here) or other client-side libraries should be able to filter out a significant amount without requiring user input
Server-side approach: You can create your own logic with a custom list of potentially suspicious user agents (very difficult to compile and manage), use services like Cloudflare Bot Fight Mode, or even use the official IAB list for bot classification (expensive). Note that if user agent and IP are successfully forwarded (see the example below), there are instances, where you can still rely on vendor-side bot detection, which eventually is nothing more than a comparison with the IAB list in most cases.

No Cookie-ID On The First Event

This is mainly due to the PHP cookie-logic and the order of requests. The first page view is needed to request the PHP script handling the cookie in case it does not exist yet. However, in most cases, this effect will be obliterated by the fact that you should not drop or use a cookie value pre-consent on the first event anyway.

What You Need

Access to your content hosting server and domain folder (the script containing the tracking logic needs to be pasted here) and some basic knowledge in server-side scripting and cookie management.

What You Must Consider

Compliance And Consent Management

The dispatch of tracking information is now happening behind closed doors, i.e. on your server and not visible for the user. Therefore, it is even more important to obtain consent for the eventual usage of the data you collect.

For the example site, I wrote my own consent management script. The content hosting server is issuing and reading the measure cookie with an anonymized ID. However, I am only forwarding this ID to the vendor as a means of visitor identification if a user consents to it. Based on the user selection, I am placing a consent cookie:

If consent = yes → Use the measure cookie ID in my server-side script.
If consent = no → Don’t use the measure cookie ID. AT Internet has a no consent mode, which excludes the entire tracking event and all associated information when no consent has been given. Google’s Privacy Sandbox will have similar options. This option is also the default for all my requests and is only overridden on the server when consent has been obtained.

What It Looks Like In Practice

My example site has three pages and a button to be tracked: https://measure.hinternesch.com

Client-Side Logic

Server-side analytics client-side process

You can use any kind of custom JavaScript or vendor library to pick up client-side info and create a tracking request. This is the difference to full-on server-side implementations. Data is still collected client-side. The big upside here is that we are still able to capture client information and user interactivity in the browser, such as a button click.
Change the default collection domain and path of the request URL.
GA default:https://www.google-analytics.com/g/collect
ATI default:https://logs.xiti.com/hit.xiti
I replaced it with the address of a file on my own server:
https://measure.hinternesch.com/measure.php
Instead of requesting a pixel from the vendor collection server, I am requesting a PHP file from my own server. Note that all the information of the tracking request will still be transmitted:
-) Client side info in query string, as assembled by my tagging library
-) User agent, IP, cookie information in the request header

Server-Side Logic

The PHP file on my server is never rendered on client-side and has two purposes:
1) It is the vehicle for tracking information — just like a pixel would be
2) It handles data preparation and the dispatch to vendor collection endpoints
Note that the content of this PHP file and what it does is by no means applicable to every implementation. It strongly depends on how you aim to prepare your data, how you manage compliance, and which data you want to forward to which endpoint. For your reference, here is an annotated example of what my bare-bones server-side PHP script looks like:

Et voilà: The tracking requests and the visitor identification cookie are never blocked. Neither by ad blockers, nor by any browser tracking prevention such as Brave’s privacy shield. They are not even picked up by amazing tools like Omnibug.

In a nutshell, we have just moved a large portion of our measuring activity server-side and therefore made significant improvements with regards to data accuracy, compliance, and our possibilities for data preparation — Simply by doing two things: Changing the collection address of the hit and adding a single file to our server.

— If you have questions about the process, need help with implementation, found a bug in the code, or if you’re simply up for talking analytics: Feel free to reach out. —