I thought Google Tag Manager (GTM) would have a built-in method to hash Personally Identifiable Information (PII) – but it doesn’t, and I couldn’t for the life of me find a simple guide on how to do it.
Plenty of guides on how to obfuscate PII in URLs and query strings, even a couple guides on encrypting – but nothing on hashing.
Isn’t encrypting and hashing the same?
In short: no. Encrypting implies that there is a method to decrypt the data again, where hashing is a one-way operation. Keep in mind that with GTM, any data manipulation happens client-side using JavaScript. Because it’s happening client-side – you have to hand over the encryption mechanism and key to the end user. Not exactly fantastic for security…
Why would I want to do this?
Any time you want to send PII data to a 3rd party vendor, it should be hashed using an agreed-upon hash algorithm. As an example: Facebook needs a hashed version of a user’s email address to do advanced matching. The Facebook Pixel tag will hash the email address for you – but if you’re using the pixel image instead, you need to hash that yourself.
Note: Google Analytics does not allow the use of PII data, hashed or otherwise. This is probably not a rule you want to break.
How do I do this?
There are three steps involved:
- Declare the incoming data as a Data Layer Variable
- Load a library for the hash algorithm you want to use
- Create a Custom JavaScript Variable to hash the data
Declare a Data Layer Variable
Your CMS will need to surface the PII data (such as an email address) into the dataLayer
object somehow. How that happens is outside the scope of this article, but the Google Tag Manager for WordPress plugin happily adds the email of the logged-in user in a data point called visitorEmail
.
Once the PII data is in the Data Layer, it needs to be declared in GTM:
- Go to the container for your site and select Variables on the left
- Create a new User-Defined Variable
- Select Data Layer Variable from the list of variable types
- For Data Layer Variable Name, enter the name of the variable exactly as it appears in the
dataLayer
object
For PII with alpha characters (such as email addresses), you also need to lower-case the string. While you’re in the variable configuration screen:
- Open the Format Value section
- Check the box for Change Case to…
- Select Lowercase in the dropdown
Load a JS Library to Hash Data
Note: Do not write your own library unless you are a cryptographer with years of experience. Do validate that the library you choose does what you expect.
I have minimally tested the js-sha256 library and have confirmed it correctly hashes data. It doesn’t appear to do anything nefarious, but I am not a programmer – so you should do your own testing and code review!
Next, whatever you’re using needs to be hosted somewhere – hosting it yourself is an option, but I prefer using an established CDN. The js-sha256 library is available on Cloudflare’s cdnjs CDN:
https://cdnjs.cloudflare.com/ajax/libs/js-sha256/0.9.0/sha256.min.js
If you’re using a CDN-hosted library, you should make sure you’re using a specific version of the code and that there’s an SRI Hash in your tag configuration to make sure someone doesn’t tamper with the code.
Here’s how to load a library in GTM:
- Go to the container for your site and select Tags on the left
- Create a new Custom HTML tag
- Go to the SRI Hash tool linked above and create an SRI Hash for your library
- Edit the output for use in a Custom HTML tag (see second code block below)
For the js-sha256 library hosted by cdnjs, the SRI-hashed version looks something like:
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-sha256/0.9.0/sha256.min.js" integrity="sha384-2epjwyVj8M4n8AweIsY7SKPSJmqBBBkmksXvkmtYORfxPS1I4NZE/+Ttk/9gCELG" crossorigin="anonymous"></script>
To use this in a Custom HTML tag, you have to use JavaScript to construct the script tag (for some reason, GTM thinks the above isn’t valid HTML).
Here’s what the result looks like – note how the src
, integrity
and crossorigin
attributes are set:
<script>
(function() {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://cdnjs.cloudflare.com/ajax/libs/js-sha256/0.9.0/sha256.min.js';
script.setAttribute('integrity','sha384-2epjwyVj8M4n8AweIsY7SKPSJmqBBBkmksXvkmtYORfxPS1I4NZE/+Ttk/9gCELG');
script.setAttribute('crossorigin','anonymous');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
You can optionally set this tag to load once per page – so long as the function is available, it should work fine for subsequent events.
Create a Custom JavaScript Variable
Last but not least – hash the data. When hashing data, it’s critically important to make sure that the source data is what you expect before hashing. After all, there’s no way to check after it’s been hashed!
In my example of hashing an email address in the visitorEmail
variable, I want to run some very basic tests on the string to make sure it looks like an email address. In the code block below, you can see the test called out – you should do something similar for other types of data.
For the final piece – set up a Custom JavaScript variable:
- Go to the container for your site and select Variables on the left
- Create a new User-Defined Variable
- Select Custom JavaScript from the list of variable types
- In the Custom JavaScript box, enter the following:
function() {
// Test email address first
function emailIsValid (email) {
return /\S+@\S+\.\S+/.test(email)
}
// If email address is valid, hash it
if (emailIsValid({{visitorEmail}})) {
var hash = sha256({{visitorEmail}});
return hash;
} else {
return undefined;
}
}
Next steps
Test things! Use the Preview function to make sure the variable you created actually contains a hash of the PII data you want to send to the vendor. Test with known bad data – does the code in the Custom JavaScript variable catch it?
Once you’re satisfied that the code is actually working properly, map it to the vendor in your tag configuration – safe in the knowledge you’re not sending PII data in the clear! 🙂