Unifying the User for Marketing Attribution
FIRM FIXED IDEAS
Google Data Studio
How to unify marketing insights from silos of data so you have a complete picture of the customer journey.
To unify the user for marketing attribution requires a unique identifier across the discrete silos of data in order to merge them. Sometimes you have a lot of data that can be merged but possibly not 100% of it. The good news is that most analytics and data collection tools have a unique identifier. The bad news is that few tools have the same identifiers with the same data in those fields such as: emails, cookie id’s, phone numbers, addresses, identification numbers, or semi-randomly derived values. This sounds complex and you might be asking yourself, “How do I work with this in my situation?”
(Tip: You will need a field in your data to account for every system or tool’s unique identifier).
“How do I work with this in my situation?”
First, it’s helpful to start with where you want to have your central space of truth. Where do you want to go for the data? What system do you have that is equipped to contain all of the data you need? Data collection is the first step towards unifying the user and is why most marketers lean on their CRM or marketing automation tools as those systems tend to be robust enough to handle it. In my opinion I recommend a CRM (customer relationship management) tool.
Second, you need to take inventory of all the data you currently have access to. Do not focus on all the data you could have or might have… start with what you have today. You will need to inventory all of the data that touches the customer either directly or indirectly. You might want to prioritize the data that most directly interacts with the customer (where you can see the customer in your data) and sources that are easy for you to access.
Third, you need to model your data inventory to draw out some ideas for how it will all merge together. Have a whiteboard, mind mapping, or sketching software then use that to help you. To model the data you need to think of everything in terms of tables (think of excel). Rows will always correspond to your unique identifiers (typically over time). Data that is too large will make it impossible to work in software like Excel because it will crash on you. This is why you can’t just rip all of the data from everywhere – you need to begin with a model. You don’t need all the rows of data but you do need all of the columns (the metrics and dimensions) that add context to your data. With this model you can see the unique identifiers that cross silos and some that cannot.
Fourth, you will almost need to write a recipe of sorts that details what needs to be merged with what, when, and in what order. When two identifiers can match up between two tables of modeled data you will end up with a single table that includes all of the information combined. For example say you have a table of names and emails, and another table with phone numbers and emails, when you combine them you will have a table of names, emails, and phone numbers. The emails column was the unique identifier that connected the two data silos and when they matched up you will be able to have phone numbers alongside the emails. When you have data you feel that you don’t need you do not need to bring it along – but this is why we model so you can see ahead of time what work needs to be done.
How does this whole process look?
Let’s go through a hypothetical example using some free tools and how you could merge that data into a business intelligence tool (in this case we will use Google Data Studio). For our example let’s assume you have the following tools with data in them:
- Google Analytics (Website data)
- Email (Could be any email platform)
- Digital Advertising (Could be any advertising platform)
- CRM (Could be any customer relationship platform)
Step 1 – Decide what your ultimate source of truth will be.
In these tools we could have (3) possible sources of truth: Google Analytics, Email platform, or the CRM. I recommend not using Google Analytics because you are not allowed to store personally identifiable information that is not hashed or stored in a private way. Depending on how robust your Email platform is you could use it as the source of truth but typically these systems are not built with this primary function. Feel free to use anything you can to achieve this but for this example I will consider using the CRM.
Step 2 – Take inventory of your data sources to get our metrics and dimensions.
Google Drive is free if you have a gmail so for this example I’m creating a gmail and jumping into Google Sheets. From sheets (similar to Excel) I am going to use this to inventory the data and do all of the merging with the index match function galore (see this article if you want to know how I do this).
For Google Analytics you can use the free analytics addon to pull your data. I’m going to save you some time and let you know that if you use the standard setup of GA you aren’t using the User-ID view and you are not hashing or storing your personal data into custom dimensions/metrics. Lets deal with the basic setup here – in which case the unique identifier is the cookie id or a UTM parameter. The trouble is, if you aren’t capturing this into your Email platform or CRM with a form field that picks this up you will not be able to use this data to merge. You can of course see data in aggregate like X visitors came from a specific email (assuming you used UTMs) but you cannot see this by each unique customer unless your systems are setup to account for unique user identification. By default they are not setup this way.
So what now? Your Google Analytics standard setup is valuable for website analytics but does not come out of the box with a unique user identification system. You have to do some work to set it up that way. I wanted to include this mention but if you have setup your analytics with a way to match the user to a specific ID that is stored in your CRM/Email provider because you capture it there as well – then power to you! In this case you can use the analytics addon to pull any metrics and dimensions (other than the identification field) to add website analytics to your model.
For Email, you have the easiest source of data yet because you at least have the email of each unique record in your system. After that any data you have corresponds to information that adds context to that email such as: first names, last names, company information, demographic information, any activities/interactions that took place, and possibly commerce related fields. Most email platforms have a way to export data but if you can only download the aggregate data (leaving out the individual emails) then you will not be able to do any trying of it’s data to other systems.
Email to landing page insights might also exist in your email platform but likely will not exist in Google Analytics (tied to a user) by default because Google Analytics removes personally identifiable information like this on purpose.
Digital Advertising data collection of the unified user is similar to email – whatever platform you are using for forms is going to have the user identified by emails. If however, you are not using an email platform like this and have your CRM attached to the forms then you will look there. Either way the only way to attribute any digital advertising activities to users is from the capture point such as a form, a chat, an email, or a phone call. The point where a user becomes known is often described by many platforms as a, “Conversion.” By itself, saying something is a conversion in marketing is not very descriptive.
The last and hopefully the easiest place to get customer data is the customer relationship management platform (aka the CRM). In order to model the data you need to work with the data it it’s tabled format. So export all of the user-related data from Google Analytics (if the unique identifier exists), Email, Digital Advertising, or CRM platforms with at least the relevant dates and the unique identifiers. All the other metrics, dimensions, and context of your data is not necessary for modeling.
Step 3 – now we will model the data you’ve exported and align their unique identifiers.
If your data is not too large you can hold the data from one table in the first Excel tab, and the data from another table in the second Excel tab. Assuming this is the case, we are going to be using Index Matching to merge the second table into the first table (or vice versa). The two data tables need to have at least 1 of the unique identifiers in common. I would start with a table that has the most data potential (like the CRM) as the first tab and merge all data into it.
(Tip: If your data is too large for Excel or Google Sheets, then you need a server (SQL-Lite) can use a desktop and provision a file for this purpose. Is it worth it? If you are working with that much data you should really consider what value improving data insights could provide).
As you work through the data modeling you need to remember (or better yet, write it down) all the steps you took to merge the data. The index data (first part of the formula) will contain the data you want to add from the second tab into the first tab. The match data (second part of the formula) will contain the data you are comparing to see if there is an exact match of the unique identifier. For example, the index data could be the names, and the match data could be comparing emails to ensure they match.
(Tip: This is where you might realize that the data in one matches the other but there are spaces or formatting issues in one table. Therefore you might need to use format cleaning in the Excel document or change the way data is collected to make sure it’s scrubbed prior.)
The cleanliness of the data you are merging is a major factor. You need to rinse and repeat the index match formula for each column of data in the second tab that you want merged into the first. At the end you will end up with a much larger first table then you started with.
(Tip: After you are done merging the tables copy and paste the values but do not paste the formulas, paste just the values so no formulas exist in the cells but just the values.)
Now that you’ve done this for two tables, let’s open a third tab (if its possible), or delete the older second tab and start another second tab (that is fresh and empty). Time to merge the next table into the first – so make sure that the tables both have at least 1 unique identifier. You will continue this for all data tables you wish to merge until you end with a final table of glory… hah.
Step 4 – the last step is to write down your process and make sure you can account for everything.
Writing down the process in this way prepares you to work with Structured Query Language (SQL) the basic language of processing data in most if not all systems. You are working in Excel because that is a great place to start but you are learning a skill that can be applied to working with large volumes of data using a SQL-based server. The merging in SQL is done using Joins just as it is using Index Match in Excel.
Assuming you got this far now it’s time to have some fun in Google Data Studio. You do need a Google Account, but it’s easy to create one (with a strong password) and then jump in there:
Get started by saving your final data table as a (.CSV) file and then upload it as a data source into Google Data Studio.
After your data is in there create a report. You can jump into a template one designed for what you are trying to measure but usually you can open a blank template and then add charts and data visualization to play around.
I recommend that at the end of the day you watch some videos (take a training) on how to get the most out of Google Data Studio. At first some of it is daunting but it’s a really great tool and one that has amazing features. Most of what you learn you can even take with you into paid platforms.
Here is a decent Google Data Studio learning video from Google:
I wish I could expand more into different data types but this can be so subjective for different businesses. At some point I will tackle the user-id in Google Analytics but it is well covered in other blogs. My experience with user-data in Google Analytics is that while it is nice to have it also means some of the elements of Google Analytics function differently.