12 Questions Every Brand Should Ask an Identity Graph Provider
Why a True Identity Graph is Essential to Marketers
The looming deprecation of the third-party tracking cookie has many CMOs and CDOs scrambling for solutions that will allow them to reach and engage their customers going forward. Consumer ID graphs are widely seen as a great solution for a number of reasons, but they do raise some challenges.
First, due to the attention that consumer ID graphs receive these days, many providers now offer them. But are they true consumer ID graphs? Scratch the surface, and we realize that many are marketing databases sporting a new name. Marketing databases are fine if that’s what you’re after, but they’re not the same as consumer ID graphs. Nor do they serve the same use cases.
A marketing database answers questions such as, “Can you help us reach New Jersey residents who are avid gardeners via their email addresses and home phone numbers?”
This contrasts with a true consumer ID graph, which is an instrument deployed to uncover an identity, as well as to confirm that identity is correct using multiple data points. Its role is to identify your consumer and to tell you which channels and devices you can reach them.
Let’s say you want to know more about the users who visit your website. A true ID graph will allow you to associate that IP or authenticated email address to an individual consumer. From there you can associate that IP address to a household, as well as gleam additional insights into that user, such as geolocation, demographics, presence of children in the household, and so on.
The other challenge: Many graphs are often at the household, not the individual level, leading to numerous inaccuracies that can skew targeting and insights. These graphs are built using probabilistic data, and more or less “guess” who the individuals are within that household. Household level or cluster based graphs are problematic because you’ll end up targeting a lot of people who aren’t your audience.
In the face of third-party cookie deprecation, all marketers need identity graphs that begin at the individual level, and ladder up from there. You also need graphs that are based on validated deterministic data, not probabilistic best guesses.
How do you distinguish a marketing database from a sophisticated identity graph? Below are 12 questions to ask any provider under consideration.
#1: What’s your approach to targeting? Do you deploy individual or household / cluster targeting?
Accuracy is key to understanding your audience and to executing successful performance campaigns. If the graph is built on household clusters, you will inevitably make the wrong assumptions about the people who live there and reach a great number of people who aren’t your intended audience. It’s a shotgun approach which simply isn’t necessary in today’s age of advanced data science. In contrast, a graph built on the individual level will add efficiency to your marketing spend and drive performance.
#2: When building your graph, do you use deterministic or probabilistic data?
Don’t believe the claims that there isn’t enough deterministic data available to build highly accurate consumer identify graphs. There is, and it is much more accurate than probabilistic data.
In addition to the inaccuracies listed in the previous question, probabilistic data can lead to brand suitability issues. For instance, if you’re a healthcare company, you don’t want to target consumers with ads for the wrong ailments, or target devices used by children with ads for spirits.
#3: Do you see accuracy and scalability as mutually exclusive goals, or are they compatible with your approach to user ID graph building?
It’s generally assumed in marketing circles that scale comes at the cost of accuracy. That’s true when ID graphs are built on the household level, but not when the starting point is an individual user based on deterministic data points.
The Internet is flooded with deterministic data, from old email addresses used to log into various sites, to IP addresses for devices no longer used. With the right data science approaches, those data points can be combined to validate audience members at scale.
But that validation isn’t a static process, which is why Throtle dynamically maintains and updates our graphs with billions of data points on a continuous basis. This process ensures that our activations and matches are highly accurate. It also settles the point that scale doesn’t need to come at the cost of accuracy.
#4: At a high level, can you explain how you construct your graph, and what processes you emphasize to ensure accuracy?
We’ve created — and actively maintain — the most comprehensive omni-channel ID Graph in the U.S., with over 250MM individuals and their associated identifiers. It’s purpose-built at the individual level with deterministic matching. This means that we corroborate each piece of information from multiple independent sources for precise accuracy.
How do we do that? Each year we curate and verify information from hundreds of data sources on a continuous basis, of which approximately 10% pass our rigorous standards and are incorporated into our ID graph. But that’s just the start. We also refresh and refine our graph with data that is updated daily in order to provide our clients with a current view of your customers.
We’ve found that an always-refreshed source of truth is essential, as individuals change and use many different identifiers over time. In fact, over 75% of U.S. adults have two or more emails, with nearly 20% having four or more. We connect all these emails, social handles, addresses, phone numbers, and other persistent identifiers, deterministically to a single individual. Throtle provides a true omni-channel individual truth set, not just a splice of information about consumers.
#5: Do you use all signals, including expired ones, or do you cull data, such as past email or IP addresses?
We believe that consumer ID graphs must use as many as many signals, past and present, for an individual as possible in order to ensure accuracy. We’re happy to compile every former and current email address for an individual that we can find. Ditto for phone numbers, home addresses, IP addresses, MAIDs, and so on.
There is a lot of old data on the open web. People sign into forms, sites and apps using old data that keeps logging them in. That old data, when combined with current information, helps to confirm identity.
#6: How often do you rebuild or refresh your graph with new data?
Ideally, ID graphs should be updated daily and completely rebuilt monthly. While that sounds like a lot of work, it is absolutely essential. Why? Across the entire US population, people move daily, change their phone numbers daily, start new jobs and are assigned new email addresses daily. If the identity isn’t updated on a continuous basis accuracy is lessened.
#7: Is your ID graph a core asset to your business, or is it ancillary to your core business?
This is a critically important question that every marketer must ask. As mentioned in the introduction, many companies are seeking to get in on the identity game and relabeling their marketing database as a consumer identity graph. But they’re two different technologies and serve two different use cases. They also require two different approaches to data science.
ID graphs must be built from the ground up for the purpose of resolving identity on an individual level. There’s just no getting around that requirement.
#8: Where do you get your data? How many data providers do you partner with?
In the identity graph sector, overlap is good. Overlap means plenty of opportunities to either validate or toss out suspect data. A good identity graph provider will partner with dozens of data providers, ranging from postal to email to MAID, to CTV providers.
#9: How transparent are you? Are you willing to license your graph? To share the actual data with clients?
Be wary of any ID graph provider that won’t share their actual data with you, as it’s a sign they’re hiding something (typically, such companies don’t want the market to know that its graph is built at the household, not the individual level).
A company that stands by its accuracy claims is one that isn’t afraid of transparency across its methodology, reporting and pricing.
#10: How do you verify that the matches you have are correct? What’s the science you use to verify?
Accuracy demands deterministic data, which in turn, demands significant external and deep data science needed for validation to answer key questions, such as:
How do you know that it’s the right IP address associated with that individual?
Do you know that it is the right email address for that individual?
Do you have any other variables around those data points that give you the confirmation and the verification that that data is accurate, and is accurately corresponding to an individual that is accurate?
Do you have that individual anchored at a physical home address that’s current?
Do you have their date of birth correct?
Are their phone numbers correct?
#11: Do you offer match and identity tests?
At the end of the day, partnering with an ID graph provider means making an investment, and you want to know upfront whether that investment will deliver the dividends you need. You need a data test from an overlap and matching perspective, and to ensure accuracy.
Make sure your provider is willing to structure the test with your specific business goals in mind (e.g. shed light on website visitors, match MAIDs to a household).
When conducting tests, realize that you will need to spend some time cleaning up the data, getting rid of duplicates and filling in the gaps. This is normal.
Once the data hygiene process is completed, your provider should run its match process and report the results back to you for evaluation. Or, if you want to test a specific subset of your customer data, the provider should be willing to send you the data so that you can conduct your own evaluation.
#12: Is your ID Graph anchored at the physical address?
It’s difficult to achieve accuracy without anchoring identity at the individual’s physical address. People may or may not have a smart TV or a tablet, but they need to live somewhere. Physical addresses are extremely useful in associating other data points to an individual.