Artwork for podcast The Conversion Show
Navigating the Wild West of Visitor Identification with Larry Kim of Customers.ai
Episode 3114th March 2025 • The Conversion Show • Erik Christiansen, CEO & Co-Founder of Justuno
00:00:00 00:39:21

Share Episode

Shownotes

In this conversation, Erik Christiansen and Larry Kim delve into the complexities of the website visitor identification industry, discussing its current state, the accuracy of data, and the implications of using such data in marketing strategies. They explore the differences between first-party and third-party data, the challenges of ensuring data accuracy, and the potential costs associated with using inaccurate data. The discussion emphasizes the need for transparency and understanding in the industry, highlighting the importance of accurate data for effective marketing.

Chapters

Understanding Website Visitor Identification Technology

Transparency and Accuracy in Visitor Identification

Challenges of Data Accuracy and Its Implications

Establishing a Baseline for Testing Accuracy

First-Party vs. Third-Party Data

The Importance of Accurate Data in Marketing

The Cost of Poor Data Quality

Match Rate vs. Accuracy in Visitor Identification

The Dangers of Relying on Inaccurate Data

Industry Reactions and Future Directions

Concluding Thoughts and Lessons Learned

Transcripts

Erik Christiansen (:

The popcorn emojis were out yesterday for anyone that was paying attention on LinkedIn. Larry Kim posted a somewhat controversial post about the website visitor identification industry and his report. I had a chance to sit down with her yesterday and we dug into this further. And what I uncovered and what you'll hear is we'll learn about the methodologies.

of the test, gained some clarity. And you know, if you don't know much about the website visitor identification industry, it is somewhat as Larry calls it, the wild west still.

So let's hear what I learned, what the industry learned yesterday, and I hope you enjoy the show.

Erik Christiansen (:

All right. Thanks for joining us today, Larry. Today is Thursday, March 13th. And earlier this week, you launched the State of the Website Visitor Identification Industry Report, which is being talked about a lot right now. So I thought we would take today to maybe reflect back what's being talked about, also address and talk about what website visitor identification is, and kind of go from there. How does that sound?

Larry Kim (:

Awesome, thanks Eric, great to be here. maybe we'll get to start at beginning. Website Visitor ID Tech has been around for, you know...

for five, six years, it's kind of a niche. And basically, the promise of website visor ID tech is that it can sort of ingest various signals to determine the identity of the person on your website, even without them having to fill out a form. So it sounds quite magical. And...

You know, it's also quite opaque and mysterious. And so, you know, after a couple of years working in this business, we felt it was, you know, the right time to try to help shine a light on some of the issues and shortcomings of the industry in order to elevate transparency and discussion around, you know,

how this data format does have issues and what to do about it.

Erik Christiansen (:

So you mentioned transparency, and I know that one of your competitors had immediate response to the report.

Larry Kim (:

you

Erik Christiansen (:

What surprised you about the response you've received initially from the report?

Larry Kim (:

So we can talk a little bit about the report first. It's the kind of the quick summary was that the website visitor ID, like if you look across different vendors, is not completely wrong.

But there's a good amount of error. And it varies by provider, and it varies by website, traffic composition, and stuff like this. But it's mostly wrong across the board, with the exception of customer's AI.

But typically what we see is 5 % to 30%. There's a range. But certainly mostly wrong. And so the question is, how does this become a multi-hundred million dollar business category if there's issues with this?

And I think the issue is that it's, while it's mostly wrong, there is some nuggets in there. Like it's not 100 % wrong. And so typically what vendors do is they showcase the...

the bounty, like the goals, where they nailed it. And they'll kind of compare the number of emails provided. There will be a big list of website visitor lists that is kind of provided by these different providers. And the...

They can then go into a Klaviyo or some kind of a Shopify store and then compare, match up the emails to see, to see, you know, what...

which one of these emails converted. generally it's not zero. It's actually profitable. that you can generate, like these things aren't that expensive. It's like single digits, thousands of dollars a month or something like this. for a brand to generate maybe 10k or 20k, like some multiple of what they're spending on the software. it can be...

profitable. But I think the issue that comes up here is that's not really vouching for the accuracy of underlying data. It's only highlighting that a few of the many thousands of emails being provided actually purchased. And there's sort of this more pernicious damage that occurs when you are ingesting too much of this...

kind of toxic data feed, where, you know, let's just say, for example, let's it's like 15 % correct, 85 % wrong, that that might not be a problem for deliverability or reputation on day one, but that can fester, and that can result in, you know, all sorts of problems down the road around like deliverability and just kind of privacy concerns, I think.

compliance issues. And so, we are actually victims of visitor ID.

low accuracy. When I first heard of this many, years ago, we licensed data from various providers in this space and we're super excited to jump on this bandwagon. Unfortunately, as time went on, it became clear that we were sold a bill of goods that...

that the promises weren't actually true, that there was this good amount of error. And over the last many quarters, we've been really wrestling with, from a fundamental first principles perspective on how to deal with these issues. And the answer that we came up with was, this has to be fundamentally rebuilt from the ground up. Just comparing a Google search engine to a Yahoo search engine, they're completely different types of products in terms of how they're built and architected.

And that's kind of what we're showcasing here today. There is a ton of error in here. We're going to make it super easy for brands or agencies to accurately quantify the error and just run data tests in a very open and transparent way. I'll stop there. Any questions on that?

Erik Christiansen (:

Well,

speaking of like open, open transparent tests, that's something that everyone's trying to understand your industry, your specific website, visitor identification, how, how your companies kind of report success. How would you go about an open transparent test against other platforms? Uh, given that, you know, it's been stated that, you know, what, you know, what you're reporting on, um, isn't

You're not digging deep enough to compare the data set, i.e. third party reclaimed versus third party grow. What would be, and just looking at all the different comments and the different threads, what would that look like? How would you do a transparent side by side comparison?

Larry Kim (:

Sure.

So I think it's a challenge on one hand to know whether or not a provided email really is the person who visited your website or not. And there is a methodology, like a protocol for testing this out, and it's fairly straightforward if you think about it. It's as follows. You first have to establish a baseline of truth.

So typically, like an e-commerce brand doesn't know who everyone is on their website, but they do know who their customers are. Like the moment when Eric purchases something, like right at that instance of time, you actually have a pretty good handle on who that person is. Or when someone feels that I just do know form, like they're, you know, it's a very high probability that that's the person, okay?

And so that's going to be a subset of your traffic. won't be a hundred percent. Maybe it's like five, six, seven, eight, nine, 10 % depending on your conversion rate. it's, it's some component of your website traffic. And, and, um, what you, what you can do from a, you know, statistical perspective is you can just rewind the tape and look to see, you know, you know, what did vendor A, vendor B, vendor C, vendor D have to say on this person before sort

the grand reveal, the purchase. And this is kind of an abstract protocol. It's not a product. It's just a testing framework that can be implemented by any brand or any agency if they so desire. They could spend a few minutes and set that up. all we're doing is we're saying like...

you know, if you want help on setting this up, we can help you set that up. And it's not something that other vendors lead with. They would much prefer to showcase sort of the correct guesses, if that makes sense. But what I think what we're saying is like that matters. But what also matters is how many guesses are we making that are wrong in order to get to those?

know, subset of purchasers. And maybe it's like you're perfectly okay with a 90 % error rate or whatever, but it's still good information to know. There's no downside in knowing what the accuracy is, if you hear what I'm saying here.

Erik Christiansen (:

So as the audience tries to understand how your platforms function, we're talking about pre-match and post-purchase match. Are you suggesting that?

The test is, let's start tracking visitors now, and you're building those profiles, and then when they actually purchase with an email address, we're gonna compare the notes of what the system had.

Larry Kim (:

You got

it. It's step one, establish a baseline of truth. Or actually, the first one is get one or more data providers. And you can even do it on just customer's AI. You don't have to benchmark this against some other product. If you want to know what the accuracy of customer's AI is on a site, we'll happily set that up for you.

But, and we're the only ones who will offer this, okay? But yes, in general, you set up one or more ID providers, you remember the guesses, and then a subset of them you'll be able to validate, the ones who end up purchasing.

And then you just compute, know, vendor one made this many guesses on the baseline of truth, like the subset of known visitors, and they were right this many times and they were wrong this many times. And typically what we're seeing is that these vendors are mostly wrong. It does vary, but it's not.

God there's like recordings out there saying like 98 % correct and blah blah blah like it's not it's not quite There's certainly a lot of hype out there we're not even saying we're correct all the time like we're typically you know, you know 60 70 80 Percent itch ish like like we're not saying we're 100 % but like whatever that number is Let it be known is all we're any questions on the protocol

Erik Christiansen (:

So.

So my thoughts are trying to understand the difference between matching post and pre and what is happening pre-purchase and the identification at that point because you're identifying and then directly bringing them into your CRM and sending messages to them. Can you talk about that aspect of the business?

Larry Kim (:

So once someone's purchased, that's first-party data. So that's pretty safe. They've opted in basically, double, triple opted in because they're a customer. But the promise of this technology is this idea that you could get more leverage by

also identifying a meaningful fraction of the people who haven't yet purchased. And so that's kind of third-party guesses. And, you know, that's more tenuous from a accuracy perspective, like the accuracy of someone who purchases something is, you know, it's not 100%, but it's like probably very damn close. And the...

You might use a throwaway email or something like this. But a guess, I guess the issue here is that the vendors in this space, as far as I can tell, are being less than fully transparent around what the accuracy rate of those guesses are. I think it can get problematic if, say it's like 5%, 10%, 15%, 20 % accuracy.

and you're pushing those guesses directly into like a CRM or ESP, email service provider, and that's going to have certain implications from a compliance perspective as well as a deliverability perspective. then like, you know, we want people to fully understand, you know, what they're signing up for in terms of, you know, match rate.

and accuracy in order to maximize the chances of success with this interesting, valuable data set.

Erik Christiansen (:

And as I read through the report, on page 21 is a question that arose for me and maybe some of the listeners is if someone's purchasing, they're going to use their accurate information, be it their address, their email address. They might have multiple emails, but it's one they're using that they want to get correspondence from. So at that point, it's first party data, zero party data, depending on how much information you're getting from them. They're self-identifying.

Larry Kim (:

Yes. Yes.

Erik Christiansen (:

This is from an outsider kind of looking at it. So obviously that data is going to be 90 % plus accurate.

Erik Christiansen (:

Can you dig in a little bit more on your test methodologies as it relates to this?

Larry Kim (:

The goal is not only to match to known purchasers, but the known purchasers is the only subset of website visitors for which you have conviction on who those people really are. And so the data test is basically an extrapolation. It's saying, if you correctly guess, you know,

Let's give an example. You have 100 purchasers, okay? And for that set of 100 purchasers, a vendor is not gonna make guesses on all 100. Sometimes they say, I don't know. And sometimes they'll say, I think I know who it is. So you have 100 purchasers, maybe vendor A guesses 20 times. And if vendor A gets two of those 20 guesses correct, and you can validate that because you know who it is later, okay?

then that two out of 20 guesses corresponds to a 10 % accuracy. Are you following me so far, Eric? So that is, it therefore follows that because the accuracy within the kind of subset of website traffic that was known.

you know, these anonymous guesses was 10 % correct. It follows that all the other guesses of people visiting the website who didn't purchase would likely be in a similar accuracy percentage. Do you understand that?

Erik Christiansen (:

Okay, yes, and that is our anonymous visitor identification.

Larry Kim (:

That's correct. These tools aren't necessary. They're raison d'etre. They have one job, okay? And that job is not to ID people that you already know. It's actually to ID people you don't know. But the only way you can test this is by checking the guesses against the known people on your site.

from which you can then extrapolate an understanding of what the accuracy rate is likely to be for the unknown visitors. typically, what I'm telling you is it's lower than what you might imagine. when it says, like these commercials, they just say, you know, we're gonna tell you the people on your website. It does not say that they're gonna have to make 10 or 20 or, you know.

a high number of guesses in order to get to one where it's correct. Do you follow me here, Eric?

Erik Christiansen (:

Yeah. If, if you were to balance that out with, know, I think the industry understands that this anonymous visitor identification is you're not getting, you're never going to get a hundred percent. It's whatever you can get is a win. And you know, the, numbers being thrown out, you know, in for the anonymous visitor being, you know, what is it, you know, if you're in, you know, in the teams, you should be happy.

What are you suggesting you should be happy with?

Larry Kim (:

I think we're an important concept. You're talking about match rate, not accuracy. So match rate is, there's kind of the conventional definition of match rate, which is if a 100 visitors comes to your website, what percentage of visitors?

will the identity provider be able to hazard a guess? And typically, like you're saying, 15%, 20%, or 10%, there's a range. Some of these tools even claim 30%, 40%. But that's not the accuracy rate. That's just the guess rate. Are you following me, Eric?

Erik Christiansen (:

Okay, guess rate versus match.

Larry Kim (:

It's saying we will make a guess 25 % of the time or whatever. So match rate you should be thinking of like guess rate. And so what we're trying to do is figure out, okay, so you've made guesses on 25 % of the traffic, but what percentage of those guesses are actually the right person? And what I'm telling you is,

it's quite low. Most of them are wrong, generally. And the way that you validate this is you compare, you establish a source of truth, which is like the people who have purchased your products. So those are kind of sessions on your site where you actually know who it was. And then you check to see on any of sessions that are known,

Did any ID vendor bother to make a guess on who this was? And if they did, check to see if it's correct. So there's a way to compute accuracy by comparing whether or not the guesses were correct or not.

Erik Christiansen (:

That's something that gets into that I have always cared a lot about is wasted money and costs. And oftentimes, know, digital marketers, they have budgets that they're not personally really responsible for and being siloed, they might just have one KPI, main KPI they're tracked on. Meaning, you know, if you're getting match rates and able to send emails and

make money, digital marketers are going to be happy. But there's a cost to this, which is you're filling your CRM with that data. You're having to pay per contact. What can we, let's talk about the, the ill effects of what I like to call steroids. There's steroids and everything. And when it lead with lead capture for me, you can do a full screen ad or full screen pop over.

Increase lead capture, get double digits, but you're affecting the conversion rate or the customer experience. And what are we talking about? You have a whole section, your third chapter talking about the true cost of great data deception. Can we talk about that for a bit?

Larry Kim (:

Sure, I think the reason why this data is so valuable is not just for analytical purposes and just looking at a list of website visitors. think marketers want to use this data to try to squeeze more sales out of their existing traffic. And it's a great idea. So typically what people do is they take this data.

And there's a couple of different modalities. Like typically, you act on the data in one of a few ways. You either email those people, or you can send that data through to an ad platform to sort of re-enable certain ad remarketing use cases. Because...

You might be wondering, well, why do I need to send data from my website to power ad remarketing use cases? And the reason is because remarketing is completely broken. That was relying on a Google AdWord or Facebook Pixel that was sending back visitor ID data from your website to the ad servers. But all these browsers are...

clobbering all these cookies and everything. that data pipe between your website to the ad server which was sending ID data is broken. So typically those are the two main modalities. There's other things you can do like mail postcards or whatever, but the main reason why you want this data is to either send them an email or send them an ad, okay?

When you have bad data, and if it's not just like 5 % wrong, if it's like 5 % right, OK? Well, the spam threshold for bulk email sending from Google Webmasters for Gmail is very low. It's 0.3%.

You can see how super, super high error rate might result in negative externalities, if you will, domain sender reputation issues, email visibility issues. Additionally, if the error rate is very, very high,

and you're sending this data to power your ad campaigns. Well, all of this ad campaigns are, you know, audience plus and all this stuff, like all this AI targeting. So AI is...

It's kind of garbage in garbage out. If you're sending them these phony profiles of random people who never really visited your website to target, it's confusing the ad targeting systems of the ad platform.

They're trying to find the signal and the noise to hone in on your ICP and get your ads in front of right person. And you're sending this data to the ad platform saying, no, no, I want this 90-year-old granny or something. Assuming that you're not selling to 90-year-old grannies. So it's like undermining the AI targeting of these ad platforms.

you know, jeopardizing the health of your email sending reputation. And so if you're going to pump stuff into those critical assets, it is imperative that marketers understand what the accuracy is of these things. Now, there's not an expectation for marketers that anything is 100%, but I I think people might have a problem with.

like

Erik Christiansen (:

So this is a steroid. It works.

Larry Kim (:

It works a small fraction of the time, which is usually several times fewer than the... You for every one that's correct, there could be somewhere between two to 20 wrong guesses. see what saying? And people are celebrating the correct guesses without fully internalizing and understanding...

the magnitude of the wrong guesses and what that could do for your marketing channels in terms of.

Erik Christiansen (:

Have we seen,

do we have any examples of the side effects to this from any ESPs that you're working with or any specific clients?

Larry Kim (:

One, we won't name any vendors here, but one thing that happened in the industry last year was that a major ESP started cracking down on the usage of one of these platforms. it was like they were being threatened, the future of their...

email accounts were being threatened by CSMs from that ESP. Like, if you use this, you can't use us, we'll kick you off. So that actually happened. In fact, people who know, know this is one of these this is one of these worst kept secrets.

in the industry of website visitor ID. People who understand this kind of understand what I'm talking about here. But I'm just afraid that there's a much, much larger constituency here at stake that don't understand what I'm talking about here. And that is why we're trying to make this an issue. And we are, you know...

trying to awareness and transparency of the challenges, and also to promote different strategies and methodologies for getting a handle on data accuracy and then devising strategies to make imperfect data still work. But in a modality where you have full understanding of, you know,

what kind of error we're dealing with here. Is it?

10 % error or is it 90 % error? It matters in terms of how you intend to use the data.

Erik Christiansen (:

When we hear retention uses the terms first party reclaim and then third party grow, they're openly saying, yes, their grow is a 13 % accuracy.

Larry Kim (:

Yeah.

Erik Christiansen (:

Is that... is that accurate? Is... that... I'm just trying to understand comparing apples to apples.

that they're not saying they're 90%. They're saying, the grow part of the tool is only 13%. And those are the numbers I've heard from others in the industry looking at identifying anonymous visitors, et cetera.

Larry Kim (:

I think what they're saying is...

Again, I don't think it's necessary to talk about specific people or products because what we're talking about is an industry-wide problem. And I think that the explanations being given by leaders of these companies are quite confusing, quite ambiguous. Not because I don't understand it, but because I think they are...

they could be more clear. One leader put out a statement saying that, know, pretty much saying that their accuracy rate is quite low, okay? But what he's saying is that that product, like the guessing product, is only, I think he said like 14 % of the value or something like this.

But do see what he said? He said 14 % of the value. It's still making, that's not accuracy. Like he's trying to conflate a bunch of different topics here. These products still make thousands and thousands of guesses. Okay. And what matters at the end of day is are these guesses right or are these guesses wrong? And that, because they, they,

directly inform sort of the downstream campaign metrics. So the downstream campaign metrics would be like purchases, would be things like complaints, click through rates, you know, and, you know, I think this is, I would categorize it as a bit of a deflection, you know, to try to move the target away from accuracy towards like, you know, acknowledging that

you know, his product has significant accuracy problems, but saying that there's another product that they also have which is actually less used. They're describing it more as a first party data product, which I think is a different problem set. Like there are other companies...

solving first-party data challenges like, I don't know, Blot Out or Elevar or... see what saying? Like, this is... It's just a conflation of a couple different concepts. First and third party data, which I think he's kind of... Like, there's enough people who know, who think that...

their definitions of first and third party data are a little wonky, and conflation of value versus accuracy.

Erik Christiansen (:

Why don't we leave it at that.

Larry Kim (:

This has been a great discussion. These are challenging topics. The core message that I just wish to conclude is, it's buyer beware. It very much is the wild-out last in this website visitor ID market. I think that there is an expectation that...

have and then there's the reality of what is being delivered and all we're trying to do, mean just make it very simple, all we're trying to do is shine a light on kind of what's in this box. Is it you know 10 % correct? Is it 30 % correct? Is it 70 % correct? Customers deserve to know what the accuracy is of their website, their identification data and we want

Erik Christiansen (:

Can we maybe finish up? I ask you, know, Larry, you've been in this industry a long time. I know you back from, back from, um, why am I blanking? Or the stream days. Um, well respected in the industry. A lot of people look to you as like, as a mentor learning from you, what would you say as, as you and I continue to build our, our, our new companies, different companies.

Larry Kim (:

Word stream, word stream.

Erik Christiansen (:

In the last couple of days, have you personally learned anything from this experience putting out this report?

Larry Kim (:

you know, if you uncover certain excesses or certain behaviors or certain truths about an industry, there's going to be people who want to maintain that status quo. This could be, you know, the affiliate marketers or influencers who were...

pumping these products. could be the agency partners that recommend this kind of software to their clients. I guess we knew that people would be upset about it. But I think what I've learned is just...

We need to elevate the discussion, not make it personal. I welcome these types of discussions to, the industry deserves better, is my message and is my learning. There's plenty of...

It's hard enough to be a brand, know, to add prices going up and, you know, there's just so many challenges. Channels are getting harder and harder. at the end of the day, we're just trying to increase this transparency and share best practices to make the most of this unique data set and space.

Erik Christiansen (:

Thanks for your time, Larry. I really appreciate you sitting down. I know it's been a busy week for you. So I really appreciate it.

Larry Kim (:

Thanks.

Thanks, Eric.

Erik Christiansen (:

And there you have it. Hope you enjoyed listening to the conversion show. Join me next week where I have Adam Tuttle from Active Campaign visiting. I've known Adam for 12 plus years. He's been at Active Campaign from the very beginning. One of the first 10 employees. We talk about his time in Australia. He spent three years there building up that market. He's filled every role and now he's really focused on operations there. So be sure to like and subscribe and we'll see you next week. Thanks.

Links

Chapters

Video

More from YouTube