Listen Now
Are you really measuring your ads the right way? In this episode we dive deep into the world of incrementality testing—what it is, why it’s so important, and why most brands struggle to operationalize it effectively.
We break down the biggest mistakes marketers make when it comes to ad measurement and how to avoid them. Learn the secrets to building a consistent testing system, understanding your ad performance, and driving better results for your business.
Key topics covered:
- What is incrementality testing?
- Why it’s so hard to apply test results consistently
- How to turn data into actionable insights
- The tools and frameworks to operationalize your findings
If you want to stop guessing and start making data-driven decisions, this episode is a must-watch.
Show Notes:
- Check out Motion’s Creative Trends 2025
- Get our Prophit System: prophitsystem.com
- The Ecommerce Playbook mailbag is open — email us at podcast@commonthreadco.co
Watch on YouTube
[00:00:00] Richard Gaffin: Hey folks, welcome to the Ecommerce Playbook Podcast. I'm your host, Richard Gaffin, Director of Digital Product Strategy here at Common Thread Collective. And I'm joined yet again, as I always am by Mr. Taylor Holiday, CEO here at CTC Taylor. What's going on today?
[00:00:13] Taylor Holiday: A lot is going on, but one of the big things, and this is what I'm excited to chat about today is there are a bunch of ideas that have been coming to the forefront of our industry that I think what is always lagging. The idea itself is how to operationalize that ideology effectively. And I think that is actually the much grander challenge that we always face as it relates to ideas in our industry.
[00:00:42] Richard Gaffin: Right. Yeah. So on this podcast, we've, we've had extended discussions with, with Luke, with Tony about. Sort of the ins and outs of what incrementality is, and if it were operationalized, operationalized correctly, what the advantages of it would be, but I think the thing that we want to discuss today is a why it's so difficult to operationalize it and B, maybe some kind of initial steps towards thinking through operationalizing it for your business.
So let's just get into it then. So let's talk about maybe. Maybe I'll backtrack to my conversation with Luke, where the way to operationalize incrementality or the way to at least initially discover it is by a geo holdout test, which is sort of like framed as something that's sort of simple. It's just something you can do and bam, you know, your incrementality, and then you can apply your benchmarks to your different platforms or whatever, but what exactly about it makes it difficult to operationalize.
[00:01:34] Taylor Holiday: The challenge with the application of test results is that it is difficult to identify and hold constant. All of the variables from the test result. So let me give you one example of a way I see this manifest itself all the time. Business a runs geo holdout account wide study on meta gets back incrementality read relative to the platform reported result. Applies factor moves forward, but along the way they switch from 50 percent BAU 50 percent ASC when they ran the test to a meta rep suggests to them they should move to 100 percent ASC and they moved from seven day click one day view to using exclusively seven day click. All of a sudden the inputs are no longer consistent. In your application of the information and the testing environment themselves, and this understanding of what are the variables that impact these results, and then how do you force yourself into periods of constant. Action relative to test results and do not introduce new variables until you have a capacity to measure them is a rigor and discipline that I think is actually very, very hard to fulfill
[00:03:21] Richard Gaffin: Okay, so on the face of it, it sounds like the simple answer here is just make the testing circumstances as close to the, because, because in specifically in the example of switching from BAU to all ASC or whatever, that seems like an easily avoidable problem where you run the test under one set of circumstances in terms of settings within the platform and then run it the exact same way when you actually bring it into the real world.
But maybe why is it not as simple as that?
[00:03:44] Taylor Holiday: because the dynamics over time relative to when you got the test, they change for all sorts of reasons.
There are. There are new partners that come in, there are new reps at the, the at the platform providers that have new ideas, there's new products that get introduced that are exciting that you want to test. Our industry is so dynamic and these ad platforms are so dynamic relative to the application of the tools and there's so many different people usually involved in the process of what people want to apply that we're just, I'm just talking about one component variable, the optimization setting, which I think is a huge, huge lever. In this factor of being able to create a incrementality factor for the platform, but there are tons of others, right? You could think about this as the mix between new customer acquisition and retention as one of the other variables. You could think about it as the dynamics of the creative itself.
You could think about it as the season in which you run the test. If you ran your test during black Friday, cyber Monday, are those tests replicable and applicable to January? And. What you end up with is the challenge with incrementality is it's static. It's a, it's a moment in time result with a set of inputs and circumstances that led to that test outcome. Right. And if you think about like sort of this, this principle that it's like scientific methodology, well, the idea would be that with scientific testing is that someone could pick up the test settings and recreate it and get a similar result. Right. But we can't actually ever do that. We can never recreate the moment that that test was run.
It's gone. The season, the ad, the time, the surrounding cultural impact. They're too dynamic. You can't actually ever recreate them. So. What this means that is like, holy cow, do I see test results from very long periods of time being applied in all sorts of weird ways with very completely different circumstances in the ad account than when they were applied, that it really renders the test results useless and actually, in some cases, not helpful.
So if you want, there's this tension between this idea that we want to be really scientific and disciplined in our ad spend and then like we want to be nimble, fast and try things a lot and just sort of gauge day to day results. And I watch people be at tension with these ideas all the time. We are not a patient industry. We are not patient and the process of measurement incrementality is a exercise in patience. And I just don't see that really being an attribute that we're excited to embrace. So what do we do in light of that? I think is an important question about how do we sequence test design? How do we apply them? What should we use in the absence of it? Things like that. That, Are really the point of this conversation. It's like, okay. Are you willing to wait weeks and weeks while something occurs to get a read and then to apply it with a consistent set of standards of execution? Meanwhile, meta just told you that they have a brand new thing that they want to try, or they just launched value optimization.
And are you going to reject that under this premise or what place do those things have? So I think there's a lot around the application process that becomes very messy following the realization of a historical test.
[00:06:57] Richard Gaffin: Yeah. Okay. So this brings to mind a conversation that we had probably for the first time, like six years ago, which was around when, when a similar conversation was being had about creative testing sort of back in 2017, 2018, which was like, and one thing that you brought to us was this idea that. What does a creative test prove, right?
An A B test run at a certain time. All it proves is that at that time in that place with that creative at a B at B. What it doesn't necessarily mean is that at a will be at B again, because time is shifted. It's a different part of the year. People have moved on whatever the case may be. And so what this, this brings to sort of seems like it's pointing at a larger question, which is like, what is the purpose or the use even of this sort of like scientific method style testing within such a dynamic environment where what you're studying is not like physics or something you're studying the behavior of real people.
So let's, I mean, this gets to the conversation I think we want to have about this specific thing, but like, talk to me about what do you think the use of testing is in this type of environment?
[00:08:01] Taylor Holiday: Yeah, it's a, it's a really good question because the creative testing thing is a whole separate conundrum here. Part of what I think our role in this is that the advantage or the most knowledge is gained through being able to run lots and lots of tests and beginning to understand the dynamics that alter the outcomes and the ranges of possibilities the levers that alter them such that you can guide someone through a process. Of making the most impactful decisions for their business. And then deciding then how to turn that into a system of day to day execution, all for the sake of the achievement of their business objective. And this is where I go back to the hierarchy of metrics that I, that I've done a drawing video on before against this idea that there are things that are definitively objectively true. And then things become less true as you move down the hierarchy of metrics and they become more. Indicators they become, and so the reliability of information and then therefore how much value we assign to it is it's wild. Like I, this is the hardest thing about being a service provider is that inside of every organization the amount of weight that people place on different inputs is wildly variable for some people, organizationally post purchase surveys are the most right thing. And then in another organization, last click GA is the most right thing. And all of those I think should be treated with way less rightness than the sequence moving from my bank account to contribution margin, to my revenue, to my AMER, to like down the chain of objectivity and still has to flow from that, but okay, what we're trying to do. In light of that belief that we hold truth to move in this hierarchic manner from money in your bank account to revenue or to contribution margin to revenue, to new customer revenue over ad spend, which is just, we would call AMER as the efficiency metric. What I'm attempting to do is to determine a causal relationship between. My advertising efforts and those outcomes. Okay, this is really important. I'm building a causal bridge, not a corollary bridge. Meaning I'm not trying to determine that these things happened at the same time. I'm trying to determine that one caused the other. And what a geo holdout study does is it allows me to determine, again, in a historical period of time, that there was a causal relationship between those efforts. That's really important. And so it is the strongest indication, in my opinion, that the advertising effect was real. Now, the amount that occurred in that moment, will it occur in the same way in the future? No, I don't think so. And this is where, What I'm, we're representing in this research that Steve's putting together is the bound of outcomes. How, what, how variable are the outcomes in this channel across all the data sets is really important to determine, okay, if I'm here now. What are the, how much will this vary potentially in the future? And how do I have, what are the things that are true when it really under reports or over reports? And could I avoid those things such that my day to day indicators of the platform become more valuable to me? That's the point of the test to make your day to day indicators more direct. More effective for you on a day to day basis and to refine them all the time.
[00:11:34] Richard Gaffin: Yeah. So this goes back to kind of the point you were making at the beginning around the industry's general distaste or rather impatience, I guess. And so it sounds like what you're kind of speaking to is that you do the geo holdup test and then you do it again, and then you do it again. And over the course of a year, maybe two years, finally, you have something that's really feels reliable.
[00:11:53] Taylor Holiday: That's right. Yes, you and I I'm a fan of always on
holdouts such that you are constantly it's like a tuning exercise, which is there's this range of outcomes and the more that I can hone in on. The levers that change them because the other thing that's really important is documentation of the inputs during test period. What optimization setting was I running? What was my mix of acquisition versus retention? B. A. U. Versus A. S. C. What was I doing in that time? What was the creative mix so that you can go back and look at the variables that were in play during that testing period? And that will help you understand, Okay, if I rerun the test, what variables changed and what was the outcome.
Okay, cool. Now I can think about how to apply those variables, go forward. And that becomes an additional part of making this information helpful to you. but what I see people doing is many aren't running holdout studies. that's like problem one. Problem two is when they're running them, there's no actual documentation of what was true during them, So what I notice people do is they use that fact to undermine line. The result all the time. Oh, well now we have more click versus view through. Oh, now there's more of this country versus that country. And so it's a mechanism for undermining the previous result. And instead of what we need to do is we need to say, okay, in light of that result, that means if we behave this way, we need to hold these variables constant.
And if we change them. We should account for the fact that in our minds, there may be a marginal outcome change relative to the previous test result. And then we should test again and we should see what happens. And best case scenario is you've run an always on geo holdout study that perpetually gives you the same incrementality read over and over and over.
And so your confidence grows over time. It becomes a sturdier foundation to stand on that directionally signals back and allows you to move AMER and allows you to move contribution margin and make the clearest decisions. But that's rarely the case, right? We're constantly getting. Alternative signals that sort of force us to refine for a period of time, behave consistent with that information for a period of time and test again.
[00:13:58] Richard Gaffin: So let's talk about the operationalization piece again. And what I wanted to speak to is, you were mentioning something earlier about Steve, who is of course our sort of head of data science here at CTC. He's researching essentially the sequence, if I'm understanding this correctly, the sequence by which you want to test these variables, is that correct?
Maybe unpack a little bit about how, how he's thinking about or what he's
[00:14:22] Taylor Holiday: Yeah. So what's what Steve is, Steve is doing, and this is, I think a brilliant approach, because I think when people think about the sequence of testing, they just really go like, where's most of my spend. And then they go test that, then that, then that, unless it's a period or they go like new channel test that. And what Steve did is. He started gathering up all the test results that we have across all of our brands. And he started going, where's the widest range of outcomes? So where do we see the test results show us that what we thought was true was most wrong by the widest gap. So in other words, that to me represents the biggest potential business risk.
Now there's a formula there that if you multiply the spend By the potential range, you can get to your potential over or under reporting. And in most cases, what I would say is that there are some channels for whom that's business transformative information.
If it's at the upper bound of the range, you're substantially underspending.
If it's at the lower bound of the range, you're substantially overspending and knowing that information is critical. So when we looked at that and we go through every channel, we go, okay, Facebook acquisition tests, Facebook retention tests, Google, non brand test, Tik TOK, et cetera. What you see is that the, the widest band that we see right now is on Facebook acquisition. We see it run. From like 70 percent incremental to like 140% incremental. And that is also tends to be with people's largest spend channel. Now that's a lot isolated to seven day. Click when we get into seven day, click one day view, there's a whole different set of bounds and parameters. And a lot of accounts have mixed versions. So there's complexity in this, but this is Steve's just beginning the process of gathering all this up and building this idea, because we want to be able to propose to customers. Given your unique dynamics of systems, here's the measurement roadmap that you should go after, where there is the potential biggest risk or gain based on the ranges of results, based on your media optimization settings and your current spend plan.
[00:16:19] Richard Gaffin: So, okay, that's interesting. My guess was going to be that like the whitest, I guess not necessarily the most business transformative, but the widest range would be in like branded search or something like that, which I, where I feel like the numbers that
[00:16:28] Taylor Holiday: Yeah, I haven't seen, I haven't seen. Yeah, I haven't seen he hasn't gotten to the
to the aggregate on that number yet. So I'm saying we're still working right now. The largest in his results is met. But I agree. I think that's possible. I've seen cases where brands are just incremental above 100%. I've seen it at like zero basically.
So I wouldn't be surprised if you're right there.
[00:16:46] Richard Gaffin: Yeah. Fascinating. Okay. Okay. So I think like one. One kind of place I want to take this is like, given some of the issues. So you've laid out kind of two cases. One is like when, when we talk about how difficult it is to really get a clear, actionable answer out of this type of test because of the way that the dynamics of the environment change.
So we compare that to the fact that in some cases, this is business transformative, particularly if you're overspending on a certain platform, let's say I guess what I, where I'm trying to get with this is like, what is. In light of all this, how useful or important is incrementality testing as a tactic?
By which I mean is like the results are by definition marginal,
[00:17:25] Taylor Holiday: Yeah. Yeah.
[00:17:26] Richard Gaffin: So for whom is this important? Yeah.
[00:17:33] Taylor Holiday: if, okay. And this is, this is, I'd say, let me take this back. A smaller business is going to struggle to apply this because they're, the variables are, are too, it's really hold hard to hold thing.
Con hold elements constant and the volumes of spend dictate slower learning. Like it just means you learn too slow and your, your system is too dynamic to really take. Take effect of it. I think that It's really important for anybody spending significant amount of money in channels. It's really important for people expanding into channel diversification. It's hyper critical for anybody who's broadened distribution into Amazon and com. That's an area that is like if you are using your holdout study to assess com and then you add Amazon distribution, all of your previous test results, Are useless to you. They're null and void. So that's another example of where operationalizing it becomes complex because the variables change distribution, both in large scale, retail and Amazon. If you take your existing merchandise as a business and you make it duplicitous to Amazon, your previous test result is not, does not matter. Like it is now a completely different environment for demand capture that you've created for your business. So I think you have to be conscious of those, those triggers that lead to new outcomes, and then you have to assess. So like, here's an example of why that one, one thing is hard. Brands build systems for measured, measuring their advertising that become like ingrained in an organization. There's spreadsheets and data tools and all sorts of things that exist. And a lot of that starts with like just measuring all the effects of your advertising on.
com. Then they all of a sudden expand out to Amazon. Okay, so now over here, there's this whole revenue stream of Amazon revenue that usually exists separate from all of the reporting and systems related to com. Now you go to run a test, okay, and you get back a read that actually does hold out Amazon and Shopify.
Let's say you get both reads, okay, and you understand that there's an incremental revenue creation from on Amazon. But all of your ad reporting AMR, et cetera, still only shows. com revenue and new customer revenue on. com. But your media buying team is now operationalizing an IRS target that considers the impact on Amazon. If you don't also update your reporting every day, such that your AMR is now a consideration of. All new customer revenue across every point of distribution over ad spend. You will create internal conflict because your AMER on your. com will look too low and it will look like you're losing money. I see this happen all the time and you'll pull back scale because the system didn't allow the Amazon revenue to show up in the view of impact. And even though the IRO, I'm saying my IRO S is good. They're like Miami are sucks. What are you talking about? And I w I've lived this reality. It's over and over and over. And so, You have to, as you introduce the measurement system and the test results, correspondingly impact than the reporting. So let's say you get to an IROS measure. I promise you like every day somebody is doing a report somewhere where they're showing ad performance that uses other metrics. You have to get rid of those. Like you have to now say no, all these reports that exist all over the organization showing channel level performance are now going to use this metric. Because if you start to undermine that metric with, Oh, but look, what's happening on the platform or North beam says this, or triple L says this, or post purchase survey say this, you're going to end up in what I see, which is like the quagmire of ambiguity. Where everything is wrong and right at the same time, depending on how you look at it.
And that is like the end result of this journey for most people is that, is that, and that is it, that is a pit by which you cannot function.
You can't actually make any decisions. Everybody's doing something different. One person will be like, things are good. And then someone will be like, things are terrible.
And you're just like, I don't understand. What are you looking at? What am I looking at? And this happens over and over and over again. And it's a problem of organizational system design. It's a problem of operating operationalizing data. It's a, a problem of belief. What do people actually believe? Can you create shared belief across an organization? It's all of these things that are really hard to create.
[00:21:44] Richard Gaffin: Yeah. Well, so, I mean, the way you've laid it out there, like the stakes of the stakes of doing this right, seem incredibly high because it's basically, It sounds like if you do it wrong, you go from sort of a bad situation into a potentially a worse one where you've sort of driven yourself insane. So maybe give us like a counter example of what it would look like.
What does it look like when this is done? Right? And I don't mean an ideal circumstance like within the chaos. What would it look like to have this sort of executed correctly? Hmm.
[00:22:18] Taylor Holiday: on what we are going to assign Truth too. Okay. So again, I would suggest this hierarchy of metrics where you move from contribution margin is the goal to a revenue definition. I've talked about this, how hard it is just to create agreement on what revenue definition we're looking at total ad spend, what ad spend is going to be counted against switch revenue.
Are we going to include Amazon revenue? Are we not? Are we going to include Amazon ad spend? Are we not? Are we including brand spend in this calculation? Every one of these metrics needs clarity and definition. And you need like an organizational data dictionary, like a definition. One of the best sheets I've ever seen from an organization about their data was that every term had a tool tip formula calculation for it in a spreadsheet, every one of them, because they recognize that these words don't mean the same thing to everybody. And if we want to create shared understanding, we have to start with shared meaning. Okay. And so that is, that is really important that down the entire hierarchy of metrics, we get to organizational understanding. And then with media spend, we have to decide what is going to be the governing day to day optimization metric. I would suggest. That you take your platform or your, your geo holdout study that you run against your meta results, you hold as many variables as constant as you can, you turn it into a percentage factor against the platform reported result and you use that and you commit to it and you do not deviate from it for a period of time. While you run another test and confirm or adjust the results. And the reason you have to wait this against the platform reported number is again, because we need to tie this string from platform optimization, what ultimately met is going to use to bid in your, your, in the auction. To your measurement solution to the business tied causally to the business impact. That sequence of that chain cannot be broken from contribution margin to the causal effect, to the optimization and bidding setting that the platform uses. There has to be continuity there. And so if you do that, now we have a way that a media buyer can, u can look in the platform and they can make decisions.
Hopefully that decision is simply setting the cost control and not doing anything else. But if you are a person who looks at results and turns off campaigns or raises budgets, at least you have, we create an IROAS column in your dashboard that uses that factor to report on the number. And we, we report and execute against that for a period of time. And then you retest and refine and you go, and that's the sequence that you go through and all the way through the organization. Every day you report on this thing. Another thing that I see happening. This is a problem. It's statless like we've had to update our tools to be able to allow for those factors to affect our reports or when I send you a meta report, I don't want to show you the platform Ross and confuse you. I want to show you. I Ross. And if you want to go another step further, there's a big project I'm working on at CTC around the idea of IMR, Incremental Marginal Return. It's just, some people call it POAS, but it's like IPOAS, right? It's this idea of like, actually showing and normalizing every ad channel to, Zero is a neutral return, meaning you made 0 on this order to 10 percent would be a 10 percent return on investment on first order to negative, and it's actually the marginal return factor.
So I think there's a measurement system that could go from not just revenue, but actual in margin, but it's the incremental weighted factor against the platform reported number.
[00:25:47] Richard Gaffin: Well, I mean, so it sounds, it sounds like maybe the solution here is, Hey, hire an agency and maybe hire CTC. I do think that like one of, one of the things that like whenever creating organizational agreement is sort of one of the solutions, that's where bringing in an outside consultant essentially can be super, super helpful because I think getting there on your own, you know, It can
[00:26:09] Taylor Holiday: Well, so here, here's another thing I'll say out in the world is that I don't believe that these people. Incrementality solutions will ever be self serve platforms. They will, it'll never work that way. And they, they want to, they're framing themselves that way. That's their ambition. I don't believe it'll ever happen because of this reason is that the actual value proposition isn't the math.
The math is a commodity. The value proposition is. Is what I'm describing, it's the process and sequence of operationalizing the information for business impact. And it's actually why I think us as an agency or other agencies that are going to take this on what they possess to sell and to offer is that process.
It's that operationalizing of information for your effect. It is very hard to do on your own. You don't have an access to enough information. I don't think to do it well. It's really hard and you do. So what ends up happening is you turn house or measured or work magic Into the service provider. And you, you, you ask them to create it for you based on their knowledge set, but they're also resistant to that. They don't want to do that, right? Because service business is a different labor profile, different cost structure, and there's downward pressure on the price of the commodity of running a test, but what they customer actually wants is this high touch service to help them design and operationalize the information. I watch it all the time. People get a test result. Okay. What do I do? What does this mean? What there's nobody, they can just receive it like a report output and instantaneously make it efficacious inside of like their business. They just, it can't happen. This isn't, it's not a self serve software in that way.
So I think this is a huge piece of the problem.
[00:27:44] Richard Gaffin: That's right. All right. Well, if you want to talk to an agency that is building the system for you, you can go ahead. You know where to find us commentary code. com. Hit that hires button. We would love to chat Taylor. Anything else you want to hit on this subject?
[00:27:56] Taylor Holiday: No, I'm going to, I'm going to keep hammering this a lot publicly because I, I think that it's become in a good way, like accepted, like incrementality has become accepted as a
practice. But that is, that is having a test result and just being able to go, Oh, I did a thing in a period of time, just like the creative AB testing thing is not an application or a set of fundamental organizational processes that actually make that information, make your business better. There's a long gap between having the sheet of paper and being able to execute it. Right? I think about it like it's right now. I'm a little league coach, right? And my friends at driveline do all this amazing research about bat speed and launch angle and attack angle of your back. And they like have proven. Increasing bat speed and certain attack angles are better for hitters. And it's like Kyle Bodie there, he uses his phrase, which I like a lot, which he's like, he's like, it's a solved problem. It's we are right about this. And I have that information. I have it, I can hold it and look at it, but that does not mean that all of a sudden, all of the athletes on my team can make their bat speed better. With just because I possess that information, there's a giant bridge between me now, turning that into a practice modality with a measurement system, refinement, checking in on the emotional status of all these kids, getting them to focus, repeat the activities consistently to produce the thing. I know to be true. And it feels that way with incrementality, which is that. Hold the test result is just like holding the data that says that speeds better. I now have to turn that into an action that actually allows my organization to be more effective and generate the better result. And that is the real problem.
[00:29:39] Richard Gaffin: I mean, it seems like a good opportunity for a David Ogilvie quote, our favorite, which is that most advertisers use data the way a drunk uses a lamppost for support and not for illumination, which I think is definitely the case here.
[00:29:52] Taylor Holiday: There you go.
All right, well, go forth and operationalize this information,
[00:29:56] Richard Gaffin: that's right. Okay, folks, we will we'll speak to you next week.
Take care. Goodbye.