Listen Now

After years of relying on expensive, limited solutions, we decided to build our own. In this episode of the Podcast, CTC’s VP of Ecommerce Strategy Luke Austin is joined by VP of Technology Kwa and Director of Data Steve Rekuc to unveil our brand new incrementality testing platform — now available to all CTC partners.

We’ll break down:

  • Why incrementality testing is the gold standard of marketing measurement
  • How our tool makes geo holdout testing more affordable and accessible
  • Real world results from brands already using it
  • How this changes the way we plan, forecast, and allocate ad budgets

If you’re tired of guessing what your ads are really doing, this one’s for you..

Show Notes:
  • Common Thread listeners get $250 by depositing $5,000 or spending $5,000 using the Mercury IO credit card within your first 90 days (or do both for $500) at mercury.com/ctc.!
  • What Drove March?
  • Explore the Prophit System: prophitsystem.com
  • The Ecommerce Playbook mailbag is open — email us at podcast@commonthreadco.com to ask us any questions you might have about the world of ecomm

*Mercury is a financial technology company, not an FDIC-insured bank. Checking and savings accounts are provided through our bank partners Choice Financial Group, Column, N.A., and Evolve Bank & Trust; Members FDIC. The IO Card is issued by Patriot Bank, Member FDIC, pursuant to a license from Mastercard. Learn more about cashback. Working Capital loans provided by Mercury Lending, LLC NMLS ID: 2606284.

Watch on YouTube

[00:00:00] Luke Austin: Why did we do this? Why did we spend the energy and resource in on our side to be able to build an incrementality platform here at CTC to be able to, to be able to offer. So first and primarily. Incrementality testing through causal geo holdout experiments is the gold standard of measurement.

That's what we believe. I think there's consensus across the space that the gold standard in marketing measurement is incrementality testing.

Hey folks, welcome to the Ecommerce Playbook Podcast. I am Luke Austin, the VP of Ecommerce Strategy at CTC, and I will actually be your host on today's episode, joined by two very special guests one that you may recognize and one newcomer completely. And I'm going to set up the topic of this conversation from the start and then introduce these two gentlemen.

So here it is. We have built our own incrementality tool into stats that is now available to anyone who works with us at CTC to have access to unlimited geo holdout tests. It's now available and this is something that we've been working on for a very long time and are very excited for this launch and what it will allow us to partner with you all in as it relates to marketing measurement.

At ctc. So we'll get more into why we did this what the tool entails, what allows us to do, and actually walking through a couple of incrementality geo holdout test results to walk through the process with you all. But before, I do wanna introduce a couple of the key gentlemen behind this rollout and what we've been working on for for a long time now.

So I'll kick it over first to Kwa and then to Steve Rekuc. So qua over to you.

[00:01:43] Kwa: So I'm the VP of technology at the CTC and my background actually is in finance, so I kind of was brought up in a lot of financial modeling. And so time series analysis for me is, is very natural. box Jenkins. And so when we started looking at geof studies and we do a lot of forecasting and TAL as well. I kind of felt like, oh, this is kind of a domain that I, I was really interested in. So I jumped in, looked at the research the research is really interesting and I think like, for people that are like, really numbers happy and wanna get into marketing, it's, it's something that you, everyone should read. And then when, when I saw kind of like the other studies, like just felt like there's something that we could make and contribute to the marketplace.

[00:02:32] Luke Austin: Awesome, Steve.

[00:02:33] Steve Rekuc: Yeah, and I'm Steve Rekuc Director of Data at Common Thread Collective. I help build some of the models that we our brands use that are utilized in establ. I also look at, at a lot of the macroeconomic data some of the survey data for the DTCCI and, direct to consumer confidence index that is and also contribute to DT D two C index that we publish weekly versions of and monthly versions of. And I have a background more in mechanical engineering and took statistical testing and grad school and then looked at for testing out quality control purposes as well as using it more in engineering specs. But it makes sense. Very much to utilize it in this manner in like geo testing. So it was always a, a strong interest and I thought felt like we could do a really good job with it. So, glad to see that we've taken this on.

[00:03:25] Luke Austin: Yeah, definitely. So I think that leads naturally into the question of why, why did we do this? Why, why did we spend the, the energy and resource in on our side to be able to build an incrementality platform here at CTC to be able to, to be able to offer. So first and primarily. Incrementality testing through causal geo holdout experiments is the gold standard of measurement.

That's what we believe. I think there's consensus across the space that the gold standard in marketing measurement is incrementality testing. There are sort of, three different tiers as it relates to marketing measurement and how people can think about doing that. First is surveys and third party data to sort of understand.

What what relationships might exist in certain channels contributing to the business outcome. Second tier would be more related to MTAs and attribution models attempting to apply some fractional credit on top of historical data. But. Incrementality testing is the gold standard because it is setting up an experiment at a specific point in time that has a high confidence output that we can say, this test was run, this is our confidence in this result being the case.

And then making the budget allocation decisions accordingly from that point. So we built the tool for that reason. Incrementality is the gold standard measurement. That said, the solutions that currently in exist in the market through the platforms that many of us know and, and are probably well familiar with, cost $10,000 a month plus longer term agreements in most cases.

And. Cost prohibitive for a number of brands needing to participate in the gold standard of measurements. So, we believe that everyone needs to be using geo holdout testing, and so we created a solution to be able to allow for that with high confidence output. That is not cost prohibitive at that same level.

The, the reason we are able to do this, which I think could be a, a, a question that naturally follows is that one we have our own proprietary business planning insights tool called Stats. So Qua is a VP of technology, oversee stats and all the functions there and the data integrations to all the platforms.

We have the data already and exist within this platform. Then Steve's role as the director of data is looking at looking at all that data across our data set and the the broader market trends, as well as doing the modeling on behalf of individual brands. We're already looking at that data and, and doing our forecasting and planning against it.

In addition to that, we have access to hundreds of, of incrementality tests and geo holdout tests from other brands and other platforms that we've, over the course of the recent years, been able to see those insights, being able to see the trends and also see what works well, what doesn't to identify things to improve upon and build our tool off of.

And then. The third and final thing, which is that our growth strategists at CTC are responsible for the fp and a budget allocation and target setting across channels, and then tracking against that, pacing the forecast, and the opportunity identification. So, marketing measurement is already naturally a part of the responsibility that we're taking on as growth strategists at CTC and partnering with the brands that we do.

And naturally incrementality has become a, a big part of those conversations over the past couple of years. So we don't need to, we haven't needed to build out additional teams or units that are taking this on because we're already involved in those conversations. We already have our own tool. We already have all the data, and we already have the people engaging in those conversations.

So naturally us being able to offer a tool like gives high confidence results that is included as a part of working with CTC was the thing that we were always after related to this, this conversation. So what we want to do from here is walk through our tool in the context of a recent test output and, and speak to some of the areas, but before we do.

Jump into that qua, Steve, from either of you, is there anything else that you'd want to add as it relates to the why or the how in terms of how why we built this, this tool and how, and how we got here, some of the impetus to, to frame it up before getting into the tool and the test results specifically?

[00:07:25] Steve Rekuc: I think us building our own tool also allows us to kind of get under the hood in the process because if we're in charge of a brand and growth strategy for that brand, we're responsible for acting on the knowledge that we have. So if a test winds up with an erroneous result and then we apply that, that doesn't look really good. For us, and that's bad for our brands. So we'd really like to understand what's going on and what could negatively impact test results that we might not be able to get under the hood of if we don't if we can't look into those details and see what's going on in the test, we might be acting on erroneous test results. In a bad way, and we'd like to avoid that, and we'd like to therefore know what's going on. So building our own and knowing what's going on gives us a little bit more direct control on the process.

[00:08:21] Luke Austin: Yeah, that's a great point. And what connected to that, what we've built into. Stats is, the ability to see the efficiency or the I oas from each of the platforms based on the results of these incrementality studies. So, so loading in incrementality factors for each of the channels, and then viewing channel performance based on the results of incrementality tests.

So we have that within stats. That's how, that's primarily how we view. And that's, yeah, that's primarily how we view and assess the performance of each of the marketing channels. Steve, to your point. We're loading in those incrementality factors, we're looking at the, the channel performance against those.

And so we wanna be very confident in those results and be very involved in the process as it relates to arriving at those incrementality factors. And, and each of the results connected, connected to them. And I think the other thing we saw is that there's the relationship between. Geo holdout tests impact on each of the different revenue definitions, which we'll walk through here soon.

So on your, not just your total D two C revenue, but the new customer revenue from D two C, returning customer revenue from D two C and then the halo effect on Amazon. Each of those have a relationship to the media spend that's important to define, measure against, and then take into the consideration when when setting, when setting targets as well.

And then the other thing, we were talking briefly before, before we started, started the episode. And call. I'm, I'm curious if, if you could jump in with a bit more insight that you had shared earlier, which is you've, you've done you've done a lot of digging as we all have into this space over the recent years as it relates to incrementality testing.

You've seen some of the other platforms and results that exist on them. What were some of the things that you have identified over that course of time that really led you to identify Yeah, these are things we need to improve upon and that we could improve upon in building our own, our own tool.

[00:10:13] Kwa: Yeah, so my, my journey into incrementality product, is a little bit different than I think Steve yours and Steve's. So, I kind of like started my, did my own research and we were kind of studying everything for like a couple months. I had a couple other, like statisticians. So we looked at it from a very statistical standpoint, like, okay, like how do we properly run the test? and from my, like, if you could control the test. If you can control like of like the, the model and like a lot of the inputs, then you could run a really good test. Like, but if someone kind of just gives you data and you have to make something out of it, then it's very hard to manage and that's when you really need to advance, like the kinda metrics.

And so, the, the viewpoint that I thought that we can do is that we're just going to try to take really clean tests. Then we're gonna have a really high confidence level in those tests versus like, someone runs this test in a slapping way and we have to kind of go in and fix it. And I don't know if we even have the tools to do that. but what, what we do con, what we can do is we can control the test. Like we could like, okay, you, you should, do this and do that. And do, and like, set up your channels this way. And, and so like, I think that is kind of our advantage and I didn't really think about it until we started working together more. I was like, Hey, actually, like when you approach this from like a marketing standpoint, there's a lot of inputs that's really valuable. And like another thing is like from a statistician, like we just kind of give you what the results are and it's up to you to do whatever you want with it. From CT standpoint, it's like, okay, well if the results don't measure, like these recommendations don't work out, they're responsible for it.

Right.

[00:11:50] Luke Austin: Yep.

[00:11:51] Kwa: that like change of responsibilities is a, a really big deal compared to like how other firms run things. I

[00:11:58] Luke Austin: Yeah.

[00:11:59] Kwa: Yeah. It just has a, it just, there's a ton of value in terms of like taking on the full responsibility from, in terms of like study to action. And I, I don't, I don't think there's anything like that.

[00:12:09] Luke Austin: Yeah. Yeah. I, I think it's a great, I think it's a great point, which is with most of you listening are probably familiar with, every month we build out a forecast and we say this is the target that we're gonna execute against. We track on it we track against it every day. And our goal is to be plus or minus 10% within that forecast, and we're putting ourselves on the hook for that outcome for you all.

And so, yeah, it's a great point. Quad where, which is. Incre incrementality being the core of how we're setting budget allocations and targets and doing the marketing measurement. It's then connecting to the forecast that we're putting ourselves on the line for and saying, this is what we're gonna go after and get.

And so the the output of the test is in a data point that sort of lives somewhere. It gets directly integrated into the business forecast and business plan that gets actioned against right when we get a a high confidence test result. So yeah, I think that's a, I think that's a great point.

Alright.

[00:12:57] Steve Rekuc: Well, too, Luke, the advantage that we have too is that a lot of the data sits in Sta Like Quas built an excellent tool where we already have the data for our brands ready to use for this.

it, it makes it an easier step for us to go in and do this rather than some third party that requires more integrations.

[00:13:18] Luke Austin: yes. Yeah, and I,

[00:13:19] Kwa: not only that, it's cleaned already, so. Like we

channels that you might not have thought of. We like the, like attribution definitions. We might like even exclude certain campaigns that would've thrown off the data. Like there's all these things that we kind of pre-clean that really make this video a lot easier. that you, like if you were coming in fresh and had to just were given data, you don't really know, like, Hey, is this point valid? Was there something that I should throw out? And just having that is just such a big lift for like from a statistic standpoint.

[00:13:54] Luke Austin: Yep, for sure. And the it, it's worth noting too, we've built in the functionality. To import past incrementality tests into our tool. So for brands that have run incrementality tests with another platform or another tool prior to working with us or even if they don't work with us, we're able to upload those results so that we can have all the incrementality results live within stats.

All of them connect to the media channels in the forecast. And to the point earlier that we were discussing, be able to assess the. Confidence of past tests that were maybe not even run with our platform to to assess whether it's a high confidence output or whether we should rerun that channel or tactic again, but have everything in one place and really be able to say, okay, do we have enough contents to again, put our, put our sign off on this and bring it into the business forecast?

Or is there something that should be retested to make sure we're, we're fully confident in, in being able to sign up for it? So let's jump into the tool without further ado. So what we'll have up here for those of you viewing this on YouTube or through some visual channel is a walkthrough of the studies tool, the geo holdout portion of the studies tool within stats.

We'll be walking through each section of it and it's in the context of one of the results for one of our brands that we have run. Recently. For those of you just listening we'll try to describe with as many adjectives as possible to give the idea of what we're, what we're walking through.

But the first screen here that we have up is a project progress overview. So it shows five steps in the geo holdout test design. So the first step is getting recommendations on the market selection for the test. The next is selecting the test conditions. The third is setting the study start date.

The fourth step is running the study, and then the fifth step is finalizing and ending the test. So we're gonna walk through each of those five steps, se sequentially how they exist in our tool, and then share context on what the methodology and approach is that is leading to each of those. So we're gonna start first here with.

The recommendations or the market selection phase of the process. So Steve Qua, we'll kind of do open, open core here. Whoever wants to jump in. Give us some context on the recommendations, market selection stage step one here, what is going on in the tool.

[00:16:24] Kwa: So the, the method of incrementality that we, we use is trying, is called synthetic controls. And so what this is doing is it's trying to set up um, of states that are going to be good for synthetic controls. So if you select, like if you just set four random states and you run a study later when you need to find another four group of states to kind of compare that against, to have like a control group. It might be difficult if you selected wrong four state the wrong the wrong states, right? So this, what this does is it goes through and gives you a kind of like, of, what was the term? So it kind of, models out like, oh, if I select these four states, what are kind of like potential things that can happen?

What are kind of, best fits that could occur given the the data that I, I fed into it? So. And then it gives you kind of like a confidence level and then it ranks it. So like it ranks it by, so the first two here are the top kind of recommendations and rankings depending on effect size. So, and the effect size determines your budget.

So like, if the, it determines, okay, you're gonna have to do like a 10% lift with these 10 states, which is a 10% effect size, you can kind of look in like, okay, how much did I spend in these states? And then. If I were to do a holdout, what, what would be the cost? And that, that kind of gives you the budget. And he also kind of recommends how long to run the study for. And then as you scroll down, it should get worse and worse. But sometimes you might want to like pick a market rank number three. Like, it's not, it's not, it's, it's, the difference might not be that much where you can, you, you just always pick the first ranked one market.

[00:18:01] Luke Austin: Great.

[00:18:01] Kwa: there's some other things into consideration. And the other thing that the software doesn't really isn't able to do is look at charts. So like, that's why we have the charts in here is like you, it might be like one, but you'll see like like, you know, like it, doesn't really make sense when you look at the chart.

Like it's, it's, it's not stable or like, it, this one's this data should be thrown out because it's whatever, whatever information I know about the market. So, I think it's always important to look at charts.

[00:18:27] Steve Rekuc: And what we're doing with the synthetic control is trying to find a representation states that represent your overall performance, your nationwide performance by selecting a subset and weighting the different states in that subset so that they're very representative of what's going over on in the overall market.

[00:18:46] Luke Austin: Yes. Yes. And I think we all, we all probably have different, different levels of understanding as it relates to what incrementality and geo holdout testing is, but. There, there are primarily two types of tests as it relates to incrementality testing that brands will run, run, one, which is just straight holdout holdout test, which is you're gonna run media only in a group of, let's call it five states, that's I, that are identified based on the market selection and synthetic control.

That that's typically, typically good for lower spending or new ad channels where you're launching a media buy on. Tv. And so you're gonna do it in, you know, five to seven states first, see the impact with the lower budget based on the market selection, and then you can grow from there to a national buy.

The other test type is an inverse holdout test. This is the most common of what most brands want to do, which is the market selection process occurs. Identify based on the synthetic control, a group of, let's call it five states that are represent representative of your, of the total business, and then those states are excluded from the media.

So you run media in the 45 states, you don't run it in the five states, and you see the negative impact on revenue by withholding media from those. From the state. So holdout testing, inverse holdout testing, but it's all based around this process of identifying a group of markets that's gonna be represent representative of how your overall revenue for your business moves, and then being able to affect the media spend within those markets to see what the impact is on either increased revenue or drop in revenue.

If, if you're, if you're pulling back spin. So the market selection process here. Is identifying the group of states that will be best based on representation of your business as well as budget constraints and what you're able to do to be able to get a high confidence read in structuring and structuring the task.

I think another point that is worth making here is I. What's noted here at the top of the test, which is that this is looking at discounted sales, so we can, we can look at different revenue definitions. It doesn't make, it doesn't make a huge impact, but specifically this is looking at first time order.

So first time revenue Steve or Qua, whoever wants to jump in, can can you give more insight on how we look at different revenue definitions? First time revenue, total revenue, returning revenue. And why we may pick, to design a test around first time customer revenue rather than total customer revenue.

[00:21:08] Steve Rekuc: In this particular case, and in many brands case, you're actually using ad spend to drive new customer sales. You're not doing it so much to hopefully not doing it too much on the returning customer side. And this particular brand has a good amount of LTV. So we were to look at total sales. It kind of gets a little bit more lost in the noise. You're gonna see a much stronger signal of what that ad spend is doing against new, new customers first time orders. And that's why we. In this particular case, we're focusing on this and we'll do that for a lot of brands where you're running the campaign primarily to new customers. We're going to look at the impact on new customers, and that's gonna be what we're gonna base our selection on and what we're gonna be trying to hit in terms of finding statistical significance in measuring that effect.

[00:21:58] Luke Austin: Yeah, and it's worth. Noting for this test. This was a Facebook acquisition test, so we were, we were measuring the incremental impact of the acquisition or new customer oriented campaigns within meta. So the test was then designed against looking at first time customer revenue from Shopify and each of the markets to be able to design that test on med acquisition.

But we look at different revenue definitions and structuring a. An incrementality test to determine the effect of meta retention campaigns, for example. Right. And, and there would be, there would be different things considered there. So market selection, this is the recommendation stage. And the tool that we've built has allows us to see different market selections, but what we're doing in, in partnering.

Together with the brands that we're working on. Incre Fatality is Steve and Quas Insight. And the data team's insight on comes through in terms of here's the recommended market and, and test design that we should move forward with. We're making that recommendation based on the different things that we're seeing in the tool and the confidence level.

And then and then partnering and standing up the test together. Moving on to the next step, which is the recommendation outputs. There, we don't have to spend too much time here, but qua, is there anything worth pulling up here in terms of the recommendation, outputs and sort of visualizing what to expect from one of these tests?

[00:23:18] Kwa: Yeah, so the the thing I look at here is the, the power curve. So the dual lift power curve gives you kind of, a model on like what the, how the effect size will change given investment. So like, if I were to spend what point? So I, I, I would expect to see an effect starting around 40 K and these numbers are, are aren't exact because we, we know that there's assumptions that might, might not come true, which we correct loop for later.

But, you know, like at some point 40 K to 80 k. go from a 10% to a 20% effect size. And so, like, this is kind of valuable in if you're running the type of test. So like if you're running a holdout versus a reverse holdout, you kind of wanna look at this to see, okay, if I were to run a reverse holdout, am I gonna be able to get the desired effect I want? So I usually just kind of take a quick look at this make sure like, it doesn't show anything weird.

[00:24:08] Luke Austin: Yeah, that's great. So we can see this for each of the different each of the different market selection recommendations and kinda see what the, what the power curve looks like. But again, to be able to have confidence that we're gonna, we're going to be able to spend enough within that channel and tactic over that time period to get a high confidence read on the test output.

'cause what we don't want to do is we're trying to get an understanding, in most cases of what is the incrementality of this channel as it currently stands, right? So we don't wanna set up a test. Then have to double spend within that channel for the duration of the test to determine the incrementality, knowing that we're gonna pull back the spend after, right?

We need to be represent representative of the budget that we're planning to continue at, or at least within that range. And so this allows us to make sure that we have enough budget to get the read at the current levels. Step three, location selection. So we've identified by this point what the recommended market outputs are based on market selection, and then the recommendation from the team made sure the budget consideration is gonna be strong enough based on the power curve and then the location, location, selection step here.

What else are we looking for and determining within this step?

[00:25:16] Kwa: So this is where you, you would make your selection and that's probably already decided by looking at the, the previous steps. the thing here is figure out your budget. So what we do different compared to I think other platforms or like other people that normally model it, is we don't make an assumption on assumption on what is the expected Ross going into the model, that is decided in this step. So this is because that, that's a really big determinant in your budget allocation. So if you think that the IRS is the, the, the roster for, you know, Facebook resistance, like two that drastically changes your, your, your expected budget, right? Versus if you think it's 0.5. So by making this step, I think you get a much more accurate budget allocation. And that's,

the main thing to

[00:26:04] Luke Austin: Yes. And yeah.

[00:26:05] Kwa: to do a holdout or reverse holdout and what budget you're expecting to spend.

[00:26:09] Luke Austin: Yes. And, and I think this is a, this is a, a really important point actually, which is in designing these tests to understand. So the tests are designed based on an assumption around the revenue impact that needs to be seen. So what we're looking at is how much revenue impact we need to see to have a high, high confidence read on this test output.

Then what we need to do is back, back into, okay, how much budget do I need within this channel and tactic to be able to realize that that level of revenue impact, that progression, revenue impact to the budget is really important because we have to assume in there that some level of, of actual efficiency from the channel and tactic, right?

Quad you, you said it like, if we need $80,000 of revenue impact and we think the channel performs at a three x, we don't have to spend that much money. But if the channel actually performs at a 0.8 x, we're gonna have to spend a lot more to be able to see that revenue, that revenue impact. And so this is something that over the course of some time now, we've been aggregating the results from incrementality, tests from various platforms and various brands to create incrementality, starting points or benchmarks that we use in this process.

And so we're able to say, based on hundreds of tests that we've seen from other brands and the results of those. A starting point for Facebook meta acquisition on a seven day click basis 120% incrementality is a, a high is a starting point that we're confident in. We know that meta seven day click acquisition is not gonna be a three x incremental ROAS in most cases.

Right. And there's always, there's always a brands on all side of the, that data set. But we also know it's likely not gonna be a 0.5. And so we use a starting point incrementality benchmark based on a larger data set. In this case, meta acquisition, seven day click at 120% incrementality factor, which is saying meta under reports.

Its true incremental impact. And then we're using that to do the equation of how much revenue impact do we need to see back into the budget. So we use the incrementality benchmarks or starting point factors from the data set of brands that we have aggregated. To be able to get to the budget allocation necessary, and that just gives us a higher confidence that the test we're designing is going to be a test result that is going to work out in the end, rather than getting into the test and saying, all right, we think we need to spend $25,000 on this incrementality test in these states.

We get to the end, we get to the end of three weeks, four weeks, whatever it is. Well, no, we need to keep spending more. And actually you need to spend $50,000 for that test, not $25,000. We want to mitigate any of that to have a really high confidence in the, in the necessary budget allocation. I.

Okay. And then validation. This is now we have everything set up, market selected, budget allocated, the timeline for the test. The, the step here is that the exclusions for those markets need to be placed within the platform. We've added this validation step within the tool to help us be able to see if the exclusions have been applied to the necessary campaigns or if there are any gaps.

So we've done the market selection. We're going to see if the regions have been excluded or not from the specific campaigns that they need to be. In this case, this test is completed. And so what it's showing is there's no, no regions excluded from these campaigns. If the test was running, it would say, these are the regions that were excluded, we're good to go.

This test is concluded. We've added back in those DMAs, so we'd expect nothing to be excluded. This just allows us to be able to see and, and be clear that yes, we have excluded the the right regions. There's no straggler campaigns that are not excluding the right regions that are gonna muddy up the test results and give us that validation in the process.

Then landing the plane here with the end output, which is the result of the test. We've set up the test, we've run the test for a number of weeks. So at this point, when can we end the test? And what is the data that we're looking at in determining when the test result is completed and that the test was a strong output.

[00:30:18] Kwa: So you're, you're looking for stability. So if you're able to go in and look at a test every day, then you would, you, you'll notice hey, look like it's, it's stabilizing. The V values have reached a certain point. If you don't know, if you're not looking at tests every day, then you would just go with the recommendation.

So the recommendation was 21 days, you would look at 21 days. Okay. It looks pretty good. Probably what we should add as a feature thing is a feature is to kinda look to see how the results have changed over time that we can see if it has stabilized. That's just talking through it. That seems like something you're no brainer at, but but that, that that's what you would do.

You would you wanna know that it's stabilized. You wanna know that you've reached your effect effect threshold. And hopefully you have, you have some decent P values and then you look through the charts to see like, if it makes sense.

[00:31:09] Luke Austin: Great. So let's, let's talk through some of the specifics of this. Of this test result for the brand that we will leave unnamed, but we can walk through some of the specific data. So, this test has concluded it was a, it was a result that we have high confidence in. Let's talk through some of the specific data points as a result of the test.

What, what do we, what are we looking at that gives us high confidence in the test output? And what was the result of, of the test?

[00:31:35] Steve Rekuc: Well first we were running this based on and selecting based upon new customers, and it's a med acquisition campaign, so we're first looking at. At new customer effect and the first thing that we see is a p value of 0.01. Meaning we have 99% confidence that this, account that we're testing does have a positive effect that this is for sure doing something good.

It's definitely an incremental channel. Next we're scrolling down and we're looking at the I oas that we measure, and this is measured by the amount of revenue loss. In this partic particular case, we were doing that inverse geo lift that you were talking about Luke, we're turning off channels in these states. So we should see a loss in revenue in those states. And based upon that, we're estimating. We also estimate an incremental spend that would've occurred in those states had we. Kind of continued spending. So we're able to come up with an IO as an estimated IO as for what we would've had in revenue and in spend for that state had we left this media on.

[00:32:47] Luke Austin: Yeah. That's great. So that's the, that is the impact on the new customer revenue as measured in Shopify. At the conclusion of this test, we have really high confidence. What we're also then looking at is what was the expect on, what was the effect in with withholding that spend on returning customer revenue and then on total revenue.

So combined new and returning for D two C. And then also looking at the Amazon effect, the halo effect on Amazon revenue, and looking at each of those things to see if there's an there, if there's a relationship to, to each one of them. In the case of this test. The effect on returning customer revenue is a p value of 0.34.

So that is not high confidence enough for us to be able to say that withholding meta acquisition spend had a significant impact on the returning customer revenue. That is what we would expect from a meta acquisition oriented campaign. If the exclusions are set up right and, and the test is dialed in, so that's actually a good sign in this case, right?

We don't have a read on returning revenue, so they are focused on. New customer revenue, and it's why looking at the new customer revenue, in addition to the total revenue is so important because we do have a strong enough read on total revenue. It's a p value of 0.08, but the new customer revenue is a 0.01.

It's much stronger when we pull out the returning customer revenue. Effect from it. That said, we're looking at the total fully loaded I oass impact of this channel or tactic on D two C plus Amazon to get to a confidence of dollar into this specific channel and tactic. What can I expect the I Oass to be on the business as a whole?

So that's what we have here visualized in this chart is. A fully loaded total effect. I oas platform ROAS is on the far left here. And then we see the additional impact from the new customer read from Shopify, the returning customer revenue impact, and then the Amazon Halo effect to, to be the total effect of the channel on the full, the full business.

And so we can, we can slice it different ways to be able to say, okay. The incremental roas on new customer revenue for D two C compared to the platform roas. Or we can look at the fully loaded number that takes into account total revenue as well as the Amazon to be able to give the clearest picture of, if I put a dollar in this channel based on the result of this incrementality test what can I expect the result to be of that media spend relative to what the platform is claiming the impact was and what the efficiency.

What the efficiency of the test was.

And then scrolling down here, we have some other visuals and data points as it relates to the results of the test. I don't know if you'd want to jump in and highlight anything as it relates to some of these charts below the test results that are worth highlighting and that we that we look for the conclusion of a test.

[00:35:39] Kwa: Yeah, you, you want to just validate that the, the spin is what you expected. So you see there, like it just, it didn't spin in those periods. Is, is what you want. Then

in the next graph like you would hope to see, I mean, it's really hard to see but you would hope to see that there's some effect on the customer revenue within the ones that you withheld spin on. but like, like I said, it is really hard to, to add all that, but in some cases you're, you're gonna see it. The other thing you're looking for is, like, in this case here, where there was a, there was a spike. So these are non-controlled states. There's a spike in spend. So, you might wanna. So that might be something like, okay, so why, why did we maybe there's a sale or something. And so you would take that into account as well, like when you look through the charts. so there's no spend there. And you see, like in this one, top line, there's, there's kind of a spikes in revenue kind of correlate to the spikes and spend where there's nothing in the test state. So like there is gonna be a pretty strong signal here. I'm just, just looking at the chart.

[00:36:40] Luke Austin: Great.

[00:36:40] Kwa: this is just something like, I kind of like wanna see. That they're like the, kinda like visualize what, what the states were and like see if there any, if there's anything that might be like too close to each other. So this is, that's what I would look here.

And then if you scroll over, like for example Texas you can, you kinda look at the different states. So like Texas had, can you scroll up? So Texas, like the effect size is minus 10%. And then I would look, if you look at Florida. Go over Florida. So the effect size here was like 16%. And so like I just kind of look it through like, okay, where, where are kind of the different sizes?

Was there one state that really dominated? And then, you know, like there's nothing that we could really do with it, but it would affect your kind of like post-analysis. Okay. Like there's just like, we saw really sick, strong signal, but all of it came from this one really big state. And so that would kind of put a little bit damper on the, on the study. But these are things that you would, you would kind of like a checklist of things they look at.

[00:37:44] Luke Austin: Great.

[00:37:44] Steve Rekuc: And even as the test is progressing, those charts are awesome. So when you're able to look at the, the holdout states the states where we were supposed to turn off, spend, you. You can, should be able to see them have no spend continue that something weird didn't happen that all of a sudden they got turned back on by somebody. So it, it's great as you're keeping track of this campaign and checking back in or keeping track of this test and checking back in, that you see that those lines make sense throughout the time period, not just at the end of the test.

[00:38:14] Luke Austin: Yeah. Yeah, it makes total sense. So that's the process, that's the thought work that has gone into building the tool and standing up the test and getting the result. The, the, the, so what of all of this is at the end of this test, what we're able to see is the platform meta acquisition was claiming a 1.217 day click roas after adding in the incrementality results for new customer revenue, total revenue, and then the Amazon Halo effect.

The total fully loaded IOS result based on the test. It was close to a 1.33, 1.34. So the meta platform on a seven day click basis for this tactic was underreporting its impact to the broader business when taking into account the revenue as well as the Amazon Halo effect. Which would then as an action, potentially lead to a decision to be able to invest more in this channel to drive more overall revenue.

If we were looking at the 1.21 and thinking, okay, this is this is what the efficiency is, actually, there's additional halo impact from this channel. On the broader business, we're willing to invest more. And the test results that we have seen go both ways and at very different levels. And so it's very important that we.

We love using incrementality, starting points and benchmarks, but getting the result for your brand specifically is paramount because the results that we're seeing from these tests are very disparate in terms of what the true incremental impact is of each of these channels versus what the platform is reporting and the budget allocation decisions.

The target setting and media planning decisions that follow are really consequential, which is why we've built this tool at CTC to be able to offer on behalf of our clients so that anyone who works with CTC has access to unlimited geo holdout tests starting now. 

So I think that covers everything we're planning to walk through today. Qua Steve, anything else to add before we wrap things up?

[00:40:07] Kwa: Just looking through these tests for. Like almost a year now, maybe almost like, and just working with a lot of other statisticians, like I think anyone that tells you that I have like a hundred percent confidence in this study after you write it one time is probably lying to you. And like, there's just so much leeway in terms of how you calculate things, in terms of how you model things, in terms of how you set up the data. like, I think. I just don't, I just don't know how, like you could have confidence if you don't have total control over the data. And so like even if we were to give the test to someone else to run, like there's just, we'll just never have that much confidence running it ourselves.

[00:40:49] Steve Rekuc: Yeah. And, and I think they're really dependent upon the time and the spend that you run it at and the other things that are going on with your brand. So like, it, it very, it's very much how your brand fits into using that channel at this particular time.

[00:41:08] Luke Austin: All right, we'll leave it at that to anyone who made it to the end of this episode, more power to you. Thanks for thanks for hanging with us. Talk soon.