CTC Core Methodology Series: Marketing Measurement

Taylor Holiday

by Taylor Holiday

Feb. 25 2026

How CTC builds progressive truth in marketing measurement, closing the gap between what your data says and what's actually happening.

01 — The Measurement Gap

Every brand longs to understand the causal relationship between the advertising dollars they spend and the revenue they realize. This is the central question of modern marketing: does this spend actually work?

A great measurement system closes the gap between reality and fiction when it comes to interpreting that effect. The wider the gap, the worse your capital allocation decisions. The narrower the gap, the more confidently you can invest.

Your measurement system exists in this gap between Reality (the true incremental impact of your spend) and Fiction (what your platform dashboards report). The goal is not perfection. The goal is to move closer to reality over time, with increasing confidence.

02 — Understanding the Dynamic Reality

Before building any measurement system, there are foundational truths that must be accepted. These aren't opinions. They're constraints that govern any honest approach to marketing measurement.

1. Media efficacy is in constant flux

The relationship between your ad spend and its revenue impact is not a constant. It changes day to day, week to week, driven by forces both within and outside your control. Any system that treats this relationship as fixed is lying to you.

2. You are always building an approximation

At all times, your measurement system is an attempt to build the closest approximation to reality that you can. There is no perfect measurement. There is only less wrong measurement.

3. Progressive truth is the mechanism

CTC's approach is to build a progressive truth. We want to move closer and closer to reality over time. Every test, every data point, every experiment reduces the error rate of the system. Truth is not discovered in a single moment. It is accumulated.

The Forces at Play

The efficacy of your media is shaped by two categories of forces, and understanding this distinction is critical to interpreting any measurement result.

Within Your Control
  • Quality of your creative
  • Campaign optimization
  • Placement strategy
  • Audience targeting
  • Budget allocation
  • Landing page experience
Outside Your Control
  • General market demand
  • Macro economy
  • Cost of ad inventory
  • Platform algorithm changes
  • Competitive intensity
  • Seasonal demand shifts

03 — Incrementality Testing & Geo Holdout

The best mechanism for building progressive truth is through incrementality studies, specifically geo holdout tests. These experiment designs are the gold standard for isolating the causal effect of advertising spend on revenue.

Geo Holdout Test Design

Isolate geographic regions to measure the true incremental lift of ad spend:

  • Test Regions: Comparable geographic regions receiving marketing for the channel being measured
  • Control Regions: Comparable geographic regions not receiving marketing for the channel being measured
  • What We Measure: Revenue difference between test & control regions = Incremental Lift

Measuring Across Every Point of Distribution

The incremental lift should ideally be measured through every point of distribution, not just your .com. Advertising impact bleeds across channels: DTC (.com), Amazon, and Retail (where possible).

An incrementality test gives you a snapshot of the causal effect of that revenue, with some degree of confidence, for that period of time. It is critical to recognize that this is a single data point that may or may not ever be replicated again.

This is precisely where the concept of progressive truth becomes essential. A single test result is valuable but insufficient. The power comes from accumulation.

Inside a GeoLift Study

CTC designs and deploys these tests through the Statlas Data Science Platform, which provides end-to-end management of GeoLift studies, from recommendation outputs to test specification, validation, and results.

Statlas Data Science Platform: GeoLift Studies, MMM, New Customer Acquisition Models, and more

GeoLift Recommendation Output: Treatment vs Control time series with confidence bands, and Power Curve showing statistical reliability at different spend levels

GeoLift Results: Incremental lift by channel, test/control geographic regions, spend and revenue time series, and detailed statistical output

04 — Building Progressive Truth

We want to build the largest database of test results, both in aggregate across all brands and for each individual brand, to continuously reduce the error rate of any presently applied measurement system.

Stage 1: Aggregate Benchmark

Start with the aggregate benchmark across all tests ever run × the brand's platform-reported revenue.

Stage 2: First Test Result

Weight the individual result relative to its confidence level. Move toward it, but not all the way.

Stage 3: Accumulation

As more results come in, the median of the set represents the measurement with the lowest error.

The Progressive Truth in Practice

Imagine a scatter plot of incrementality test results over time. Each dot represents a single test. Early on, the dots are sparse and widely scattered. The confidence in any single result is low. But as tests accumulate, patterns emerge. The median converges. Error shrinks.

Incrementality Test Results Over Time: Each point represents a single geo holdout test. The median converges as data accumulates.

The key insight: the starting point for any brand should be the aggregate benchmark. The second an individual test result arrives, we weight it relative to its confidence and shift toward it. With each subsequent test, the system gets less wrong.

05 — The Database of Truth

CTC maintains one of the largest proprietary databases of incrementality test results in eCommerce. This is not theoretical. These are real geo holdout tests, run across real brands, with real dollars.

CTC Test Database — Current State
  • 146 Total Tests — Across 80 stores
  • 115 Completed — 31 currently running
  • 59 Finalized — Avg iROAS: 1.24

Statistical Confidence of Finalized Results:

  • 28 — Strong Stat Sig (p < 0.05)
  • 11 — Stat Sig (p < 0.10)
  • 12 — Directional (p < 0.20)

Statlas Test Status Summary: MMM Recommendation Ranges showing platform performance metrics

Channel-Level Benchmarks

With this database, we can see the incrementality profile of every major channel. These benchmarks serve as the starting point for any brand before they have their own test results.

Platform / Ad Type Stores Factor Range Factor Median % Median
Facebook / Acquisition 45 0.5 – 2.4 1.14 63%
AppLovin / All 2 0.3 – 1.5 1.47 10%
YouTube / All 24 0.3 – 12.1 1.10 3%
GoogleAds / Non-Brand 44 0.4 – 140.0 0.67 16%
Facebook / Non-Acquisition 20 0.3 – 1.1 0.60 6%
TikTok / All 15 0.1 – 7.0 0.50 4%
Snapchat / All 3 0.2 – 1.7 0.30 9%
Pinterest / All 9 0.2 – 1.0 0.30 5%
GoogleAds / Brand 39 0.1 – 36.0 0.27 8%
Affiliate / All 1 0.1 – 0.1 0.13 0%

The data tells a clear story. Facebook Acquisition (median iROAS 1.14, 45 stores tested) is the most reliably incremental channel. Google Branded Search (median iROAS 0.27, 39 stores) confirms what the theory predicts: platforms dramatically overreport on last-click channels. And the ranges show why single test results are insufficient. Facebook Acquisition ranges from 0.5× to 2.4× across different brands and periods.

These factors are not permanent truths. They are the current best approximation based on the aggregate of all available evidence. As new test results come in, every number in this table will be refined. That is the system working as designed. — The Progressive Truth Principle

06 — From Snapshot to Distribution

A single test result is a snapshot. Valuable, but incomplete. By adopting a principle of always-on testing, we transform that snapshot into a distribution of potential incremental outcomes for each individual business.

As individual test results accumulate for a brand, the bounds of that distribution represent the total possible error in either direction. The median represents the point at which we are, at any given time, most likely to predict the future outcome.

Individual Brand Test Result Distribution: Always-on testing builds a distribution of outcomes. The median converges toward predictive accuracy.

Beyond the Simple Median

As the database of test results grows, the system becomes increasingly sophisticated. Seasonal effects, sale moments, and other variables can be incorporated. Rather than applying a simple median across all results, the factor can be adjusted dynamically based on the conditions that most closely match the present moment.

Dynamic Factor Adjustment

With enough data points across seasons, promotional periods, and varying spend levels, we can weight test results that most closely resemble current conditions more heavily than those from dissimilar periods. The measurement system becomes context-aware.

07 — Apples-to-Apples Across Channels

One of the most powerful benefits of this measurement system is that it enables true like-for-like comparison between channels and ad products. Without normalization, comparing Meta acquisition spend to Google branded search is meaningless. The platforms report on different attribution windows with fundamentally different relationships to incrementality.

The Google Branded Search Problem

Google branded search is a well-understood example of this distortion. Within Google Ads, which by default reports on a 28-day click, 1-day view attribution setting, branded search will dramatically overreport its return on ad spend.

Why? Because branded search is a final-step action for many shoppers on their path to purchase. The customer was already going to buy. They typed your brand name into Google, clicked your ad, and completed the purchase. The ad gets credit, but the purchase was still likely to have occurred had the ad not appeared.

Normalizing iROAS Across Channels
Meta Acquisition Google Branded Search
Attribution 7-day click 28-day click, 1-day view
Platform ROAS 3.2× 12.5×
iROAS 3.7× (×1.15) 3.1× (×0.25)
Reality Platform underreports by ~15% Platform overreports by ~4×

The Comparison: With normalized iROAS, capital allocation decisions can be made on a true like-for-like basis across every channel.

By running incrementality studies on every channel and computing the incrementality factor (incremental revenue divided by ad spend, compared to platform-reported revenue), we create a normalized iROAS that allows a Profit Engineer or brand to compare the performance of their media channels on a like-for-like basis.

This is what enables real capital allocation. Without normalization, a brand looking at 12.5× ROAS on branded search and 3.2× on Meta acquisition would rationally shift budget toward search. With normalized iROAS, the picture inverts entirely.

Operationalized in Statlas

These incrementality factors are not theoretical. They are operationalized in CTC's Statlas platform through the MMM Roadmap, which uses test-derived iROAS to prioritize channel allocation for each brand.

Statlas MMM Roadmap: Channel prioritization and budget allocation driven by incrementality factors

Channel Performance with iROAS Applied

When incrementality factors are applied to each channel, spend, revenue, CPA, and ROAS are all normalized to their incremental values, enabling true apples-to-apples comparison across every media dollar.

Channel Performance: Google, YouTube, and Facebook with incrementality factors applied to normalize revenue and ROAS

08 — The Degradation Curve

There is one more critical dimension to measurement: the relationship between spend volume and incrementality. It should never be assumed that a test result at $X of spend would replicate the same results at 2× the spend.

As spend scales in any channel, incremental returns typically degrade. The first dollar of Meta spend is more incremental than the millionth. Understanding where you sit on this curve is essential to optimizing allocation.

Incremental Return Degradation by Spend Level: As spend increases, incremental ROAS degrades. Scale-in tests reveal the shape of this curve.

The Scale-In Test

To map the degradation curve, CTC uses a scale-in test design. For a defined period, spend is increased in a channel while comparing results against the baseline volume of spend versus completely excluding spend. This reveals how incrementality changes as investment grows.

Baseline Spend Level

The current spend level where incrementality has been measured. This is the known data point.

Scale-In Period

A controlled period of increased spend, measuring the incremental revenue generated at the new, higher level against both the baseline and zero spend.

Degradation Profile

The resulting data paints a curve showing how incremental returns change at different spend levels. This informs the optimal allocation point for each channel.

09 — iROAS Is Subordinate to Reality

Even a well-calibrated incremental return on ad spend should always be subordinate to the realities of revenue and contribution margin in a given period of time. We should always assume some amount of error in the system.

When iROAS and actual business outcomes are incongruent, meaning one is moving up while the other is flat or moving in the opposite direction, that is the signal to examine the underlying measurement system and recalibrate.

The Congruence Test
  • ✓ Congruent: iROAS improving → Revenue growing → CM expanding. System is likely well-calibrated. Continue testing to refine.
  • ✗ Incongruent: iROAS improving → Revenue flat or declining. Measurement system is drifting from reality. Recalibrate immediately.
The measurement system exists to serve capital allocation decisions, not the other way around. When the map disagrees with the terrain, trust the terrain.

10 — Measure the Totality of Your Distribution

Your measurement system must, when possible, measure the effect of the totality of your distribution. If you want to measure the total effects of Meta on your business, you must include not just .com revenue but also Amazon and wholesale.

One of the biggest mistakes brands make is suffocating their ad spend by diminishing the perceived value, measuring only the effects on their .com revenue. This systematically understates the true return on media investment.

Direct Response Media

Click-based optimization (Meta conversion, Google Shopping). Linear path: ad → click → purchase. Most revenue captured on .com. Amazon and Retail capture smaller shares.

Upper Funnel Media

Non-click optimization (YouTube, TV, Audio, Brand). Non-linear path: ad → awareness → purchase wherever convenient. Amazon captures a significant share; Retail captures a meaningful share.

The less direct the path from ad to purchase, the more broadly the revenue effect distributes across your channels. A YouTube campaign doesn't drive clicks to your site the way a Meta conversion campaign does. It drives awareness that realizes itself wherever the consumer chooses to buy, whether that's .com, Amazon, or a retail store.

Your marketing measurement must match your distribution. If you sell on .com, Amazon, and retail, your measurement system must capture the effects across all three. Anything less is systematically biased against upper-funnel investment and will lead to chronic underinvestment in the media most likely to grow the total business.

11 — How Long to Measure

A common point of discussion is the horizon of tests: how long should we measure the effects of a given amount of spend in a channel over time?

When possible, we should allow for a post-treatment window to be examined under whatever hypothetical horizon someone wants to understand the effects. However, there is a critical principle here:

No long-term effects should be assumed. They need to be validated in order to be included in the consideration of the measurement. There is a large temptation to want to constantly find evidence for the long-term effects of media to justify spending decisions. Resist this temptation.

If a brand wants to understand the 60-day or 90-day revenue impact of a two-week test, the post-treatment data can be examined. But the burden of proof lies with the data, not with the hypothesis. Long-term effects are real for some channels, negligible for others, and the only way to know is to measure them directly.

✓ Validated long-term effects

Post-treatment windows with measurable lift above baseline, confirmed through the geo holdout control group. These can be incorporated into the measurement system with the appropriate confidence weighting.

✗ Assumed long-term effects

Theoretical multipliers applied to justify spend without supporting test data. These introduce fiction into the system and widen the gap between measurement and reality.

12 — End-to-End Accountability

One of the unique things that CTC brings to the table is the ability to design, deploy, and report on geo holdout tests as a part of our core service offering. But the real differentiator goes further: we have the obligation to operationalize those results in your ad account and to bring those effects to life in our decision-making and reporting.

Most Measurement Platforms
  • Design and run tests
  • Deliver a report with results
  • Suggest actions
  • Stop here ←

No accountability for whether results are acted upon or whether they actually improve outcomes.

CTC Profit Engine
  • Design and run tests
  • Deliver results with interpretation
  • Operationalize in your ad accounts
  • Calibrate cost controls and bid targets
  • Report on realized business outcomes
  • Be accountable to the result ←

The reality is that most testing processes don't yield single definitive data points that are easy to turn into action. Interpreting ambiguous data for the sake of making better day-to-day decisions is the great challenge. That is where CTC seeks to be accountable: to take the information, design the tests such that we can create application, and make better decisions that ultimately affect your business.

Statlas Data Science Platform: GeoLift studies, MMM, acquisition and retention modeling

The Cost Control Connection

Because CTC deploys a system of media buying that involves cost controls and constraints, the signal input to the platforms about your desired outcome, appropriately calibrating those targets is critical. Without a clear view of the actual incremental impact of your media and how it relates to the data that Meta or Google is optimizing for, you are likely designing a system that moves you off course.

Incrementality factors directly inform cost control targets. The measurement system and the execution system are not separate. They are one loop.

13 — A Better Starting Point for Every Brand

Because CTC maintains a system where test results across all customers are constantly tracked and aggregated, we have a dynamic set of benchmarks that evolves in real time. This allows for a closer approximation of reality prior to any individual testing for a new business.

If a brand does not yet have any of their own test results, we can still offer them a better starting point for channels like Meta, Google, TikTok, AppLovin, or any other platform where we have accumulated test data. This helps to reduce the amount of waste that has to occur prior to gaining the knowledge that CTC has spent years gathering.

CTC Aggregate Benchmark Database

Starting points derived from cross-brand test results, updated continuously:

  • Meta — 1.15× (Largest sample)
  • Google — Varies (Brand vs Non-Brand)
  • TikTok — Growing (Expanding dataset)
  • AppLovin — Emerging (Early tests)
  • Others — Adding (Pinterest, Snap, etc.)

14 — The Measurement Philosophy

Our measurement philosophy involves building an ongoing roadmap toward a progressively better truth. One that allows us to get closer and closer to the approximation of reality and the causal effects of our media at any given point in time.

The best measurement methodology will include:

  1. A constant pursuit of new data points. Always-on testing, across channels, at varying spend levels, to continuously inform and refine the existing measurement system. The database of truth never stops growing.
  2. A period of application that builds, not undermines. Once data informs the system, it should be applied. But that application should never seek to undermine the present set of information. Instead, it should hypothesize, design a new test, and add to the dataset. Each cycle of apply → hypothesize → test → refine moves us closer and closer to what is true.
  3. Subordination to business reality. No measurement system operates above the realities of revenue and contribution margin. When the model and the P&L disagree, the P&L wins. Recalibrate the model.
  4. End-to-end accountability. Testing without operationalization is academic exercise. The value is in turning ambiguous data into better capital allocation decisions, and being accountable to the outcome.
  5. Dynamic benchmarks from aggregate intelligence. Every brand benefits from the collective knowledge of all tests ever run. No brand starts from zero. The system compounds across the entire portfolio.
  6. Honest treatment of uncertainty. Long-term effects are validated, not assumed. Error is acknowledged, not hidden. The goal is never false precision. It is less wrong, faster.
The measurement system is only as good as its ability to lead to a better allocation of capital across the available media channels.

That is the standard. That is what we measure ourselves against. Not the elegance of the model, but the quality of the decisions it produces.

Taylor Holiday is the CEO of Common Thread Collective. A former professional baseball player who lucked into entrepreneurship over a decade ago, Taylor lives in Southern California with his amazing wife and three kids — “who are my world.” He’d love to connect with you on Twitter or LinkedIn.