Overview
What are Geo-Experiments?
Geo-experiments are controlled tests designed to measure the true incremental impact of marketing activities by comparing test and control groups across different geographic areas.
The methodology works by:
- Dividing geographic areas into two groups:
- Test Group: Areas where the marketing activity will be modified
- Control Group: Areas where marketing continues as usual
- The algorithm selects these groups to be as similar as possible in terms of:
- Historical revenue patterns
- Seasonality
- Market size
- Consumer behavior
- This similarity ensures that any external factors (seasonality, market conditions, other marketing activities) affect both groups equally, allowing us to isolate the specific impact of our test variable.
For example: If we see a 10% increase in the test group and a 5% increase in the control group during a high season, we can attribute the 5% difference to our marketing activity, as both groups were equally affected by seasonality.
Why are they Important?
- Validate model assumptions about channel performance
- Measure true incremental impact of marketing activities
- Calibrate the MMM model with real experimental data
- Inform budget allocation decisions with empirical evidence
- Challenge or confirm platform-reported metrics
Data Requirements
Required Data Points
The experiment requires only two essential data points:
- Date
- Total output variable (e.g., Total Revenue, Total Orders)
<aside> 💡
Important: This should be total figures, NOT channel-attributed numbers
</aside>
Example: Use total daily revenue for each geography, not “revenue attributed to Meta ads”
Historical Data Requirements
- Minimum historical data should be 4-5 times the intended test duration
- For a 1-month test: 4-5 months of historical data
- For a 2-week test: 8-10 weeks of historical data
- Optimal scenario: 1 year of historical data when possible
Geographic Granularity
The platform has specific limitations on data volume to ensure reliable processing:
- Maximum of ~40,000 total rows in the dataset
- This limit applies to the combination of locations and time periods
- Examples of viable combinations:
- 100 locations × 365 days
- 260 locations × 150 days
- Best practices:
- Keep total locations under 260 to maintain ability to use 4-5 months of historical data
- Optimal scenario: ≤100 locations to allow for full year of historical data
- Avoid zip code level granularity in large markets as it often exceeds processing limits
- Use state/region level where possible
Experiment Design
Types of Geo-Experiments
- Hold-out Tests
- Completely stop spending in test regions
- Measures baseline contribution of channel
- Best for validating channel incrementality
- Scale-up Tests
- Increase spending in test regions
- Measures potential for growth
- Best for testing saturation points
- New Channel Tests
- Test new channel in specific regions
- Measures incremental impact of new activity
- Best for validating expansion plans
Design Parameters
- Duration: Typically 14-21 days minimum
- Budget: Determined by the expected lift and ROI
- Geography Selection: Algorithm selects regions to create comparable test/control groups
- Expected Lift: Minimum detectable effect needed for statistical significance
Channel-Specific Considerations
When testing upper-funnel activities (e.g., Meta Awareness, YouTube), consider the delayed effect:
Example Scenario:
-
Test Duration: 3 weeks
-
Channel’s Known Lag Effect: 2 weeks
-
Analysis Approach:
- First Analysis: At the end of 3-week test period
- Final Analysis: At 5 weeks (3 weeks test + 2 weeks lag)
This ensures we capture the full impact, including delayed conversions
This is particularly important for:
- Brand awareness campaigns
- Video advertising
- Content marketing
- Other upper-funnel activities with known lag effects
Understanding Expected Lift
The expected lift shown in the experiment design represents:
- The minimum change needed to validate the input ROI assumption
- NOT a prediction of actual results
- A threshold for statistical significance
- Calculated based on:
- Input ROI/ROAS
- Historical performance
- Geographic variance
- Test duration
Example:
If input ROAS = 10
Expected Lift = -5%
This means: To validate a ROAS of 10, we need to see at least a 5% reduction in revenue when reducing spend. If we see less impact, it indicates the actual ROAS is lower than 10.
Running the Experiment
Implementation Steps
- Select test regions based on Cassandra’s recommendations
- Adjust campaign targeting/budgets accordingly
- Monitor spend and delivery throughout test period
- Maintain consistent tracking and measurement
- Allow for additional time to measure delayed effects
Best Practices
- Avoid major changes to other channels during test period
- Consider seasonal effects and promotional calendar
- Document any external factors that might impact results
- Maintain test conditions for full duration
- Monitor for any technical issues or tracking problems
Results Interpretation
Key Metrics
- Measured lift/impact
- Statistical significance
- Incremental ROAS
- Total attributed revenue
- Confidence intervals
Understanding Results
- Results show actual incremental impact vs. control group
- Significance level indicates reliability of results (aim for 95%)
- Confidence intervals show possible range of true effect
- Compare actual lift to expected lift to validate ROI assumptions
Using Results
- Model Calibration
- Input results into model calibration
- Update channel ROI assumptions
- Refine budget allocation recommendations
- Strategic Planning
- Inform budget allocation decisions
- Guide channel strategy
- Validate or challenge existing assumptions