1. Knowledge base
  2. Data preparation & Model Calibration

Understanding and handling Multicollinearity

Multicollinearity is a common challenge in Marketing Mix Modeling (MMM) that can distort attribution accuracy and reduce confidence in model insights. This guide explains what multicollinearity is, how to detect it, and best practices to resolve it.

1. What is Multicollinearity and Why Does It Matter?

Definition

Multicollinearity occurs when two or more independent variables (marketing channels) are highly correlated, making it difficult to determine the true impact of each channel.

Why It Matters in MMM

  • Unclear Attribution: Strongly correlated variables compete for attribution, leading to unreliable estimates.
  • Inflated Variance: Increased standard errors make model coefficients unstable.
  • Misleading Budget Allocation: Overestimated or underestimated channel contributions can result in inefficient spend recommendations.

2. How to Detect Multicollinearity

Key Indicators

  • High Correlation Between Channels – If two marketing channels have similar spend patterns over time, they are likely correlated.
  • Unstable Model Coefficients – If small changes in data drastically alter estimated contributions, multicollinearity may be present.
  • Variance Inflation Factor (VIF) – A statistical measure that quantifies how much a predictor variable’s variance is inflated due to multicollinearity.
    • VIF > 5: Moderate multicollinearity (requires attention).
    • VIF > 10: Severe multicollinearity (urgent action required).

Detection Tools

  • Cassandra’s Diagnostic Reports – Built-in checks flag potential multicollinearity issues.
  • Correlation Matrices – Review pairwise correlations between marketing channels.
  • Principal Component Analysis (PCA) – Helps identify redundant features by reducing dimensions.

3. How to Resolve Multicollinearity

A. Aggregating Similar Campaigns

  • Combine campaigns with similar targeting, objectives, or platforms to reduce redundancy.
  • Example: Instead of modeling Facebook Video Ads and Facebook Carousel Ads separately, aggregate them into Facebook Ads.

B. Adding Constraints in Model Training

  • Apply regularization techniques such as Ridge Regression (L2) to reduce variance and stabilize coefficients.
  • Introduce priors in Bayesian models to guide attribution toward realistic values.

C. Removing Redundant Variables

  • If multiple channels track similar activity (e.g., Google Search Ads and Branded Search Ads), consider removing the less critical one.
  • If variables contribute little to model accuracy (low statistical significance), remove them.

D. Running Incrementality Tests

  • Use Geo-Experiments or Conversion Lift Studies to isolate individual channel effects.
  • Apply findings to calibrate the model and refine multicollinearity assumptions.

E. Adjusting Model Granularity

  • If correlations exist due to weekly fluctuations, consider modeling at a daily level.
  • Segmenting by campaign type (e.g., Brand vs. Non-Brand) can also improve separation of effects.

4. Summary & Next Steps

  • Monitor multicollinearity indicators like VIF and correlation matrices.
  • Aggregate similar campaigns to prevent redundant attribution.
  • Use constraints and regularization techniques to stabilize model estimates.
  • Leverage experiments to validate individual channel impact and refine modeling decisions.