Math, Applied
Multicollinearity: When Two Drivers Share the Same Story
The idea
Regression wants to split credit across inputs. When ad spend and branded search move together, the model cannot tell which lever caused the lift. Coefficients become unstable: small data changes flip signs or swap magnitude.
Multicollinearity answers: Are my inputs redundant enough that driver rankings are not trustworthy?
Example: overlapping features in one regression
Drag feature overlap. When two inputs move together, coefficient credit becomes unstable.
Ad spend and branded search move together. Coefficients flip sign when both are in the model.
Ad spend coef
0.50
Branded search coef
1.26
Ad spend and Branded search correlate at ~78%. Coefficients swap magnitude and can change sign with small data changes. Drop one feature or combine them.
The math
Collinear features
Columns of the design matrix point in nearly the same direction in feature space.
Unstable β
Drop one row or add a week and β1, β2 can swing even when predictions stay similar.
What to do
Keep one of the twins, build a composite index, or use ridge penalty. Do not interpret each coefficient as a clean causal lever.
A simple application: marketing attribution
Two channels rise in the same campaign weeks. The regression still forecasts, but coefficient signs are not a budget allocation guide. Check correlation before you present driver slides to leadership.
Marketing mix: correlated drivers in one regression
Move overlap between ad spend and branded search. Coefficients become unstable when both rise together.
Ad coef 0.50 · Search coef 1.26 · r ≈ 78%
Regression coefficients
Ad spend: 0.50 · Branded search: 1.26
Correlation
78%
Stability
Unstable
Overlap
85%
Optimize (move here)
- • Plot feature correlation before driver regression
- • Combine collinear channels into one index
Hold (do not over-react)
- • Splitting budget by raw coefficients when r > 0.8
Escalate if
- • Correlation between drivers exceeds 85%
- • Coefficient sign flips across weekly refits
Driver credit is unreliable. Drop one channel or build a composite before budget slides.
The habit: plot feature correlation before trusting multi-driver regression. Pair with overfitting when you have many correlated KPIs and short history.