Math, Applied
Ridge Regularization: Shrink Unstable Coefficients Without Dropping Features
The idea
Ordinary least squares can assign huge, opposite-signed coefficients when inputs correlate. Ridge adds a penalty on coefficient size. Higher λ pulls betas toward zero, stabilizing predictions on collinear features without deleting columns from the model.
Ridge answers: Can I keep all drivers in the model while making coefficient swings less violent?
Example: ridge penalty shrinks unstable coefficients
Drag λ (regularization strength). Higher λ pulls coefficients toward zero and stabilizes collinear inputs.
Ridge shrinks unstable ad and search coefficients when both channels correlate.
Ad spend
0.20
OLS: 0.41
Branded search
0.67
OLS: 1.38
Ridge at λ = 35% reduces coefficient swing vs OLS (0.41, 1.38). Better for prediction than causal attribution.
The math
Ridge objective
Fit residuals plus a penalty on squared coefficient size. λ controls shrinkage strength.
Closed form (ridge)
Adding λ to the diagonal of XᵀX dampens ill-conditioned directions from collinearity.
Shrinkage
As λ grows, coefficients move toward zero. Predictions often stay stable while interpretability as causal credit does not improve.
A simple application: collinear marketing channels
When ad spend and branded search rise together, OLS coefficients flip week to week. Ridge keeps the forecast usable for planning but does not turn correlated inputs into clean attribution. Pair with the multicollinearity post before presenting driver slides.
Marketing mix: ridge vs raw OLS on collinear channels
Increase λ to shrink ad and search coefficients when both channels correlate. Stabilize forecasts without pretending attribution is clean.
Ridge ads 0.23 · search 0.64 · r ≈ 84%
Ridge coefficients
Ad spend: 0.23 · Branded search: 0.64
λ
35%
Correlation
84%
Stability
Improved
Optimize (move here)
- • Use ridge when you must keep correlated features for prediction
- • Cross-validate λ on holdout weeks
Hold (do not over-react)
- • Budget splits from ridge coefficients when r > 0.8
Escalate if
- • λ above 50% and coefficients still flip sign on refit
- • Leadership asks for causal driver credit on collinear channels
Coefficients shrink toward zero. Better for stable forecasts; still not clean causal attribution.
The habit: try ridge when you must keep correlated features and care about prediction stability. Still plot correlation and avoid causal language on individual coefficients.