Math, Applied

The Model Looked Perfect on Past Data: Overfitting in Real Decisions

The idea

A forecast can look excellent on the weeks you used to build it and fall apart on weeks you held back. The model memorized noise, seasonality quirks, and one-off promos instead of learning a pattern that repeats.

Remember it in one line: if it only works on history you already saw, it is not ready for next month's budget.

Overfitting is not a data science buzzword. It is the gap between training performance and holdout performance. Finance sees it when inventory plans miss. Marketing sees it when ROI models break after a channel mix change. Product sees it when churn scores misfire on new accounts.

Overfitting answers: Did we fit the past, or did we fit the future?

Example: train error falls while holdout error rises

Drag model complexity. A flexible curve hugs training history. Holdout weeks tell you if the forecast will survive new data.

Finance trusts a wiggly forecast that memorized last quarter.

Model complexity: 7 / 10

Simple (few parameters)Flexible (many parameters)

Training weeks (model fit)

Train error

11.2%

Holdout error

30.8%

Train

Holdout

Holdout error is 30.8% vs train 11.2%. The model is memorizing history, not forecasting new weeks.

The math

The warning sign

generalization gap = holdout error − train error

Train error keeps falling as you add variables and flexibility. Holdout error often bottoms out, then rises. The widening gap is overfitting.

Why it happens

more parameters + short history → easier to memorize

Twelve weeks of data cannot support twenty interaction terms. Each extra dial lets the curve bend to fit random wiggles that will not repeat.

What to do

trust holdout weeks you did not use to fit the model

Split time: fit on early weeks, score on later weeks. Prefer simpler models when the holdout gap opens. Regression posts cover the mechanics; this post covers the decision to ship or wait.

A simple application: the forecast review

Ops presents a weekly order forecast with 4% train error. Finance asks for holdout weeks: error jumps to 19%. The team drops three channel interaction terms, error on holdout falls to 9%, train error rises slightly. That is a model you can plan inventory against.

Forecast review: train vs holdout

Add model complexity. Train error falls while holdout error rises — memorizing history.

Model complexity (terms): 12

Train 11% error, holdout 14% — gap 3 pp

Error (%)

Train: 11% · Holdout: 14%

Complexity vs holdout

Low: 8% · You: 14% · High: 22%

Train error

11%

Holdout error

14%

Terms

Optimize (move here)

• Always show train and holdout together
• Cap complexity when history is short

Hold (do not over-react)

• Shipping forecast on train error alone

Escalate if

• Holdout error worsens after adding drivers

Generalization gap is acceptable for this complexity.

The habit: always show train and holdout together. Cap complexity when history is short. Pair with sample size and confidence intervals before you lock spend.