Math foundations

Least Squares: Why Squared Error Picks One Best Line

Residuals squared minimized by best fit line

The idea

Many lines can pass near your points. Least squares picks the one that minimizes the sum of squared residuals. Squaring penalizes big misses more than small ones and keeps the math differentiable enough to solve in closed form.

Least squares answers: Which line makes the total squared prediction error smallest?

Example: total squared error as you move the line

Each point contributes (actual − predicted)². Least squares minimizes the sum.

Intercept: 2.29

Slope: 0.90

Your SSE

0.08

Minimum SSE

0.08

Total squared error = 0.08. This is the minimum for these points. Least squares picks the line where squared residuals sum to the smallest value.

The math

Sum of squared errors

SSE = Σ (yi − ŷi)²

Each point contributes its vertical distance to the line, squared.

Objective

minimize SSE over intercept and slope

The best-fit line from the linear models explorer is the SSE minimizer for those points.

Sensitivity

large outliers pull the line (squared penalty)

One bad week moves the line more than one modest miss. Robust methods use absolute error when outliers dominate.

A simple application

When you report a trend line, you are implicitly reporting the SSE-minimizing line unless you chose another loss on purpose. Pair with overfitting and holdout posts before you trust SSE on training data alone.