Math foundations
Logistic Regression: Linear Boundary, Probability Score
The idea
Logistic regression extends the line fit you already know into classification. A linear combination of features passes through a sigmoid to produce a probability between 0 and 1. Fraud models, churn scores, and hiring rankers often start here.
Logistic regression answers: What is the probability this row is positive, and where is the linear decision boundary?
Example: linear boundary and sigmoid probability
Adjust weights to rotate the boundary. The cutoff turns probability into a class label.
Logistic regression draws a linear boundary and outputs a probability, not just yes/no.
Feature space (Velocity vs Amount)
Sigmoid curve
Orange line = classification cutoff
44% train acc.
Boundary at 50% probability misclassifies many points. Logistic regression assumes a linear separator; nonlinear patterns need trees or more features. Sample score at center: 92%.
The math
Logistic regression keeps the linear structure of regression but outputs a probability. The sigmoid squashes the score; cross-entropy (see the loss post) teaches the weights; gradient descent fits them.
Linear score (logit input)
w₀ is the intercept (baseline log-odds). Each wⱼ is how much feature j pushes the score up or down, holding other columns fixed. Same form as y = a + bx, but z is not the final prediction.
Sigmoid
Maps any real z to (0, 1). z = 0 gives P = 0.5. Large positive z approaches 1; large negative z approaches 0. This is the number fraud and churn dashboards show before policy applies a cutoff.
Log-odds (logit)
The linear part models log-odds, not probability directly. Adding one unit to a feature multiplies odds by e^wⱼ. Coefficients are interpretable on the log scale.
Decision boundary
In two features, z = 0 is a line. In n features, it is a hyperplane. Points on one side score above 0.5, the other below. Nonlinear boundaries need feature engineering, trees, or other models.
Training objective
Fit weights by minimizing classification loss (usually cross-entropy) on labeled rows. Optional λ penalty shrinks coefficients, as in ridge, when features correlate.
Multiclass extension
More than two classes: one score per class, softmax turns scores into probabilities that sum to 1. Same linear core, wider output layer.
Where teams get stuck
Treating 0.87 as calibrated. A score is a ranking aid until calibration checks pass. See the model calibration post.
Reading coefficients as causal. Correlated inputs share credit. wⱼ is a lever only when features are independent enough to interpret.
Linear boundary on nonlinear data. If the explorer boundary misclassifies obvious clusters, add interaction terms, polynomial features, or switch family (trees, k-NN).
A simple application
Fraud ops ships a logistic score from velocity and amount. Policy auto-blocks at 0.85. The math post stops at the boundary; the classifier metrics post asks whether 0.85 precision and recall are good enough, and threshold tradeoffs adds queue capacity and dollar cost.