Math, Applied
Decision Trees: Readable Rules for Classification
The idea
A decision tree asks a sequence of if-then questions on individual features. Each internal node is a split; each leaf is a class label. Ops and compliance teams can read the policy. The tradeoff is depth: shallow trees generalize; deep trees memorize training noise.
Decision trees answer: Can we classify with explicit rules instead of a weighted score?
Example: axis splits carve readable regions
Increase tree depth to add vertical and horizontal rules. Each split is a policy line ops can audit.
Each split is an if-then rule ops can read: amount high AND velocity high.
Amount (horizontal) · Velocity (vertical) · Purple = splits
Train accuracy
72%
2 split(s)
Depth 2 adds 2 axis-aligned split(s). Readable for Approve vs Review; deeper trees memorize noise.
The math
A tree recursively partitions feature space into boxes. Each split picks one feature and one threshold to make child nodes purer. Leaves output the majority class in that box.
Axis-aligned split
Only one feature j is tested at each node. Splits are vertical or horizontal lines in 2D. Rules read as if-then chains: if amount > 500 and velocity > 3, then review.
Gini impurity
pₖ is the fraction of class k in a node. Pure node (one class only) has Gini = 0. Split candidates are scored by how much weighted Gini drops in the children.
Entropy
Alternative impurity measure. Also zero when a node is pure. Algorithms often default to Gini or entropy; results are usually similar on business tabular data.
Leaf prediction
Majority vote among training rows that reach the leaf. Regression trees average numeric targets instead.
Depth and capacity
Deeper trees fit more jagged regions. Shallow trees are smoother policies. The explorer shows how each depth adds splits and lifts training accuracy.
Stopping rules
Prune growth when a node has too few rows, gain is tiny, or depth hits a cap. Stopping early is the main defense against memorizing noise.
Where teams get stuck
Training accuracy 99%, holdout 70%. Tree memorized quirks. Cap depth, require minimum leaf size, or use random forests for variance reduction.
Unstable rules week to week. Small data changes flip early splits. Ensemble many trees or prefer logistic scores when stability matters more than readability.
A simple application
Fraud and hiring teams often prototype with trees before shipping logistic scores. Compliance can read the if-then path. Pair with the overfitting post when depth grows: training accuracy rises while holdout stalls. Pair with logistic regression when you need a smooth score and calibrated probability.