Math, Applied

Decision Trees: Readable Rules for Classification

Decision tree axis splits on a scatter plot

The idea

A decision tree asks a sequence of if-then questions on individual features. Each internal node is a split; each leaf is a class label. Ops and compliance teams can read the policy. The tradeoff is depth: shallow trees generalize; deep trees memorize training noise.

Decision trees answer: Can we classify with explicit rules instead of a weighted score?

Example: axis splits carve readable regions

Increase tree depth to add vertical and horizontal rules. Each split is a policy line ops can audit.

Each split is an if-then rule ops can read: amount high AND velocity high.

Amount (horizontal) · Velocity (vertical) · Purple = splits

Tree depth: 2

Train accuracy

72%

2 split(s)

Depth 2 adds 2 axis-aligned split(s). Readable for Approve vs Review; deeper trees memorize noise.

The math

A tree recursively partitions feature space into boxes. Each split picks one feature and one threshold to make child nodes purer. Leaves output the majority class in that box.

Axis-aligned split

xⱼ ≤ t → left child · xⱼ > t → right child

Only one feature j is tested at each node. Splits are vertical or horizontal lines in 2D. Rules read as if-then chains: if amount > 500 and velocity > 3, then review.

Gini impurity

Gini = 1 − Σₖ pₖ²

pₖ is the fraction of class k in a node. Pure node (one class only) has Gini = 0. Split candidates are scored by how much weighted Gini drops in the children.

Entropy

H = −Σₖ pₖ log pₖ

Alternative impurity measure. Also zero when a node is pure. Algorithms often default to Gini or entropy; results are usually similar on business tabular data.

Leaf prediction

leaf label = argmaxₖ countₖ

Majority vote among training rows that reach the leaf. Regression trees average numeric targets instead.

Depth and capacity

depth d allows up to 2ᵈ leaves (full tree)

Deeper trees fit more jagged regions. Shallow trees are smoother policies. The explorer shows how each depth adds splits and lifts training accuracy.

Stopping rules

stop when min samples · max depth · min gain

Prune growth when a node has too few rows, gain is tiny, or depth hits a cap. Stopping early is the main defense against memorizing noise.

Where teams get stuck

Training accuracy 99%, holdout 70%. Tree memorized quirks. Cap depth, require minimum leaf size, or use random forests for variance reduction.

Unstable rules week to week. Small data changes flip early splits. Ensemble many trees or prefer logistic scores when stability matters more than readability.

A simple application

Fraud and hiring teams often prototype with trees before shipping logistic scores. Compliance can read the if-then path. Pair with the overfitting post when depth grows: training accuracy rises while holdout stalls. Pair with logistic regression when you need a smooth score and calibrated probability.