Math foundations

Sensitivity, Specificity, and the 2×2 Screening Table

Two by two screening table with TP FP FN TN

The idea

Every classifier, screen, or alert splits the world into four boxes: true positive, false positive, false negative, and true negative. Sensitivity and specificity describe how the test behaves on known positives and negatives. They are not what reviewers feel day to day.

Positive predictive value is the conditional probability that a case is truly positive given a positive flag. That number depends on prevalence. A strong test on a rare outcome still floods the queue with false alarms.

The 2×2 table answers: How does this test split reality, and what does a positive flag mean in our population?

Example: the 2x2 screening table

Sensitivity and specificity describe the test. Positive predictive value describes what a positive flag means in your population.

90% sensitivity sounds strong until you read the false-positive column.

Prevalence: 2.0%

Sensitivity: 90%

Specificity: 92%

	Test +	Test -
Actually +	180 TP	20 FN
Actually -	784 FP	9016 TN

PPV (precision): 19% · NPV: 100%

Sensitivity 90% catches most cases, but only 19% of flagged rows are true positives. Positive predictive value is what reviewers feel in the queue.

The math

Sensitivity (recall on positives)

sensitivity = TP ÷ (TP + FN)

Of all true positives, how many did the test catch? Same as recall on the disease-plus row.

Specificity

specificity = TN ÷ (TN + FP)

Of all true negatives, how many cleared the test? High specificity means fewer false flags.

Positive predictive value

PPV = TP ÷ (TP + FP)

Of everything flagged positive, how many are real? This is precision in the flagged pile. It is what base rates and threshold posts build on.

A simple application

Model cards often lead with sensitivity. Ops leads should ask for the full table with your prevalence. Staffing, auto-block policy, and reviewer training depend on PPV and queue volume, not sensitivity alone.

The habit: draw the four counts before you debate thresholds. If false positives dominate the flagged column, tighten specificity or raise the cutoff before you add headcount.