Math, Applied
Stacking Rare Risks: Alert Fatigue From Many Small False Positives
The idea
Fraud teams add rules. SRE teams monitor many services. Trust and safety stack classifiers on the same content. Each layer might false-alarm on only half a percent of clean cases. Stacked together, clean traffic starts looking suspicious.
This is the same complement math as at-least-one failure, applied to parallel checks on one unit instead of repeated trials over time. The system-level false alert rate is what reviewers and customers experience.
Stacking rare risks answers: If each check is rarely wrong alone, how often does any check fire on a clean case?
Example: many rare checks on one case
Each check has a small false-positive rate. Stack enough checks and clean cases start triggering alerts.
Twenty rules each fire 0.5% false positives on clean orders. Alert fatigue adds up.
System false alert rate
9.5%
Expected false alerts
477
On 5,000 clean cases
At 20 checks, system-level false alert risk is 9.5% even though each check is only 0.5%.
The math
k independent checks
f is false-positive rate per check. k is how many checks run on the same case. Twenty rules at 0.5% each give roughly 9.5% system false alert rate on clean orders.
Queue load
Convert system rate to expected daily false alerts before you add another rule or model to the stack.
Caution
Rules that fire together break the independence assumption. The formula is a planning upper bound when checks are mostly separate signals.
A simple application: rule sprawl
Before shipping rule twenty-one, estimate how many clean cases already trigger any prior rule. Consolidate, raise per-rule specificity, or route through one calibrated scorer instead of stacking noisy gates.
Fraud rules: system false alert rate
Move rule count and per-rule false positive rate. See clean orders trigger any rule more often than each rule alone.
9.5% of clean orders hit any rule (954 / 10,000)
System false alert rate (%)
Per rule: 0.50% · Any rule: 9.54%
Rules stacked
20
System FP rate
9.5%
False alerts / 10k
954
Optimize (move here)
- • List system FP rate when proposing a new rule
- • Route through one calibrated scorer
Hold (do not over-react)
- • Adding rules without estimating queue impact
Escalate if
- • System false alert rate exceeds 15% on clean traffic
Even rare per-rule false positives add up. Estimate system rate before adding another gate.
The habit: every new check lists its standalone false-positive rate and the projected system rate after stacking. Alert fatigue is a probability problem, not only a tooling problem.