Sandhya Indurkar

Math, Applied

Stacking Rare Risks: Alert Fatigue From Many Small False Positives

Many checks stacked with combined false alert risk

The idea

Fraud teams add rules. SRE teams monitor many services. Trust and safety stack classifiers on the same content. Each layer might false-alarm on only half a percent of clean cases. Stacked together, clean traffic starts looking suspicious.

This is the same complement math as at-least-one failure, applied to parallel checks on one unit instead of repeated trials over time. The system-level false alert rate is what reviewers and customers experience.

Stacking rare risks answers: If each check is rarely wrong alone, how often does any check fire on a clean case?

Example: many rare checks on one case

Each check has a small false-positive rate. Stack enough checks and clean cases start triggering alerts.

Twenty rules each fire 0.5% false positives on clean orders. Alert fatigue adds up.

System false alert rate

9.5%

Expected false alerts

477

On 5,000 clean cases

At 20 checks, system-level false alert risk is 9.5% even though each check is only 0.5%.

The math

k independent checks

P(any false alert) = 1 − (1 − f)^k

f is false-positive rate per check. k is how many checks run on the same case. Twenty rules at 0.5% each give roughly 9.5% system false alert rate on clean orders.

Queue load

expected false alerts = volume × P(any false alert)

Convert system rate to expected daily false alerts before you add another rule or model to the stack.

Caution

correlated checks → formula overstates independence

Rules that fire together break the independence assumption. The formula is a planning upper bound when checks are mostly separate signals.

A simple application: rule sprawl

Before shipping rule twenty-one, estimate how many clean cases already trigger any prior rule. Consolidate, raise per-rule specificity, or route through one calibrated scorer instead of stacking noisy gates.

Fraud rules: system false alert rate

Move rule count and per-rule false positive rate. See clean orders trigger any rule more often than each rule alone.

9.5% of clean orders hit any rule (954 / 10,000)

System false alert rate (%)

Per rule: 0.50% · Any rule: 9.54%

Rules stacked

20

System FP rate

9.5%

False alerts / 10k

954

Optimize (move here)

  • List system FP rate when proposing a new rule
  • Route through one calibrated scorer

Hold (do not over-react)

  • Adding rules without estimating queue impact

Escalate if

  • System false alert rate exceeds 15% on clean traffic

Even rare per-rule false positives add up. Estimate system rate before adding another gate.

The habit: every new check lists its standalone false-positive rate and the projected system rate after stacking. Alert fatigue is a probability problem, not only a tooling problem.