Math, Applied
At Least One Failure: When Small Risks Compound
The idea
Each deploy might have a 2% rollback chance. Each day a region might miss SLA with 1% probability. Each check might fail on a clean build 5% of the time. None of those sound alarming alone.
Repeat the trial many times and the question becomes whether at least one failure happens. That probability grows faster than linear intuition suggests.
At-least-one probability answers: If we repeat this risky trial n times, what are the odds something goes wrong at least once?
Example: P(at least one) = 1 - (1 - p)^n
Each trial has the same small risk p. Repeating n independent trials raises the chance that at least one event happens.
Each deploy has a 2% rollback risk. Ten deploys are not 2% system risk.
At least one
18.3%
None happen
81.7%
P(At least one rollback) = 18.3%. Even a small per-trial risk compounds when you repeat the trial many times.
The math
All clean
Independent trials all succeed with probability 1 minus p, raised to the n power.
Complement trick
Easier to count the chance that nothing bad happens, then subtract from 100%. Ten deploys at p = 2% give about 18% chance of at least one rollback, not 2%.
Compounding
Many low-probability trials push system risk up. Reliability planning and on-call staffing should use the system-level read, not the per-trial rate alone.
A simple application: release risk
Engineering leads often quote per-deploy failure rates while product plans ten releases per sprint. Convert to at-least-one risk before you promise stability. Same math applies to SLA misses across regions and flaky checks in CI.
Release planning: at-least-one rollback risk
Move per-deploy rollback rate and release count. See system risk rise faster than the per-deploy number.
18% chance of at least one rollback in 10 deploys
At-least-one risk vs deploy count
Per-deploy risk
2.0%
Sprint risk
18%
Deploys
10
Optimize (move here)
- • Report sprint-level risk alongside per-deploy rate
- • Batch risky changes with rollback drills
Hold (do not over-react)
- • Promising zero incidents from a 2% per-deploy rate over many releases
Escalate if
- • At-least-one risk exceeds 20% for the release train
Per-deploy risk looks small, but sprint-level risk is material. Plan rollbacks and comms.
The habit: when someone says the per-event risk is small, ask how many independent events you run in the planning window. Report both numbers.