Sandhya Indurkar

Math, Applied

At Least One Failure: When Small Risks Compound

Many trials with at least one failure probability

The idea

Each deploy might have a 2% rollback chance. Each day a region might miss SLA with 1% probability. Each check might fail on a clean build 5% of the time. None of those sound alarming alone.

Repeat the trial many times and the question becomes whether at least one failure happens. That probability grows faster than linear intuition suggests.

At-least-one probability answers: If we repeat this risky trial n times, what are the odds something goes wrong at least once?

Example: P(at least one) = 1 - (1 - p)^n

Each trial has the same small risk p. Repeating n independent trials raises the chance that at least one event happens.

Each deploy has a 2% rollback risk. Ten deploys are not 2% system risk.

At least one

18.3%

None happen

81.7%

P(At least one rollback) = 18.3%. Even a small per-trial risk compounds when you repeat the trial many times.

The math

All clean

P(none) = (1 − p)^n

Independent trials all succeed with probability 1 minus p, raised to the n power.

Complement trick

P(at least one) = 1 − (1 − p)^n

Easier to count the chance that nothing bad happens, then subtract from 100%. Ten deploys at p = 2% give about 18% chance of at least one rollback, not 2%.

Compounding

p small, n large → risk approaches 1

Many low-probability trials push system risk up. Reliability planning and on-call staffing should use the system-level read, not the per-trial rate alone.

A simple application: release risk

Engineering leads often quote per-deploy failure rates while product plans ten releases per sprint. Convert to at-least-one risk before you promise stability. Same math applies to SLA misses across regions and flaky checks in CI.

Release planning: at-least-one rollback risk

Move per-deploy rollback rate and release count. See system risk rise faster than the per-deploy number.

18% chance of at least one rollback in 10 deploys

At-least-one risk vs deploy count

Per-deploy risk

2.0%

Sprint risk

18%

Deploys

10

Optimize (move here)

  • Report sprint-level risk alongside per-deploy rate
  • Batch risky changes with rollback drills

Hold (do not over-react)

  • Promising zero incidents from a 2% per-deploy rate over many releases

Escalate if

  • At-least-one risk exceeds 20% for the release train

Per-deploy risk looks small, but sprint-level risk is material. Plan rollbacks and comms.

The habit: when someone says the per-event risk is small, ask how many independent events you run in the planning window. Report both numbers.