Math, Applied
A/B Test Readouts: Significance Without Jargon
The idea
Experiment readouts often hide behind words like significant or not significant. You do not need that vocabulary to decide. You need three things: observed lift, how many users were in each arm, and whether the uncertainty bands overlap.
If variant beats control by 0.7 points but the 95% intervals still overlap, the result is compatible with no real difference. If bands separate and lift clears your minimum bar, you have a stronger case to ship.
A good readout answers: Is the lift big enough for us, and is the sample large enough that we are not fooling ourselves?
Example: read an A/B test without p-value jargon
Compare observed lift to your minimum bar and check whether the 95% bands overlap. Overlap means the result is still compatible with no real difference.
Control
3.2%
2.3% to 4.4%
Variant
3.9%
2.9% to 5.2%
Lift
+0.7 pts
Min to ship: 0.3 pts
95% bands overlap
Directional only
Variant leads by +0.7 pts on paper, but intervals still overlap. A few hundred more users per arm could settle it.
The math
Observed lift
3.9% variant vs 3.2% control is +0.7 percentage points. That is the headline move. Decision quality depends on whether that move is real or within sampling noise.
Uncertainty per arm
Each rate gets a band. Overlap means both stories could still be true at once: variant ahead by luck, or truly tied. No overlap means the arms are separated at your chosen confidence level.
When to ship (practical rule)
Set a minimum lift that covers engineering cost, risk, or revenue goal. Then check separation. A tiny win with huge samples might be statistically separated but not worth the rollout tax.
Sample size shrinks the bands. Pre-set minimum detectable lift before you launch so you know when to stop. If bands overlap, extend the test or accept a directional read only. This connects directly to the sample size and confidence interval posts: same machinery, decision-first framing.
A simple application: experiment readouts
Product and growth teams paste lift, n per arm, and interval overlap into readout docs instead of a lone p-value. Ops sets reversible rollouts when separation is thin. Leadership asks for the minimum lift bar up front so debates happen before data arrives, not after.
Experiment readout: overlap and next step
Adjust lift and sample size. See when intervals overlap enough to wait vs ship.
Intervals overlap — next step: wait or slice
Conversion (%)
Control: 8.0% · Variant: 9.4%
Lift band (pp)
Low: +0.2 · Lift: +1.4 · High: +2.6
Lift
+1.4 pp
Overlap?
Yes
Per arm
4,000
Optimize (move here)
- • Paste lift, n, and interval overlap into readout docs
- • Set minimum lift bar before data arrives
Hold (do not over-react)
- • Shipping on point estimate alone
Escalate if
- • Aggregate wins but every segment loses
Report overlap explicitly. Ops can run reversible rollout while traffic accumulates.
When you report overlap clearly, the next step is obvious: ship, wait for more traffic, or slice the result before trusting the aggregate. That last step matters when segment mix can flip the story.