Interpret this A/B test result.
Inputs:
- Hypothesis: {{paste — what you predicted and why}}
- Variants: {{paste — control, treatment(s), what differs}}
- Sample sizes per variant: {{paste}}
- Primary metric + result per variant: {{paste — with confidence intervals if available}}
- Secondary metrics: {{paste — guardrails, downstream effects}}
- Test duration + traffic source: {{paste}}
Output:
## What the test actually says
2–3 sentences. Be precise about effect size, confidence, and the population.
## Statistical significance — and what it means here
- p-value or confidence interval
- Whether the test was powered to detect the effect we care about
- If null: was it a true null or were we underpowered
## Practical significance
Even if statistically significant, is the effect big enough to act on? Cite the absolute lift, not the %.
## Guardrail metrics
Did anything else move? Treat any unexpected secondary movement as a finding, not noise.
## Confounders to rule out
- Novelty effect (early-test boost that fades)
- Day-of-week or seasonal contamination
- Selection bias in who saw which variant
- SRM (sample ratio mismatch — the split wasn't 50/50)
- Cohort drift (the population changed mid-test)
## Decision
One of: ship, kill, iterate, extend. With the specific reason.
## What I learned beyond ship/kill
Even when the call is clear, what did this test teach about the user / the system / our priors? This is where most A/B testing leaves money on the table.
Hard rules:
- Null is a result, not a failure. Don't reinterpret a null as "trending positive."
- If we're going to ship a 0.5% lift that isn't sig, say so. Speed-of-iteration is a real reason; "the test said so" isn't.
- Distinguish "we'd ship this if it were sig" from "we wouldn't ship this even if it were sig" before reading results.ab-testingstatisticsproduct-analytics