Marketer pack
Claude Skill
A/B Test Setup Coach
Designs an A/B test that can produce a real conclusion — not "the test was inconclusive" 4 weeks in.
What it does
Helps you design a test before launching: hypothesis, success metric, sample size, duration, what could go wrong. Plus the analysis plan so you know upfront how to read the result.
When to use
- ✓Before any A/B test, especially landing page / CTA / pricing changes
- ✓When your last few tests were "inconclusive" and you want to know why
- ✓Pre-quarter when planning what to test
When not to use
- ✗You have < 1000 conversions/week — sample size will betray you
- ✗The change is small and clearly not worth a test — just ship it
Install
Download the .zip, then unzip into your Claude skills folder.
mkdir -p ~/.claude/skills
unzip ~/Downloads/ab-test-setup-coach.zip -d ~/.claude/skills/
# Restart Claude Code session.
# Skill is now available — Claude will use it when relevant.SKILL.md
SKILL.md
---
name: ab-test-setup-coach
description: Use when designing an A/B test or experiment. Triggers on "A/B test", "split test", "experiment setup", "test hypothesis".
---
# A/B Test Setup Coach
A/B tests fail in predictable ways: underpowered, ambiguous metric, leaky variants, novelty effect. Designing for these upfront beats discovering them at week 4.
## Required inputs
1. **What you're testing** — the specific change between control and variant
2. **Hypothesis** — what you predict and why (not "it'll improve things")
3. **Success metric** — the ONE primary metric you'll judge by
4. **Guardrail metrics** — things that shouldn't get worse
5. **Current baseline** — conversion rate, sample volume per week
## Output
### Sample size calculation
- Detect a Y% relative lift in metric X
- At Z% statistical power
- With confidence interval of W%
- → need N conversions per variant
If N exceeds what you can get in 4-6 weeks, you can't run this test as designed. Options: pick a bigger effect to detect, lower the power requirement, or pick a different test.
### Test design
- **Sample allocation** — 50/50 typically; 90/10 if risk of broken variant is high
- **Duration** — minimum 2 weekly cycles, prefer 4 weeks for B2B
- **Randomization unit** — user (most common) vs. session vs. account
- **Stopping criteria** — fixed duration (no peeking) or sequential testing
### Guardrails
Variables that shouldn't get worse even if primary improves:
- Revenue per user
- Day-7 retention
- Support ticket volume
- Specific segment performance
### Analysis plan (write BEFORE running)
- Primary metric pass/fail criteria
- How you'll handle small-but-significant results
- How you'll handle null results
- What you'll do if guardrails are breached
- How long after test ends before you make the call (to let lagging metrics surface)
### Common failure modes to pre-empt
- **Novelty effect** — variant wins in week 1, fades by week 4
- **Selection bias** — variants weren't randomly assigned
- **SRM** (sample ratio mismatch) — split came out 55/45 instead of 50/50, suggests data plumbing issue
- **Cherry-picking segments** — "it worked for enterprise" post-hoc
## Anti-patterns
- Multiple primary metrics (now everything is "directionally positive")
- "Inconclusive" framing for null results (null IS a result)
- Peeking and stopping early when one variant is ahead
- Testing 5 things at once and not knowing which moved the metric
- Shipping the winner without first checking if it has subgroup harm
## When to skip A/B testing
- Effect would have to be huge to matter; just ship the better option
- Sample size will never be enough; do qualitative research instead
- You'd ship the variant regardless of result (then save the cycles)
Example prompts
Once installed, try these prompts in Claude:
- I want to test a new pricing page layout. Current weekly visitors: 8K. Current conversion: 2.3%. Help me set up the test.
- My last 5 tests were "inconclusive." Audit my test design and tell me what I'm doing wrong.