How many A/B tests you can run per month at your traffic level.
Inputs
Smallest uplift you want to detect
Tests running at once
Results
Tests / month (max)—
Sessions / variant
—
Test duration
—
Tests / year
—
Sensitivity
If your lift target is...
Sample / variant
Test days
Tests / mo
About this calculator
A/B testing programs run into the same wall: not enough traffic to detect meaningful uplifts. Operators set up an A/B test, wait two weeks, see no significance, and conclude "the change doesn\'t work" — when actually their traffic was insufficient to detect the change\'s real effect. This calculator surfaces the constraint.
The math: detecting a 15% lift on a 2% baseline conversion rate requires roughly 5,000-10,000 sessions per variant for 95% confidence. At 80,000 monthly visitors split 50/50 across 2 concurrent tests, each variant gets ~10,000 sessions over the test period. So you can run roughly 2-4 tests per month at this traffic level — not 10.
Smaller-traffic stores are not stuck. Two adaptations: (1) test bigger changes that produce 30-50% lifts (homepage redesigns, drastic value-prop shifts) rather than button colors that produce 5%; (2) extend test windows to 4-6 weeks for incremental changes. Both trade speed for ability to detect signal.
Pair with the A/B Test Sample Size Calculator for specific test sizing and the Conversion Rate Impact Calculator to value the lift you\'re trying to detect. Most operators discover their test program is over-ambitious for their traffic level.
Frequently asked questions
How long should an A/B test run?
Minimum: enough sessions to reach statistical significance — usually 1,000-5,000 sessions per variant for typical CR uplifts. Maximum: 4 weeks. Tests longer than 4 weeks accumulate noise from external variability (weather, news, holidays) that drowns out the signal.
Can I run multiple tests at once?
Yes, with caveats. Multiple tests on the same page can interact (test A's winning variation may behave differently when test B's winning variation is also live). Best practice: run concurrent tests on different pages, OR run them sequentially on the same page, OR use multivariate testing tools that account for interactions.
What CR uplift can I detect at my traffic level?
At 10K monthly visitors, you can reliably detect 15%+ uplifts (e.g., 2.0% → 2.3% CR). Below that, results are noise. At 100K monthly visitors, you can detect 5% uplifts. Most operators waste tests trying to find sub-threshold differences.
What if I don't hit significance?
Three options. (1) Stop the test as inconclusive — neither variant is meaningfully better. (2) Pick the directionally better variant by gut feel. (3) Re-design the test with a more meaningful change (small UX tweaks rarely produce significance even at scale).
Should I cap tests per month?
Yes. Past a point, dev/design overhead outpaces incremental learning. Most teams cap at 1-2 concurrent tests for sub-50K monthly visitors, 3-5 for 50K-500K, 8-15 for 500K+. Beyond 15 concurrent tests you need a dedicated experimentation platform and team.