Statistical Significance

Statistical significance in PPC is the discipline of distinguishing signal from noise — refusing to make bid or budget decisions on samples too small to be meaningful. It is the structural defence against the Whack-a-Mole Effect.

Why it matters

A keyword with 8 clicks and 0 sales has an ACOS of ∞ — but the result is consistent with a 5% true CVR (probability of 0 conversions in 8 trials at 5% CVR is ~66%). The "high ACOS" is essentially random; bidding the keyword down throws away a probably-decent keyword.

A keyword with 200 clicks and 0 sales has the same nominal ACOS — but the result is inconsistent with any plausible positive CVR. This keyword genuinely doesn't convert.

The two situations look identical in a CSV. They are not the same situation.

Practical thresholds

Most accounts don't need formal hypothesis testing — they need defensible heuristics:

Decision	Minimum sample
Promote search term to exact-match (harvest)	≥3 orders AND ≥15 clicks
Add search term as negative	≥15 clicks AND 0 orders
Bid down a keyword	≥30 clicks in the analysis window
Bid up a keyword	≥3 orders AND ACOS demonstrably below target
Pause a campaign	≥100 clicks across the campaign with 0 orders
Declare a creative test winner	≥1000 impressions per variant AND ≥30 conversions

These are conservative defaults. Higher-velocity SKUs can use shorter windows; lower-velocity SKUs need longer.

CVR confidence intervals

For a more rigorous take, the 95% confidence interval on observed CVR shrinks roughly as ±1.96 × √(p(1-p)/n). Practical implications:

At 30 clicks with 3 conversions (10% observed CVR), the 95% CI is roughly 2%–27%. Wide enough that bid changes based on the point estimate are mostly noise-chasing.
At 300 clicks with 30 conversions (10% observed CVR), the CI tightens to roughly 7%–14%. Now actionable.
At 3,000 clicks with 300 conversions, CI is roughly 9%–11%. Precise enough to bid math against.

Time-windowed vs. event-windowed

Significance is about events, not time. A keyword with 100 clicks in a week and a keyword with 100 clicks in a quarter carry the same statistical weight (modulo external shifts in CVR over the period). Don't measure significance by "the last 7 days" — measure by accumulated events since the last bid change.

Common mistakes

Reading 7-day or daily ACOS on low-volume keywords. The volatility is purely sampling noise.
Declaring an A/B test winner on the first day. Almost never significant.
Combining heterogeneous data to "get more clicks." Aggregating across SKUs or match types to reach significance often pools data with structurally different CVRs and produces a meaningless average.
Ignoring significance for "obvious" decisions. Even obvious decisions can fail at small samples; discipline applies uniformly.

Why it matters

Practical thresholds

CVR confidence intervals

Time-windowed vs. event-windowed

Common mistakes

Related terms

Mentioned in