The Wald Test

Let $θ$ be a scalar parameter, let $\hat{θ}$ be an estimate of $θ$ , and let $\hat{SE}$ be the estimated standard error of $\hat{θ}$ .

Hypothesis testing says that a test is built by choosing a test statistic $T$ and a critical value $c$ , then rejecting when $T$ looks too extreme under $H_{0}$ . The Wald test is one of the most common ways to do this when an estimator is approximately Normal.

Definition (The Wald Test)

Consider testing
$H_{0} : θ = θ_{0} versus H_{1} : θ \neq = θ_{0} .$
Assume that $\hat{θ}$ is asymptotically Normal:
$\frac{θ ^ - θ}{SE ^} ⇝ N (0, 1) .$
The size $α$ Wald Test is: reject $H_{0}$ when $∣ W ∣ > z_{α /2}$ where
$W = \frac{θ ^ - θ _{0}}{SE ^} .$

Theorem

Asymptotically, the Wald test has a size $α$ , that is,
$P_{θ_{0}} (∣ W ∣ > z_{α /2}) \to α$
as $n \to \infty$ .

Intuition

The Wald statistic

W = \frac{θ ^ - θ _{0}}{SE ^}

measures how many estimated standard errors the observed estimate $\hat{θ}$ is away from the null value $θ_{0}$ .

If $H_{0}$ is true, then $θ = θ_{0}$ , so we expect

W \approx N (0, 1) .

That means values of $W$ near $0$ are typical under the null, while very large positive or negative values are unlikely under the null. So the Wald test says:

Intuition

Pretend the null value $θ_{0}$ is the truth. Then ask: would an estimate this far from $θ_{0}$ , relative to its usual noise level, be surprising?

If yes, reject $H_{0}$ . If no, keep $H_{0}$ .

This is exactly the general hypothesis testing template with

T (x) = \frac{θ ^ ( x ) - θ _{0}}{SE ^ ( x )} and c = z_{α /2} .

So the rejection region is

R = {x : \frac{θ ^ ( x ) - θ _{0}}{SE ^ ( x )} > z_{α /2}} .

Why Standardize?

The raw difference $\hat{θ} - θ_{0}$ by itself is not enough, because the same absolute difference can be very meaningful or not meaningful depending on the estimator’s variability.

If $\hat{SE}$ is small, even a modest difference from $θ_{0}$ is strong evidence against $H_{0}$ .
If $\hat{SE}$ is large, the same difference may just be sampling noise.

Dividing by $\hat{SE}$ puts the discrepancy on a common scale: “how many noise levels away from the null are we?”

Connection to Confidence Intervals

The Wald test is the testing version of a normal-based confidence interval. The usual approximate $1 - α$ interval is

\hat{θ} \pm z_{α /2} \hat{SE} .

The null hypothesis $H_{0} : θ = θ_{0}$ is rejected exactly when $θ_{0}$ is outside this interval, since

\frac{θ ^ - θ _{0}}{SE ^} > z_{α /2}

is equivalent to saying that $θ_{0}$ lies more than $z_{α /2}$ standard errors away from $\hat{θ}$ .

This is often the cleanest way to remember the Wald test:

Note

A Wald test rejects a null value precisely when that null value is not plausible according to the corresponding normal-based confidence interval.

Example

Suppose $X_{1}, \dots, X_{n} \sim Bernoulli (p)$ and we want to test

H_{0} : p = p_{0} versus H_{1} : p \neq = p_{0} .

Let $\overset{p}{^} = n^{- 1} \sum_{i = 1}^{n} X_{i}$ . Since

\overset{p}{^} \approx N (p, \frac{p ( 1 - p )}{n}),

we estimate the standard error by

\hat{SE} = \frac{p ^ ( 1 - p ^ )}{n} .

The Wald statistic is then

W = \frac{p ^ - p _{0}}{p ^ ( 1 - p ^ ) / n} .

If $∣ W ∣ > z_{α /2}$ , we reject $H_{0}$ .

Informally: if the observed sample proportion is several estimated standard errors away from the hypothesized proportion $p_{0}$ , then $p_{0}$ is not a convincing explanation for the data.

Caveat

The Wald test is simple and widely used, but it can be inaccurate in small samples, for skewed estimators, or when the parameter is near the boundary of its parameter space. In those settings, score tests, likelihood-ratio tests, or bootstrap-based methods can behave better.

Jake Tuero

Explorer

The Wald Test

The Wald Test

Intuition

Why Standardize?

Connection to Confidence Intervals

Example

Caveat

Table of Contents