The Central Limit Theorem

Theorem (The Central Limit Theorem (CLT))

Let $X_{1}, \dots, X_{n}$ be IID with mean $μ$ and variance $σ^{2}$ . Let $\overline{X}_{n} = n^{- 1} \sum_{i = 1}^{n} X_{i}$ . Then
$Z_{n} \equiv \frac{X _{n}}{V ( X _{n} )} = \frac{n ( X _{n} - μ )}{σ} ⇝ Z$
where $Z \sim N (0, 1)$ . In other words,
$n \to \infty lim P (Z_{n} \leq z) Φ (z) = \int_{- \infty}^{z} \frac{1}{2 π} e^{- x^{2} /2} d x .$

The Law of Large Numbers and the CLT both describe what happens to the sample mean as the sample size grows, but they emphasize different aspects of that behavior.

The LLN is a statement about stability: averaging many observations washes out randomness and pulls the sample mean toward the population mean $μ$ .
The CLT is a statement about the shape of the remaining randomness: after centering at $μ$ and scaling by $n$ , the error in the sample mean is approximately normal.

In that sense, the LLN explains why the average becomes reliable, while the CLT explains what the leftover uncertainty looks like for a large but finite sample. This is why the LLN is most useful when the main point is that an empirical average should be close to the truth, whereas the CLT is most useful when approximating probabilities or building confidence intervals. Informally, the LLN says averaging makes noise die out, while the CLT says the leftover noise is approximately Gaussian and has size about $1/ n$ .

The interpretation is that probability statements about $\overline{X}_{n}$ can be approximated using a Normal distribution. Its the probability statements that we are approximating, not the random variable itself.

In addition to $Z_{n} ⇝ N (0, 1)$ , there are several forms of notation to denote that the distribution of $Z_{n}$ is converging to a Normal. They all mean the same thing:

Z_{n} \overline{X}_{n} \overline{X}_{n} - μ n (\overline{X}_{n} - μ) \frac{n ( X _{n} )}{σ} \approx N (0, 1) \approx N (μ, \frac{σ ^{2}}{n}) \approx N (0, \frac{σ ^{2}}{n}) \approx N (0, σ^{2}) \approx N (0, 1) .

Example

Suppose that the number of errors per computer program has a Poisson distribution with mean 5. We get 125 programs. Let $X_{1}, \dots, X_{125}$ be the number of errors in the programs. We want to approximate $P (\overline{X}_{n} < 5.5)$ . Let $μ = E (X_{1}) = λ = 5$ and $σ^{2} = V (X_{1}) = λ = 5$ . Then,
$P (\overline{X}_{n} < 5.5) = P (\frac{n ( X _{n} - μ )}{σ} < \frac{n ( 5.5 - μ )}{σ}) \approx P (Z < 2.5) = 0.9938.$

One important thing to note is that we rarely know $σ$ . However, we can estimate $σ^{2}$ from $X_{1}, \dots, X_{n}$ by

S_{n}^{2} = \frac{1}{n - 1} i = 1 \sum n (X_{i} - \overline{X}_{n})^{2} .

Interestingly, if $σ$ is replaced with $S_{n}$ , the central limit theorem still holds:

Theorem

Assume the same conditions as the CLT. Then,
$\frac{n ( X _{n} - μ )}{S _{n}} ⇝ N (0, 1) .$

Jake Tuero

Explorer

The Central Limit Theorem

The Central Limit Theorem