Bootstrapping

Definition

The bootstrap is a method for estimating standard errors and computing confidence intervals.

Let be a statistic, that is, is any function of the data. Suppose we want to know , the variance of . Note that the subscript emphasizes that the variance usually depends on the unknown distribution . For example, if , then where and . Thus, the variance of is a function of .

Intuition

If we only use the observed sample directly, we can certainly compute the statistic itself, such as the sample mean, median, or correlation. But that gives us only one realized value of the statistic. It does not tell us how much that statistic would change if we drew a different sample from the population.

That is the real goal of the bootstrap: not to recompute the statistic for its own sake, but to approximate the sampling distribution of the statistic. Once we have an approximation to that sampling distribution, we can estimate standard errors, variances, and confidence intervals.

Since the true population distribution is unknown, we replace it by the empirical distribution , which puts mass on each observed data point. Resampling with replacement from the observed sample is exactly the same as drawing from . Repeating this many times lets us see how would vary if were the population.

The bootstrap has two steps:

  1. Estimate with .
  2. Approximate using simulation

For , we have for Step 1 that , where . In this case, Step 1 is enough. However, in more complicated cases, we cannot write down a simple formula for which is why we need Step 2.

Simulation

Suppose we draw an IID sample from a distribution . By the Law of Large Numbers,

as . So if we draw a large sample from , we can use the sample mean to approximate . In a simulation, we can make as large as we like, in which case the difference between and is negligible.

More generally, if is any function with finite mean, then

as . In particular,

Hence, we can use the sample variance of the simulated values to approximate .

Limitations

The bootstrap is powerful, but it is not automatic or universally reliable.

  • It only uses the observed sample. If the sample is biased or unrepresentative, the bootstrap inherits that problem.
  • With small samples, the empirical distribution may be a poor approximation to the true distribution , so the bootstrap approximation can be inaccurate.
  • It cannot recover behavior that is missing from the data, such as rare events or poorly observed tail behavior.
  • The ordinary bootstrap assumes the data are IID. For dependent data, such as time series or clustered observations, the usual bootstrap should not be used without modification.
  • Some statistics are not well approximated by the bootstrap, especially irregular or non-smooth ones such as maxima, minima, or post-model-selection estimators.
  • Bootstrap confidence intervals can have poor coverage in small samples or highly skewed settings unless more careful variants are used.
  • It can be computationally expensive when the statistic is costly to recompute many times.

So the bootstrap is most useful when the sample is reasonably representative, the IID assumption is plausible, and the statistic has a complicated sampling distribution but behaves regularly.