Mean Squared Error

Estimator

Definition

The quality of a point estimate is sometimes assessed by the mean squared error, or MSE, defined by
$MSE = E_{θ} (θ_{n} - θ)^{2} .$

Keep in mind that $E_{θ} (\cdot)$ refers to the expectation with respect to the distribution

f (x_{1}, \dots, x_{n}; θ) = i = 1 \prod n f (x_{i}; θ)

that generated the data. It does not mean we are averaging over a distribution for $θ$ . In other words, $θ$ is treated as fixed but unknown, while $θ_{n}$ is random because it depends on the random sample $(X_{1}, \dots, X_{n})$ . Thus, the MSE is the average squared error we would obtain over repeated samples drawn from the population when the true parameter value is $θ$ .

More generally, even when the true parameter value is unknown, we can still analyze an estimator theoretically by deriving its bias, variance, or MSE as functions of the unknown parameter and the sample size. Then, once we observe a sample and compute the estimator, those theoretical properties help us interpret how reliable the estimate is under repeated sampling.

Theorem

The MSE can be written as
$MSE = V (θ_{n}) + bias^{2} (θ_{n}) .$

Proof

Let $\overset{ˉ}{θ}_{n} = E (θ_{n})$ . Then,
$E_{θ} [(θ_{n} - θ)^{2}] = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n} - \overset{ˉ}{θ}_{n} - θ)^{2}] = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n})^{2} + 2 (θ_{n} - \overset{ˉ}{θ}_{n}) (\overset{ˉ}{θ}_{n} - θ) + (\overset{ˉ}{θ}_{n} - θ)^{2}] = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n})^{2}] + E_{θ} [2 (θ_{n} - \overset{ˉ}{θ}_{n}) (\overset{ˉ}{θ}_{n} - θ)] + E_{θ} [(\overset{ˉ}{θ}_{n} - θ)^{2}] = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n})^{2}] + 2 (\overset{ˉ}{θ}_{n} - θ) E_{θ} [θ_{n} - \overset{ˉ}{θ}_{n}] + (\overset{ˉ}{θ}_{n} - θ)^{2} = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n})^{2}] + 2 (\overset{ˉ}{θ}_{n} - θ) (\overset{ˉ}{θ}_{n} - \overset{ˉ}{θ}_{n}) + (\overset{ˉ}{θ}_{n} - θ)^{2} = E_{θ} [(θ_{n} - \overset{ˉ}{θ}_{n})^{2}] + (\overset{ˉ}{θ}_{n} - θ)^{2} = V (θ_{n}) + bias^{2} (θ_{n}) (1) (2)$
Where (1) is due to $\overset{ˉ}{θ}_{n} - θ$ being a constant, and (2) $\overset{ˉ}{θ}_{n}$ being a constant.

A shorter proof can be achieved using that for a random variable $X$ , $E (X^{2}) = V (X) + (E [X])^{2}$ :

Proof

$MSE (θ_{n}) = E_{θ} [(θ_{n} - θ)^{2}] = V (θ_{n} - θ) + (E [θ_{n} - θ])^{2} = V (θ_{n}) + bias^{2} (θ_{n})$

Theorem

If $bias \to 0$ and the standard error $SE \to 0$ as $n \to \infty$ , then $θ_{n}$ is consistent, that is, $θ_{n} p θ$ .

Proof

If $bias \to 0$ and $SE \to 0$ , then by the above theorem, $MSE \to 0$ . It follows from convergence in quadratic mean that $θ_{n} qm θ$ . The result then follows from the convergence properties that $θ_{n} qm θ$ implies that $θ_{n} P θ$ .

Predictor

If a vector of $n$ predictions is generated from a sample of $n$ data points on all random variables, and $Y$ is the vector of observed values of the variable being predicted, with $Y$ being the predicted values, then the within-sample MSE of the predictor is computed as

MSE = \frac{1}{n} i = 1 \sum n (Y_{i} - Y_{i})^{2} .

In matrix notation,

MSE = \frac{1}{n} i = 1 \sum n (e_{i})^{2} = \frac{1}{n} e^{T} e,

where $e_{i}$ is $Y_{i} - Y_{i}$ , and $e$ is a $n \times 1$ column vector.

This version of MSE is used when the goal is prediction rather than estimation of a population parameter. In that setting, we evaluate how close the predicted responses are to the observed outcomes, whereas the estimator version of MSE evaluates how close an estimator tends to be to a fixed but unknown parameter value.

Examples

Mean

Example

Suppose we have a random sample of size $n$ from a population, $X_{1}, \dots X_{n}$ . The usual estimator for the population mean $μ$ is the sample mean
$\overset{ˉ}{X} = \frac{1}{n} i = 1 \sum n X_{i}$
which has an expected value equal to the true mean $μ$ (so it is unbiased), and a mean squared error of
$MSE (\overset{ˉ}{X}) = E [(\overset{ˉ}{X} - μ)^{2}] = (\frac{σ}{n})^{2} = \frac{σ ^{2}}{n}$
where $σ$ is the population variance.

Variance

Example

The usual estimator for the variance is the corrected sample variance
$S_{n - 1}^{2} = \frac{1}{n - 1} i = 1 \sum n (X_{i} - \overset{ˉ}{X}_{n})^{2} .$
This is unbiased (its expected value is $σ^{2}$ ), and its MSE is
$MSE (S_{n - 1}^{2}) = \frac{1}{n} (μ_{4} - \frac{n - 3}{n - 1} σ^{4}),$
where $μ_{4}$ is the fourth central moment of the distribution or population.

Jake Tuero

Explorer

Mean Squared Error

Mean Squared Error

Estimator

Predictor

Examples

Mean

Variance

Table of Contents