Variance and Covariance

Variance

The variance measures the spread of a distribution.

Definition

Let be a random variable with mean . The variance of , denoted by or or , is defined by

assuming this expectation exists. The standard deviation is , and is denoted by and .

We can’t use as a measure of spread, since . We can sometimes use as a measure of spread, but often we use the variance.

Theorem

Assuming the variance is well defined, it has the following properties:

  1. .
  2. If and are constants, then .
  3. If are independent and are constants, then

Sample Mean and Sample Variance

Definition

If are random variables, then we define the sample mean to be

and the sample variance to be

This is the unbiased (or corrected) sample variance. If we divided by instead, then on average we would underestimate the true variance. A shorter way to see this is to use the identity

To derive it, write

Then

The reason is that the data are being measured relative to the sample mean , not the true mean . Since is computed from the same sample, it is the value that makes the sample look as centered as possible, so the deviations are typically a bit smaller than the true deviations .

A concrete way to see the loss of one degree of freedom is that the centered values must satisfy

Once you know any of these deviations, the last one is forced to be whatever makes the total equal to . So there are really only independent pieces of variation left after estimating the mean. Dividing by compensates for this built-in shrinkage, which is why .

Theorem

Let be IID and let , . Then,

i.e., the sample mean and sample variance are unbiased estimators of and respectively.

Covariance and Correlation

If and are random variables, then the covariance and correlation between and measure how strong the relationship is between and .

Definition

let and be random variables with means and , and standard deviations and . Define the covariance between and by

and the correlation by

Theorem

The covariance satisfies

The correlation satisfies

If for some constants and , then if , and if . If and are independent, then . The converse is not true in general.

Theorem

, and . More generally, for random variables ,

Definition

For a random vector , the covariance matrix or variance-covariance matrix is a square matrix giving the covariance between each pair of random variables of the vector . It is defined by

Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions.