Probability Inequalities

Markov’s Inequality

https://en.wikipedia.org/wiki/Markov%27s_inequality

Markov’s inequality provides a bound on the probability that the realization of random variable exceeds a given threshold. It relates probabilities to expectations.

Theorem (Markov's Inequality)

Let be a non-negative random variable, and suppose that exists. For any ,

Proof

Since ,

Suppose , and by Markov’s Inequality, . The intuition here is that you cannot have more than half of the distribution to the right of 20 to still have enough distribution left to result in an expected value of 10. The further away of a value is from the expected value, the less density can be while keeping the expected value the same. Note that this bound may not always be tight. Markov’s Inequality is also used to prove other inequalities, such as Chebyshev’s Inequality below.

Chebyshev’s Inequality

https://en.wikipedia.org/wiki/Chebyshev%27s_inequality

Theorem (Chebyshev's Inequality)

Let and . Then,

where . In particular, and .

One way to think about Chebyshev’s inequality is that the variance is a budget of squared distance from the mean, and a tail event spends that budget very quickly. On the event , the squared deviation is at least . So if that event happens with probability , it contributes at least to the average squared deviation:

Rearranging gives the result.

Example

Suppose we test a prediction method like a neural network on a set of new test cases. Let if the prediction is wrong, and if the predictor is correct. Then is the observed error rate. Each may be regarded as a Bernoulli with unknown mean . We would like to know the true — but unknown — error rate . Intuitively, we expect that should be close to . How likely is to not be within of ? We have , and

since for all . For and , the bound is 0.0625.

Hoeffding’s Lemma

https://en.wikipedia.org/wiki/Hoeffding%27s_lemma

Hoeffding's Lemma

Let be any real-values random variable such that . Then for all ,

or equivalently,

If is bounded in , then the centered variable has an MGF bounded like a Gaussian.

When coming up with bounds, a common trick is to use the Chernoff / exponential Markov template:

  1. Pick what you want to control, usually a sum/mean:
  2. Exponentiate to make it non-negative and allows Markov to be applicable. For any ,
  1. Bound the moment generating function : If each are independent, then a product can be pulled out of the exponentiated sum. Then we bound each factor using something like Hoeffding’s Lemma

Hoeffding’s Inequality

https://en.wikipedia.org/wiki/Hoeffding%27s_inequality

Theorem (Hoeffding's Inequality)

Let be independent observations such that . Let . Then, for any ,

Hoeffding’s inequality can be used when we have independence and bounded observations, and we want high-probability finite-sample bounds. If the data is unbounded or is heavy-tailed, Hoeffding’s inequality can be invalid or overly conservative.

Proof

For any , using that is increasing:

Applying Markov’s Inequality to the non-negative random variable ,

where , with and . If the are independent, so are the , and

This gives us

We can then apply Hoeffdings lemma to . Its range is , so

Plugging this back in gives

Combining the product into an exponential sum:

We now choose to make the bound as small as possible. Let . We want to minimize the exponent . Differentiating and setting to zero: Plugging back in,

Therefore,

The two-sided Hoeffding bound can be easily recovered by applying the same inequality to , where , so we also get

and using the union bound:

Theorem

Let . Then for any ,

where .

From Hoeffding to Confidence Intervals

Hoeffding’s inequality also gives a simple way of creating a confidence interval for a binomial parameter . Fix and let

By Hoeffding’s inequality,

Let . Then . Hence, , and we call a confidence interval.

Mill’s Inequality

The following inequality is useful for bounding probability statements about Normal random variables.

Theorem (Mill's Inequality)

Let . Then,