Probability Inequalities
Markov’s Inequality
https://en.wikipedia.org/wiki/Markov%27s_inequality
Markov’s inequality provides a bound on the probability that the realization of random variable exceeds a given threshold. It relates probabilities to expectations.
Theorem (Markov's Inequality)
Let be a non-negative random variable, and suppose that exists. For any ,
Proof
Since ,
Suppose , and by Markov’s Inequality, . The intuition here is that you cannot have more than half of the distribution to the right of 20 to still have enough distribution left to result in an expected value of 10. The further away of a value is from the expected value, the less density can be while keeping the expected value the same. Note that this bound may not always be tight. Markov’s Inequality is also used to prove other inequalities, such as Chebyshev’s Inequality below.
Chebyshev’s Inequality
https://en.wikipedia.org/wiki/Chebyshev%27s_inequality
Theorem (Chebyshev's Inequality)
Let and . Then,
where . In particular, and .
One way to think about Chebyshev’s inequality is that the variance is a budget of squared distance from the mean, and a tail event spends that budget very quickly. On the event , the squared deviation is at least . So if that event happens with probability , it contributes at least to the average squared deviation:
Rearranging gives the result.
Example
Suppose we test a prediction method like a neural network on a set of new test cases. Let if the prediction is wrong, and if the predictor is correct. Then is the observed error rate. Each may be regarded as a Bernoulli with unknown mean . We would like to know the true — but unknown — error rate . Intuitively, we expect that should be close to . How likely is to not be within of ? We have , and
since for all . For and , the bound is 0.0625.
Hoeffding’s Lemma
https://en.wikipedia.org/wiki/Hoeffding%27s_lemma
Hoeffding's Lemma
Let be any real-values random variable such that . Then for all ,
or equivalently,
If is bounded in , then the centered variable has an MGF bounded like a Gaussian.
When coming up with bounds, a common trick is to use the Chernoff / exponential Markov template:
- Pick what you want to control, usually a sum/mean:
- Exponentiate to make it non-negative and allows Markov to be applicable. For any ,
- Bound the moment generating function : If each are independent, then a product can be pulled out of the exponentiated sum. Then we bound each factor using something like Hoeffding’s Lemma
Hoeffding’s Inequality
https://en.wikipedia.org/wiki/Hoeffding%27s_inequality
Theorem (Hoeffding's Inequality)
Let be independent observations such that . Let . Then, for any ,
Hoeffding’s inequality can be used when we have independence and bounded observations, and we want high-probability finite-sample bounds. If the data is unbounded or is heavy-tailed, Hoeffding’s inequality can be invalid or overly conservative.
Proof
For any , using that is increasing:
Applying Markov’s Inequality to the non-negative random variable ,
where , with and . If the are independent, so are the , and
This gives us
We can then apply Hoeffdings lemma to . Its range is , so
Plugging this back in gives
Combining the product into an exponential sum:
We now choose to make the bound as small as possible. Let . We want to minimize the exponent . Differentiating and setting to zero: Plugging back in,
Therefore,
The two-sided Hoeffding bound can be easily recovered by applying the same inequality to , where , so we also get
and using the union bound:
Theorem
Let . Then for any ,
where .
From Hoeffding to Confidence Intervals
Hoeffding’s inequality also gives a simple way of creating a confidence interval for a binomial parameter . Fix and let
By Hoeffding’s inequality,
Let . Then . Hence, , and we call a confidence interval.
Mill’s Inequality
The following inequality is useful for bounding probability statements about Normal random variables.
Theorem (Mill's Inequality)
Let . Then,