Entropy

The core idea of information theory is that the informational value of a communicated message depends on the degree to which the content of the message is surprising. If a highly likely event occurs, the message carries little information. On the other hand, if a highly unlikely event occurs, the message is much more informative.

The information content, also called the surprise or self-information, of an event is a function that increases as the probability of an event decreases.

Definition

The information, or surprise, of an event is defined by

or equivalently,

The logarithm gives 0 surprise when the probability of the event is 1. In fact, is the only function that satisfies a specific set of conditions for information theory.

Definition (Entropy)

The entropy of a random variable with distribution , denoted , is a measure of its uncertainty. The entropy of a discrete random variable , which takes values in the set and is distributed accordingly to such that , is

Note that is itself a random variable. The entropy can be explicitly written as

For continuous random variables, with probability density function , the differential entropy (or continuous entropy) is given by

Cross-Entropy

The cross-entropy between two probability distributions and , over the same underlying set of events, measures the average number of bits needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution .

Definition (Cross-Entropy)

The cross-entropy between of the distribution relative to distribution over a given set is defined as

For discrete distributions and with the same support ,

The situation for continuous distributions is analogous

where and are the probability density functions of and respectively.

Sources