Sanov's theorem

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector . Then, we have the following bound on the probability that the empirical measure of the samples falls within the set A:

,

where

is the joint probability distribution on , and
is the information projection of q onto A.
, the KL divergence, is given by:

In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set, then

Technical statement

Define:

is a finite set with size . Understood as Ã¢ÂÂalphabetÃ¢ÂÂ.
is the simplex spanned by the alphabet. It is a subset of .
is a random variable taking values in . Take samples from the distribution , then is the frequency probability vector for the sample.
is the space of values that can take. In other words, it is

Then, Sanov's theorem states:

For every measurable subset ,
For every open subset ,

Here, means the interior, and means the closure.

Sanov's theorem

Technical statement

References

Further reading