my-server
← Wiki

Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

Definition

Suppose we have two partitions and of a set , namely and .

Let:

and

Then the variation of information between the two partitions is:

.

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on defined by for .

Explicit information content

We can rewrite this definition in terms that explicitly highlight the information content of this metric.

The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet and the join , where the maximum is the partition with only one block, i.e., all elements grouped together, and the minimum is , the partition consisting of all elements as singletons. The meet of two partitions and is easy to understand as that partition formed by all pair intersections of one block of, , of and one, , of . It then follows that and .

Let's define the entropy of a partition as <div class="center">

,

</div> where . Clearly, and . The entropy of a partition is a monotonous function on the lattice of partitions in the sense that .

Then the VI distance between and is given by

<div class="center">

.

</div>

The difference is a pseudo-metric as doesn't necessarily imply that . From the definition of , it is .

If in the Hasse diagram we draw an edge from every partition to the maximum and assign it a weight equal to the VI distance between the given partition and , we can interpret the VI distance as basically an average of differences of edge weights to the maximum

<div class="center">

.

</div>

For as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet

<div class="center">

</div>

and we also have that coincides with the conditional entropy of the meet (intersection) relative to .

Identities

The variation of information satisfies

,

where is the entropy of , and is mutual information between and with respect to the uniform probability measure on . This can be rewritten as

,

where is the joint entropy of and , or

,

where and are the respective conditional entropies.

The variation of information can also be bounded, either in terms of the number of elements:

,

Or with respect to a maximum number of clusters, :

Triangle inequality

To verify the triangle inequality , expand using the identity . It suffices to prove The right side has a lower bound

which is no less than the left side.

References

Further reading

External links