In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.
Suppose we have two partitions and of a set , namely and .
Let:
Then the variation of information between the two partitions is:
This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on defined by for .
We can rewrite this definition in terms that explicitly highlight the information content of this metric.
The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet and the join , where the maximum is the partition with only one block, i.e., all elements grouped together, and the minimum is , the partition consisting of all elements as singletons. The meet of two partitions and is easy to understand as that partition formed by all pair intersections of one block of, , of and one, , of . It then follows that and .
Let's define the entropy of a partition as <div class="center">
</div> where . Clearly, and . The entropy of a partition is a monotonous function on the lattice of partitions in the sense that .
Then the VI distance between and is given by
<div class="center">
</div>
The difference is a pseudo-metric as doesn't necessarily imply that . From the definition of , it is .
If in the Hasse diagram we draw an edge from every partition to the maximum and assign it a weight equal to the VI distance between the given partition and , we can interpret the VI distance as basically an average of differences of edge weights to the maximum
<div class="center">
</div>
For as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet
<div class="center">
</div>
and we also have that coincides with the conditional entropy of the meet (intersection) relative to .
The variation of information satisfies
where is the entropy of , and is mutual information between and with respect to the uniform probability measure on . This can be rewritten as
where is the joint entropy of and , or
where and are the respective conditional entropies.
The variation of information can also be bounded, either in terms of the number of elements:
Or with respect to a maximum number of clusters, :
To verify the triangle inequality , expand using the identity . It suffices to prove The right side has a lower bound
which is no less than the left side.