Sinkhorn's theorem states that every square matrix with positive entries can be written in a certain standard form.
If A is an n × n matrix with strictly positive elements, then there exist diagonal matrices D<sub>1</sub> and D<sub>2</sub> with strictly positive diagonal elements such that D<sub>1</sub>AD<sub>2</sub> is doubly stochastic. The matrices D<sub>1</sub> and D<sub>2</sub> are unique up to multiplying the first matrix by a positive number and dividing the second one by the same number.
A simple iterative method to approach the double stochastic matrix is to alternately rescale all rows and all columns of A to sum to 1. Sinkhorn and Knopp presented this algorithm and analyzed its convergence. This is essentially the same as the Iterative proportional fitting algorithm, well known in survey statistics.
The following analogue for unitary matrices is also true: for every unitary matrix U there exist two diagonal unitary matrices L and R such that LUR has each of its columns and rows summing to 1.
The following extension to maps between matrices is also true (see Theorem 5 and also Theorem 4.7): given a Kraus operator that represents the quantum operation æ mapping a density matrix into another,
that is trace preserving,
and, in addition, whose range is in the interior of the positive definite cone (strict positivity), there exist scalings x<sub>j</sub>, for j in {0,1}, that are positive definite so that the rescaled Kraus operator
is doubly stochastic. In other words, it is such that both,
as well as for the adjoint,
where I denotes the identity operator.
In the 2010s Sinkhorn's theorem came to be used to find solutions of entropy-regularised optimal transport problems. This has been of interest in machine learning because such "Sinkhorn distances" can be used to evaluate the difference between data distributions and permutations. This improves the training of machine learning algorithms, in situations where maximum likelihood training may not be the best method.