Sobol' sequences (also called LP<sub>ÃÂ</sub> sequences or (t, s) sequences in base 2) are a type of quasi-random low-discrepancy sequence. They were first introduced by the Russian mathematician Ilya M. Sobolâ (ÃÂûÃÂàÃÂõõÃÂþòøàáþñþûÃÂ) in 1967.
These sequences use a base of two to form successively finer uniform partitions of the unit interval and then reorder the coordinates in each dimension.
Let be the -dimensional unit hypercube, and a real integrable function over . The original motivation of Sobolâ was to construct a sequence in so that
and the convergence be as fast as possible.
It is more or less clear that for the sum to converge towards the integral, the points should fill minimizing the holes. Another good property would be that the projections of on a lower-dimensional face of I<sup>s</sup> leave very few holes as well. Hence the homogeneous filling of I<sup>s</sup> does not qualify because in lower dimensions many points will be at the same place, therefore useless for the integral estimation.
These good distributions are called (t,m,s)-nets and (t,s)-sequences in base b. To introduce them, define first an elementary s-interval in base b a subset of I<sup>s</sup> of the form
where a<sub>j</sub> and d<sub>j</sub> are non-negative integers, and for all j in {1, ...,s}.
Given 2 integers , a (t,m,s)-net in base b is a sequence x<sub>n</sub> of b<sup>m</sup> points of I<sup>s</sup> such that for all elementary interval P in base b of hypervolume û(P) = b<sup>tâÂÂm</sup>.
Given a non-negative integer t, a (t,s)-sequence in base b is an infinite sequence of points x<sub>n</sub> such that for all integers , the sequence is a (t,m,s)-net in base b.
In his article, Sobolâ described à<sub>ÃÂ</sub>-meshes and LP<sub>ÃÂ</sub> sequences, which are (t,m,s)-nets and (t,s)-sequences in base 2 respectively. The terms (t,m,s)-nets and (t,s)-sequences in base b (also called Niederreiter sequences) were coined in 1988 by Harald Niederreiter. The term Sobolâ sequences was introduced in late English-speaking papers in comparison with Halton, Faure and other low-discrepancy sequences.
A more efficient Gray code implementation was proposed by Antonov and Saleev.
As for the generation of Sobolâ numbers, they are clearly aided by the use of Gray code instead of n for constructing the n-th point draw.
Suppose we have already generated all the Sobolâ sequence draws up to n − 1 and kept in memory the values x<sub>n−1,j</sub> for all the required dimensions. Since the Gray code G(n) differs from that of the preceding one G(n − 1) by just a single, say the k-th, bit (which is a rightmost zero bit of n − 1), all that needs to be done is a single XOR operation for each dimension in order to propagate all of the x<sub>n−1</sub> to x<sub>n</sub>, i.e.
Sobolâ introduced additional uniformity conditions known as property A and AâÂÂ.
There are mathematical conditions that guarantee properties A and A'.
Tests for properties A and Aâ are independent. Thus it is possible to construct the Sobolâ sequence that satisfies both properties A and Aâ or only one of them.
To construct a Sobolâ sequence, a set of direction numbers v<sub>i,j</sub> needs to be selected. There is some freedom in the selection of initial direction numbers. Therefore, it is possible to receive different realisations of the Sobolâ sequence for selected dimensions. A bad selection of initial numbers can considerably reduce the efficiency of Sobolâ sequences when used for computation.
Arguably the easiest choice for the initialisation numbers is just to have the l-th leftmost bit set, and all other bits to be zero, i.e. m<sub>k,j</sub> = 1 for all k and j. This initialisation is usually called unit initialisation. However, such a sequence fails the test for Property A and Aâ even for low dimensions and hence this initialisation is bad.
Good initialisation numbers for different numbers of dimensions are provided by several authors. For example, Sobolâ provides initialisation numbers for dimensions up to 51. The same set of initialisation numbers is used by Bratley and Fox.
Initialisation numbers for high dimensions are available on Joe and Kuo. Peter Jäckel provides initialisation numbers up to dimension 32 in his book "Monte Carlo methods in finance".
Other implementations are available as C, Fortran 77, or Fortran 90 routines in the Numerical Recipes collection of software. A free/open-source implementation in up to 1111 dimensions, based on the Joe and Kuo initialisation numbers, is available in C, and up to 21201 dimensions in Python and Julia. A different free/open-source implementation in up to 1111 dimensions is available for C++, Fortran 90, Matlab, and Python.
Commercial Sobolâ sequence generators are available within, for example, the NAG Library. BRODA Ltd. provides Sobol' and scrambled Sobol' sequences generators with additional unifomity properties A and A' up to a maximum dimension 131072. These generators were co-developed with Prof. I. Sobol'. MATLAB contains Sobol' sequences generators up to dimension 1111 as part of its Statistics Toolbox.