my-server
← Wiki

V-statistic

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for "unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals of the empirical distribution function are called statistical functionals. Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

Examples of statistical functions

<ol> <li> The k-th central moment is the functional , where is the expected value of X. The associated statistical function is the sample k-th central moment,

</li>

<li> The chi-squared goodness-of-fit statistic is a statistical function T(F<sub>n</sub>), corresponding to the statistical functional

where A<sub>i</sub> are the k cells and p<sub>i</sub> are the specified probabilities of the cells under the null hypothesis. </li>

<li> The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional

where w(x;&nbsp;F<sub>0</sub>) is a specified weight function and F<sub>0</sub> is a specified null distribution. If w is the identity function then T(F<sub>n</sub>) is the well known Cramér–von-Mises goodness-of-fit statistic; if then T(F<sub>n</sub>) is the Anderson–Darling statistic. </li> </ol>

Representation as a V-statistic

Suppose x<sub>1</sub>, ..., x<sub>n</sub> is a sample. In typical applications the statistical function has a representation as the V-statistic

where h is a symmetric kernel function. Serfling discusses how to find the kernel in practice. V<sub>mn</sub> is called a V-statistic of degree&nbsp;m.

A symmetric kernel of degree 2 is a function h(x,&nbsp;y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x<sub>1</sub>, ..., x<sub>n</sub>, the corresponding V-statistic is defined

Example of a V-statistic

<ol start="4"> <li> An example of a degree-2 V-statistic is the second central moment m<sub>2</sub>.

If h(x, y) = (x &minus; y)<sup>2</sup>/2, the corresponding V-statistic is

which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:

.

</li> </ol>

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional&nbsp;T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics. Let A(m) be the property defined by:

A(m):

<ol style="list-style-type:lower-roman"> <li> Var(h(X<sub>1</sub>, ..., X<sub>k</sub>)) = 0 for k < m, and Var(h(X<sub>1</sub>, ..., X<sub>k</sub>)) > 0 for k = m; </li> <li> n<sup>m/2</sup>R<sub>mn</sub> tends to zero (in probability). (R<sub>mn</sub> is the remainder term in the Taylor series for T.)</li> </ol>

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(F<sub>n</sub>) is asymptotically normal.

In the variance example (4), m<sub>2</sub> is asymptotically normal with mean and variance , where .

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and and . Then nV<sub>2,n</sub> converges in distribution to a weighted sum of independent chi-squared variables:

where are independent standard normal variables and are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V<sub>2,n</sub> is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.

See also

Notes

References