Skip to main content
Log in

Multivariate tests of uniformity

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We present tests of multivariate uniformity using data depth, the normal quantiles and the interpoint distances between the observations. We investigate the properties of the interpoint distances among uniform random vectors. We compare the performance of the proposed tests with two existing statistics under the hypothesis of uniformity and obtain their empirical power under various alternatives in a Monte Carlo study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley-Interscience, Hoboken

    MATH  Google Scholar 

  • Arcones MA, Giné E (1993) Limit theorems for u-processes. Ann Probab 21:1494–1542

    Article  MathSciNet  MATH  Google Scholar 

  • Avram F, Bertsimas D (1992) The minimum spanning tree constant in geometrical probability and under the independent model: a unified approach. Ann Appl Probab 2:113–130

    Article  MathSciNet  MATH  Google Scholar 

  • Barrow J, Bhavsar S, Sonoda D (1985) Minimal spanning trees, filaments and galaxy clustering. R Astron Soc Mon Not 216:17–35

    Article  Google Scholar 

  • Berrendero JR, Cuevas A, Vázquez-Grande F (2006) Testing multivariate uniformity: the distance-to-boundary method. Can J Stat 34:693–707

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. J Comput Stat Data Anal 52:4979–4988

    Article  MathSciNet  MATH  Google Scholar 

  • Elmore RT, Hettmansperger TP, Xuan F (2006) Spherical data depth and a multivariate median. In: Liu RY, R Serfling, DL Souvaine (eds) Proceedings of data depth: robust multivariate analysis, computational geometry and applications. American Mathematical Society, Rhode Island, pp 87–101

  • Green JR, Hegazy YA (1976) Powerful modified-EDF goodness-of-fit tests. J Am Stat Assoc 71:204–209

    Article  MATH  Google Scholar 

  • Hegazy YAS, Green JR (1975) Some new goodness-of-fit tests using order statistics. J R Stat Soc 24:299–308

    Google Scholar 

  • Jammalamadaka SR, Janson S (1986) Limit theorems for a triangular scheme of U-statistics with applications to inter-point distances. Ann Probab 14:1347–1358

    Article  MathSciNet  MATH  Google Scholar 

  • Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, New York

    Book  MATH  Google Scholar 

  • Krumbholz W, Schmid F (1996) A non standard \(\chi ^2\) test of fit for testing uniformity with unknown limits. Stat Pap 37(4):365–373

    Article  MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014) DD-classification of asymmetric and fat-tailed data. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis. Machine learning and knowledge discovery. Springer, Berlin, pp 71–78

    Chapter  Google Scholar 

  • Lee S (1999) The central limit theorem for Euclidean minimal spanning trees ii. Adv Appl Probab 31:969–984

    Article  MathSciNet  MATH  Google Scholar 

  • Li J, Liu RY (2008) Multivariate spacings based on data depth. I. Construction of nonparametric multivariate tolerance regions. Ann Stat 36:1299–1323

    Article  MathSciNet  MATH  Google Scholar 

  • Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18:405–414

    Article  MathSciNet  MATH  Google Scholar 

  • Liu RY, Parelius JM, Singh K (1999) Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann Stat 27:783–858

    MathSciNet  MATH  Google Scholar 

  • Liu Z, Modarres R (2011) Lens data depth and median. J Nonparametr Stat 23:1063–1074

    Article  MathSciNet  MATH  Google Scholar 

  • Marsaglia G (1968) Random numbers fall mainly in the planes. Proc Natl Acad Sci 61:25–28

    Article  MathSciNet  MATH  Google Scholar 

  • Modarres R (2014) On the interpoint distances of Bernoulli vectors. Stat Probab Lett 84:215–222

    Article  MathSciNet  MATH  Google Scholar 

  • Mosler K (2002) Multivariate dispersion, central regions and depth: the lift zonoid approach. Springer, New York

    Book  MATH  Google Scholar 

  • Mozharovskyi P, Mosler K, Lange T (2014) Classifying real-world data with the DD-procedure. Adv Data Anal Classif 9:287–314 (Springer online-first)

    Article  MathSciNet  Google Scholar 

  • Nelson RB (2006) An introduction to copulas, 2nd edn. Springer, New York

    Google Scholar 

  • Pardo MC (2003) A test for uniformity based on informational energy. Stat Pap 44(4):521–534

    Article  MathSciNet  MATH  Google Scholar 

  • Petrie A, Willemain TR (2013) An empirical study of tests for uniformity in multidimensional data. Comput Stat Data Anal 64:253–268

    Article  MathSciNet  Google Scholar 

  • Rosenblatt M (1952) Remarks on multivariate transformation. Ann Math Stat 23:470–472

    Article  MathSciNet  MATH  Google Scholar 

  • Schellhaas H (1999) A modified Kolmogorov–Smirnov test for a rectangular distribution with unknown parameters: computation of the distribution of the test statistic. Stat Pap 40(3):343–349

    Article  MathSciNet  MATH  Google Scholar 

  • Steele J (1988) Growth rates of Euclidean minimal spanning trees with power-weighted edges. Ann Probab 16:1767–1787

    Article  MathSciNet  MATH  Google Scholar 

  • Stephens MA (1986) Test for the uniform distribution. In: D’Agostino RB, Stephens MA (eds) Goodness-of-fit techniques. Marcel Dekker, Inc, New York, pp 331–336

    Google Scholar 

  • Stuart A, Ord K (1994) Kendall’s advanced theory of statistics, distribution theory, vol 1, 6th edn. Oxford University Press, New York

    MATH  Google Scholar 

  • Tukey JW (1975) Mathematics and picturing data. In: James RD (ed) Proceedings of the international congress on mathematics. Canadian Mathematical Congress, Montreal, pp 523–531

  • Wiegand T, Moloney K (2004) Rings, circles, and null-models for point pattern analysis in ecology. OIKOS 104:209–229

    Article  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank two anonymous referees and the Editor for comments that led to an improved manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Modarres.

Appendix

Appendix

Proof for Theorem 1

Let \(e^2_{i, j}=\Vert \mathbf {u}_i - \mathbf {u}_j\Vert ^2\). Consider

$$\begin{aligned}&\displaystyle \bar{\delta }=\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\sum _{i<j}^n e^2_{i, j}&\nonumber \\&\displaystyle \bar{S}=\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\sum _{i<j}\big (e^2_{i,j}-\frac{d}{6}\big )^2&\end{aligned}$$
(8)

and

$$\begin{aligned}&\text {Var}(\bar{\delta })=\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left[ \text {Var}\left( e^2_{i,j}\right) +2(n-2)\text {Cov}\left( e^2_{i,j},e^2_{i,k}\right) \right] \nonumber \\&\text {Var}(\bar{S})=\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left[ \text {Var}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2\right) +2(n-2)\text {Cov}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2,\left( e^2_{i,k}-\frac{d}{6}\right) ^2\right) \right] . \end{aligned}$$
(9)

From Eq. 9, we have

$$\begin{aligned} \text {Var}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2\right) =\text {Var}\ \left( e^4_{i,j}-\frac{d}{3}e^2_{i,j}\right) =E^2\left[ e^4_{i,j}-\frac{d}{3}e^2_{i,j}\right] -E[e^4_{i,j}-\frac{d}{3}e^2_{i,j}]^2. \end{aligned}$$
(10)

Let \(\epsilon _{ijkh}=(u_{i,j}-u_{k,h})\). One observes,

$$\begin{aligned} \text {Var}(\bar{\delta })=\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left[ d\text {Var}\big (\epsilon _{1121}^2\big )+2d(n-2)\text {Cov}\left( \epsilon _{1121}^2,\epsilon _{1131}^2\right) \right] =\frac{d(2n+3)}{90n(n-1)}. \end{aligned}$$
(11)

Furthermore,

$$\begin{aligned} E[e^4_{i,j}-\frac{d}{3}e^2_{i,j}]&=E\left[ \left( \sum _{i=1}^d\epsilon _{1i2i}^2\right) ^2\right] -\frac{d^2}{18} =E\left[ \sum _{i=1}^d\epsilon _{1i2i}^4+2\sum _{i<j}^d \epsilon _{1i2i}^2\epsilon _{1j2j}^2\right] -\frac{d^2}{18} \nonumber \\&=dE\left[ \epsilon _{1i2i}^4\right] +d(d-1)E\left[ \epsilon _{1i2i}^2\right] ^2-\frac{d^2}{18}=\frac{d}{15}+\frac{d(d-1)}{36}-\frac{d^2}{18}\nonumber \\&=\frac{7d}{180}-\frac{d^2}{36}\quad \end{aligned}$$
(12)

and

$$\begin{aligned} E^2\left[ e^4_{i,j}-\frac{d}{3}e^2_{i,j}\right] =E\left[ e^8_{i,j}-\frac{2d}{3}e^6_{i,j}\right] +\frac{d^2}{9}\left( \frac{d}{15}+\frac{d(d-1)}{36}\right) . \end{aligned}$$
(13)

One can verify that

$$\begin{aligned} E\left[ e^8_{i,j}\right] ={\left\{ \begin{array}{ll}\frac{1}{1296}d^4+\frac{7}{1080}d^3+\frac{2789}{226800}d^2+\frac{101}{37800}d, \quad &{}d\ge 4,\\ \frac{2}{15}+\frac{1}{7}+\frac{2}{25}, \quad &{}d=3, \\ \frac{2}{45}+\frac{2}{75}+\frac{1}{21}, \quad &{}d=2.\end{array}\right. } \end{aligned}$$
(14)

Similarly,

$$\begin{aligned} E\left[ e^6_{i,j}\right] ={\left\{ \begin{array}{ll}\frac{1}{216}d^3+\frac{7}{360}d^2+\frac{11}{945}d, \quad &{}d\ge 3,\\ \frac{1}{14}+\frac{1}{15}, \quad &{}d=2.\end{array}\right. } \end{aligned}$$
(15)

It follows from Eqs. 10 to 15,

$$\begin{aligned} \text {Var}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2\right) ={\left\{ \begin{array}{ll}\frac{989}{56700}, \quad &{}d=2,\\ \frac{37}{1050}, \quad &{}d=3, \\ \frac{49}{16200}d^2+\frac{101}{37800}d, \quad &{}d\ge 4.\end{array}\right. } \end{aligned}$$
(16)

The covariance term in Eq. 9 can be expanded as

$$\begin{aligned} \text {Cov}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2,\left( e^2_{i,k}-\frac{d}{6}\right) ^2\right) =\text {Cov}\left( e^4_{i,j}e^4_{i,k}\right) -\frac{2d}{3}\text {Cov}\left( e^4_{i,j},e^2_{i,k}\right) +\frac{d^3}{1620}. \end{aligned}$$
(17)

One can verify

$$\begin{aligned} E\left[ e^4_{i,j}e^2_{i,k}\right]&=E\left[ \left( \sum _{i=1}^d\epsilon _{1i2i}^2\right) ^2\sum _{j=1}^d\epsilon _{1j3j}^2\right] \nonumber \\&=dE\left[ \epsilon _{1131}^2\sum _{i=1}^d\epsilon _{1i2i}^4\right] +2dE\left[ \epsilon _{1131}^2\sum _{i<k}^d \epsilon _{1i2i}^2\epsilon _{1k2k}^2\right] . \end{aligned}$$
(18)

One observes,

$$\begin{aligned} dE\left[ \epsilon _{1131}^2\sum _{i=1}^d\epsilon _{1i2i}^4\right] =d\left( E\left[ \epsilon _{1131}^2\epsilon _{1121}^4\right] +(d-1)E\left[ \epsilon _{1131}^2\right] E\left[ \epsilon _{1222}^4\right] \right) , \end{aligned}$$
(19)

where \(E[\epsilon _{1131}^2\epsilon _{1121}^4]=\frac{19}{1260}\). Similarly,

$$\begin{aligned}&2dE\left[ \epsilon _{1131}^2\sum _{i<k}^d \epsilon _{1i2i}^2\epsilon _{1k2k}^2\right] =2d\left( (d-1)E[\epsilon _{1131}^2\epsilon _{1121}^2]E[\epsilon _{1121}^2]+\left( {\begin{array}{c}d-1\\ 2\end{array}}\right) E[\epsilon _{1121}^2]^3\right) ,\nonumber \\ \end{aligned}$$
(20)

where \(E[\epsilon _{1131}^2\epsilon _{1121}^2]=\frac{1}{30}.\) It follows from Eqs. 18 to 20 that

$$\begin{aligned} \text {Cov}\left( e^4_{i,j},e^2_{i,k}\right) =E\left[ e^4_{i,j}e^2_{i,k}\right] -E\left[ e^4_{i,j}\right] E\left[ e^2_{i,k}\right] =\frac{d^2}{540}+\frac{2d}{945}. \end{aligned}$$
(21)

One can also verify

$$\begin{aligned} E\left[ e^4_{i,j}e^4_{i,k}\right]= & {} E\left[ \sum _{i=1}^d\epsilon _{1i2i}^4\sum _{i=1}^d\epsilon _{1i3i}^4+2\sum _{i=1}^d\epsilon _{1i3i}^4\sum _{i<j}^d \epsilon _{1i2i}^2\epsilon _{1j2j}^2 \right. \nonumber \\&\left. +2\sum _{i=1}^d\epsilon _{1i2i}^4\sum _{i<j}^d \epsilon _{1i3i}^2\epsilon _{1j3j}^2+4\sum _{i<j}^d \epsilon _{1i2i}^2\epsilon _{1j2j}^2\sum _{i<j}^d \epsilon _{1i3i}^2\epsilon _{1j3j}^2\right] .\qquad \end{aligned}$$
(22)

Consider the first term in Eq. 22. One observes

$$\begin{aligned} E\left[ \sum _{i=1}^d\epsilon _{1i2i}^4\sum _{i=1}^d\epsilon _{1i3i}^4\right] =dE\left[ \epsilon _{1121}^4\epsilon _{1131}^4\right] +(d-1)E\left[ \epsilon _{1121}^4\right] ^2=\frac{23d}{3150}+\frac{d(d-1)}{225}. \end{aligned}$$
(23)

Similarly,

$$\begin{aligned} E\left[ \sum _{i=1}^d\epsilon _{1i3i}^4\sum _{i<j}^d \epsilon _{1i2i}^2\epsilon _{1j2j}^2\right]&=E\left[ \sum _{i=1}^d\epsilon _{1i2i}^4\sum _{i<j}^d \epsilon _{1i3i}^2\epsilon _{1j3j}^2\right] \nonumber \\&=\frac{19d(d-1)}{7560}+\frac{d(d-1)(d-2)}{1080} \end{aligned}$$
(24)

by symmetry. Consider the last term in Eq. 22. One observes

$$\begin{aligned} E\left[ \sum _{i<j}^d \epsilon _{1i2i}^2\epsilon _{1j2j}^2\sum _{i<j}^d \epsilon _{1i3i}^2\epsilon _{1j3j}^2\right]&=\left( {\begin{array}{c}d\\ 2\end{array}}\right) E\left[ \epsilon _{1121}^2\epsilon _{1222}^2\sum _{i<j}^d \epsilon _{1i3i}^2\epsilon _{1j3j}^2\right] \nonumber \\&=\left( {\begin{array}{c}d\\ 2\end{array}}\right) E\left[ \epsilon _{1121}^2\epsilon _{1222}^2\left\{ \epsilon _{1131}^2\epsilon _{1232}^2+\epsilon _{1131}^2\sum _{2<j}\epsilon _{1j3j}^2\right. \right. \nonumber \\&\quad \left. \left. +\epsilon _{1232}^2\sum _{2<j}\epsilon _{1j3j}^2+\sum _{2<i<j}\epsilon _{1i3i}^2\epsilon _{1j3j}^2\right\} \right] . \end{aligned}$$
(25)

From Eqs. 22 to 25, we have

$$\begin{aligned} \text {Cov}\left( e^4_{i,j},e^4_{i,k}\right)&=E\left[ e^4_{i,j}e^4_{i,k}\right] -E\left[ e^4_{i,j}\right] E\left[ e^4_{i,k}\right] \nonumber \\&={\left\{ \begin{array}{ll}\frac{701}{56700}, \quad &{}d=2,\\ \frac{29}{900}, \quad &{}d=3, \\ \frac{d^3}{1620}+\frac{167d^2}{113400}+\frac{29d}{37800}, \quad &{}d\ge 4,\end{array}\right. } \end{aligned}$$
(26)

Equation 17 becomes

$$\begin{aligned} \text {Cov}\left( \left( e^2_{i,j}-\frac{d}{6}\right) ^2,\left( e^2_{i,k}-\frac{d}{6}\right) ^2\right) ={\left\{ \begin{array}{ll}\frac{101}{56700}, \quad &{}d=2,\\ \frac{1}{350}, \quad &{}d=3, \\ \frac{d^2}{16200}+\frac{29d}{37800}, \quad &{}d\ge 4.\end{array}\right. } \end{aligned}$$
(27)

and

$$\begin{aligned} \text {Var}(\bar{S})={\left\{ \begin{array}{ll}\frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left( \frac{1}{56700}\left[ 989+202(n-2)\right] \right) , \quad &{}d=2,\\ \frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left( \frac{1}{1050}\left[ 37+6(n-2)\right] \right) , \quad &{}d=3, \\ \frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\left( \frac{49d^2}{16200}+\frac{101d}{37800}+2(n-2)\left[ \frac{d^2}{16200}+\frac{29d}{37800}\right] \right) , \quad&d\ge 4.\end{array}\right. } \end{aligned}$$
(28)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Modarres, R. Multivariate tests of uniformity. Stat Papers 58, 627–639 (2017). https://doi.org/10.1007/s00362-015-0715-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0715-x

Keywords

Mathematics Subject Classification

Navigation