Multivariate power series interpoint distances

Abstract

We establish (a) the probability mass function of the interpoint distance (IPD) between random vectors that are drawn from the multivariate power series family of distributions (MPSD); (b) obtain the distribution of the IPD within one sample and across two samples from this family; (c) determine the distribution of the MPSD Euclidean norm and distance from fixed points in \({\mathbb {Z}}^d\); and (d) provide the distribution of the IPDs of vectors drawn from a mixture of the MPSD distributions. We present a method for testing the homogeneity of MPSD mixtures using the sample IPDs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88(1):190–206

    MathSciNet  MATH  Article  Google Scholar 

  2. Barni M, Cappellini V, Mecocci A (1994) Fast vector median filter based on Euclidean norm approximation. IEEE Signal Process Lett 1(6):92–94

    Article  Google Scholar 

  3. Berrendero JR, Cuevas A, Pateiro-Lòpez B (2016) Shape classification based on interpoint distance distributions. J Multivar Anal 146:237–247

    MathSciNet  MATH  Article  Google Scholar 

  4. Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171

    MathSciNet  MATH  Article  Google Scholar 

  5. Friedman JH, Rafsky LC (1979) Multivariate Generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717

    MathSciNet  MATH  Article  Google Scholar 

  6. Guo L, Modarres R (2018) Interpoint distance classification of high dimensional discrete observations. Int Stat Rev. https://doi.org/10.1111/insr.12281

    Article  Google Scholar 

  7. Hall P, Tajvidi N (2002) Permutation tests for equality of distributions in high-dimensional settings. Biometrika 89(2):359–374

    MathSciNet  MATH  Article  Google Scholar 

  8. Hall P, Titterington DM, Xue JH (2009) Median-based classifiers for high-dimensional data. J Am Stat Assoc 104(488):1597–1608

    MathSciNet  MATH  Article  Google Scholar 

  9. Henze N, Penrose MD (1999) On the multivariate runs test. Ann Stat 27(1):290–298

    MathSciNet  MATH  Article  Google Scholar 

  10. Joshi SW, Patil GP (1970) Certain structural properties of the sum-symmetric power series distributions. Indian J Stat Ser A 33(2):175–184

    MathSciNet  MATH  Google Scholar 

  11. Jureckovà J, Kalina J (2012) Nonparametric multivariate rank tests and their unbiasedness. Bernoulli 18(1):229–251

    MathSciNet  MATH  Article  Google Scholar 

  12. Kolesnik AD (2014) The explicit probability distribution of the sum of two telegraph processes. Stoch Dyn 15(2). arXiv:1402.6866

  13. Lance GN, Williams WT (2007) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9(4):373–380

    Article  Google Scholar 

  14. Liu Z, Modarres R (2011) A triangle test for equality of distribution functions in high dimensions. J Nonparametric Stat 23(3):605–615

    MathSciNet  MATH  Article  Google Scholar 

  15. Lok WS, Lee SMS (2011) A new statistical depth function with applications to multimodal data. J Nonparametric Stat 23(3):617–631

    MathSciNet  MATH  Article  Google Scholar 

  16. Lukens MW (2004) Examination of statistical outlier structure in high dimension using interpoint distance densities and multivariate rankings. Unpublished PhD dissertation, George Mason University

  17. Marozzi M (2015) Multivariate multidistance tests for high-dimensional low sample size case-control studies. Stat Med 34(9):1511–1526

    MathSciNet  Article  Google Scholar 

  18. Marozzi M (2016) Multivariate tests based on interpoint distances with application to magnetic resonance imaging. Stat Methods Med Res 25(6):2593–2610

    MathSciNet  Article  Google Scholar 

  19. Modarres R (2013) On the interpoint distances of Bernoulli vectors. Stat Probab Lett 84:215–222

    MathSciNet  MATH  Article  Google Scholar 

  20. Niu X, Li P, Zhang P (2011) Testing homogeneity in a multivariate mixture model. Can J Stat 39(2):218–238

    MathSciNet  MATH  Article  Google Scholar 

  21. Nijenhuis A, Wilf HS (1978) Combinatorial algorithms, 2nd edn. Academic Press, New York

    Google Scholar 

  22. Noack A (1950) A class of random variables with discrete distributions. Ann Math Stat 21(1):127–132

    MathSciNet  MATH  Article  Google Scholar 

  23. Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Shape distributions. ACM Trans Gr 21(4):807–832

    MathSciNet  MATH  Article  Google Scholar 

  24. Patil GP (1968) On sampling with replacement from populations with multiple characters. Indian J Stat Ser B 30(3/4):355–366

    MathSciNet  Google Scholar 

  25. Ripley BD (1977) Modeling spatial patterns. J R Stat Soc Ser B 39:172–192

    Google Scholar 

  26. Royle JA, Link WA (2005) A general class of multinomial mixture models for anuran calling survey data. Ecol Soc Am 86(9):2505–2512

    Google Scholar 

  27. Rosenblum PR (2005) An exact distribution free test comparing two multivariate distributions based on adjacency. J R Stat Soc Ser B 67:515–530

    MathSciNet  Article  Google Scholar 

  28. Shin W (1987) A multinomial change-point theory in the context of diagnosis code searching. Unpublished PhD dissertation, Texas Tech University

  29. Shurygin AM (2006) Using interpoint distances for pattern recognition. Pattern Recognit Image Anal 16(4):726–729

    Article  Google Scholar 

  30. Sibuya M, Yoshimura I, Shimizu R (1964) Negative multinomial distribution. Ann Inst Stat Math 16(1):409–426

    MathSciNet  MATH  Article  Google Scholar 

  31. Silverman B, Brown T (1978) Short distances, flat triangles and Poisson limits. J Appl Probab 15(4):816–826

    MathSciNet  MATH  Article  Google Scholar 

  32. Song Y (2018) Interpoint distance distributions and their applications. Unpublished PhD dissertation, George Washington University

  33. Szèkely GJ, Rizzo ML (2013) Energy statistics: statistics based on distances. J Stat Plann Inference 143:1249–1272

    MathSciNet  MATH  Article  Google Scholar 

  34. Waller LA, Zelterman D (1997) Log-linear modeling with the negative multinomial distribution. Biometrics 53(3):971–982

    MathSciNet  MATH  Article  Google Scholar 

  35. Zhang P, Wang X, Song PXK (2006) Clustering categorical data based on distance vectors. J Am Stat Assoc 101(473):355–367

    MathSciNet  MATH  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Reza Modarres.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 157 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Modarres, R., Song, Y. Multivariate power series interpoint distances. Stat Methods Appl 29, 955–982 (2020). https://doi.org/10.1007/s10260-020-00508-8

Download citation

Keywords

  • MPSD family
  • Interpoint distance
  • Normand mixtures

Mathematics Subject Classification

  • 62H10
  • 62E15
  • 62H15