Testing equality of a large number of densities under mixing conditions

Abstract

In certain settings, such as microarray data, the sampling information is formed by a large number of possibly dependent small data sets. In special applications, for example in order to perform clustering, the researcher aims to verify whether all data sets have a common distribution. For this reason we propose a formal test for the null hypothesis that all data sets come from a single distribution. The asymptotic setting is that in which the number of small data sets goes to infinity, while the sample size remains fixed. The asymptotic null distribution of the proposed test is derived under mixing conditions on the sequence of small data sets, and the power properties of our test under two reasonable fixed alternatives are investigated. A simulation study is conducted, showing that the test respects the nominal level, and that it has a power which tends to 1 when the number of data sets tends to infinity. An illustration involving microarray data is provided.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. Bücher A, Kojadinovic I (2016a) A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing. Bernoulli 22:927–968

    MathSciNet  Article  Google Scholar 

  2. Bücher A, Kojadinovic I (2016b) Dependent multiplier bootstrap for non-degenerate \(U\)-statistics under mixing conditions with applications. J Stat Plan Inference 170:83–105

    MathSciNet  Article  Google Scholar 

  3. Bühlmann P (1993) The blockwise bootstrap in time series and empirical processes (Ph.D. thesis), ETH Zürich, Diss. ETH No. 10354

  4. Cousido-Rocha M, de Uña-Álvarez J, Hart J (2018) Equalden.HD: testing the equality of a high dimensional set of densities. R package version 1.0. CRAN package repository: https://cran.r-project.org/web/packages/Equalden.HD/index.html

  5. Dehling H, Wendler M (2010) Central limit theorem and the bootstrap for \(U\)-statistics of strongly mixing data. J Multivar Anal 101:126–137

    MathSciNet  Article  Google Scholar 

  6. Dehling H, Fried R, Garcia I, Wendler M (2015) Change-point detection under dependence based on two-sample \(U\)-statistics. Asymptotic laws and method in stochastics, a volume in Honour of Miklos Csrg, pp 195–220

  7. Dey-Rao R, Sinha AA (2017) Genome-wide gene expression dataset used to identify potential therapeutic targets in androgenetic alopecia. Data Brief 13:85–87

    Article  Google Scholar 

  8. Doukhan P (1995) Mixing: properties and examples. Springer, New York

    Google Scholar 

  9. Fan J, Yao Q (2003) Non linear time series: nonparametric and parametric methods. Springer, New York

    Google Scholar 

  10. Hahn M (2006) Proceedings of the SMBE Tri-National Young Investigators’ Workshop 2005. Accurate inference and estimation in population genomics. Mol Biol Evol 23:911–8

    Article  Google Scholar 

  11. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi O, Wilfond B, Borg A, Trent J, Raffeld M, Yakhini Z, BenDor A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger G, Loman N, Johannsson O, Olsson H, Sauter G (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548

    Article  Google Scholar 

  12. Koren A, Tirosh I, Barkai N (2007) Autocorrelation analysis reveals widespread spatial biases in microarray experiments. BMC Genomics 8:164

    Article  Google Scholar 

  13. Künsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17(3):1217–1241

    MathSciNet  Article  Google Scholar 

  14. Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: Lepage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York

    Google Scholar 

  15. Marmer V (2016) Lecture notes on econometric theory II: Lecture 7, adapted from Peter Phillips’ lecture notes on stationarity and NSTS, 1995, and H. White, 1999, asymptotic theory for econometricians, Academic Press. UBC Vancouver School of Economics, Econ627. http://faculty.arts.ubc.ca/vmarmer/econ627/627_07_2.pdf

  16. Neumann MH, Paparoditis E (2000) On bootstrapping \(L_2\)-type statistics in density testing. Stat Probab Lett 50:137–147

    Article  Google Scholar 

  17. Priestley MB (1981) Spectral analysis and time series. Academic Press, New York

    Google Scholar 

  18. Politis DN (2002) Adaptive bandwidth choice. https://pdfs.semanticscholar.org/c8d5/4df33343c6550HrB85f867e82a1861e9d510dcd.pdfHrB. Accessed 13 Feb 2017

  19. Politis DN, Romano JP (1994) Bias-corrected nonparametric spectral estimation II. Technical Report #94-5

  20. Quessy JF, Éthier F (2012) Cramér–von Mises and characteristic function tests for the two and \(k\)-sample problems with dependent data. Comput Stat Data Anal 56:2097–2111

    Article  Google Scholar 

  21. van der Vaart AW, Wellner JA (2000) Weak convergence and empirical processes, 2nd edn. Springer, New York

    Google Scholar 

  22. Zhan D, Hart J (2014) Testing equality of a large number of densities. Biometrika 101:449–464

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

This work has received financial support of the Call 2015 Grants for Ph.D. contracts for training of doctors of the Ministry of Economy and Competitiveness, cofinanced by the European Social Fund (Ref. BES-2015-074958). We acknowledge support from MTM2014-55966-P project, Ministry of Economy and Competitiveness, and MTM2017-89422-P project, Ministry of Economy, Industry and Competitiveness, State Research Agency, and Regional Development Fund, UE. We also acknowledge the financial support provided by the SiDOR research group through the grant Competitive Reference Group, 2016–2019 (ED431C 2016/040), funded by the “Consellería de Cultura, Educación e Ordenación Universitaria. Xunta de Galicia.” To finish, the first author would like to thank the University of Vigo, and its Escola Internacional de Doutoramento (EIDO) by the financial support provided through mobility doctorate grants. The authors also thank Professors Raymond J. Carroll and Robert Chapkin for allowing use of their data.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Marta Cousido-Rocha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Materials:

Supplementary Material includes formal definitions of mixing dependence, stationarity and regularity conditions needed for the technical results, a remark about Theorem 5, the proof of Theorem 6, an additional real data analysis, and additional simulation results. (pdf 394KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cousido-Rocha, M., de Uña-Álvarez, J. & Hart, J.D. Testing equality of a large number of densities under mixing conditions. TEST 28, 1203–1228 (2019). https://doi.org/10.1007/s11749-018-00625-3

Download citation

Keywords

  • Dependent data
  • Kernel density estimation
  • k-Sample problem
  • Smooth tests
  • U-statistics

Mathematics Subject Classification

  • 62G10