Advertisement

Hypothesis Testing for High-Dimensional Data

  • Wei Biao Wu
  • Zhipeng Lou
  • Yuefeng Han
Chapter
Part of the Springer Handbooks of Computational Statistics book series (SHCS)

Abstract

We present a systematic theory for tests for means of high-dimensional data. Our testing procedure is based on an invariance principle which provides distributional approximations of functionals of non-Gaussian vectors by those of Gaussian ones. Differently from the widely used Bonferroni approach, our procedure is dependence-adjusted and has an asymptotically correct size and power. To obtain cutoff values of our test, we propose a half-sampling method which avoids estimating the underlying covariance matrix of the random vectors. The latter method is shown via extensive simulations to have an excellent performance.

Keywords

Gaussian approximation Goodness-of-Fit Test Half-sampling High-dimensional data Hypothesis testing Large p small n Rademacher weighted differencing 

References

  1. Ahmad MR (2010) Tests for covariance matrices, particularly for high dimensional data. Technical Reports, Department of Statistics, University of Munich. http://epub.ub.uni-muenchen.de/11840/1/tr091.pdf. Accessed 3 Apr 2018
  2. Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329Google Scholar
  3. Bai ZD, Jiang DD, Yao JF, Zheng SR (2009) Corrections to LRT on large-dimensional covariance matrix by RMT. Ann Stat 37:3822–3840MathSciNetCrossRefGoogle Scholar
  4. Bickel PJ, Levina E (2008a) Regularized estimation of large covariance matrices. Ann Stat 36:199–227MathSciNetCrossRefGoogle Scholar
  5. Bickel PJ, Levina E (2008b) Covariance regularization by thresholding. Ann Stat 36:2577–2604MathSciNetCrossRefGoogle Scholar
  6. Birke M, Dette H (2005) A note on testing the covariance matrix for large dimension. Stat Probab Lett 74:281–289MathSciNetCrossRefGoogle Scholar
  7. Brent RP, Osborn JH, Smith WD (2015) Probabilistic lower bounds on maxima determinants of binary matrices. Available at http://arxiv.org/pdf/1501.06235. Accessed 3 Apr 2018
  8. Cai Y, Ma ZM (2013) Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19:2359–2388MathSciNetCrossRefGoogle Scholar
  9. Cai T, Liu WD, Luo X (2011) A constrained l 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106:594–607MathSciNetCrossRefGoogle Scholar
  10. Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835MathSciNetCrossRefGoogle Scholar
  11. Chen SX, Zhang L-X, Zhong P-S (2010) Tests for high-dimensional covariance matrices. J Am Stat Assoc 105:810–819MathSciNetCrossRefGoogle Scholar
  12. Chen XH, Shao QM, Wu WB, Xu LH (2016) Self-normalized Cramér type moderate deviations under dependence. Ann Stat 44:1593–1617MathSciNetCrossRefGoogle Scholar
  13. Chernozhukov V, Chetverikov D, Kato K (2014) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann Stat 41:2786–2819MathSciNetCrossRefGoogle Scholar
  14. Dickhaus T (2014) Simultaneous statistical inference: with applications in the life sciences. Springer, HeidelbergCrossRefGoogle Scholar
  15. Dudiot S, van der Laan M (2008) Multiple testing procedures with applications to genomics. Springer, New YorkGoogle Scholar
  16. Efron B (2010) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, CambridgeGoogle Scholar
  17. Fan J, Hall P, Yao Q (2007) To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied. J Am Stat Assoc 102:1282–1288MathSciNetCrossRefGoogle Scholar
  18. Fan J, Liao Y, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal complements. J R Stat Soc Ser B Stat Methodol 75:603–680MathSciNetCrossRefGoogle Scholar
  19. Fisher TJ, Sun XQ, Gallagher CM (2010) A new test for sphericity of the covariance matrix for high dimensional data. J Multivar Anal 101:2554–2570MathSciNetCrossRefGoogle Scholar
  20. Georgiou S, Koukouvinos C, Seberry J (2003) Hadamard matrices, orthogonal designs and construction algorithms. In: Designs 2002: further computational and constructive design theory, vols 133–205. Kluwer, BostonCrossRefGoogle Scholar
  21. Han YF, Wu WB (2017) Test for high dimensional covariance matrices. Submitted to Ann StatGoogle Scholar
  22. Hedayat A, Wallis WD (1978) Hadamard matrices and their applications. Ann Stat 6:1184–1238MathSciNetCrossRefGoogle Scholar
  23. Jiang TF (2004) The asymptotic distributions of the largest entries of sample correlation matrices. Ann Appl Probab 14:865–880MathSciNetCrossRefGoogle Scholar
  24. Jiang DD, Jiang TF, Yang F (2012) Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J Stat Plann Inference 142:2241–2256MathSciNetCrossRefGoogle Scholar
  25. Ledoit O, Wolf M (2002) Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann Stat 30:1081–1102MathSciNetCrossRefGoogle Scholar
  26. Liu WD, Shao QM (2013) A Cramér moderate deviation theorem for Hotelling’s T 2-statistic with applications to global tests. Ann Stat 41:296–322CrossRefGoogle Scholar
  27. Lou ZP, Wu WB (2018) Construction of confidence regions in high dimension (Paper in preparation)Google Scholar
  28. Marčenko VA, Pastur LA (1967) Distribution of eigenvalues for some sets of random matrices. Math U S S R Sbornik 1:457–483CrossRefGoogle Scholar
  29. Onatski A, Moreira MJ, Hallin M (2013) Asymptotic power of sphericity tests for high-dimensional data. Ann Stat 41:1204–1231MathSciNetCrossRefGoogle Scholar
  30. Portnoy S (1986) On the central limit theorem in \(\mathbb {R}^p\) when p →. Probab Theory Related Fields 73:571–583Google Scholar
  31. Qu YM, Chen SX (2012) Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann Stat 40:1285–1314MathSciNetCrossRefGoogle Scholar
  32. Schott JR (2005) Testing for complete independence in high dimensions. Biometrika 92:951–956MathSciNetCrossRefGoogle Scholar
  33. Schott JR (2007) A test for the equality of covariance matrices when the dimension is large relative to the sample size. Comput Stat Data Anal 51:6535–6542MathSciNetCrossRefGoogle Scholar
  34. Srivastava MS (2005) Some tests concerning the covariance matrix in high-dimensional data. J Jpn Stat Soc 35:251–272MathSciNetCrossRefGoogle Scholar
  35. Srivastava MS (2009) A test for the mean vector with fewer observations than the dimension under non-normality. J Multivar Anal 100:518–532MathSciNetCrossRefGoogle Scholar
  36. Veillette MS, Taqqu MS (2013) Properties and numerical evaluation of the Rosenblatt distribution. Bernoulli 19:982–1005MathSciNetCrossRefGoogle Scholar
  37. Wu WB (2005) Nonlinear system theory: another look at dependence. Proc Natl Acad Sci USA 102:14150–14154 (electronic)MathSciNetCrossRefGoogle Scholar
  38. Wu WB (2011) Asymptotic theory for stationary processes. Stat Interface 4:207–226MathSciNetCrossRefGoogle Scholar
  39. Wu WB, Shao XF (2004) Limit theorems for iterated random functions. J Appl Probab 41:425–436MathSciNetCrossRefGoogle Scholar
  40. Xiao H, Wu WB (2013) Asymptotic theory for maximum deviations of sample covariance matrix estimates. Stoch Process Appl 123:2899–2920MathSciNetCrossRefGoogle Scholar
  41. Xu M, Zhang DN, Wu WB (2014) L 2 asymptotics for high-dimensional data. Available at http://arxiv.org/pdf/1405.7244v3. Accessed 3 Apr 2018
  42. Yarlagadda RK, Hershey JE (1997) Hadamard matrix analysis and synthesis. Kluwer, BostonCrossRefGoogle Scholar
  43. Zhang RM, Peng L, Wang RD (2013) Tests for covariance matrix with fixed or divergent dimension. Ann Stat 41:2075–2096MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of ChicagoChicagoUSA

Personalised recommendations