Science China Mathematics

, Volume 62, Issue 5, pp 961–978 | Cite as

A combined p-value test for the mean difference of high-dimensional data

  • Wei Yu
  • Wangli Xu
  • Lixing ZhuEmail author


This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.


high-dimensional data equality of means multiple hypothesis testing sparse alternatives 


47N30 65C05 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work was supported by a grant from the University Grants Council of Hong Kong, National Natural Science Foundation of China (Grant No. 11471335), the Ministry of Education project of Key Research Institute of Humanities and Social Sciences at Universities (Grant No. 16JJD910002), and Fund for Building World-Class Universities (Disciplines) of Renmin University of China. The authors thank two referees for their constructive comments that led to an improvement of an early version of the article.


  1. 1.
    Bai Z, Saranadasa H. Effect of high dimension: By an example of a two-sample problem. Statist Sinica, 1996, 6: 311–329MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bennett B M. Note on a solution of the generalized Behrens-Fisher problem. Ann Inst Statist Math, 1951, 2: 87–90MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Cai T, Liu W D. Adaptive thresholding for sparse covariance matrix estimation. J Amer Statist Assoc, 2011, 106: 672–684MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Cai T, Liu W D, Luo X. A constrained l1 minimization approach to sparse precision matrix estimation. J Amer Statist Assoc, 2011, 106: 594–607MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Cai T, Liu W D, Xia Y. Two-sample test of high dimensional means under dependency. J R Stat Soc Ser B Stat Methodol, 2014, 76: 349–372MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chakraborty A K, Chatterjee M. On multivariate folded normal distribution. Sankhyā, 2013, 75: 1–15MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chen S X, Qin Y L. A two-sample test for high-dimensional data with applications to gene-set testing. Ann Statist, 2010, 38: 808–835MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    David H A, Nagaraja H N. Order Statistics, 3rd ed. Hoboken: John Wiley & Sons, 2003CrossRefGoogle Scholar
  9. 9.
    Dong Z C, Yu W, Xu W L. A modified combined p-value multiple test. J Stat Comput Simul, 2015, 85: 2479–2490MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dudbridge F, Koeleman B P C. Rank truncated product of P values, with application to genomewide association scans. Genet Epidemiol, 2003, 25: 360–366CrossRefGoogle Scholar
  11. 11.
    Feng L, Zou C L, Wang Z J, et al. Two-sample Behrens-Fisher problem for high-dimensional data. Statist Sinica, 2015, 25: 1297–1312MathSciNetzbMATHGoogle Scholar
  12. 12.
    Fisher R A. Statistical Methods for Research Workers. London: Oliver and Boyd, 1932zbMATHGoogle Scholar
  13. 13.
    Gregory K B, Carroll R J, Baladandayuthapani V, et al. A two-sample test for equality of means in high dimension. J Amer Statist Assoc, 2015, 110: 837–849MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Gupta A K, Nadarajah S. Handbook of Beta Distribution and Its Applications. New York: Marcel Dekker, 2004CrossRefzbMATHGoogle Scholar
  15. 15.
    Hall P, Jing B-Y, Lahiri S N. On the sampling window method for long-range dependent data. Statist Sinica, 1998, 8: 1189–1204MathSciNetzbMATHGoogle Scholar
  16. 16.
    Hu X J, Gadbury G L, Xiang Q F, et al. Illustrations on using the distribution of a P-value in high-dimensional data analyses. Adv Appl Stat Sci, 2010, 1: 191–213MathSciNetzbMATHGoogle Scholar
  17. 17.
    Sheng X, Yang J. An adaptive truncated product method for combining dependent p-values. Economics Letters, 2013, 119: 180–182MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Srivastava M. Multivariate theory for analyzing high dimensional data. J Japan Statist Soc, 2007, 37: 53–86MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Tsanas A, Little M A, Fox C, et al. Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease. IEEE Trans Neural Syst Rehabil Eng, 2014, 22: 181–190CrossRefGoogle Scholar
  20. 20.
    Yu K, Li Q, Bergen W A, et al. Pathway analysis by adaptive combination of P-values. Genet Epidemiol, 2009, 22: 170–185Google Scholar
  21. 21.
    Zaykin D V, Zhivotovsky L A, Westfall P H, et al. Truncated product method for combining P-values. Genet Epidemiol, 2002, 22: 170–185CrossRefGoogle Scholar
  22. 22.
    Zhang S, Chen H, Pfeiffer R M. A combined p-value test for multiple hypothesis testing. J Statist Plann Inference, 2013, 143: 764–770MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany 2018

Authors and Affiliations

  1. 1.Center for Applied Statistics, School of StatisticsRenmin University of ChinaBeijingChina
  2. 2.School of StatisticsBeijing Normal UniversityBeijingChina
  3. 3.Department of MathematicsHong Kong Baptist UniversityHong KongChina

Personalised recommendations