Science China Mathematics

, Volume 62, Issue 5, pp 961–978

# A combined p-value test for the mean difference of high-dimensional data

• Wei Yu
• Wangli Xu
• Lixing Zhu
Articles

## Abstract

This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.

## Keywords

high-dimensional data equality of means multiple hypothesis testing sparse alternatives

47N30 65C05

## Notes

### Acknowledgments

This work was supported by a grant from the University Grants Council of Hong Kong, National Natural Science Foundation of China (Grant No. 11471335), the Ministry of Education project of Key Research Institute of Humanities and Social Sciences at Universities (Grant No. 16JJD910002), and Fund for Building World-Class Universities (Disciplines) of Renmin University of China. The authors thank two referees for their constructive comments that led to an improvement of an early version of the article.

## References

1. 1.
Bai Z, Saranadasa H. Effect of high dimension: By an example of a two-sample problem. Statist Sinica, 1996, 6: 311–329
2. 2.
Bennett B M. Note on a solution of the generalized Behrens-Fisher problem. Ann Inst Statist Math, 1951, 2: 87–90
3. 3.
Cai T, Liu W D. Adaptive thresholding for sparse covariance matrix estimation. J Amer Statist Assoc, 2011, 106: 672–684
4. 4.
Cai T, Liu W D, Luo X. A constrained l1 minimization approach to sparse precision matrix estimation. J Amer Statist Assoc, 2011, 106: 594–607
5. 5.
Cai T, Liu W D, Xia Y. Two-sample test of high dimensional means under dependency. J R Stat Soc Ser B Stat Methodol, 2014, 76: 349–372
6. 6.
Chakraborty A K, Chatterjee M. On multivariate folded normal distribution. Sankhyā, 2013, 75: 1–15
7. 7.
Chen S X, Qin Y L. A two-sample test for high-dimensional data with applications to gene-set testing. Ann Statist, 2010, 38: 808–835
8. 8.
David H A, Nagaraja H N. Order Statistics, 3rd ed. Hoboken: John Wiley & Sons, 2003
9. 9.
Dong Z C, Yu W, Xu W L. A modified combined p-value multiple test. J Stat Comput Simul, 2015, 85: 2479–2490
10. 10.
Dudbridge F, Koeleman B P C. Rank truncated product of P values, with application to genomewide association scans. Genet Epidemiol, 2003, 25: 360–366
11. 11.
Feng L, Zou C L, Wang Z J, et al. Two-sample Behrens-Fisher problem for high-dimensional data. Statist Sinica, 2015, 25: 1297–1312
12. 12.
Fisher R A. Statistical Methods for Research Workers. London: Oliver and Boyd, 1932
13. 13.
Gregory K B, Carroll R J, Baladandayuthapani V, et al. A two-sample test for equality of means in high dimension. J Amer Statist Assoc, 2015, 110: 837–849
14. 14.
Gupta A K, Nadarajah S. Handbook of Beta Distribution and Its Applications. New York: Marcel Dekker, 2004
15. 15.
Hall P, Jing B-Y, Lahiri S N. On the sampling window method for long-range dependent data. Statist Sinica, 1998, 8: 1189–1204
16. 16.
Hu X J, Gadbury G L, Xiang Q F, et al. Illustrations on using the distribution of a P-value in high-dimensional data analyses. Adv Appl Stat Sci, 2010, 1: 191–213
17. 17.
Sheng X, Yang J. An adaptive truncated product method for combining dependent p-values. Economics Letters, 2013, 119: 180–182
18. 18.
Srivastava M. Multivariate theory for analyzing high dimensional data. J Japan Statist Soc, 2007, 37: 53–86
19. 19.
Tsanas A, Little M A, Fox C, et al. Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease. IEEE Trans Neural Syst Rehabil Eng, 2014, 22: 181–190
20. 20.
Yu K, Li Q, Bergen W A, et al. Pathway analysis by adaptive combination of P-values. Genet Epidemiol, 2009, 22: 170–185Google Scholar
21. 21.
Zaykin D V, Zhivotovsky L A, Westfall P H, et al. Truncated product method for combining P-values. Genet Epidemiol, 2002, 22: 170–185
22. 22.
Zhang S, Chen H, Pfeiffer R M. A combined p-value test for multiple hypothesis testing. J Statist Plann Inference, 2013, 143: 764–770