Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control

Xia, Yin

doi:10.1007/s11749-017-0533-7

Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control

Original Paper
Published: 27 March 2017

Volume 26, pages 782–801, (2017)
Cite this article

TEST Aims and scope Submit manuscript

Yin Xia ORCID: orcid.org/0000-0001-9784-8742¹

498 Accesses
3 Citations
Explore all metrics

Abstract

Motivated by applications in genomics, we study in this paper four interrelated high-dimensional hypothesis testing problems on dependence structures among multiple populations. A new test statistic is constructed for testing the global hypothesis that multiple covariance matrices are equal, and its limiting null distribution is established. Correction methods are introduced to improve the accuracy of the test for finite samples. It is shown that the proposed tests are powerful against sparse alternatives and enjoy certain optimality properties. We then propose a multiple testing procedure for simultaneously testing the equality of the entries of the covariance matrices across multiple populations. The proposed method is shown to control the false discovery rate. A simulation study demonstrates that the proposed tests maintain the desired error rates under the null and have good power under the alternative. The methods are also applied to a Novartis multi-tissue analysis. In addition, testing and support recovery of submatrices of multiple covariance matrices are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypothesis tests for high-dimensional covariance structures

Article 01 August 2020

Empirical likelihood test for the equality of several high-dimensional covariance matrices

Article 07 April 2021

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

Article Open access 31 July 2019

References

Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley-Intersceince, New York
MATH Google Scholar
Bagirov AM, Mardaneh K (2006) Modified global k-means algorithm for clustering in gene expression data sets. In: Proceedings of the 2006 workshop on Intelligent systems for bioinformatics, Vol 73. Australian Computer Society, Inc, pp 23–28
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Article MathSciNet MATH Google Scholar
Birnbaum A, Nadler B (2012) High dimensional sparse covariance estimation: accurate thresholds for the maximal diagonal entry and for the largest correlation coefficient. Technical report
Cai TT, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106:1566–1577
Article MathSciNet MATH Google Scholar
Cai TT, Liu W (2016) Large-scale multiple testing of correlations. J Am Stat Assoc 111(513):229–240
Article MathSciNet Google Scholar
Cai TT, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108(501):265–277
Article MathSciNet MATH Google Scholar
Cai TT, Xia Y (2014) High-dimensional sparse MANOVA. J Multivar Anal 131:174–196
Article MathSciNet MATH Google Scholar
De Souto, MC, Silva S, Bittencourt VG, De Araujo DS (2005) Cluster ensemble for gene expression microarray data. In: Neural networks, 2005. IJCNN’05. proceedings. 2005 IEEE international joint conference on, Vol 1, pp 487–492. IEEE
Fujikoshi Y, Himeno T, Wakaki H (2004) Asymptotic results of a high dimensional MANOVA test and power comparison when the dimension is large compared to the sample size. J Jpn Stat Soc 34(1):19–26
Article MathSciNet MATH Google Scholar
Goeman JJ, Van De Geer SA, De Kort F, Van Houwelingen HC (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20(1):93–99
Article Google Scholar
Hall P (1991) On convergence rates of suprema. Probab Theory Relat Fields 89(4):447–455
Article MathSciNet MATH Google Scholar
Ho JW, Stefani M, dos Remedios CG, Charleston MA (2008) Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24(13):i390–i398
Article Google Scholar
Hu R, Qiu X, Glazko G (2010) A new gene selection procedure based on the covariance distance. Bioinformatics 26(3):348–354
Article Google Scholar
Hu R, Qiu X, Glazko G, Klebanov L, Yakovlev A (2009) Detecting intergene correlation changes in microarray analysis: a new approach to gene selection. BMC bioinformatics 10(1):20
Article Google Scholar
Huckemann S, Hotz T, Munk A (2010) Intrinsic MANOVA for Riemannian manifolds with an application to Kendall’s space of planar shapes. IEEE Trans Pattern Anal Mach Intell 32(4):593–603
Article Google Scholar
Li J, Chen SX (2012) Two sample tests for high-dimensional covariance matrices. Ann Stat 40(2):908–940
Article MathSciNet MATH Google Scholar
Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Hayward NK, Montgomery GW, Visscher PM, Martin NG et al (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87(1):139–145
Article Google Scholar
Liu W (2013) Gaussian graphical model estimation with false discovery rate control. Ann Stat 41(6):2948–2978
Article MathSciNet MATH Google Scholar
Liu W-D, Lin Z, Shao Q-M (2008) The asymptotic distribution and Berry-Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann Appl Probab 18(6):2337–2366
Article MathSciNet MATH Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118
Article MATH Google Scholar
Schott JR (2007a) Some high-dimensional tests for a one-way MANOVA. J Multivar Anal 98(9):1825–1839
Article MathSciNet MATH Google Scholar
Schott JR (2007b) A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput Stati Data Anal 51(12):6535–6542
Article MathSciNet MATH Google Scholar
Shedden K, Taylor J (2005) Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. In: Methods of microarray data analysis, Springer, pp 121–131
Shen Y, Lin Z, Zhu J (2011) Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis. Comput Stat Data Anal 55(7):2221–2233
Article MathSciNet MATH Google Scholar
Srivastava MS (2007) Multivariate theory for analyzing high dimensional data. J Jpn Stat Soc 37(1):53–86
Article MathSciNet MATH Google Scholar
Srivastava MS, Yanagihara H (2010) Testing the equality of several covariance matrices with fewer observations than the dimension. J Multivar Anal 101(6):1319–1329
Article MathSciNet MATH Google Scholar
Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE et al (2004) A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306(5696):655–660
Article Google Scholar
Storey JD, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J R Stat Soc Ser B (Stat Methodol) 66(1):187–205
Article MathSciNet MATH Google Scholar
Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A et al (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Nat Acad Sci 99(7):4465–4470
Article Google Scholar
Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Stat Methodol) 71(2):393–424
Article MathSciNet MATH Google Scholar
Sun W, Reich BJ, Tony Cai T, Guindani M, Schwartzman A (2015) False discovery control in large-scale spatial multiple testing. J R Stat Soc Ser B (Stat Methodol) 77(1):59–83
Article MathSciNet Google Scholar
Tsai C-A, Chen JJ (2009) Multivariate analysis of variance test for gene set analysis. Bioinformatics 25(7):897–903
Article Google Scholar
Wu WB (2008) On false discovery control under dependence. Ann Stat 36:364–380
Article MathSciNet MATH Google Scholar
Xia Y, Cai T, Cai TT (2015) Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102:247–266
Article MathSciNet MATH Google Scholar
Xia Y, Cai T, Cai TT (2017) Multiple testing of submatrices with applications to identification of between pathway interactions. J. Amer. Stat. Assoc. doi:10.1080/01621459.2016.1251930
Yu Z, Wongb H-S, You J, Yang Q, Liao H (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans NanoBioscience 10(2):76–85
Article Google Scholar

Download references

Acknowledgements

Funding was provided by “The Recruitment Program of Global Experts” Youth Project, the startup fund from Fudan University and National Science Foundation of China (Grant No. 11690013)

Author information

Authors and Affiliations

Department of Statistics, School of Management, Fudan University, Shanghai, China
Yin Xia

Authors

Yin Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin Xia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 632 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, Y. Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control. TEST 26, 782–801 (2017). https://doi.org/10.1007/s11749-017-0533-7

Download citation

Received: 22 August 2016
Accepted: 20 March 2017
Published: 27 March 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11749-017-0533-7

Keywords

Mathematics Subject Classification

62H15

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control

Abstract

Access this article

Similar content being viewed by others

Hypothesis tests for high-dimensional covariance structures

Empirical likelihood test for the equality of several high-dimensional covariance matrices

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 632 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control

Abstract

Access this article

Similar content being viewed by others

Hypothesis tests for high-dimensional covariance structures

Empirical likelihood test for the equality of several high-dimensional covariance matrices

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 632 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation