Skip to main content
Log in

Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Motivated by applications in genomics, we study in this paper four interrelated high-dimensional hypothesis testing problems on dependence structures among multiple populations. A new test statistic is constructed for testing the global hypothesis that multiple covariance matrices are equal, and its limiting null distribution is established. Correction methods are introduced to improve the accuracy of the test for finite samples. It is shown that the proposed tests are powerful against sparse alternatives and enjoy certain optimality properties. We then propose a multiple testing procedure for simultaneously testing the equality of the entries of the covariance matrices across multiple populations. The proposed method is shown to control the false discovery rate. A simulation study demonstrates that the proposed tests maintain the desired error rates under the null and have good power under the alternative. The methods are also applied to a Novartis multi-tissue analysis. In addition, testing and support recovery of submatrices of multiple covariance matrices are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley-Intersceince, New York

    MATH  Google Scholar 

  • Bagirov AM, Mardaneh K (2006) Modified global k-means algorithm for clustering in gene expression data sets. In: Proceedings of the 2006 workshop on Intelligent systems for bioinformatics, Vol 73. Australian Computer Society, Inc, pp 23–28

  • Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188

    Article  MathSciNet  MATH  Google Scholar 

  • Birnbaum A, Nadler B (2012) High dimensional sparse covariance estimation: accurate thresholds for the maximal diagonal entry and for the largest correlation coefficient. Technical report

  • Cai TT, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106:1566–1577

    Article  MathSciNet  MATH  Google Scholar 

  • Cai TT, Liu W (2016) Large-scale multiple testing of correlations. J Am Stat Assoc 111(513):229–240

    Article  MathSciNet  Google Scholar 

  • Cai TT, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108(501):265–277

    Article  MathSciNet  MATH  Google Scholar 

  • Cai TT, Xia Y (2014) High-dimensional sparse MANOVA. J Multivar Anal 131:174–196

    Article  MathSciNet  MATH  Google Scholar 

  • De Souto, MC, Silva S, Bittencourt VG, De Araujo DS (2005) Cluster ensemble for gene expression microarray data. In: Neural networks, 2005. IJCNN’05. proceedings. 2005 IEEE international joint conference on, Vol 1, pp 487–492. IEEE

  • Fujikoshi Y, Himeno T, Wakaki H (2004) Asymptotic results of a high dimensional MANOVA test and power comparison when the dimension is large compared to the sample size. J Jpn Stat Soc 34(1):19–26

    Article  MathSciNet  MATH  Google Scholar 

  • Goeman JJ, Van De Geer SA, De Kort F, Van Houwelingen HC (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20(1):93–99

    Article  Google Scholar 

  • Hall P (1991) On convergence rates of suprema. Probab Theory Relat Fields 89(4):447–455

    Article  MathSciNet  MATH  Google Scholar 

  • Ho JW, Stefani M, dos Remedios CG, Charleston MA (2008) Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24(13):i390–i398

    Article  Google Scholar 

  • Hu R, Qiu X, Glazko G (2010) A new gene selection procedure based on the covariance distance. Bioinformatics 26(3):348–354

    Article  Google Scholar 

  • Hu R, Qiu X, Glazko G, Klebanov L, Yakovlev A (2009) Detecting intergene correlation changes in microarray analysis: a new approach to gene selection. BMC bioinformatics 10(1):20

    Article  Google Scholar 

  • Huckemann S, Hotz T, Munk A (2010) Intrinsic MANOVA for Riemannian manifolds with an application to Kendall’s space of planar shapes. IEEE Trans Pattern Anal Mach Intell 32(4):593–603

    Article  Google Scholar 

  • Li J, Chen SX (2012) Two sample tests for high-dimensional covariance matrices. Ann Stat 40(2):908–940

    Article  MathSciNet  MATH  Google Scholar 

  • Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Hayward NK, Montgomery GW, Visscher PM, Martin NG et al (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87(1):139–145

    Article  Google Scholar 

  • Liu W (2013) Gaussian graphical model estimation with false discovery rate control. Ann Stat 41(6):2948–2978

    Article  MathSciNet  MATH  Google Scholar 

  • Liu W-D, Lin Z, Shao Q-M (2008) The asymptotic distribution and Berry-Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann Appl Probab 18(6):2337–2366

    Article  MathSciNet  MATH  Google Scholar 

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118

    Article  MATH  Google Scholar 

  • Schott JR (2007a) Some high-dimensional tests for a one-way MANOVA. J Multivar Anal 98(9):1825–1839

    Article  MathSciNet  MATH  Google Scholar 

  • Schott JR (2007b) A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput Stati Data Anal 51(12):6535–6542

    Article  MathSciNet  MATH  Google Scholar 

  • Shedden K, Taylor J (2005) Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. In: Methods of microarray data analysis, Springer, pp 121–131

  • Shen Y, Lin Z, Zhu J (2011) Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis. Comput Stat Data Anal 55(7):2221–2233

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava MS (2007) Multivariate theory for analyzing high dimensional data. J Jpn Stat Soc 37(1):53–86

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava MS, Yanagihara H (2010) Testing the equality of several covariance matrices with fewer observations than the dimension. J Multivar Anal 101(6):1319–1329

    Article  MathSciNet  MATH  Google Scholar 

  • Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE et al (2004) A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306(5696):655–660

    Article  Google Scholar 

  • Storey JD, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J R Stat Soc Ser B (Stat Methodol) 66(1):187–205

    Article  MathSciNet  MATH  Google Scholar 

  • Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A et al (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Nat Acad Sci 99(7):4465–4470

    Article  Google Scholar 

  • Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Stat Methodol) 71(2):393–424

    Article  MathSciNet  MATH  Google Scholar 

  • Sun W, Reich BJ, Tony Cai T, Guindani M, Schwartzman A (2015) False discovery control in large-scale spatial multiple testing. J R Stat Soc Ser B (Stat Methodol) 77(1):59–83

    Article  MathSciNet  Google Scholar 

  • Tsai C-A, Chen JJ (2009) Multivariate analysis of variance test for gene set analysis. Bioinformatics 25(7):897–903

    Article  Google Scholar 

  • Wu WB (2008) On false discovery control under dependence. Ann Stat 36:364–380

    Article  MathSciNet  MATH  Google Scholar 

  • Xia Y, Cai T, Cai TT (2015) Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102:247–266

    Article  MathSciNet  MATH  Google Scholar 

  • Xia Y, Cai T, Cai TT (2017) Multiple testing of submatrices with applications to identification of between pathway interactions. J. Amer. Stat. Assoc. doi:10.1080/01621459.2016.1251930

  • Yu Z, Wongb H-S, You J, Yang Q, Liao H (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans NanoBioscience 10(2):76–85

    Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by “The Recruitment Program of Global Experts” Youth Project, the startup fund from Fudan University and National Science Foundation of China (Grant No. 11690013)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin Xia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 632 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, Y. Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control. TEST 26, 782–801 (2017). https://doi.org/10.1007/s11749-017-0533-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-017-0533-7

Keywords

Mathematics Subject Classification

Navigation