A two-step method for estimating high-dimensional Gaussian graphical models

Yang, Yuehan; Zhu, Ji

doi:10.1007/s11425-017-9438-5

A two-step method for estimating high-dimensional Gaussian graphical models

Articles
Published: 14 May 2020

Volume 63, pages 1203–1218, (2020)
Cite this article

Science China Mathematics Aims and scope Submit manuscript

Yuehan Yang¹ &
Ji Zhu²

146 Accesses
1 Citation
Explore all metrics

Abstract

The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step, the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banerjee O, Ghaoui L-E, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res, 2008, 9: 485–516
MathSciNet MATH Google Scholar
Bickel P-J, Levina E. Regularized estimation of large covariance matrices. Ann Statist, 2008, 36: 199–227
Article MathSciNet Google Scholar
Burnatowska-Hledin M-A, Kossoris J-B, Van Dort C-J, et al. T47D breast cancer cell growth is inhibited by expression of VACM-1, a cul-5 gene. Biochem Bioph Res Co, 2004, 319: 817–825
Article Google Scholar
Chang H-Y, Nuyten D-S, Sneddon J-B, et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA, 2001, 102: 3738–343
Article Google Scholar
Chen M-J, Ren Z, Zhao H-Y, et al. Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J Amer Statist Assoc, 2016, 111: 394–406
Article MathSciNet Google Scholar
Dempster A-P. Covariance selection. Biometrics, 1972, 28: 157–175
Article MathSciNet Google Scholar
Fan J-Q, Feng Y, Wu Y-C. Network exploration via the adaptive LASSO and scad penalties. Ann Appl Stat, 2009, 1: 521–541
Article MathSciNet Google Scholar
Fan J-Q, Li R-Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Article MathSciNet Google Scholar
Fan J-Q, Lv L-C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911
Article MathSciNet Google Scholar
Ferguson T-S. An inconsistent maximum likelihood estimate. J Amer Statist Assoc, 1982, 77: 831–834
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics, 2008, 9: 432–441
Article Google Scholar
Han X, Poon R. Critical differences between isoforms of securin reveal mechanisms of separase regulation. Mol Cell Biol, 2013, 33: 3400–3415
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009
Book Google Scholar
Hayward B, Moran V, Strain L, et al. Bidirectional imprinting of a single gene: GNAS1 encodes maternally, paternally, and biallelically derived proteins. Proc Natl Acad Sci USA, 1998, 95: 15475–15480
Article Google Scholar
Jankova J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat, 2015, 9: 1205–1229
Article MathSciNet Google Scholar
Lam C, Fan J-Q. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Statist, 2009, 37: 42–54
Article MathSciNet Google Scholar
Li N, Zhang J, Liao D, et al. Association between C4, C4A, and C4B copy number variations and susceptibility to autoimmune diseases: A meta-analysis. Sci Rep, 2017, 7: 42628
Article Google Scholar
Meinshausen N. Relaxed LASSO. Comput Statist Data Anal, 2007, 52: 374–393
Article MathSciNet Google Scholar
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the LASSO. Ann Statist, 2006, 34: 1436–1462
Article MathSciNet Google Scholar
Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist, 2009, 37: 246–270
Article MathSciNet Google Scholar
Miyoshi Y, Iwao K, Egawa C, et al. Association of centrosomal kinase STK15/BTAK mRNA expression with chromosomal instability in human breast cancers. Int J Cancer, 2001, 92: 370–373
Article Google Scholar
Negahban S, Ravikumar P, Wainwright M-J, et al. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist Sci, 2012, 27: 1348–1356
Article MathSciNet Google Scholar
Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195
Article MathSciNet Google Scholar
Otterbach F, Callies R, Frey U-H, et al. The T393C polymorphism in the gene GNAS1 of G protein is associated with survival of patients with invasive breast carcinoma. Breast Cancer Res Treat, 2007, 105: 311–317
Article Google Scholar
Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. J Amer Statist Assoc, 2009, 104: 735–746
Article MathSciNet Google Scholar
Raskutti G, Wainwright M-J, Yu B. Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res, 2010, 11: 2241–2259
MathSciNet MATH Google Scholar
Raskutti G, Wainwright M-J, Yu B. Minimax rates of estimation for high-dimensional linear regression over-balls. IEEE Trans Inform Theory, 2011, 57: 6976–6994
Article MathSciNet Google Scholar
Ravikumar P, Wainwright M-J, Raskutti G, et al. High-dimensional covariance estimation by minimizing ℓ₁-penalized log-determinant divergence. Electron J Stat, 2011, 5: 935–980
Article MathSciNet Google Scholar
Rothman A-J, Bickel P-J, Levina E, et al. Sparse permutation invariant covariance estimation. Electron J Stat, 2008, 2: 494–515
Article MathSciNet Google Scholar
Taylor J, Tibshirani R. Post-selection inference for-penalized likelihood models. Canad J Statist, 2018, 46: 41–61
Article MathSciNet Google Scholar
Uhler C, Raskutti G, Buühlmann P, et al. Geometry of the faithfulness assumption in causal inference. Ann Statist, 2013, 41: 436–463
Article MathSciNet Google Scholar
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202
Article MathSciNet Google Scholar
van de Vijver M-J, He Y-D, van’t Veer L-J, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 2002, 37: 1999–2009
Article Google Scholar
Wang H-C, Chiu C-F, Tsai R-Y, et al. Association of genetic polymorphisms of EXO1 gene with risk of breast cancer in taiwan. Anticancer Res, 2009, 29: 3897–3901
Google Scholar
West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA, 2001, 98: 11462–11467
Article Google Scholar
Yuan M. Efficient computation of ℓ₁ regularized estimates in Gaussian graphical models. J Comput Graph Statist, 2008, 17: 809–826
Article MathSciNet Google Scholar
Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94: 19–35
Article MathSciNet Google Scholar
Zhang C-H, Zhang S-S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242
Article MathSciNet Google Scholar
Zhou S-H, Ruütimann P, Xu M, et al. High-dimensional covariance estimation based on Gaussian graphical models. J Mach Learn Res, 2011, 12: 2975–3026
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, 100081, China
Yuehan Yang
Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, USA
Ji Zhu

Authors

Yuehan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuehan Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Zhu, J. A two-step method for estimating high-dimensional Gaussian graphical models. Sci. China Math. 63, 1203–1218 (2020). https://doi.org/10.1007/s11425-017-9438-5

Download citation

Received: 20 December 2017
Accepted: 29 July 2018
Published: 14 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11425-017-9438-5

Keywords

MSC(2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-step method for estimating high-dimensional Gaussian graphical models

Abstract

Access this article

Similar content being viewed by others

Structured regularization for conditional Gaussian graphical models

A comparison of different parameter estimation methods for exponentially modified Gaussian distribution

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

MSC(2010)

Navigation

A two-step method for estimating high-dimensional Gaussian graphical models

Abstract

Access this article

Similar content being viewed by others

Structured regularization for conditional Gaussian graphical models

A comparison of different parameter estimation methods for exponentially modified Gaussian distribution

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC(2010)

Search

Navigation