Skip to main content
Log in

A two-step method for estimating high-dimensional Gaussian graphical models

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step, the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banerjee O, Ghaoui L-E, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res, 2008, 9: 485–516

    MathSciNet  MATH  Google Scholar 

  2. Bickel P-J, Levina E. Regularized estimation of large covariance matrices. Ann Statist, 2008, 36: 199–227

    Article  MathSciNet  Google Scholar 

  3. Burnatowska-Hledin M-A, Kossoris J-B, Van Dort C-J, et al. T47D breast cancer cell growth is inhibited by expression of VACM-1, a cul-5 gene. Biochem Bioph Res Co, 2004, 319: 817–825

    Article  Google Scholar 

  4. Chang H-Y, Nuyten D-S, Sneddon J-B, et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA, 2001, 102: 3738–343

    Article  Google Scholar 

  5. Chen M-J, Ren Z, Zhao H-Y, et al. Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J Amer Statist Assoc, 2016, 111: 394–406

    Article  MathSciNet  Google Scholar 

  6. Dempster A-P. Covariance selection. Biometrics, 1972, 28: 157–175

    Article  MathSciNet  Google Scholar 

  7. Fan J-Q, Feng Y, Wu Y-C. Network exploration via the adaptive LASSO and scad penalties. Ann Appl Stat, 2009, 1: 521–541

    Article  MathSciNet  Google Scholar 

  8. Fan J-Q, Li R-Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360

    Article  MathSciNet  Google Scholar 

  9. Fan J-Q, Lv L-C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911

    Article  MathSciNet  Google Scholar 

  10. Ferguson T-S. An inconsistent maximum likelihood estimate. J Amer Statist Assoc, 1982, 77: 831–834

    Article  MathSciNet  Google Scholar 

  11. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics, 2008, 9: 432–441

    Article  Google Scholar 

  12. Han X, Poon R. Critical differences between isoforms of securin reveal mechanisms of separase regulation. Mol Cell Biol, 2013, 33: 3400–3415

    Article  Google Scholar 

  13. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009

    Book  Google Scholar 

  14. Hayward B, Moran V, Strain L, et al. Bidirectional imprinting of a single gene: GNAS1 encodes maternally, paternally, and biallelically derived proteins. Proc Natl Acad Sci USA, 1998, 95: 15475–15480

    Article  Google Scholar 

  15. Jankova J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat, 2015, 9: 1205–1229

    Article  MathSciNet  Google Scholar 

  16. Lam C, Fan J-Q. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Statist, 2009, 37: 42–54

    Article  MathSciNet  Google Scholar 

  17. Li N, Zhang J, Liao D, et al. Association between C4, C4A, and C4B copy number variations and susceptibility to autoimmune diseases: A meta-analysis. Sci Rep, 2017, 7: 42628

    Article  Google Scholar 

  18. Meinshausen N. Relaxed LASSO. Comput Statist Data Anal, 2007, 52: 374–393

    Article  MathSciNet  Google Scholar 

  19. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the LASSO. Ann Statist, 2006, 34: 1436–1462

    Article  MathSciNet  Google Scholar 

  20. Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist, 2009, 37: 246–270

    Article  MathSciNet  Google Scholar 

  21. Miyoshi Y, Iwao K, Egawa C, et al. Association of centrosomal kinase STK15/BTAK mRNA expression with chromosomal instability in human breast cancers. Int J Cancer, 2001, 92: 370–373

    Article  Google Scholar 

  22. Negahban S, Ravikumar P, Wainwright M-J, et al. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist Sci, 2012, 27: 1348–1356

    Article  MathSciNet  Google Scholar 

  23. Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195

    Article  MathSciNet  Google Scholar 

  24. Otterbach F, Callies R, Frey U-H, et al. The T393C polymorphism in the gene GNAS1 of G protein is associated with survival of patients with invasive breast carcinoma. Breast Cancer Res Treat, 2007, 105: 311–317

    Article  Google Scholar 

  25. Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. J Amer Statist Assoc, 2009, 104: 735–746

    Article  MathSciNet  Google Scholar 

  26. Raskutti G, Wainwright M-J, Yu B. Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res, 2010, 11: 2241–2259

    MathSciNet  MATH  Google Scholar 

  27. Raskutti G, Wainwright M-J, Yu B. Minimax rates of estimation for high-dimensional linear regression over-balls. IEEE Trans Inform Theory, 2011, 57: 6976–6994

    Article  MathSciNet  Google Scholar 

  28. Ravikumar P, Wainwright M-J, Raskutti G, et al. High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electron J Stat, 2011, 5: 935–980

    Article  MathSciNet  Google Scholar 

  29. Rothman A-J, Bickel P-J, Levina E, et al. Sparse permutation invariant covariance estimation. Electron J Stat, 2008, 2: 494–515

    Article  MathSciNet  Google Scholar 

  30. Taylor J, Tibshirani R. Post-selection inference for-penalized likelihood models. Canad J Statist, 2018, 46: 41–61

    Article  MathSciNet  Google Scholar 

  31. Uhler C, Raskutti G, Buühlmann P, et al. Geometry of the faithfulness assumption in causal inference. Ann Statist, 2013, 41: 436–463

    Article  MathSciNet  Google Scholar 

  32. van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202

    Article  MathSciNet  Google Scholar 

  33. van de Vijver M-J, He Y-D, van’t Veer L-J, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 2002, 37: 1999–2009

    Article  Google Scholar 

  34. Wang H-C, Chiu C-F, Tsai R-Y, et al. Association of genetic polymorphisms of EXO1 gene with risk of breast cancer in taiwan. Anticancer Res, 2009, 29: 3897–3901

    Google Scholar 

  35. West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA, 2001, 98: 11462–11467

    Article  Google Scholar 

  36. Yuan M. Efficient computation of 1 regularized estimates in Gaussian graphical models. J Comput Graph Statist, 2008, 17: 809–826

    Article  MathSciNet  Google Scholar 

  37. Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94: 19–35

    Article  MathSciNet  Google Scholar 

  38. Zhang C-H, Zhang S-S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242

    Article  MathSciNet  Google Scholar 

  39. Zhou S-H, Ruütimann P, Xu M, et al. High-dimensional covariance estimation based on Gaussian graphical models. J Mach Learn Res, 2011, 12: 2975–3026

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 11671059).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuehan Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Zhu, J. A two-step method for estimating high-dimensional Gaussian graphical models. Sci. China Math. 63, 1203–1218 (2020). https://doi.org/10.1007/s11425-017-9438-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-017-9438-5

Keywords

MSC(2010)

Navigation