Abstract
The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step, the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.
Similar content being viewed by others
References
Banerjee O, Ghaoui L-E, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res, 2008, 9: 485–516
Bickel P-J, Levina E. Regularized estimation of large covariance matrices. Ann Statist, 2008, 36: 199–227
Burnatowska-Hledin M-A, Kossoris J-B, Van Dort C-J, et al. T47D breast cancer cell growth is inhibited by expression of VACM-1, a cul-5 gene. Biochem Bioph Res Co, 2004, 319: 817–825
Chang H-Y, Nuyten D-S, Sneddon J-B, et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA, 2001, 102: 3738–343
Chen M-J, Ren Z, Zhao H-Y, et al. Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J Amer Statist Assoc, 2016, 111: 394–406
Dempster A-P. Covariance selection. Biometrics, 1972, 28: 157–175
Fan J-Q, Feng Y, Wu Y-C. Network exploration via the adaptive LASSO and scad penalties. Ann Appl Stat, 2009, 1: 521–541
Fan J-Q, Li R-Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Fan J-Q, Lv L-C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911
Ferguson T-S. An inconsistent maximum likelihood estimate. J Amer Statist Assoc, 1982, 77: 831–834
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics, 2008, 9: 432–441
Han X, Poon R. Critical differences between isoforms of securin reveal mechanisms of separase regulation. Mol Cell Biol, 2013, 33: 3400–3415
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009
Hayward B, Moran V, Strain L, et al. Bidirectional imprinting of a single gene: GNAS1 encodes maternally, paternally, and biallelically derived proteins. Proc Natl Acad Sci USA, 1998, 95: 15475–15480
Jankova J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat, 2015, 9: 1205–1229
Lam C, Fan J-Q. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Statist, 2009, 37: 42–54
Li N, Zhang J, Liao D, et al. Association between C4, C4A, and C4B copy number variations and susceptibility to autoimmune diseases: A meta-analysis. Sci Rep, 2017, 7: 42628
Meinshausen N. Relaxed LASSO. Comput Statist Data Anal, 2007, 52: 374–393
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the LASSO. Ann Statist, 2006, 34: 1436–1462
Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist, 2009, 37: 246–270
Miyoshi Y, Iwao K, Egawa C, et al. Association of centrosomal kinase STK15/BTAK mRNA expression with chromosomal instability in human breast cancers. Int J Cancer, 2001, 92: 370–373
Negahban S, Ravikumar P, Wainwright M-J, et al. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist Sci, 2012, 27: 1348–1356
Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158–195
Otterbach F, Callies R, Frey U-H, et al. The T393C polymorphism in the gene GNAS1 of G protein is associated with survival of patients with invasive breast carcinoma. Breast Cancer Res Treat, 2007, 105: 311–317
Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. J Amer Statist Assoc, 2009, 104: 735–746
Raskutti G, Wainwright M-J, Yu B. Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res, 2010, 11: 2241–2259
Raskutti G, Wainwright M-J, Yu B. Minimax rates of estimation for high-dimensional linear regression over-balls. IEEE Trans Inform Theory, 2011, 57: 6976–6994
Ravikumar P, Wainwright M-J, Raskutti G, et al. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electron J Stat, 2011, 5: 935–980
Rothman A-J, Bickel P-J, Levina E, et al. Sparse permutation invariant covariance estimation. Electron J Stat, 2008, 2: 494–515
Taylor J, Tibshirani R. Post-selection inference for-penalized likelihood models. Canad J Statist, 2018, 46: 41–61
Uhler C, Raskutti G, Buühlmann P, et al. Geometry of the faithfulness assumption in causal inference. Ann Statist, 2013, 41: 436–463
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202
van de Vijver M-J, He Y-D, van’t Veer L-J, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 2002, 37: 1999–2009
Wang H-C, Chiu C-F, Tsai R-Y, et al. Association of genetic polymorphisms of EXO1 gene with risk of breast cancer in taiwan. Anticancer Res, 2009, 29: 3897–3901
West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA, 2001, 98: 11462–11467
Yuan M. Efficient computation of ℓ1 regularized estimates in Gaussian graphical models. J Comput Graph Statist, 2008, 17: 809–826
Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94: 19–35
Zhang C-H, Zhang S-S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242
Zhou S-H, Ruütimann P, Xu M, et al. High-dimensional covariance estimation based on Gaussian graphical models. J Mach Learn Res, 2011, 12: 2975–3026
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 11671059).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Y., Zhu, J. A two-step method for estimating high-dimensional Gaussian graphical models. Sci. China Math. 63, 1203–1218 (2020). https://doi.org/10.1007/s11425-017-9438-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-017-9438-5