Abstract
In this article, we deal with sparse high-dimensional multivariate regression models. The models distinguish themselves from ordinary multivariate regression models in two aspects: (1) the dimension of the response vector and the number of covariates diverge to infinity; (2) the nonzero entries of the coefficient matrix and the precision matrix are sparse. We develop a two-stage sequential conditional selection (TSCS) approach to the identification and estimation of the nonzeros of the coefficient matrix and the precision matrix. It is established that the TSCS is selection consistent for the identification of the nonzeros of both the coefficient matrix and the precision matrix. Simulation studies are carried out to compare TSCS with the existing state-of-the-art methods, which demonstrates that the TSCS approach outperforms the existing methods. As an illustration, the TSCS approach is also applied to a real dataset.
Similar content being viewed by others
References
Breiman, L., Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 3–54.
Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.
Chen, L., Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107(500), 1533–1545.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Fan, J., Feng, Y., Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521.
Friedman, J., Hastie, T., Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphica Lasso. Biostatistics, 9(3), 432–441.
Jiang, Y. (2015). Sequential approaches in graphical models and multivariate response regression models. In PhD thesis, Department of Statistics & Applied Probability, National University of Singapore.
Jiang, Y., Chen, Z. (2016). A sequential scaled pairwise selection approach to edge detection in nonparanormal graphical models. Canadian Journal of Statistics, 44(1), 25–43.
Lee, W., Liu, Y. (2012). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood. Journal of Multivariate Analysis, 111, 241–255.
Luo, S., Chen, Z. (2014a). Edge detection in sparse Gaussian graphical models. Computational Statistics & Data Analysis, 70, 138–152.
Luo, S., Chen, Z. (2014b). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109(507), 1229–1240.
McLendon, R., Friedman, A., Bigner, D., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216), 1061–1068.
Meinshausen, N., Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34(3), 1436–1462.
Obozinski, G., Wainwright, M. J., Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. The Annals of Statistics, 39(1), 1–47.
Peng, J., Wang, P., Zhou, N., Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.
Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. Y., Pollack, J. R., et al. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1), 53.
Rothman, A. J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.
Sun, T., Zhang, C. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.
Sun, T., Zhang, C. H. (2013). Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
Turlach, B. A., Venables, W. N., Wright, S. J. (2005). Simultaneous variable selection. Technometrics, 47(3), 349–363.
Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell, 17(1), 98–110.
Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5), 2183–2202.
Wang, J. (2015). Joint estimation of sparse multivariate regression and conditional graphical models. Statistica Sinica, 25(3), 831–851.
Yin, J., Li, H. (2011). A sparse conditional Gaussian graphical model for analysis of genetical genomics data. The Annals of Applied Statistics, 5(4), 2630.
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 99, 2261–2286.
Yuan, M., Ekici, A., Lu, Z., Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 329–346.
Zhou, S., van de Geer, S., Bühlmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling. arXiv preprint arXiv:0903.2515.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Chen, Z., Jiang, Y. A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models. Ann Inst Stat Math 72, 65–90 (2020). https://doi.org/10.1007/s10463-018-0686-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0686-5