Skip to main content
Log in

A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this article, we deal with sparse high-dimensional multivariate regression models. The models distinguish themselves from ordinary multivariate regression models in two aspects: (1) the dimension of the response vector and the number of covariates diverge to infinity; (2) the nonzero entries of the coefficient matrix and the precision matrix are sparse. We develop a two-stage sequential conditional selection (TSCS) approach to the identification and estimation of the nonzeros of the coefficient matrix and the precision matrix. It is established that the TSCS is selection consistent for the identification of the nonzeros of both the coefficient matrix and the precision matrix. Simulation studies are carried out to compare TSCS with the existing state-of-the-art methods, which demonstrates that the TSCS approach outperforms the existing methods. As an illustration, the TSCS approach is also applied to a real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Breiman, L., Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 3–54.

    Article  MathSciNet  Google Scholar 

  • Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.

    Article  MathSciNet  Google Scholar 

  • Chen, L., Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107(500), 1533–1545.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Feng, Y., Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521.

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphica Lasso. Biostatistics, 9(3), 432–441.

    Article  Google Scholar 

  • Jiang, Y. (2015). Sequential approaches in graphical models and multivariate response regression models. In PhD thesis, Department of Statistics & Applied Probability, National University of Singapore.

  • Jiang, Y., Chen, Z. (2016). A sequential scaled pairwise selection approach to edge detection in nonparanormal graphical models. Canadian Journal of Statistics, 44(1), 25–43.

    Article  MathSciNet  Google Scholar 

  • Lee, W., Liu, Y. (2012). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood. Journal of Multivariate Analysis, 111, 241–255.

    Article  MathSciNet  Google Scholar 

  • Luo, S., Chen, Z. (2014a). Edge detection in sparse Gaussian graphical models. Computational Statistics & Data Analysis, 70, 138–152.

    Article  MathSciNet  Google Scholar 

  • Luo, S., Chen, Z. (2014b). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109(507), 1229–1240.

    Article  MathSciNet  Google Scholar 

  • McLendon, R., Friedman, A., Bigner, D., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216), 1061–1068.

    Article  Google Scholar 

  • Meinshausen, N., Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34(3), 1436–1462.

    Article  MathSciNet  Google Scholar 

  • Obozinski, G., Wainwright, M. J., Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. The Annals of Statistics, 39(1), 1–47.

  • Peng, J., Wang, P., Zhou, N., Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.

    Article  MathSciNet  Google Scholar 

  • Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. Y., Pollack, J. R., et al. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1), 53.

    Article  MathSciNet  Google Scholar 

  • Rothman, A. J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.

    Article  MathSciNet  Google Scholar 

  • Sun, T., Zhang, C. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.

    Article  MathSciNet  Google Scholar 

  • Sun, T., Zhang, C. H. (2013). Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.

    Article  MathSciNet  Google Scholar 

  • Turlach, B. A., Venables, W. N., Wright, S. J. (2005). Simultaneous variable selection. Technometrics, 47(3), 349–363.

    Article  MathSciNet  Google Scholar 

  • Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell, 17(1), 98–110.

    Article  Google Scholar 

  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5), 2183–2202.

    Article  MathSciNet  Google Scholar 

  • Wang, J. (2015). Joint estimation of sparse multivariate regression and conditional graphical models. Statistica Sinica, 25(3), 831–851.

    MathSciNet  MATH  Google Scholar 

  • Yin, J., Li, H. (2011). A sparse conditional Gaussian graphical model for analysis of genetical genomics data. The Annals of Applied Statistics, 5(4), 2630.

    Article  MathSciNet  Google Scholar 

  • Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 99, 2261–2286.

    MathSciNet  MATH  Google Scholar 

  • Yuan, M., Ekici, A., Lu, Z., Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 329–346.

    Article  MathSciNet  Google Scholar 

  • Zhou, S., van de Geer, S., Bühlmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling. arXiv preprint arXiv:0903.2515.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zehua Chen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 263 KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Jiang, Y. A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models. Ann Inst Stat Math 72, 65–90 (2020). https://doi.org/10.1007/s10463-018-0686-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0686-5

Keywords

Navigation