Advertisement

A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models

  • 80 Accesses

Abstract

In this article, we deal with sparse high-dimensional multivariate regression models. The models distinguish themselves from ordinary multivariate regression models in two aspects: (1) the dimension of the response vector and the number of covariates diverge to infinity; (2) the nonzero entries of the coefficient matrix and the precision matrix are sparse. We develop a two-stage sequential conditional selection (TSCS) approach to the identification and estimation of the nonzeros of the coefficient matrix and the precision matrix. It is established that the TSCS is selection consistent for the identification of the nonzeros of both the coefficient matrix and the precision matrix. Simulation studies are carried out to compare TSCS with the existing state-of-the-art methods, which demonstrates that the TSCS approach outperforms the existing methods. As an illustration, the TSCS approach is also applied to a real dataset.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2

References

  1. Breiman, L., Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 3–54.

  2. Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.

  3. Chen, L., Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107(500), 1533–1545.

  4. Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

  5. Fan, J., Feng, Y., Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521.

  6. Friedman, J., Hastie, T., Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphica Lasso. Biostatistics, 9(3), 432–441.

  7. Jiang, Y. (2015). Sequential approaches in graphical models and multivariate response regression models. In PhD thesis, Department of Statistics & Applied Probability, National University of Singapore.

  8. Jiang, Y., Chen, Z. (2016). A sequential scaled pairwise selection approach to edge detection in nonparanormal graphical models. Canadian Journal of Statistics, 44(1), 25–43.

  9. Lee, W., Liu, Y. (2012). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood. Journal of Multivariate Analysis, 111, 241–255.

  10. Luo, S., Chen, Z. (2014a). Edge detection in sparse Gaussian graphical models. Computational Statistics & Data Analysis, 70, 138–152.

  11. Luo, S., Chen, Z. (2014b). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109(507), 1229–1240.

  12. McLendon, R., Friedman, A., Bigner, D., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216), 1061–1068.

  13. Meinshausen, N., Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34(3), 1436–1462.

  14. Obozinski, G., Wainwright, M. J., Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. The Annals of Statistics, 39(1), 1–47.

  15. Peng, J., Wang, P., Zhou, N., Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.

  16. Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. Y., Pollack, J. R., et al. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1), 53.

  17. Rothman, A. J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.

  18. Sun, T., Zhang, C. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.

  19. Sun, T., Zhang, C. H. (2013). Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.

  20. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.

  21. Turlach, B. A., Venables, W. N., Wright, S. J. (2005). Simultaneous variable selection. Technometrics, 47(3), 349–363.

  22. Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell, 17(1), 98–110.

  23. Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5), 2183–2202.

  24. Wang, J. (2015). Joint estimation of sparse multivariate regression and conditional graphical models. Statistica Sinica, 25(3), 831–851.

  25. Yin, J., Li, H. (2011). A sparse conditional Gaussian graphical model for analysis of genetical genomics data. The Annals of Applied Statistics, 5(4), 2630.

  26. Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 99, 2261–2286.

  27. Yuan, M., Ekici, A., Lu, Z., Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 329–346.

  28. Zhou, S., van de Geer, S., Bühlmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling. arXiv preprint arXiv:0903.2515.

Download references

Author information

Correspondence to Zehua Chen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 263 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Jiang, Y. A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models. Ann Inst Stat Math 72, 65–90 (2020) doi:10.1007/s10463-018-0686-5

Download citation

Keywords

  • Conditional models
  • Multivariate regression
  • Precision matrix
  • Selection consistency
  • Sequential procedure
  • Sparse high-dimensional model