Skip to main content

Variable Selection Approaches in High-Dimensional Space

  • Chapter
  • First Online:
Modern Statistical Methods for Health Research

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

Abstract

Technological advancements in different fields such as molecular, imaging, and other laboratory tests have led to high-dimensional statistical problems. Variable selection in high-dimensional space is a critical step to identify a parsimonious model and improve the estimation accuracy of predictive models. The penalized likelihood approach has been extensively utilized to perform simultaneous variable selection and parameter estimation for the last decades. In this chapter, we present a brief review of the penalized likelihood approaches, with emphasis on the statistical properties and implementations for different outcomes with high-dimensional covariates. We also introduce independent screening procedures in ultra-high-dimensional variable selection. We then applied these selection methods to a high-dimensional setting in patients with a time-to-event outcome. We end the chapter with a brief review of high-dimensional inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Selected Papers of Hirotugu Akaike, pp. 199–213. Springer, Berlin (1998)

    Google Scholar 

  2. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bradic, J., Fan, J., Jiang, J.: Regularization for Cox’s proportional hazards model with np-dimensionality. Ann. Stat. 39(6), 3092 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)

    MathSciNet  MATH  Google Scholar 

  7. Chatterjee, A., Lahiri, S.N.: Bootstrapping Lasso estimators. J. Am. Stat. Assoc. 106(494), 608–625 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dezeure, R., Bühlmann, P., Meier, L., Meinshausen, N.: High-dimensional inference: Confidence intervals, p-values and R-Software hdi. Stat. Sci. 30(4), 533–558 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Dong, Y., Song, L., Amin, M.: SCAD-Ridge penalized likelihood estimators for ultra-high dimensional models. Hacettepe J. Math. Stat. 47(2), 423–436 (2018)

    MathSciNet  MATH  Google Scholar 

  10. Donoho, D.L., et al. High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Challenges Lecture, vol. 1, pp. 1–32 (2000)

    Google Scholar 

  11. Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36(6), 2605 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fan, J., Li, R.: Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat., 74–99 (2002)

    Google Scholar 

  14. Fan, J., Li, R.: Statistical challenges with high dimensionality: Feature selection in knowledge discovery. arXiv preprint math/0602133 (2006)

    Google Scholar 

  15. Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(5), 849–911 (2008)

    Google Scholar 

  16. Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20(1), 101 (2010)

    MathSciNet  MATH  Google Scholar 

  17. Fan, J., Lv, J.: Nonconcave penalized likelihood with np-dimensionality. IEEE Trans. Inf. Theory 57(8), 5467–5484 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Fan, J., Peng, H., et al.: Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 32(3), 928–961 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10(Sep), 2013–2038 (2009)

    MathSciNet  MATH  Google Scholar 

  20. Fan, J., Feng, Y., Wu, Y., et al.: High-dimensional variable selection for Cox’s proportional hazards model. In: Borrowing Strength: Theory Powering Applications–a Festschrift for Lawrence D. Brown, pp. 70–86. Institute of Mathematical Statistics, New York (2010a)

    Google Scholar 

  21. Fan, J., Song, R. et al.: Sure independence screening in generalized linear models with np-dimensionality. Ann. Stat. 38(6), 3567–3604 (2010b)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42(3), 819 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  23. Fang, E.X., Ning, Y., Liu, H.: Testing and confidence intervals for high dimensional proportional hazards model. arXiv preprint arXiv:1412.5158 (2014)

    Google Scholar 

  24. Fithian, W., Sun, D., Taylor, J.: Optimal inference after model selection. arXiv preprint arXiv:1410.2597 (2014)

    Google Scholar 

  25. Frank, L.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–135 (1993)

    Article  MATH  Google Scholar 

  26. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001)

    MATH  Google Scholar 

  27. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R. et al.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  28. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)

    Article  Google Scholar 

  29. Gui, J., Li, H.: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13), 3001–3008 (2005)

    Article  Google Scholar 

  30. Halabi, S., Lin, C.-Y., Kelly, W.K., Fizazi, K.S., Moul, J.W., Kaplan, E.B., Morris, M.J., Small, E.J.: Updated prognostic model for predicting overall survival in first-line chemotherapy for patients with metastatic castration-resistant prostate cancer. J. Clin. Oncol. 32(7), 671 (2014)

    Article  Google Scholar 

  31. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models, vol. 43. CRC press, New York (1990)

    MATH  Google Scholar 

  32. Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  MATH  Google Scholar 

  33. Huang, J., Ma, S., Zhang, C.-H.: Adaptive Lasso for sparse high-dimensional regression models. Stat. Sinica, 1603–1618 (2008)

    Google Scholar 

  34. Huang, J., Breheny, P., Ma, S., Zhang, C.-H.: The Mnet method for variable selection. (Unpublished) Technical Report, vol. 402 (2010)

    Google Scholar 

  35. Huang, J., Sun, T., Ying, Z., Yu, Y., Zhang, C.-H.: Oracle inequalities for the Lasso in the Cox model. Ann. Stat. 41(3), 1142 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  36. Javanmard, A., Montanari, A.: Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15(1), 2869–2909 (2014)

    MathSciNet  MATH  Google Scholar 

  37. Jia, J., Yu, B.: On model selection consistency of the elastic net when p n. Stat. Sinica. 20(2), 595–611 (2010)

    MathSciNet  MATH  Google Scholar 

  38. Johnstone, I.M., Titterington, D.M.: Statistical challenges of high-dimensional data. Philos. Trans. R. Soc. London A Math. Phys. Eng. Sci. 367(1906), 4237–4253 (2009)

    MathSciNet  MATH  Google Scholar 

  39. Kelly, W.K., Halabi, S., Carducci, M., George, D., Mahoney, J.F., Stadler, W.M., Morris, M., Kantoff, P., Monk, J.P., Kaplan, E. et al.: Randomized, double-blind, placebo-controlled phase iii trial comparing docetaxel and prednisone with or without bevacizumab in men with metastatic castration-resistant prostate cancer: Calgb 90401. J. Clin. Oncol. 30(13), 1534 (2012)

    Article  Google Scholar 

  40. Kim, S., Halabi, S.: High dimensional variable selection with error control. Biomed Res. Int. 2016 (2016)

    Google Scholar 

  41. Kim, Y., Choi, H., Oh, H.-S.: Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 103(484), 1665–1673 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  42. Knight, K., Fu, W.: Asymptotics for Lasso-type estimators. Ann. Stat., 1356–1378 (2000)

    Google Scholar 

  43. Lee, J.D., Sun, D.L., Sun, Y., Taylor, J.E., et al.: Exact post-selection inference, with application to the Lasso. Ann. Stat. 44(3), 907–927 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  44. Liu, H., Yu, B. et al.: Asymptotic properties of Lasso+mLs and Lasso+Ridge in sparse high-dimensional linear regression. Electron. J. Stat. 7, 3124–3169 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  45. Lockhart, R., Taylor, J., Tibshirani, R.J., Tibshirani, R.: A significance test for the Lasso. Ann. Stat. 42(2), 413 (2014)

    MathSciNet  MATH  Google Scholar 

  46. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Series B (Stat. Methodol.) 72(4), 417–473 (2010)

    Google Scholar 

  47. Meinshausen, N., Meier, L., Bühlmann, P.: P-values for high-dimensional regression. J. Am. Stat. Assoc. 104(488), 1671–1681 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  48. Ning, Y., Liu, H., et al.: A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat. 45(1), 158–195 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  49. Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  50. Pi, L., Halabi, S.: Combined performance of screening and variable selection methods in ultra-high dimensional data in predicting time-to-event outcomes. Diagn. Progn. Res. 2(1), 21 (2018)

    Article  Google Scholar 

  51. Saldana, D.F., Feng, Y.: SIS: AnR package for sure independence screening in ultrahigh-dimensional statistical models. J. Stat. Softw. 83(2), 1–25 (2018). https://doi.org/10.18637/jss.v083.i02

    Article  Google Scholar 

  52. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  53. Shi, C., Song, R., Chen, Z., Li, R., et al.: Linear hypothesis testing for high dimensional generalized linear models. Ann. Stat. 47(5), 2671–2703 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  54. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1 (2011)

    Article  Google Scholar 

  55. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.), 267–288 (1996a)

    Google Scholar 

  56. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.), 267–288 (1996b)

    Google Scholar 

  57. Tibshirani, R.: The Lasso method for variable selection in the Cox model. Stat. Med. 16(4), 385–395 (1997)

    Article  Google Scholar 

  58. Tibshirani, R.J., Taylor, J., Lockhart, R., Tibshirani, R.: Exact post-selection inference for sequential regression procedures. J. Am. Stat. Assoc. 111(514), 600–620 (2016)

    Article  MathSciNet  Google Scholar 

  59. Van de Geer, S., Peter Bühlmann, Ritov, Y., Dezeure, R. et al.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)

    Google Scholar 

  60. Wasserman, L., Roeder, K.: High dimensional variable selection. Ann. Stat. 37(5A), 2178 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  61. Yuan, M., Lin, Y.: On the non-negative garrotte estimator. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69(2), 143–161 (2007)

    Google Scholar 

  62. Zeng, L., Xie, J.: Group variable selection via SCAD-L2. Statistics 48(1), 49–66 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  63. Zhang, C.-H., Huang, J.: Model-selection consistency of the Lasso in high-dimensional linear regression. Ann. Statist 36, 1567–1594 (2006)

    MATH  Google Scholar 

  64. Zhang, H.H., Lu, W.: Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94(3), 691–703 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  65. Zhang, C.-H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol., 217–242 (2014)

    Google Scholar 

  66. Zhang, C.-H. et al.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  67. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

  68. Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  69. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)

    Google Scholar 

  70. Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509 (2008)

    MathSciNet  MATH  Google Scholar 

  71. Zou, H., Zhang, H.H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37(4), 1733 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the United States Army Medical Research, Grant/Award Numbers: W81XWH-15-1-0467 and W81XWH-18-1-0278, National Institutes of Health R01 CA256157-01, and the Prostate Cancer Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susan Halabi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Luo, B., Yang, Q., Halabi, S. (2021). Variable Selection Approaches in High-Dimensional Space. In: Zhao, Y., Chen, (.DG. (eds) Modern Statistical Methods for Health Research. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-72437-5_14

Download citation

Publish with us

Policies and ethics