Skip to main content

Bootstrap-Based LASSO-Type Selection to Build Generalized Additive Partially Linear Models for High-Dimensional Data

  • Chapter
  • First Online:
Monte-Carlo Simulation-Based Statistical Modeling

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

Generalized additive partially linear model (GAPLM) is a flexible option to model the effects of covariates on the response by allowing nonlinear effects of some covariates and linear effects of the other covariates. To address the practical needs of applying GAPLM to high-dimensional data, we propose a procedure to select variables and therefore to build a GAPLM by using the bootstrap technique with the penalized regression. We demonstrate the proposed procedure by applying it to analyze data from a breast cancer study and an HIV study. The two examples show that the procedure is useful in practice. A simulation study also shows that the proposed procedure has a better performance of variable selection than the penalized regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation though the bootstrap. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML).

    Google Scholar 

  • Chatterjee, A., & Lahiri, S. N. (2011). Bootstrapping Lasso estimators. Journal of the American Statistical Association, 106(494), 608–625.

    Article  MathSciNet  MATH  Google Scholar 

  • Efron, B. (2014). Estimation and accuracy after model selection. Journal of the American Statistical Association, 109(507), 991–1007.

    Article  MathSciNet  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

    Article  Google Scholar 

  • Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., et al. (2015). Improved variable selection algorithm using a Lasso-type penalty, with an application to assessing hepatitis b infection relevant factors in community residents. PLoS ONE, 10.

    Google Scholar 

  • Hall, P., Lee, E. R., & Park, B. U. (2009). Bootstrap-based penalty choice for the Lasso, achieving oracle performance. Statistica Sinica, 449–471.

    Google Scholar 

  • Härdle, W., Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models. New York: Springer.

    Book  MATH  Google Scholar 

  • McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London, New York: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Meier, L., & Bühlmann, P. (2007). Smoothing l1-penalized estimators for highdimensional time-course data. Electronic Journal of Statistics, 1, 597–615.

    Article  MathSciNet  MATH  Google Scholar 

  • Meier, L., Geer, S. V. D., & Bhlmann, P. (2008). The group Lasso for logistic regression. Journal of the Royal Statistical Society, Series B, 70(1), 53–71.

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen, N., & Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Annals of Statsitics, 34(3), 1436–1462.

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of Royal Statistical Society, Series B, 72(4), 417–473.

    Article  MathSciNet  Google Scholar 

  • Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. Journal of the Royal Statistical Society Series B, 75(1), 55–80.

    Article  MathSciNet  Google Scholar 

  • Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group Lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.

    Article  MathSciNet  Google Scholar 

  • Stevens, K. N., Fredericksen, Z., Vachon, C. M., Wang, X., Margolin, S., Lindblom, A., et al. (2012). 19p13.1 is a triple-negative-specific breast cancer susceptibility locus. Cancer Research, 72(7), 1795–1803.

    Article  Google Scholar 

  • Strobl, R., Grill, E., & Mansmann, U. (2012). Graphical modeling of binary data using the Lasso: A simulation study. BMC Medical Research Methodology, 12(16).

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • van’t Veer, L. J., Dai, H. Y., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.

    Google Scholar 

  • Wang, L., Liu, X., Liang, H., & Carroll, R. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39, 1827–1851.

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, M., & Lin, Y. (2007). On the non-negative garrotte estimator. Journal of the Royal Statistical Society: Series B, 69(2), 143–161.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541–2563.

    MathSciNet  MATH  Google Scholar 

  • Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95, 241–247.

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army, the Agency for Healthcare Research and Quality, the Department of Defense or the Department of Health and Human Services. Liang’s research was partially supported by NSF grants DMS-1418042 and DMS-1620898, and by Award Number 11529101, made by National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiang Liu or Hua Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Liu, X., Chen, T., Li, Y., Liang, H. (2017). Bootstrap-Based LASSO-Type Selection to Build Generalized Additive Partially Linear Models for High-Dimensional Data. In: Chen, DG., Chen, J. (eds) Monte-Carlo Simulation-Based Statistical Modeling . ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-10-3307-0_18

Download citation

Publish with us

Policies and ethics