Abstract
Generalized additive partially linear model (GAPLM) is a flexible option to model the effects of covariates on the response by allowing nonlinear effects of some covariates and linear effects of the other covariates. To address the practical needs of applying GAPLM to high-dimensional data, we propose a procedure to select variables and therefore to build a GAPLM by using the bootstrap technique with the penalized regression. We demonstrate the proposed procedure by applying it to analyze data from a breast cancer study and an HIV study. The two examples show that the procedure is useful in practice. A simulation study also shows that the proposed procedure has a better performance of variable selection than the penalized regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation though the bootstrap. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML).
Chatterjee, A., & Lahiri, S. N. (2011). Bootstrapping Lasso estimators. Journal of the American Statistical Association, 106(494), 608–625.
Efron, B. (2014). Estimation and accuracy after model selection. Journal of the American Statistical Association, 109(507), 991–1007.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., et al. (2015). Improved variable selection algorithm using a Lasso-type penalty, with an application to assessing hepatitis b infection relevant factors in community residents. PLoS ONE, 10.
Hall, P., Lee, E. R., & Park, B. U. (2009). Bootstrap-based penalty choice for the Lasso, achieving oracle performance. Statistica Sinica, 449–471.
Härdle, W., Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models. New York: Springer.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London, New York: Chapman and Hall.
Meier, L., & Bühlmann, P. (2007). Smoothing l1-penalized estimators for highdimensional time-course data. Electronic Journal of Statistics, 1, 597–615.
Meier, L., Geer, S. V. D., & Bhlmann, P. (2008). The group Lasso for logistic regression. Journal of the Royal Statistical Society, Series B, 70(1), 53–71.
Meinshausen, N., & Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Annals of Statsitics, 34(3), 1436–1462.
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of Royal Statistical Society, Series B, 72(4), 417–473.
Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. Journal of the Royal Statistical Society Series B, 75(1), 55–80.
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group Lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.
Stevens, K. N., Fredericksen, Z., Vachon, C. M., Wang, X., Margolin, S., Lindblom, A., et al. (2012). 19p13.1 is a triple-negative-specific breast cancer susceptibility locus. Cancer Research, 72(7), 1795–1803.
Strobl, R., Grill, E., & Mansmann, U. (2012). Graphical modeling of binary data using the Lasso: A simulation study. BMC Medical Research Methodology, 12(16).
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
van’t Veer, L. J., Dai, H. Y., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.
Wang, L., Liu, X., Liang, H., & Carroll, R. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39, 1827–1851.
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49–67.
Yuan, M., & Lin, Y. (2007). On the non-negative garrotte estimator. Journal of the Royal Statistical Society: Series B, 69(2), 143–161.
Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541–2563.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95, 241–247.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.
Acknowledgements
Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army, the Agency for Healthcare Research and Quality, the Department of Defense or the Department of Health and Human Services. Liang’s research was partially supported by NSF grants DMS-1418042 and DMS-1620898, and by Award Number 11529101, made by National Natural Science Foundation of China.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Liu, X., Chen, T., Li, Y., Liang, H. (2017). Bootstrap-Based LASSO-Type Selection to Build Generalized Additive Partially Linear Models for High-Dimensional Data. In: Chen, DG., Chen, J. (eds) Monte-Carlo Simulation-Based Statistical Modeling . ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-10-3307-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-10-3307-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3306-3
Online ISBN: 978-981-10-3307-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)