Variable Selection in Partially Linear Proportional Hazards Model with Grouped Covariates and a Diverging Number of Parameters

  • Arfan Raheen Afzal
  • Xuewen LuEmail author
Part of the Emerging Topics in Statistics and Biostatistics book series (ETSB)


In regression models with a grouping structure among the explanatory variables, variable selection at the group and within group individual variable level is desired to improve model accuracy and interpretability. In this article, we propose a hierarchical bi-level variable selection approach for censored survival data in the linear part of a partially linear proportional hazards model where the covariates are naturally grouped. The proposed method is capable of conducting simultaneous group selection and individual variable selection within selected groups. Computational algorithms are developed. Rate of convergence, selection consistency and asymptotic normality of the proposed estimators are established. Simulation studies indicate that the hierarchical regularized method outperforms several existing variable selection including LASSO, adaptive LASSO, and SCAD. Application of the proposed method is illustrated with the primary biliary cirrhosis (PBC) data.


Bi-level selection B-spline Group variable selection Partially linear proportional hazards model Selection consistency 



The authors acknowledge with gratitude the support for this research by the Discovery Grants from Natural Sciences and Engineering Research Council (NSERC) of Canada. The authors also would like to thank the editor and the anonymous referee for their insightful and constructive comments that have helped us substantially improve the manuscript.


  1. 1.
    Bassendine, M., Collins, J., Stephenson, J., Saunders, P., & James, O. (1985). Platelet associated immunoglobulins in primary biliary cirrhosis: A cause of thrombocytopenia? Gut, 26(10),1074–1079.CrossRefGoogle Scholar
  2. 2.
    Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. The Annals of Statistics, 39(6), 3092–3120.MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Breheny, P. (2015). The group exponential lasso for bi-level variable selection. Biometrics, 71(3), 731–740.MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Breheny, P., & Huang, J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2(3), 369–380.MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4), 373–384.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Cheng, G., & Wang, X. (2011). Semiparametric additive transformation model under current status data. Electronic Journal of Statistics, 5, 1735–1764.MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187–220.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Cui, X., Peng, H., Wen, S., & Zhu, L. (2013). Component selection in the additive regression model. Scandinavian Journal of Statistics, 40(3), 491–510.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    De Boor, C. (1978). A practical guide to splines (Vol. 27). New York: Springer.zbMATHCrossRefGoogle Scholar
  10. 10.
    Dickson, E. R., Grambsch, P. M., Fleming, T. R., Fisher, L. D., & Langworthy, A. (1989). Prognosis in primary biliary cirrhosis: Model for decision making. Hepatology, 10(1), 1–7.CrossRefGoogle Scholar
  11. 11.
    Du, P., Ma, S., & Liang, H. (2010). Penalized variable selection procedure for Cox models with semiparametric relative risk. The Annals of Statistics, 38(4), 2092–2117.MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Fan, J., Gijbels, I., & King, M. (1997). Local likelihood and local partial likelihood in hazard regression. The Annals of Statistics, 25(4), 1661–1690.MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Fan, J., & Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics, 30(1), 74–99.MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Fang, K., Wang, X., Zhang, S., Zhu, J., & Ma, S. (2015). Bi-level variable selection via adaptive sparse group lasso. Journal of Statistical Computation and Simulation, 85(13), 2750–2760.MathSciNetCrossRefGoogle Scholar
  16. 16.
    Frank, L. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35(2), 109–135.zbMATHCrossRefGoogle Scholar
  17. 17.
    Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association, 87(420), 942–951.CrossRefGoogle Scholar
  18. 18.
    Gui, J., & Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001–3008.CrossRefGoogle Scholar
  19. 19.
    Hu, Y., & Lian, H. (2013). Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Statistics & Probability Letters, 83(1), 61–69.MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Huang, J. (1999). Efficient estimation of the partly linear additive Cox model. The Annals of Statistics, 27(5), 1536–1563.MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Huang, J., Horowitz, J. L., & Wei, F. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38(4), 2282–2313.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Huang, J., Liu, L., Liu, Y., & Zhao, X. (2014). Group selection in the Cox model with a diverging number of covariates. Statistica Sinica, 24(4), 1787–1810.MathSciNetzbMATHGoogle Scholar
  23. 23.
    Huang, J., Ma, S., Xie, H., & Zhang, C.-H. (2009). A group bridge approach for variable selection. Biometrika, 96(2), 339–355.MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39(1), 305–332.MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Kanehisa, M., & Goto, S. (2000). Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1), 27–30.CrossRefGoogle Scholar
  26. 26.
    Kim, J., Sohn, I., Jung, S.-H., Kim, S., & Park, C. (2012). Analysis of survival data with group lasso. Communications in Statistics-Simulation and Computation, 41(9), 1593–1605.MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Kubota, J., Ikeda, F., Terada, R., Kobashi, H., Fujioka, S.-i., Okamoto, R., et al. (2009). Mortality rate of patients with asymptomatic primary biliary cirrhosis diagnosed at age 55 years or older is similar to that of the general population. Journal of Gastroenterology, 44(9), 1000–1006.Google Scholar
  28. 28.
    Lian, H., Li, J., & Tang, X. (2014). Scad-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. Journal of Multivariate Analysis, 125, 50–64.MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Liang, H., & Li, R. (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 104(485), 234–248.MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    Liu, J., Zhang, R., & Zhao, W. (2014). Hierarchically penalized additive hazards model with diverging number of parameters. Science China Mathematics, 57(4), 873–886.MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    Lv, J., Yang, H., & Guo, C. (2016). Variable selection in partially linear additive models for modal regression. zbMATHGoogle Scholar
  32. 32.
    Ma, S., & Du, P. (2012). Variable selection in partly linear regression model with diverging dimensions for right censored data. Statistica Sinica, 22(3), 1003–1020.MathSciNetzbMATHCrossRefGoogle Scholar
  33. 33.
    Ma, S., & Huang, J. (2007). Combining clinical and genomic covariates via Cov-TGDR. Cancer Informatics, 3, 371–378.CrossRefGoogle Scholar
  34. 34.
    Ma, S., Song, X., & Huang, J. (2007). Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics, 8(60), 1–17.Google Scholar
  35. 35.
    Ni, X., Zhang, H. H., & Zhang, D. (2009). Automatic model selection for partially linear models. Journal of Multivariate Analysis, 100(9), 2100–2111.MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    O’Sullivan, F. (1993). Nonparametric estimation in the Cox model. The Annals of Statistics, 21(1), 124–145.MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    Park, M. Y., & Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 659–677.MathSciNetCrossRefGoogle Scholar
  38. 38.
    Shen, X., & Ye, J. (2002). Adaptive model selection. Journal of the American Statistical Association, 97(457), 210–221.MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5), 1–13.CrossRefGoogle Scholar
  40. 40.
    Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.MathSciNetCrossRefGoogle Scholar
  41. 41.
    Talwalkar, J. A., & Lindor, K. D. (2003). Primary biliary cirrhosis. The Lancet, 362(9377), 53–61.CrossRefGoogle Scholar
  42. 42.
    Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.CrossRefGoogle Scholar
  43. 43.
    Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39(4), 1827–1851.MathSciNetzbMATHCrossRefGoogle Scholar
  44. 44.
    Wang, S., Nan, B., Zhu, N., & Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables. Biometrika, 96(2), 307–322.MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    Xia, X., & Yang, H. (2016). Variable selection for partially time-varying coefficient error-in-variables models. Statistics, 50(2), 278–297.MathSciNetzbMATHGoogle Scholar
  46. 46.
    Xie, H., & Huang, J. (2009). Scad-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37(2), 673–696.MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Yang, J., Lu, F., & Yang, H. (2017). Quantile regression for robust estimation and variable selection in partially linear varying-coefficient models. zbMATHCrossRefGoogle Scholar
  48. 48.
    Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Yuan, M., & Lin, Y. (2007). On the non-negative garrotte estimator. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 143–161.MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94(3), 691–703.MathSciNetzbMATHCrossRefGoogle Scholar
  51. 51.
    Zhao, P., & Xue, L. (2010). Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. Journal of Multivariate Analysis, 101(8), 1872–1883.MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7, 2541–2563.MathSciNetzbMATHGoogle Scholar
  53. 53.
    Zhou, N., & Zhu, J. (2010). Group variable selection via a hierarchical lasso and its oracle property. Statistics and Its Interface, 3(4), 557–574.MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95(1), 241–247.MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of CalgaryCalgaryCanada
  2. 2.Tom Baker Cancer CentreAlberta Health ServicesCalgaryCanada

Personalised recommendations