Lifetime Data Analysis

, Volume 24, Issue 3, pp 443–463 | Cite as

A regularized variable selection procedure in additive hazards model with stratified case-cohort design

  • Ai Ni
  • Jianwen Cai


Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.


Additive hazards model Diverging number of parameters SCAD Stratified case-cohort design Survival analysis Variable selection 



This work was partially supported by National Institutes of Health Grants (P01 CA 142538, R01 ES 021900). The authors thank the staff and participants of the ARIC study for their important contributions. The ARIC Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts (N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022).

Supplementary material

10985_2017_9402_MOESM1_ESM.pdf (134 kb)
Supplementary material 1 (pdf 133 KB)


  1. Aalen O (1980) A model for nonparametric regression analysis of counting processes. Lecture notes in statistics 2. Springer, New YorkCrossRefzbMATHGoogle Scholar
  2. Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60:255–265MathSciNetCrossRefzbMATHGoogle Scholar
  3. Ballantyne CM, Hoogeveen RC, Bang H, Coresh J, Folsom AR, Heiss G, Sharrett AR (2004) Lipoprotein-associated phospholipase a2, high-sensitivity c-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the atherosclerosis risk in communities (ARIC) study. Circulation 109:837–842CrossRefGoogle Scholar
  4. Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6:39–58MathSciNetCrossRefzbMATHGoogle Scholar
  5. Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B 34:187–220MathSciNetzbMATHGoogle Scholar
  6. Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403MathSciNetCrossRefzbMATHGoogle Scholar
  7. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  8. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New YorkCrossRefzbMATHGoogle Scholar
  9. Huber PJ (1973) Robust regression: asymptotics, conjectures, and monte carlo. Ann Stat 1:799–821MathSciNetCrossRefzbMATHGoogle Scholar
  10. Kang S, Cai J, Chambless L (2013) Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the atherosclerosis risk in communities (aric) study. Biostatistics 14:28–41CrossRefGoogle Scholar
  11. Kulich M, Lin D (2000) Additive hazards regression for case-cohort studies. Biometrika 87:73–87MathSciNetCrossRefzbMATHGoogle Scholar
  12. Lin D, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71MathSciNetCrossRefzbMATHGoogle Scholar
  13. Lin W, Lv J (2013) High-dimensional sparse additive hazards regression. J Am Stat Assoc 108:247–264MathSciNetCrossRefzbMATHGoogle Scholar
  14. Martinussen T, Scheike TH (2009) Covariate selection for the semiparametric additive risk model. Scand J Stat 36:602–619MathSciNetCrossRefzbMATHGoogle Scholar
  15. Ni A, Cai J, Zeng D (2016) Variable selection for case-cohort studies with failure time outcome. Biometrika 103:547–562MathSciNetCrossRefGoogle Scholar
  16. Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:1–11MathSciNetCrossRefzbMATHGoogle Scholar
  17. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetzbMATHGoogle Scholar
  18. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67MathSciNetCrossRefzbMATHGoogle Scholar
  19. Zeng L, Xie J (2014) Group variable selection via scad-l2. Statistics 48:49–66MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Epidemiology and BiostatisticsMemorial Sloan Kettering Cancer CenterNew YorkUSA
  2. 2.Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations