Abstract
Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.
Similar content being viewed by others
References
Aalen O (1980) A model for nonparametric regression analysis of counting processes. Lecture notes in statistics 2. Springer, New York
Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60:255–265
Ballantyne CM, Hoogeveen RC, Bang H, Coresh J, Folsom AR, Heiss G, Sharrett AR (2004) Lipoprotein-associated phospholipase a2, high-sensitivity c-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the atherosclerosis risk in communities (ARIC) study. Circulation 109:837–842
Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6:39–58
Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B 34:187–220
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York
Huber PJ (1973) Robust regression: asymptotics, conjectures, and monte carlo. Ann Stat 1:799–821
Kang S, Cai J, Chambless L (2013) Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the atherosclerosis risk in communities (aric) study. Biostatistics 14:28–41
Kulich M, Lin D (2000) Additive hazards regression for case-cohort studies. Biometrika 87:73–87
Lin D, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71
Lin W, Lv J (2013) High-dimensional sparse additive hazards regression. J Am Stat Assoc 108:247–264
Martinussen T, Scheike TH (2009) Covariate selection for the semiparametric additive risk model. Scand J Stat 36:602–619
Ni A, Cai J, Zeng D (2016) Variable selection for case-cohort studies with failure time outcome. Biometrika 103:547–562
Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:1–11
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67
Zeng L, Xie J (2014) Group variable selection via scad-l2. Statistics 48:49–66
Acknowledgements
This work was partially supported by National Institutes of Health Grants (P01 CA 142538, R01 ES 021900). The authors thank the staff and participants of the ARIC study for their important contributions. The ARIC Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts (N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ni, A., Cai, J. A regularized variable selection procedure in additive hazards model with stratified case-cohort design. Lifetime Data Anal 24, 443–463 (2018). https://doi.org/10.1007/s10985-017-9402-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-017-9402-7