Penalized full likelihood approach to variable selection for Cox’s regression model under nested case–control sampling

Abstract

Assuming Cox’s regression model, we consider penalized full likelihood approach to conduct variable selection under nested case–control (NCC) sampling. Penalized non-parametric maximum likelihood estimates (PNPMLEs) are characterized by self-consistency equations derived from score functions. A cross-validation method based on profile likelihood is used to choose the tuning parameter within a family of penalty functions. Simulation studies indicate that the numerical performance of (P)NPMLE is better than weighted partial likelihood in estimating the log-relative risk and in identifying the covariates and the model, under NCC sampling. LASSO performs best when cohort size is small; SCAD performs best when cohort size is large and may eventually perform as well as the oracle estimator. Using the SCAD penalty, we establish the consistency, asymptotic normality, and oracle properties of the PNPMLE, as well as the sparsity property of the penalty. We also propose a consistent estimate of the asymptotic variance using observed profile likelihood. Our method is illustrated to analyze the diagnosis of liver cancer among those in a type 2 diabetic mellitus dataset who were treated with thiazolidinediones in Taiwan.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. Borgan Ø, Zhang Y (2015) Using cumulative sums of martingale residuals for model checking in nested case–control studies. Biometrics 71(3):696–703

    MathSciNet  MATH  Article  Google Scholar 

  2. Chang IS, Hsiung CA, Wang MC, Wen CC (2005) An asymptotic theory for the nonparametric maximum likelihood estimation in the Cox-gene model. Bernoulli 11(5):863–892

    MathSciNet  MATH  Article  Google Scholar 

  3. Chang CH, Lin JW, Wu LC, Lai MS, Chuang LM, Chan KA (2012) Association of thiazolidinediones with liver cancer and colorectal cancer in type 2 diabetes mellitus. Hepatology 55(5):1462–1472

    Article  Google Scholar 

  4. Chen KN (2001) Generalized case-cohort sampling. J R Stat Soc B 63(4):791–809

    MathSciNet  MATH  Article  Google Scholar 

  5. Chen HY (2002) Double-semiparametric method for missing covariates in Cox regression models. J Am Stat Assoc 97(458):565–576

    MathSciNet  MATH  Article  Google Scholar 

  6. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    MathSciNet  MATH  Article  Google Scholar 

  7. Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1):74–99

    MathSciNet  MATH  Article  Google Scholar 

  8. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70(5):849–911

    MathSciNet  MATH  Article  Google Scholar 

  9. Gau CS, Chang IS, Lin Wu FL, Yu HT, Huang YW, Chi CL, Chien SY, Lin KM, Liu MY, Wang HP (2007) Usage of the claim database of national health insurance programme for analysis of cisapride–erythromycin co-medication in Taiwan. Pharmacoepidemiol Drug Saf 16(1):86–95

    Article  Google Scholar 

  10. Giovannucci E, Harlan DM, Archer MC, Bergenstal RM, Gapstur SM, Habel LA, Pollak M, Regensteiner JG, Yee D (2010) Diabetes and cancer: a consensus report. CA Cancer J Clin 60(4):207–221

    Article  Google Scholar 

  11. Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1–8

    MathSciNet  MATH  Article  Google Scholar 

  12. Kim RS (2013) Lesser known facts about nested case–control designs. J Transl Med Epidemiol 1(1):1007

    Google Scholar 

  13. Liu ML, Lu WB, Shore RE, Zeleniuch-Jacquotte A (2010) Cox regression model with time-varying coefficients in nested case-control studies. Biostatistics 11(4):693–706

    Article  Google Scholar 

  14. Ni A, Cai JW, Zeng DL (2016) Variable selection for case-cohort studies with failure time outcome. Biometrika 103(3):547–562

    MathSciNet  MATH  Article  Google Scholar 

  15. Nicolucci A (2010) Epidemiological aspects of neoplasms in diabetes. Acta Diabetol 47(2):87–95

    Article  Google Scholar 

  16. Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

    MathSciNet  MATH  Article  Google Scholar 

  17. Saarela O, Kulathinal S, Arjas E, Läärä E (2008) Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives. Stat Med 27(28):5991–6008

    MathSciNet  Article  Google Scholar 

  18. Samuelsen SO (1997) A pseudolikelihood approach to analysis of nested case–control studies. Biometrika 84(2):379–394

    MathSciNet  MATH  Article  Google Scholar 

  19. Scheike TH, Juul A (2004) Maximum likelihood estimation for Cox’s regression model under nested case–control sampling. Biostatistics 5(2):193–206

    MATH  Article  Google Scholar 

  20. Scheike TH, Martinussen T (2004) Maximum likelihood estimation for Cox’s regression model under case-cohorts sampling. Scand J Stat 31(2):283–293

    MathSciNet  MATH  Article  Google Scholar 

  21. Støer NC, Samuelsen SO (2012) Comparison of estimators in nested case–control studies with multiple outcomes. Lifetime Data Anal 18(3):261–283

    MathSciNet  MATH  Article  Google Scholar 

  22. Thomas DC (1977) Addendum to “methods of cohort analysis: appraisal by application to asbestos mining,” by Liddell FDK, McDonald JC, Thomas DC. J R Stat Soc A 140(4):483–485

    Google Scholar 

  23. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  24. Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395

    Article  Google Scholar 

  25. Verweij PJM, Van Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):2305–2314

    Article  Google Scholar 

  26. Vigneri P, Frasca L, Sciacca L, Pandini G, Vigneri R (2009) Diabetes and cancer. Endocr Relat Cancer 16:1103–1123

    Article  Google Scholar 

  27. Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411

    MathSciNet  MATH  Article  Google Scholar 

  28. Zhao SD, Li Y (2014) Score test variable screening. Biometrics 70(4):862–871

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by Ministry of Science and Technology, Taiwan (Grant No. MOST-105-2319-B400-002). We are very grateful to the AE and referees, whose valuable comments and suggestions led to significant improvement of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to I-Shou Chang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 111 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Pan, C., Chang, I. et al. Penalized full likelihood approach to variable selection for Cox’s regression model under nested case–control sampling. Lifetime Data Anal 26, 292–314 (2020). https://doi.org/10.1007/s10985-019-09475-z

Download citation

Keywords

  • Nested case–control sampling
  • Oracle property
  • PNPMLE
  • SCAD