Abstract
Existing literature comparing statistical properties of nested case–control and case–cohort methods have become insufficient for present day epidemiologists. The literature has not reconciled conflicting conclusions about the standard methods. Moreover, a comparison including newly developed methods, such as inverse probability weighting methods, is needed. Two analytical methods for nested case–control studies and six methods for case–cohort studies using proportional hazards regression model were summarized and their statistical properties were compared. The answer to which design and method is more powerful was more nuanced than what was previously reported. For both nested case–control and case–cohort designs, inverse probability weighting methods were more powerful than the standard methods. However, the difference became negligible when the proportion of failure events was very low (<1 %) in the full cohort. The comparison between two designs depended on the censoring types and incidence proportion: with random censoring, nested case–control designs coupled with the inverse probability weighting method yielded the highest statistical power among all methods for both designs. With fixed censoring times, there was little difference in efficiency between two designs when inverse probability weighting methods were used; however, the standard case–cohort methods were more powerful than the conditional logistic method for nested case–control designs. As the proportion of failure events in the full cohort became smaller (<10 %), nested case–control methods outperformed all case–cohort methods and the choice of analytic methods within each design became less important. When the predictor of interest was binary, the standard case–cohort methods were often more powerful than the conditional logistic method for nested case–control designs.
Similar content being viewed by others
References
Thomas D. Addendum to ‘Methods of cohort analysis: appraisal by application to asbestos mining’ by Liddell FDK, McDonald JC, Thomas DC. J R Stat Soc. 1977;A140:469–91.
Prentice RL. A case–cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11.
Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case–cohort studies. Ann Stat. 1988;16:64–81.
Langholz B, Thomas D. Nested case–control and case–cohort methods of sampling from a cohort: a critical comparison. Am J Epidemiol. 1990;131:169–76.
Barlow WE, Ichikawa L, Rosner D, Izumi S. Analysis of case–cohort designs. J Clin Epidemiol. 1999;52:1165–72.
Barlow WE. Robust variance estimation for the case–cohort design. Biometrics. 1994;50:1064–72.
Samuelsen S. A pseudo-likelihood approach to analysis of nested case–control studies. Biometrika. 1997;84:379–94.
Kim RS, Kaplan R. Analysis of secondary outcomes in nested case–control study designs. Stat Med. 2014;33:4215–26.
Kim RS. Analysis of nested case–control study designs: revisiting the inverse probability weighting method. Commun Stat Appl Methods. 2013;20:455–66.
Lin DY, Ying Z. Cox regression with incomplete covariate measurements. J Am Stat Assoc. 1993;88:1341–9.
Binder DA. Fitting Cox’s proportional hazards models from survey data. Biometrika. 1992;79:139–47.
Lin DY. On fitting Cox’s proportional hazards models to survey data. Biometrika. 2000;87:37–47.
Anderson PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Stat. 1982;10:1100–20.
Borgan O, Goldstein L, Langholz B. Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann Stat. 1995;23:1749–78.
Therneau TM, Li H. Computing the Cox model for case cohort designs. Lifetime Data Anal. 1999;5:99–112.
Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84:1074–8.
R Development Core Team. R: a language and environment for statistical computing. Vienna: R Development Core Team; 2010.
R code: six case-cohort and two nested case-control methods. http://missionalconsulting.com/methods/rcode-cch-ncc
Zhang H, Goldstein L. Information and asymptotic efficiency of the case–cohort sampling design in Cox’s regression model. J Multivar Anal. 2003;85:292–317.
Goldstein L, Zhang H. Efficiency of the maximum partial likelihood estimator for nested case control sampling. Bernoulli. 2009;15:569–97.
Wacholder J. Practical considerations in choosing between the case–cohort and NCC designs. Epidemiology. 1991;2:155–8.
Chen KN. Generalized case–cohort sampling. J R Stat Soc Ser B (Stat Methodol). 2001;63:791–809.
Chen KN. Statistical estimation in the proportional hazards model with risk set sampling. Ann Stat. 2004;32:1513–32.
Chen HY. Double-semiparametric method for missing covariates in Cox regression models. J Am Stat Assoc. 2002;97:565–76.
Scheike TH, Juul A. Maximum likelihood estimation for Cox’s regression model under nested case–control sampling. Biostatistics. 2004;5:193–206.
Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–9.
Lubin JH. Case–control methods in the presence of multiple failure times and competing risks. Biometrics. 1985;41:49–54.
Zhang H, Schaubel DE, Kalbfleisch JD. Proportional hazards regression for the analysis of clustered survival data from case–cohort studies. Biometrics. 2011;67:18–28.
Chen F, Chen K. Case–cohort analysis of clusters of recurrent events. Lifetime Data Anal. 2014;20:1–15.
Xue X, Xie X, Gunter M, Rohan TE, Wassertheil-Smoller S, Ho GY, et al. Testing the proportional hazards assumption in case–cohort analysis. BMC Med Res Methodol. 2013;13:1–10.
Bellera C, MacGrogan G, Debled M, de Lara C, Brouste V, Mathoulin-Pelissier S. Variables with time-varying effects and the cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol. 2010;10:1–12.
Lu W, Liu M, Chen Y-H. Testing goodness-of-fit for the proportional hazards model based on nested case–control data. Biometrics. 2014;. doi:10.1111/biom.12239.
Ranganathan P, Pramesh CS. Censoring in survival analysis: potential for bias. Perspect Clin Res. 2012;3:40.
Meier EN. A sensitivity analysis for clinical trials with informatively censored survival endpoints. Master’s thesis, University of Washington; 2012.
Braekers R, Veraverbeke N. Cox’s regression model under partially informative censoring. Commun Stat Theory Methods. 2005;34:1793–811.
Lin DY, Robins JM, Wei LJ. Comparing two failure time distributions on the presence of dependent censoring. Biometrika. 1996;83:381–93.
Acknowledgments
This work was supported by the National Institutes of Health Grants 1UL1RR025750-01, P30 CA01330-35; and the National Research Foundation of Korea Grant NRF-2012-S1A3A2033416. The author is deeply thankful for the constructive comments from the anonymous referees, which led to significant improvement of this work.
Conflict of interest
None.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
10654_2014_9974_MOESM1_ESM.tif
The Empirical Biases of the Estimators of β 2. The considered methods are the full cohort analysis, two nested case-control methods which are the conditional logistic approach by Thomas (1977) and the inverse probability weighting method by Samuelsen (1997), and four case–cohort methods which are the inverse probability weighting method by Binder (1992), and the methods by Prentice (1986), Self & Prentice (1988), and Barlow (1994). The average sample size n* and the average subcohort proportion π* are shown in the titles. CCH and NCC are abbreviations for case–cohort and nested case-control designs, respectively (TIFF 173 kb)
10654_2014_9974_MOESM2_ESM.tif
The Empirical Standard Errors of the Estimators of β 2. The empirical standard errors of β 2 estimators are shown for the full cohort analysis, two nested case-control (NCC) methods, which are the conditional logistic approach by Thomas (1977) and the inverse probability weighting method by Samuelsen (1997), and four case–cohort (CCH) methods, which are the inverse probability weighting method by Binder (1992), and the methods by Prentice (1986), Self & Prentice (1988), and Barlow (1994). The average sample size n* and the average subcohort proportion π* are shown in the titles. Only the results for N=500, 1,000 are shown (TIFF 170 kb)
10654_2014_9974_MOESM3_ESM.tif
Empirical Power Testing H0: β 2=0. The nominal type 1 error rate was 0.05. The empirical power of nine methods is measured: full cohort analysis, the conditional logistic approach by Thomas (1997), inverse probability weighting methods by Samuelsen (1997) coupled with approximate jackknife (AJK) variance estimator (Kim 2013), the inverse probability weighting methods by Binder (1992) coupled with AJK variance estimator, Prentice (1986), Prentice (1986) coupled with AJK variance estimator, Self & Prentice (1988), Self & Prentice coupled with AJK variance estimator (i.e. Lin & Ying 1993), and Barlow (1994). The average sample size n* and the average subcohort proportion π* are shown in the titles. CCH and NCC are abbreviations for case–cohort and nested case-control designs, respectively. Only the results for N=500, 1,000 are shown (TIFF 199 kb)
10654_2014_9974_MOESM4_ESM.tif
N=1,500. The empirical biases, standard errors of the estimators of β 1, the empirical power testing H0: β 1=0, and the empirical standard errors of the estimators of β 2 are shown when N=1,500 (TIFF 172 kb)
Rights and permissions
About this article
Cite this article
Kim, R.S. A new comparison of nested case–control and case–cohort designs and methods. Eur J Epidemiol 30, 197–207 (2015). https://doi.org/10.1007/s10654-014-9974-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-014-9974-4