Skip to main content
Log in

Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

The case-cohort study involves two-phase samplings: simple random sampling from an infinite superpopulation at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model-based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design-based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ballantyne CM, Hoogeveen RC, Bang H, (2004) Lipoprotein-associated phospholipase A(2), high-sensitivity C-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the Atherosclerosis Risk in Communities (ARIC) study. Circulation 109:837–842

    Article  Google Scholar 

  2. Barlow WE (1994) Robust variance estimation for the case-cohort design. Biometrics 50:1064–1072

    Article  MATH  Google Scholar 

  3. Barlow WE, Ichikawa L, Rosner D, Izumi S (1999) Analysis of case-cohort designs. J Clin Epidemiol 52:1165–1172

    Article  Google Scholar 

  4. Begun JM, Hall WJ, Huang W-M, Wellner JA (1983) Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat 11:432–452

    Article  MATH  MathSciNet  Google Scholar 

  5. Binder DA (1992) Fitting Cox’s proportional hazards model from survey data. Biometrika 79:139–147

    Article  MathSciNet  Google Scholar 

  6. Borgan O, Langholz B, Samuelsen SO, (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6:39–58

    Article  MATH  MathSciNet  Google Scholar 

  7. Breslow N (1974) Covariance analysis of censored survival data. Biometrics 30:89–99

    Article  Google Scholar 

  8. Breslow NE, Holubkov R (1997) Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J R Stat Soc B 59:447–461

    Article  MATH  MathSciNet  Google Scholar 

  9. Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’. Scand J Stat 34:86–102

    Article  MATH  MathSciNet  Google Scholar 

  10. Breslow NE, Wellner JA (2008) A Z-theorem with estimated nuisance parameters and correction note for ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’. Scand J Stat 35:186–192

    Article  MathSciNet  Google Scholar 

  11. Breslow NE, Lumley T, Ballantyne CM, et al (2009) Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol (in press)

  12. Cain KC, Lange NT (1984) Approximate case influence for the proportional hazards regression model with censored data. Biometrics 40:493–499

    Article  Google Scholar 

  13. Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc B 34:187–220

    MATH  Google Scholar 

  14. Cox DR (1975) Partial likelihood. Biometrika 62:269–276

    Article  MATH  MathSciNet  Google Scholar 

  15. D’Angio GJ, Breslow N, Beckwith JB, (1989) Treatment of Wilms’ tumor: Results of the third national Wilms’ tumor study. Cancer 64:349–360

    Article  Google Scholar 

  16. Deming WE, Stephan FF (1940) On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann Math Stat 11:427–444

    Article  MATH  MathSciNet  Google Scholar 

  17. Deville JC, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382

    Article  MATH  Google Scholar 

  18. Green DM, Breslow NE, Beckwith JB, (1998) Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms’ tumor: a report from the national Wilms’ tumor study group. J Clin Oncol 16:237–245

    Google Scholar 

  19. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MATH  MathSciNet  Google Scholar 

  20. Isaki CT, Fuller WA (1982) Survey design under the regression superpopulation model. J Am Stat Assoc 77:89–96

    Article  MATH  MathSciNet  Google Scholar 

  21. Kovacevic MS, Rai SN (2002) Log-linear modelling of change using longitudinal survey data. Commun Stat Theory Methods 31:1815–1835

    Article  MATH  MathSciNet  Google Scholar 

  22. Kulich M, Lin DY (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99:832–844

    Article  MATH  MathSciNet  Google Scholar 

  23. Lin DY (2000) On fitting Cox’s proportional hazards models to survey data. Biometrika 87:37–47

    Article  MATH  MathSciNet  Google Scholar 

  24. Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84:1074–1078

    Article  MATH  MathSciNet  Google Scholar 

  25. Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9:1–19

    Google Scholar 

  26. Mark SD, Katki HA (2006) Specifying and implementing nonparametric and semiparametric survival estimators in two-stage (nested) cohort studies with missing case data. J Am Stat Assoc 101:460–471

    Article  MATH  MathSciNet  Google Scholar 

  27. Nan B (2004) Efficient estimation for case-cohort studies. Can J Stat 32:403–419

    Article  MATH  MathSciNet  Google Scholar 

  28. Neyman J (1938) Contribution to the theory of sampling human populations. J Am Stat Assoc 33:101–116

    Article  MATH  Google Scholar 

  29. Persson M, Nilsson JA, Nelson JJ, (2007) The epidemiology of Lp-PLA(2): distribution and correlation with cardiovascular risk factors in a population-based cohort. Atherosclerosis 190:388–396

    Article  Google Scholar 

  30. Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:1–11

    Article  MATH  MathSciNet  Google Scholar 

  31. Rao JNK, Yung W, Hidiroglou M (2002) Estimating equations for the analysis of survey data using post-stratification information. Sankhya 64:364–378

    MathSciNet  Google Scholar 

  32. Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866

    Article  MATH  MathSciNet  Google Scholar 

  33. Rubin-Bleuer S, Kratina IS (2005) On the two-phase framework for joint model and design based inference. Ann Stat 33:2789–2810

    Article  MATH  Google Scholar 

  34. Särndal C-E, Swensson B, Wretman JH (1989) The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76:527–537

    MATH  MathSciNet  Google Scholar 

  35. Scheike TH, Martinussen T (2004) Maximum likelihood estimation for Cox’s regression model under case-cohort sampling. Scand J Stat 31:283–293

    Article  MATH  MathSciNet  Google Scholar 

  36. Scott AJ, Wild CJ (1997) Fitting regression models to case-control data by maximum likelihood. Biometrika 84:57–71

    Article  MATH  MathSciNet  Google Scholar 

  37. The ARIC Investigators (1989) The atherosclerosis risk in communities (ARIC) study: design and objectives. Am J Epidemiol 129:687–702

    Google Scholar 

  38. Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York

    MATH  Google Scholar 

  39. van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  40. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications in statistics. Springer, New York

    Google Scholar 

  41. Wang CY, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57:414–419

    Article  MathSciNet  Google Scholar 

  42. White JE (1982) A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol 115:119–128

    Google Scholar 

  43. Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc B 69:507–536

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norman E. Breslow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Breslow, N.E., Lumley, T., Ballantyne, C.M. et al. Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology. Stat Biosci 1, 32–49 (2009). https://doi.org/10.1007/s12561-009-9001-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-009-9001-6

Keywords

Navigation