Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M.; Chambless, Lloyd E.; Kulich, Michal

doi:10.1007/s12561-009-9001-6

Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

Published: 29 April 2009

Volume 1, pages 32–49, (2009)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Norman E. Breslow¹,
Thomas Lumley¹,
Christie M. Ballantyne²,
Lloyd E. Chambless³ &
…
Michal Kulich⁴

803 Accesses
93 Citations
1 Altmetric
Explore all metrics

Abstract

The case-cohort study involves two-phase samplings: simple random sampling from an infinite superpopulation at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model-based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design-based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

Article Open access 02 April 2024

Analysis and asymptotic theory for nested case–control designs under highly stratified proportional hazards models

Article 06 December 2022

A class of weighted estimators for additive hazards model in case-cohort studies

Article 01 October 2014

References

Ballantyne CM, Hoogeveen RC, Bang H, (2004) Lipoprotein-associated phospholipase A(2), high-sensitivity C-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the Atherosclerosis Risk in Communities (ARIC) study. Circulation 109:837–842
Article Google Scholar
Barlow WE (1994) Robust variance estimation for the case-cohort design. Biometrics 50:1064–1072
Article MATH Google Scholar
Barlow WE, Ichikawa L, Rosner D, Izumi S (1999) Analysis of case-cohort designs. J Clin Epidemiol 52:1165–1172
Article Google Scholar
Begun JM, Hall WJ, Huang W-M, Wellner JA (1983) Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat 11:432–452
Article MATH MathSciNet Google Scholar
Binder DA (1992) Fitting Cox’s proportional hazards model from survey data. Biometrika 79:139–147
Article MathSciNet Google Scholar
Borgan O, Langholz B, Samuelsen SO, (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6:39–58
Article MATH MathSciNet Google Scholar
Breslow N (1974) Covariance analysis of censored survival data. Biometrics 30:89–99
Article Google Scholar
Breslow NE, Holubkov R (1997) Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J R Stat Soc B 59:447–461
Article MATH MathSciNet Google Scholar
Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’. Scand J Stat 34:86–102
Article MATH MathSciNet Google Scholar
Breslow NE, Wellner JA (2008) A Z-theorem with estimated nuisance parameters and correction note for ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’. Scand J Stat 35:186–192
Article MathSciNet Google Scholar
Breslow NE, Lumley T, Ballantyne CM, et al (2009) Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol (in press)
Cain KC, Lange NT (1984) Approximate case influence for the proportional hazards regression model with censored data. Biometrics 40:493–499
Article Google Scholar
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc B 34:187–220
MATH Google Scholar
Cox DR (1975) Partial likelihood. Biometrika 62:269–276
Article MATH MathSciNet Google Scholar
D’Angio GJ, Breslow N, Beckwith JB, (1989) Treatment of Wilms’ tumor: Results of the third national Wilms’ tumor study. Cancer 64:349–360
Article Google Scholar
Deming WE, Stephan FF (1940) On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann Math Stat 11:427–444
Article MATH MathSciNet Google Scholar
Deville JC, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382
Article MATH Google Scholar
Green DM, Breslow NE, Beckwith JB, (1998) Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms’ tumor: a report from the national Wilms’ tumor study group. J Clin Oncol 16:237–245
Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MATH MathSciNet Google Scholar
Isaki CT, Fuller WA (1982) Survey design under the regression superpopulation model. J Am Stat Assoc 77:89–96
Article MATH MathSciNet Google Scholar
Kovacevic MS, Rai SN (2002) Log-linear modelling of change using longitudinal survey data. Commun Stat Theory Methods 31:1815–1835
Article MATH MathSciNet Google Scholar
Kulich M, Lin DY (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99:832–844
Article MATH MathSciNet Google Scholar
Lin DY (2000) On fitting Cox’s proportional hazards models to survey data. Biometrika 87:37–47
Article MATH MathSciNet Google Scholar
Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84:1074–1078
Article MATH MathSciNet Google Scholar
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9:1–19
Google Scholar
Mark SD, Katki HA (2006) Specifying and implementing nonparametric and semiparametric survival estimators in two-stage (nested) cohort studies with missing case data. J Am Stat Assoc 101:460–471
Article MATH MathSciNet Google Scholar
Nan B (2004) Efficient estimation for case-cohort studies. Can J Stat 32:403–419
Article MATH MathSciNet Google Scholar
Neyman J (1938) Contribution to the theory of sampling human populations. J Am Stat Assoc 33:101–116
Article MATH Google Scholar
Persson M, Nilsson JA, Nelson JJ, (2007) The epidemiology of Lp-PLA(2): distribution and correlation with cardiovascular risk factors in a population-based cohort. Atherosclerosis 190:388–396
Article Google Scholar
Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:1–11
Article MATH MathSciNet Google Scholar
Rao JNK, Yung W, Hidiroglou M (2002) Estimating equations for the analysis of survey data using post-stratification information. Sankhya 64:364–378
MathSciNet Google Scholar
Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
Article MATH MathSciNet Google Scholar
Rubin-Bleuer S, Kratina IS (2005) On the two-phase framework for joint model and design based inference. Ann Stat 33:2789–2810
Article MATH Google Scholar
Särndal C-E, Swensson B, Wretman JH (1989) The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76:527–537
MATH MathSciNet Google Scholar
Scheike TH, Martinussen T (2004) Maximum likelihood estimation for Cox’s regression model under case-cohort sampling. Scand J Stat 31:283–293
Article MATH MathSciNet Google Scholar
Scott AJ, Wild CJ (1997) Fitting regression models to case-control data by maximum likelihood. Biometrika 84:57–71
Article MATH MathSciNet Google Scholar
The ARIC Investigators (1989) The atherosclerosis risk in communities (ARIC) study: design and objectives. Am J Epidemiol 129:687–702
Google Scholar
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York
MATH Google Scholar
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
MATH Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications in statistics. Springer, New York
Google Scholar
Wang CY, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57:414–419
Article MathSciNet Google Scholar
White JE (1982) A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol 115:119–128
Google Scholar
Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc B 69:507–536
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Washington, Seattle, WA, USA
Norman E. Breslow & Thomas Lumley
Department of Medicine, Baylor College of Medicine, Houston, TX, USA
Christie M. Ballantyne
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
Lloyd E. Chambless
Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Michal Kulich

Authors

Norman E. Breslow
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Lumley
View author publications
You can also search for this author in PubMed Google Scholar
Christie M. Ballantyne
View author publications
You can also search for this author in PubMed Google Scholar
Lloyd E. Chambless
View author publications
You can also search for this author in PubMed Google Scholar
Michal Kulich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norman E. Breslow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Breslow, N.E., Lumley, T., Ballantyne, C.M. et al. Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology. Stat Biosci 1, 32–49 (2009). https://doi.org/10.1007/s12561-009-9001-6

Download citation

Received: 19 February 2009
Accepted: 23 February 2009
Published: 29 April 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s12561-009-9001-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

Abstract

Access this article

Similar content being viewed by others

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

Analysis and asymptotic theory for nested case–control designs under highly stratified proportional hazards models

A class of weighted estimators for additive hazards model in case-cohort studies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

Abstract

Access this article

Similar content being viewed by others

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

Analysis and asymptotic theory for nested case–control designs under highly stratified proportional hazards models

A class of weighted estimators for additive hazards model in case-cohort studies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation