Skip to main content
Log in

Z-estimation and stratified samples: application to survival models

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

The infinite dimensional Z-estimation theorem offers a systematic approach to joint estimation of both Euclidean and non-Euclidean parameters in probability models for data. It is easily adapted for stratified sampling designs. This is important in applications to censored survival data because the inverse probability weights that modify the standard estimating equations often depend on the entire follow-up history. Since the weights are not predictable, they complicate the usual theory based on martingales. This paper considers joint estimation of regression coefficients and baseline hazard functions in the Cox proportional and Lin–Ying additive hazards models. Weighted likelihood equations are used for the former and weighted estimating equations for the latter. Regression coefficients and baseline hazards may be combined to estimate individual survival probabilities. Efficiency is improved by calibrating or estimating the weights using information available for all subjects. Although inefficient in comparison with likelihood inference for incomplete data, which is often difficult to implement, the approach provides consistent estimates of desired population parameters even under model misspecification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Godambe (1960) had earlier studied variances based on the “information sandwich”, but was concerned with inefficient estimators on the model rather than with misspecification. Cox (1961) derived the sandwich in an informal treatment of tests of separate families of hypotheses, later crediting Huber for a rigorous discussion of the distributional result.

  2. Although \(h_t\) is not itself in \(H\), it is of bounded variation and hence may be renormalized to be in \(H\), which is all that is needed in the sequel since the estimating equations are linear in \(h\).

  3. Indeed, the term \({\mathbb {G}}_N [(R-\pi _0)/\pi _0] \psi _{\theta _0,h}\) in (1), which has the same limiting distribution whether the \(\psi _{\theta _0,h}\) are regarded as random or fixed by conditioning (van der Vaart and Wellner 1996, Sect. 2.9), is the normalized error arising from IPW estimation of the Phase I total of the scores. The solution to the sample survey problem, to estimate this unknown total using two phase stratified sampling, is best achieved when the calibration variables used to adjust the sampling weights are highly correlated with the scores.

  4. This result would be of no surprise to a survey sampler. For estimation of a population total using stratified Bernoulli sampling, it is well known that conditioning on the Phase II stratum totals \(\{n_1,\ldots ,n_J\}\) (see Table 1) is equivalent to finite population stratified sampling (Särndal et al. 1992, Sect. 9.8, Example 9.14).

References

  • Aalen O (1976) Nonparametric inference in connection with multiple decrement models. Scand J Stat 3:15–27

    MATH  MathSciNet  Google Scholar 

  • Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10:1100–1120

    Article  MATH  MathSciNet  Google Scholar 

  • Anderson GL, Manson J, Wallace R, Lund B, Hall D, Davis S, Shumaker S, Wang CY, Stein E, Prentice RL (2003) Implementation of the Women’s Health Initiative study design. Ann Epidemiol 13:S5–S17

    Article  Google Scholar 

  • Barlow R, Bartholomew D, Bremner J, Brunk H (1972) Statistical inference under order restrictions. Wiley, New York

    MATH  Google Scholar 

  • Begun JM, Hall WJ, Huang WM, Wellner JA (1983) Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat 11:432–452

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel P, Klaassen C, Ritov Y, Wellner J (1993) Efficient and adaptive estimation for semiparametric models. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J (2000) Exposure stratified case–cohort designs. Lifetime Data Anal 6:39–58

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow N, Crowley J (1974) A large sample study of the life table and product limit estimates under random censorship. Ann Stat 2:437–453

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow NE, Lumley T (2013) Semiparametric models and two-phase samples: applications to Cox regression. In: IMS collections, vol. 9, Institute of Mathematical Statistics, Beachwood, OH, pp 65–77

  • Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Stat 34:86–102

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow NE, Wellner JA (2008) A Z-theorem with estimated nuisance parameters and correction note for ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’. Scand J Stat 35:186–192

    Article  MathSciNet  Google Scholar 

  • Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M (2009a) Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Statist Biosci 1:32–49

    Article  Google Scholar 

  • Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M (2009b) Using the whole cohort in the analysis of case–cohort data. Am J Epidemiol 169:1398–1405

    Article  Google Scholar 

  • Cox DR (1961) Tests of separate families of hypotheses. In: Proceedings of the fourth Berkeley symposium on mathematical statististics and probability, vol. 1, University of California Press, Berkeley, CA, pp 105–123

  • Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc (Ser B) 34:187–220

    MATH  Google Scholar 

  • Deville JC, Särndal CE (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87:376–382

    Article  MATH  Google Scholar 

  • Freedman DA (2006) On the so-called “Huber sandwich estimator” and “robust standard errors”. Am Stat 60:299–302

    Article  Google Scholar 

  • Godambe VP (1960) An optimum property of regular maximum-likelihood estimation. Ann Math Stat 31:1208–1211

    Article  MathSciNet  Google Scholar 

  • Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, University of California Press, Berkeley, CA, pp 221–233

  • Huber PJ (1980) Robust statistics. Wiley, New York

    Google Scholar 

  • Kalbfleisch JD, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken, NJ

    Book  MATH  Google Scholar 

  • Keogh RH, White IR (2013) Using full-cohort data in nested case–control and case–cohort studies by multiple imputation. Stat Med 32:4021–4043

    Article  MathSciNet  Google Scholar 

  • Kulich M, Lin DY (2000) Additive hazards regression for case–cohort studies. Biometrika 87:73–87

    Article  MATH  MathSciNet  Google Scholar 

  • Kulich M, Lin DY (2004) Improving the efficiency of relative-risk estimation in case–cohort studies. J Am Stat Assoc 99:832–844

    Article  MATH  MathSciNet  Google Scholar 

  • Li G, Tseng CH (2008) Non-parametric estimation of a survival function with two-stage design studies. Scand J Stat 35:193–211

    Article  MATH  MathSciNet  Google Scholar 

  • Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  MATH  MathSciNet  Google Scholar 

  • Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84:1074–1078

    Article  MATH  MathSciNet  Google Scholar 

  • Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71

    Article  MATH  MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Lumley T (2009) Robustness of semiparametric efficiency in nearly-correct models for two-phase samples. UW Biostatistics Working Paper Series. http://biostats.bepress.com/uwbiostat/paper351, Accessed 22 November 2014

  • Lumley T (2012) Complex surveys: a guide to analysis using R. Wiley, Hoboken, NJ

    Google Scholar 

  • Lumley T, Shaw PA, Dai JY (2011) Connections between survey calibration estimators and semiparametric models for incomplete data. Int Stat Rev 79:200–220

    Article  MATH  Google Scholar 

  • Marti H, Chavance M (2011) Multiple imputation analysis of case–cohort studies. Stat Med 30:1595–1607

    Article  MathSciNet  Google Scholar 

  • McKeague IW, Sasieni PD (1994) A partly parametric additive risk model. Biometrika 81:501–514

    Article  MATH  MathSciNet  Google Scholar 

  • Nan B (2004) Efficient estimation for case–cohort studies. Can J Stat 32:403–419

    Article  MATH  MathSciNet  Google Scholar 

  • Nan B, Emond M, Wellner JA (2004) Information bounds for Cox regression models with missing data. Ann Stat 32:723–753

    Article  MATH  MathSciNet  Google Scholar 

  • Nelson W (1972) Theory and applications of hazard plotting for censored failure data. Technometrics 14:945–966

    Article  Google Scholar 

  • Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:1–11

    Article  MATH  MathSciNet  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression-coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866

    Article  MATH  MathSciNet  Google Scholar 

  • Royall RM (1986) Model robust confidence-intervals using maximum-likelihood estimators. Int Stat Rev 54:221–226

    Article  MATH  MathSciNet  Google Scholar 

  • Saegusa T, Wellner JA (2013) Weighted likelihood estimation under two-phase sampling. Ann Stat 41:269–295

    Article  MATH  MathSciNet  Google Scholar 

  • Särndal C, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York

    Book  MATH  Google Scholar 

  • Scheike TH, Martinussen T (2004) Maximum likelihood estimation for Cox’s regression model under case-cohort sampling. Scand J Stat 31:283–293

    Article  MATH  MathSciNet  Google Scholar 

  • Struthers CA, Kalbfleisch JD (1986) Misspecified proportional hazard models. Biometrika 73:363–369

    Article  MATH  MathSciNet  Google Scholar 

  • Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York

    Book  Google Scholar 

  • Tsiatis AA (1981) A large sample study of Cox’s regression model. Ann Stat 9:93–108

    Article  MATH  MathSciNet  Google Scholar 

  • van der Vaart AW (1995) Efficiency of infinite dimensional M-estimators. Stat Neerl 49:9–30

  • van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge, UK

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications in statistics. Springer, New York

  • Williams OD (1989) The Atherosclerosis Risk in Communities (ARIC) study—design and objectives. Am J Epidemiol 129:687–702

    Google Scholar 

  • Zeng DL, Lin DY (2014) Efficient estimation of semiparametric transformation models for two-phase cohort studies. J Am Stat Assoc 109:371–383

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Wellner’s research was supported in part by National Science Foundation Grant DMS-1104832 and National Institute of Allegery and Infectious Diseases Grant 2R01 AI291968-04. Dedicated to Niels Keiding on the occasion of his 70th birthday.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norman E. Breslow.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Breslow, N.E., Hu, J. & Wellner, J.A. Z-estimation and stratified samples: application to survival models. Lifetime Data Anal 21, 493–516 (2015). https://doi.org/10.1007/s10985-014-9317-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-014-9317-5

Keywords

Navigation