A case-base sampling method for estimating recurrent event intensities

Abstract

Case-base sampling provides an alternative to risk set sampling based methods to estimate hazard regression models, in particular when absolute hazards are also of interest in addition to hazard ratios. The case-base sampling approach results in a likelihood expression of the logistic regression form, but instead of categorized time, such an expression is obtained through sampling of a discrete set of person-time coordinates from all follow-up data. In this paper, in the context of a time-dependent exposure such as vaccination, and a potentially recurrent adverse event outcome, we show that the resulting partial likelihood for the outcome event intensity has the asymptotic properties of a likelihood. We contrast this approach to self-matched case-base sampling, which involves only within-individual comparisons. The efficiency of the case-base methods is compared to that of standard methods through simulations, suggesting that the information loss due to sampling is minimal.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. Aalen O, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin

    Google Scholar 

  2. Arjas E, Haara P (1987) A logistic regression model for hazard: asymptotic results. Scand J Stat 14:1–18

    MathSciNet  MATH  Google Scholar 

  3. Clayton D, Hills M (1993) Statistical models in epidemiology. Oxford University Press, Oxford

    Google Scholar 

  4. Cox DR (1975) Partial likelihood. Biometrika 62:269–276

    MathSciNet  Article  MATH  Google Scholar 

  5. Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge

    Google Scholar 

  6. Farrington CP (1995) Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51:228–235

    MathSciNet  Article  MATH  Google Scholar 

  7. Ghebremichael-Weldeselassie Y, Whitaker HJ, Farrington CP (2014) Self-controlled case series method with smooth age effect. Stat Med 33:639–649

    MathSciNet  Article  MATH  Google Scholar 

  8. Gill RD, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555

    MathSciNet  Article  MATH  Google Scholar 

  9. Hanley JA, Miettinen OS (2009) Fitting smooth-in-time prognostic risk functions via logistic regression. Int J Biostat. doi:10.2202/1557-4679.1125

    MathSciNet  Google Scholar 

  10. Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York

    Google Scholar 

  11. Langholz B, Goldstein L (2001) Conditional logistic analysis of case–control studies with complex sampling. Biostatistics 2:63–84

    Article  MATH  Google Scholar 

  12. Mantel N (1973) Synthetic retrospective studies and related topics. Biometrics 29:479–486

    Article  Google Scholar 

  13. Miettinen OS (2011) Epidemiological research: terms and concepts. Springer, Dordrecht

    Google Scholar 

  14. Saarela O, Arjas E (2015) Non-parametric Bayesian hazard regression for chronic disease risk assessment. Scand J Stat 42:609–626

    MathSciNet  Article  MATH  Google Scholar 

  15. Saarela O, Hanley JA (2015) Case-base methods for studying vaccination safety. Biometrics 71:42–52

    MathSciNet  Article  MATH  Google Scholar 

  16. Whitaker HJ, Hocine MN, Farrington CP (2009) The methodology of self-controlled case series studies. Stat Methods Med Res 18:7–26

    MathSciNet  Article  Google Scholar 

  17. Woolf B (1955) On estimating the relationship between blood group and disease. Hum Genet 19:251–253

    Article  Google Scholar 

Download references

Acknowledgments

The author acknowledges the support of the Natural Sciences and Engineering Research Council (NSERC) of Canada, and thanks Prof. Elja Arjas (University of Helsinki) for helpful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Olli Saarela.

Appendix

Appendix

Because \(M_i(t)\) and \(M_i^*(t)\) are orthogonal, the predictable variation process of the score process can be expressed as

$$\begin{aligned}&\langle U \rangle (t; \theta ) \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \left\{ \log \lambda _i(u; \theta ) - \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right\} \right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u \\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right. - \frac{2 \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] } +\left. \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} \right) \\&\qquad \times Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u + \sum _{i=1}^n \int _0^{t} \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} Y_i(u)\rho _i(u)\,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u)\,{\text {d}} u. \end{aligned}$$

The observed information process is given by

$$\begin{aligned} J(t; \theta )&= -\sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top }\log \lambda _i(u; \theta ) {\text {d}}N_i(u) \\&\quad + \sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {\text {d}}Q_i(u) \\&= -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}}N_i(u) \\&\quad +\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] - \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) {\text {d}}Q_i(u), \end{aligned}$$

where we denoted \(\lambda _i''(u; \theta ) \equiv \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \lambda _i(u; \theta )\).

Using the decompositions (4) and (5), the observed information process can be further written as

$$\begin{aligned}&J(t; \theta ) \\&\quad = -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )[\lambda _i(u; \theta ) + \rho _i(u)] {-} \lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2}\right) Y_i(u)[\lambda _i(u; \theta ) + \rho _i(u)]\,{\text {d}} u \\&\qquad +\,{\mathcal {E}}(t) \\&\qquad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u) \,{\text {d}} u + {\mathcal {E}}(t), \end{aligned}$$

where we denoted

$$\begin{aligned} {\mathcal {E}}(t)\equiv & {} -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}} M_i(u)\\&\quad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {-} \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) \left[ {\text {d}} M_i(u) + {\text {d}} M_i^*(u)\right] . \end{aligned}$$

Therefore, \(E[\langle U \rangle (t; \theta _0)] = E[J(t; \theta _0)]\). With these results, motivating the asymptotic normality of the maximum partial likelihood estimator \(\hat{\theta }\) can proceed similarly as for parametric survival models (e.g. Kalbfleisch and Prentice 2002, p. 180). Briefly, assume a scalar \(\theta \) for notational simplicity, and denote \(U(\theta ) \equiv U(\tau ; \theta )\) and \(J(\theta ) \equiv J(\tau ; \theta )\). From the martingale central limit theorem, it follows under the standard regularity conditions that

$$\begin{aligned} \frac{\sqrt{n}}{n} U(\theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)\right) , \end{aligned}$$

where the matrix \(\varSigma (\theta _0)\) is such that \(\frac{1}{n} \langle U \rangle (\tau ; \theta _0) \mathop {\rightarrow }\limits ^{p} \varSigma (\theta _0)\). The Taylor expansion

$$\begin{aligned} U(\hat{\theta }) = U(\theta _0) - J(\theta _0)\left( \hat{\theta }- \theta _0\right) + \frac{1}{2} \frac{\partial ^3 l(\theta ^*)}{\partial \theta ^3} \left( \hat{\theta }- \theta _0\right) ^2 \end{aligned}$$

can be used to motivate both the consistency and asymptotic normality of \(\hat{\theta }\) by assuming that the third term on the right hand side is bounded in probability. In particular, we get

$$\begin{aligned} \sqrt{n} (\hat{\theta }- \theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)^{-1}\right) , \end{aligned}$$

where \(\varSigma (\theta _0)\) is in practice estimated by the average observed information \(\frac{1}{n} J(\hat{\theta })\) at the maximum likelihood point.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Saarela, O. A case-base sampling method for estimating recurrent event intensities. Lifetime Data Anal 22, 589–605 (2016). https://doi.org/10.1007/s10985-015-9352-x

Download citation

Keywords

  • Case-base sampling
  • Conditional logistic regression
  • Hazard regression
  • Recurrent events
  • Self-matching