Skip to main content
Log in

A case-base sampling method for estimating recurrent event intensities

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Case-base sampling provides an alternative to risk set sampling based methods to estimate hazard regression models, in particular when absolute hazards are also of interest in addition to hazard ratios. The case-base sampling approach results in a likelihood expression of the logistic regression form, but instead of categorized time, such an expression is obtained through sampling of a discrete set of person-time coordinates from all follow-up data. In this paper, in the context of a time-dependent exposure such as vaccination, and a potentially recurrent adverse event outcome, we show that the resulting partial likelihood for the outcome event intensity has the asymptotic properties of a likelihood. We contrast this approach to self-matched case-base sampling, which involves only within-individual comparisons. The efficiency of the case-base methods is compared to that of standard methods through simulations, suggesting that the information loss due to sampling is minimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aalen O, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin

    Book  MATH  Google Scholar 

  • Arjas E, Haara P (1987) A logistic regression model for hazard: asymptotic results. Scand J Stat 14:1–18

    MathSciNet  MATH  Google Scholar 

  • Clayton D, Hills M (1993) Statistical models in epidemiology. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Cox DR (1975) Partial likelihood. Biometrika 62:269–276

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Farrington CP (1995) Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51:228–235

    Article  MathSciNet  MATH  Google Scholar 

  • Ghebremichael-Weldeselassie Y, Whitaker HJ, Farrington CP (2014) Self-controlled case series method with smooth age effect. Stat Med 33:639–649

    Article  MathSciNet  MATH  Google Scholar 

  • Gill RD, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555

    Article  MathSciNet  MATH  Google Scholar 

  • Hanley JA, Miettinen OS (2009) Fitting smooth-in-time prognostic risk functions via logistic regression. Int J Biostat. doi:10.2202/1557-4679.1125

    MathSciNet  Google Scholar 

  • Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Langholz B, Goldstein L (2001) Conditional logistic analysis of case–control studies with complex sampling. Biostatistics 2:63–84

    Article  MATH  Google Scholar 

  • Mantel N (1973) Synthetic retrospective studies and related topics. Biometrics 29:479–486

    Article  Google Scholar 

  • Miettinen OS (2011) Epidemiological research: terms and concepts. Springer, Dordrecht

    Book  Google Scholar 

  • Saarela O, Arjas E (2015) Non-parametric Bayesian hazard regression for chronic disease risk assessment. Scand J Stat 42:609–626

    Article  MathSciNet  MATH  Google Scholar 

  • Saarela O, Hanley JA (2015) Case-base methods for studying vaccination safety. Biometrics 71:42–52

    Article  MathSciNet  MATH  Google Scholar 

  • Whitaker HJ, Hocine MN, Farrington CP (2009) The methodology of self-controlled case series studies. Stat Methods Med Res 18:7–26

    Article  MathSciNet  Google Scholar 

  • Woolf B (1955) On estimating the relationship between blood group and disease. Hum Genet 19:251–253

    Article  Google Scholar 

Download references

Acknowledgments

The author acknowledges the support of the Natural Sciences and Engineering Research Council (NSERC) of Canada, and thanks Prof. Elja Arjas (University of Helsinki) for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olli Saarela.

Appendix

Appendix

Because \(M_i(t)\) and \(M_i^*(t)\) are orthogonal, the predictable variation process of the score process can be expressed as

$$\begin{aligned}&\langle U \rangle (t; \theta ) \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \left\{ \log \lambda _i(u; \theta ) - \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right\} \right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u \\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right. - \frac{2 \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] } +\left. \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} \right) \\&\qquad \times Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u + \sum _{i=1}^n \int _0^{t} \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} Y_i(u)\rho _i(u)\,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u)\,{\text {d}} u. \end{aligned}$$

The observed information process is given by

$$\begin{aligned} J(t; \theta )&= -\sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top }\log \lambda _i(u; \theta ) {\text {d}}N_i(u) \\&\quad + \sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {\text {d}}Q_i(u) \\&= -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}}N_i(u) \\&\quad +\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] - \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) {\text {d}}Q_i(u), \end{aligned}$$

where we denoted \(\lambda _i''(u; \theta ) \equiv \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \lambda _i(u; \theta )\).

Using the decompositions (4) and (5), the observed information process can be further written as

$$\begin{aligned}&J(t; \theta ) \\&\quad = -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )[\lambda _i(u; \theta ) + \rho _i(u)] {-} \lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2}\right) Y_i(u)[\lambda _i(u; \theta ) + \rho _i(u)]\,{\text {d}} u \\&\qquad +\,{\mathcal {E}}(t) \\&\qquad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u) \,{\text {d}} u + {\mathcal {E}}(t), \end{aligned}$$

where we denoted

$$\begin{aligned} {\mathcal {E}}(t)\equiv & {} -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}} M_i(u)\\&\quad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {-} \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) \left[ {\text {d}} M_i(u) + {\text {d}} M_i^*(u)\right] . \end{aligned}$$

Therefore, \(E[\langle U \rangle (t; \theta _0)] = E[J(t; \theta _0)]\). With these results, motivating the asymptotic normality of the maximum partial likelihood estimator \(\hat{\theta }\) can proceed similarly as for parametric survival models (e.g. Kalbfleisch and Prentice 2002, p. 180). Briefly, assume a scalar \(\theta \) for notational simplicity, and denote \(U(\theta ) \equiv U(\tau ; \theta )\) and \(J(\theta ) \equiv J(\tau ; \theta )\). From the martingale central limit theorem, it follows under the standard regularity conditions that

$$\begin{aligned} \frac{\sqrt{n}}{n} U(\theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)\right) , \end{aligned}$$

where the matrix \(\varSigma (\theta _0)\) is such that \(\frac{1}{n} \langle U \rangle (\tau ; \theta _0) \mathop {\rightarrow }\limits ^{p} \varSigma (\theta _0)\). The Taylor expansion

$$\begin{aligned} U(\hat{\theta }) = U(\theta _0) - J(\theta _0)\left( \hat{\theta }- \theta _0\right) + \frac{1}{2} \frac{\partial ^3 l(\theta ^*)}{\partial \theta ^3} \left( \hat{\theta }- \theta _0\right) ^2 \end{aligned}$$

can be used to motivate both the consistency and asymptotic normality of \(\hat{\theta }\) by assuming that the third term on the right hand side is bounded in probability. In particular, we get

$$\begin{aligned} \sqrt{n} (\hat{\theta }- \theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)^{-1}\right) , \end{aligned}$$

where \(\varSigma (\theta _0)\) is in practice estimated by the average observed information \(\frac{1}{n} J(\hat{\theta })\) at the maximum likelihood point.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saarela, O. A case-base sampling method for estimating recurrent event intensities. Lifetime Data Anal 22, 589–605 (2016). https://doi.org/10.1007/s10985-015-9352-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-015-9352-x

Keywords

Navigation