A case-base sampling method for estimating recurrent event intensities

Saarela, Olli

doi:10.1007/s10985-015-9352-x

A case-base sampling method for estimating recurrent event intensities

Published: 14 October 2015

Volume 22, pages 589–605, (2016)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Olli Saarela¹

494 Accesses
4 Citations
Explore all metrics

Abstract

Case-base sampling provides an alternative to risk set sampling based methods to estimate hazard regression models, in particular when absolute hazards are also of interest in addition to hazard ratios. The case-base sampling approach results in a likelihood expression of the logistic regression form, but instead of categorized time, such an expression is obtained through sampling of a discrete set of person-time coordinates from all follow-up data. In this paper, in the context of a time-dependent exposure such as vaccination, and a potentially recurrent adverse event outcome, we show that the resulting partial likelihood for the outcome event intensity has the asymptotic properties of a likelihood. We contrast this approach to self-matched case-base sampling, which involves only within-individual comparisons. The efficiency of the case-base methods is compared to that of standard methods through simulations, suggesting that the information loss due to sampling is minimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nested exposure case-control sampling: a sampling scheme to analyze rare time-dependent exposures

Article 13 November 2018

Estimating time-varying exposure-outcome associations using case-control data: logistic and case-cohort analyses

Article Open access 05 January 2016

Response-Dependent Sampling with Clustered and Longitudinal Data

References

Aalen O, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin
Book MATH Google Scholar
Arjas E, Haara P (1987) A logistic regression model for hazard: asymptotic results. Scand J Stat 14:1–18
MathSciNet MATH Google Scholar
Clayton D, Hills M (1993) Statistical models in epidemiology. Oxford University Press, Oxford
MATH Google Scholar
Cox DR (1975) Partial likelihood. Biometrika 62:269–276
Article MathSciNet MATH Google Scholar
Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge
Book MATH Google Scholar
Farrington CP (1995) Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51:228–235
Article MathSciNet MATH Google Scholar
Ghebremichael-Weldeselassie Y, Whitaker HJ, Farrington CP (2014) Self-controlled case series method with smooth age effect. Stat Med 33:639–649
Article MathSciNet MATH Google Scholar
Gill RD, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555
Article MathSciNet MATH Google Scholar
Hanley JA, Miettinen OS (2009) Fitting smooth-in-time prognostic risk functions via logistic regression. Int J Biostat. doi:10.2202/1557-4679.1125
MathSciNet Google Scholar
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Langholz B, Goldstein L (2001) Conditional logistic analysis of case–control studies with complex sampling. Biostatistics 2:63–84
Article MATH Google Scholar
Mantel N (1973) Synthetic retrospective studies and related topics. Biometrics 29:479–486
Article Google Scholar
Miettinen OS (2011) Epidemiological research: terms and concepts. Springer, Dordrecht
Book Google Scholar
Saarela O, Arjas E (2015) Non-parametric Bayesian hazard regression for chronic disease risk assessment. Scand J Stat 42:609–626
Article MathSciNet MATH Google Scholar
Saarela O, Hanley JA (2015) Case-base methods for studying vaccination safety. Biometrics 71:42–52
Article MathSciNet MATH Google Scholar
Whitaker HJ, Hocine MN, Farrington CP (2009) The methodology of self-controlled case series studies. Stat Methods Med Res 18:7–26
Article MathSciNet Google Scholar
Woolf B (1955) On estimating the relationship between blood group and disease. Hum Genet 19:251–253
Article Google Scholar

Download references

Acknowledgments

The author acknowledges the support of the Natural Sciences and Engineering Research Council (NSERC) of Canada, and thanks Prof. Elja Arjas (University of Helsinki) for helpful comments.

Author information

Authors and Affiliations

Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON, M5T 3M7, Canada
Olli Saarela

Authors

Olli Saarela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olli Saarela.

Appendix

Because $M_i(t)$ and $M_i^*(t)$ are orthogonal, the predictable variation process of the score process can be expressed as

$$\begin{aligned}&\langle U \rangle (t; \theta ) \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \left\{ \log \lambda _i(u; \theta ) - \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right\} \right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u \\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\partial }{\partial \theta } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] \right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2}Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad + \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )}{\lambda _i(u; \theta ) + \rho _i(u)}\right) ^{\otimes 2} Y_i(u)\rho _i(u) \,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right. - \frac{2 \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] } +\left. \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} \right) \\&\qquad \times Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u + \sum _{i=1}^n \int _0^{t} \frac{\lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2} Y_i(u)\rho _i(u)\,{\text {d}} u \\&\quad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u)\,{\text {d}} u. \end{aligned}$$

The observed information process is given by

$$\begin{aligned} J(t; \theta )&= -\sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top }\log \lambda _i(u; \theta ) {\text {d}}N_i(u) \\&\quad + \sum _{i=1}^n \int _0^{t} \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \log \left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {\text {d}}Q_i(u) \\&= -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}}N_i(u) \\&\quad +\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] - \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) {\text {d}}Q_i(u), \end{aligned}$$

where we denoted $\lambda _i''(u; \theta ) \equiv \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \lambda _i(u; \theta )$.

Using the decompositions (4) and (5), the observed information process can be further written as

$$\begin{aligned}&J(t; \theta ) \\&\quad = -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) Y_i(u)\lambda _i(u; \theta )\,{\text {d}} u\\&\qquad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )[\lambda _i(u; \theta ) + \rho _i(u)] {-} \lambda _i'(u; \theta )^{\otimes 2}}{[\lambda _i(u; \theta ) + \rho _i(u)]^2}\right) Y_i(u)[\lambda _i(u; \theta ) + \rho _i(u)]\,{\text {d}} u \\&\qquad +\,{\mathcal {E}}(t) \\&\qquad = \sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )} - \frac{\lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta ) + \rho _i(u)}\right) Y_i(u) \,{\text {d}} u + {\mathcal {E}}(t), \end{aligned}$$

where we denoted

$$\begin{aligned} {\mathcal {E}}(t)\equiv & {} -\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\lambda _i(u; \theta ) - \lambda _i'(u; \theta )^{\otimes 2}}{\lambda _i(u; \theta )^2}\right) {\text {d}} M_i(u)\\&\quad +\,\sum _{i=1}^n \int _0^{t} \left( \frac{\lambda _i''(u; \theta )\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] {-} \lambda _i'(u; \theta )^{\otimes 2}}{\left[ \lambda _i(u; \theta ) + \rho _i(u)\right] ^2}\right) \left[ {\text {d}} M_i(u) + {\text {d}} M_i^*(u)\right] . \end{aligned}$$

Therefore, $E[\langle U \rangle (t; \theta _0)] = E[J(t; \theta _0)]$. With these results, motivating the asymptotic normality of the maximum partial likelihood estimator $\hat{\theta }$ can proceed similarly as for parametric survival models (e.g. Kalbfleisch and Prentice 2002, p. 180). Briefly, assume a scalar $\theta $ for notational simplicity, and denote $U(\theta ) \equiv U(\tau ; \theta )$ and $J(\theta ) \equiv J(\tau ; \theta )$. From the martingale central limit theorem, it follows under the standard regularity conditions that

$$\begin{aligned} \frac{\sqrt{n}}{n} U(\theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)\right) , \end{aligned}$$

where the matrix $\varSigma (\theta _0)$ is such that $\frac{1}{n} \langle U \rangle (\tau ; \theta _0) \mathop {\rightarrow }\limits ^{p} \varSigma (\theta _0)$. The Taylor expansion

$$\begin{aligned} U(\hat{\theta }) = U(\theta _0) - J(\theta _0)\left( \hat{\theta }- \theta _0\right) + \frac{1}{2} \frac{\partial ^3 l(\theta ^*)}{\partial \theta ^3} \left( \hat{\theta }- \theta _0\right) ^2 \end{aligned}$$

can be used to motivate both the consistency and asymptotic normality of $\hat{\theta }$ by assuming that the third term on the right hand side is bounded in probability. In particular, we get

$$\begin{aligned} \sqrt{n} (\hat{\theta }- \theta _0) \mathop {\rightarrow }\limits ^{d} N\left( 0, \varSigma (\theta _0)^{-1}\right) , \end{aligned}$$

where $\varSigma (\theta _0)$ is in practice estimated by the average observed information $\frac{1}{n} J(\hat{\theta })$ at the maximum likelihood point.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saarela, O. A case-base sampling method for estimating recurrent event intensities. Lifetime Data Anal 22, 589–605 (2016). https://doi.org/10.1007/s10985-015-9352-x

Download citation

Received: 23 March 2015
Accepted: 05 October 2015
Published: 14 October 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10985-015-9352-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A case-base sampling method for estimating recurrent event intensities

Abstract

Access this article

Similar content being viewed by others

Nested exposure case-control sampling: a sampling scheme to analyze rare time-dependent exposures

Estimating time-varying exposure-outcome associations using case-control data: logistic and case-cohort analyses

Response-Dependent Sampling with Clustered and Longitudinal Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A case-base sampling method for estimating recurrent event intensities

Abstract

Access this article

Similar content being viewed by others

Nested exposure case-control sampling: a sampling scheme to analyze rare time-dependent exposures

Estimating time-varying exposure-outcome associations using case-control data: logistic and case-cohort analyses

Response-Dependent Sampling with Clustered and Longitudinal Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation