Abstract
Case-base sampling provides an alternative to risk set sampling based methods to estimate hazard regression models, in particular when absolute hazards are also of interest in addition to hazard ratios. The case-base sampling approach results in a likelihood expression of the logistic regression form, but instead of categorized time, such an expression is obtained through sampling of a discrete set of person-time coordinates from all follow-up data. In this paper, in the context of a time-dependent exposure such as vaccination, and a potentially recurrent adverse event outcome, we show that the resulting partial likelihood for the outcome event intensity has the asymptotic properties of a likelihood. We contrast this approach to self-matched case-base sampling, which involves only within-individual comparisons. The efficiency of the case-base methods is compared to that of standard methods through simulations, suggesting that the information loss due to sampling is minimal.
Similar content being viewed by others
References
Aalen O, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin
Arjas E, Haara P (1987) A logistic regression model for hazard: asymptotic results. Scand J Stat 14:1–18
Clayton D, Hills M (1993) Statistical models in epidemiology. Oxford University Press, Oxford
Cox DR (1975) Partial likelihood. Biometrika 62:269–276
Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge
Farrington CP (1995) Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51:228–235
Ghebremichael-Weldeselassie Y, Whitaker HJ, Farrington CP (2014) Self-controlled case series method with smooth age effect. Stat Med 33:639–649
Gill RD, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555
Hanley JA, Miettinen OS (2009) Fitting smooth-in-time prognostic risk functions via logistic regression. Int J Biostat. doi:10.2202/1557-4679.1125
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Langholz B, Goldstein L (2001) Conditional logistic analysis of case–control studies with complex sampling. Biostatistics 2:63–84
Mantel N (1973) Synthetic retrospective studies and related topics. Biometrics 29:479–486
Miettinen OS (2011) Epidemiological research: terms and concepts. Springer, Dordrecht
Saarela O, Arjas E (2015) Non-parametric Bayesian hazard regression for chronic disease risk assessment. Scand J Stat 42:609–626
Saarela O, Hanley JA (2015) Case-base methods for studying vaccination safety. Biometrics 71:42–52
Whitaker HJ, Hocine MN, Farrington CP (2009) The methodology of self-controlled case series studies. Stat Methods Med Res 18:7–26
Woolf B (1955) On estimating the relationship between blood group and disease. Hum Genet 19:251–253
Acknowledgments
The author acknowledges the support of the Natural Sciences and Engineering Research Council (NSERC) of Canada, and thanks Prof. Elja Arjas (University of Helsinki) for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Because \(M_i(t)\) and \(M_i^*(t)\) are orthogonal, the predictable variation process of the score process can be expressed as
The observed information process is given by
where we denoted \(\lambda _i''(u; \theta ) \equiv \frac{\partial ^2}{\partial \theta \partial \theta ^\top } \lambda _i(u; \theta )\).
Using the decompositions (4) and (5), the observed information process can be further written as
where we denoted
Therefore, \(E[\langle U \rangle (t; \theta _0)] = E[J(t; \theta _0)]\). With these results, motivating the asymptotic normality of the maximum partial likelihood estimator \(\hat{\theta }\) can proceed similarly as for parametric survival models (e.g. Kalbfleisch and Prentice 2002, p. 180). Briefly, assume a scalar \(\theta \) for notational simplicity, and denote \(U(\theta ) \equiv U(\tau ; \theta )\) and \(J(\theta ) \equiv J(\tau ; \theta )\). From the martingale central limit theorem, it follows under the standard regularity conditions that
where the matrix \(\varSigma (\theta _0)\) is such that \(\frac{1}{n} \langle U \rangle (\tau ; \theta _0) \mathop {\rightarrow }\limits ^{p} \varSigma (\theta _0)\). The Taylor expansion
can be used to motivate both the consistency and asymptotic normality of \(\hat{\theta }\) by assuming that the third term on the right hand side is bounded in probability. In particular, we get
where \(\varSigma (\theta _0)\) is in practice estimated by the average observed information \(\frac{1}{n} J(\hat{\theta })\) at the maximum likelihood point.
Rights and permissions
About this article
Cite this article
Saarela, O. A case-base sampling method for estimating recurrent event intensities. Lifetime Data Anal 22, 589–605 (2016). https://doi.org/10.1007/s10985-015-9352-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-015-9352-x