Skip to main content
Log in

Penalised logistic regression and dynamic prediction for discrete-time recurrent event data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

We consider methods for the analysis of discrete-time recurrent event data, when interest is mainly in prediction. The Aalen additive model provides an extremely simple and effective method for the determination of covariate effects for this type of data, especially in the presence of time-varying effects and time varying covariates, including dynamic summaries of prior event history. The method is weakened for predictive purposes by the presence of negative estimates. The obvious alternative of a standard logistic regression analysis at each time point can have problems of stability when event frequency is low and maximum likelihood estimation is used. The Firth penalised likelihood approach is stable but in removing bias in regression coefficients it introduces bias into predicted event probabilities. We propose an alterative modified penalised likelihood, intermediate between Firth and no penalty, as a pragmatic compromise between stability and bias. Illustration on two data sets is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aalen OO, Fosen J, Wedon-Fekjær H, Borgan Ø, Husebye E (2004) Dynamic analysis of multivariate failure time data. Biometrics 60:764–773

    Article  MATH  MathSciNet  Google Scholar 

  • Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71:1–10

    Article  MATH  MathSciNet  Google Scholar 

  • Anscome FJ (1956) On estimating binomial response relations. Biometrika 43:461–464

    Article  MathSciNet  Google Scholar 

  • Berkson J (1953) A statistically precise and relatively simple method of estimating the bioassay with quantal response, based on the logistic function. J Am Statist Assoc 48:565–599

    MATH  Google Scholar 

  • Borgan Ø, Fiaccone RL, Henderson R, Barreto ML (2007) Dynamic analysis of recurrent event data with missing observations, with application to infant diarrhoea in Brazil. Scandinavian J Statist 34:53–69

    Article  MATH  MathSciNet  Google Scholar 

  • Cox DR (1970) Analysis of binary data, 1st edn. Chapman and Hall, London

  • Diggle PJ, Heagerty PJ, Liang K-Y, Zeger S (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford

    Google Scholar 

  • Ferro CAT, Stephenson DB (2012) Deterministic forecasts of extreme events and warnings. In: Jolliffe IB, Stephenson DB (eds) Forecast verification: a practitioner’s guide in atmospheric science, 2nd edn. Wiley, Chichester

    Google Scholar 

  • Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38

    Article  MATH  MathSciNet  Google Scholar 

  • Fosen J, Borgan Ø, Weedon-Fekær H, Aalen OO (2006) Dynamic analysis of recurrent event data using the additive hazard model. Biometr J 48:381–398

    Article  Google Scholar 

  • Haldane JBS (1956) The estimation and significance of the logarithm of a ratio of frequencies. Ann Human Genet 20:309–311

    Article  MATH  Google Scholar 

  • Heinz G, Puhr R (2010) Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Statist Med 29:770–777

    Article  Google Scholar 

  • Heinze G, Schemper M (2002) A solution to the problem of separation in logistic regression. Statist Med 21:2409–2419

    Article  Google Scholar 

  • Heinze G (2006) A comparative investigation of methods for logistic regression with separated or nearly separated data. Statist Med 25:4216–4226

    Article  MathSciNet  Google Scholar 

  • Henderson R, Diggle PJ, Dobson A (2002) Identification and efficacy of longitudinal markers for survival. Biostatistics 3:33–50

    Article  MATH  Google Scholar 

  • Henderson R, Keiding N (2005a) Individual survival time prediction using statistical models. (Forudsigelse af individuelle levetider ved hjaelp af statistuiske modeller). Danish Med J 167/10:1174–1177

    Google Scholar 

  • Henderson R, Keiding N (2005b) Individual survival time prediction using statistical models. J Med Ethics 31:703–706

    Article  Google Scholar 

  • Jachan M, Feldwisch H, Posdziech F, Brandt A, Altenmüller D-M, Schulze-Bonhage A, Timmer J, Schelter B (2009) Probabilistic forecasts of epileptic seizures and evaluation by the Brier score. Fourth Eur Conf Int Federation Medi Biol Eng Proc 22:1701–1705

  • Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York

    MATH  Google Scholar 

  • Mehta CR, Patel NR (1995) Exact logistic regression: theory and examples. Statist Med 14:2143–2160

    Article  Google Scholar 

  • Proust-Lima C, Taylor JMG (2009) Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics 10:535–549

    Article  Google Scholar 

  • van Houwelingen H, Putter H (2011) Dynamic prediction in clinical survival analysis. Chapman and Hall/CRC Press, London

    Google Scholar 

Download references

Acknowledgments

The research of Rosemeire Fiaccone was supported in part by National Council of Technological and Scientific Development—CNPq, Brazil (Num. 237094/2012-6, 480614/2011-3). Robin Henderson benefited from participation in Deutsche Forschungsgemeinschaft research programme FR 3070/1-1. We are grateful for the comments of the reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Henderson.

Appendix: logistic regression with separation not detected in R

Appendix: logistic regression with separation not detected in R

In the following, y is a vector of length 100, with all elements zero except the first, which is one, and x1 is a vector of 50 zeros followed by 50 ones, representing two equally sized groups. If we attempt to fit the logistic regression

$$\begin{aligned} \pi _x = P\big (Y=1 \big | x\big ) = \mathrm{expit} \big (\beta _0+\beta _1x\big ) \end{aligned}$$

then clearly a perfect fit is obtained at \(\hat{\beta }_0=\mathrm{logit}(1/50)=-3.892\) and \(\hat{\beta }_1=-\infty \). Some R (version 3.1.2) output, edited to remove unnecessary material (marked by [...], is:

figure a

Of most concern is the statement of convergence, which is true because the maximised likelihood has indeed converged: moving either of the coefficients away from their current values leads to no improvement. The fitted probabilities \(\hat{\pi }_0\) and \(\hat{\pi }_1\) are accurate but clearly \(\hat{\beta }_1\) is unrealistic. Uncritical assessment of the results might lead to this problem being missed.

If we use the Firth correction as implemented in Kosmidis’ bias reduction package brglm, we obtain:

figure b

Hence the coefficients are stabilised, at the expense of higher values of \(\hat{\pi }_0\) and \(\hat{\pi }_1\) as expected. Heinze’ package logistf gives the same results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elgmati, E., Fiaccone, R.L., Henderson, R. et al. Penalised logistic regression and dynamic prediction for discrete-time recurrent event data. Lifetime Data Anal 21, 542–560 (2015). https://doi.org/10.1007/s10985-015-9321-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-015-9321-4

Keywords

Navigation