Penalised logistic regression and dynamic prediction for discrete-time recurrent event data

Elgmati, Entisar; Fiaccone, Rosemeire L.; Henderson, R.; Matthews, John N. S.

doi:10.1007/s10985-015-9321-4

Penalised logistic regression and dynamic prediction for discrete-time recurrent event data

Published: 28 January 2015

Volume 21, pages 542–560, (2015)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Entisar Elgmati¹,
Rosemeire L. Fiaccone²,
R. Henderson³ &
…
John N. S. Matthews³

963 Accesses
10 Citations
Explore all metrics

Abstract

We consider methods for the analysis of discrete-time recurrent event data, when interest is mainly in prediction. The Aalen additive model provides an extremely simple and effective method for the determination of covariate effects for this type of data, especially in the presence of time-varying effects and time varying covariates, including dynamic summaries of prior event history. The method is weakened for predictive purposes by the presence of negative estimates. The obvious alternative of a standard logistic regression analysis at each time point can have problems of stability when event frequency is low and maximum likelihood estimation is used. The Firth penalised likelihood approach is stable but in removing bias in regression coefficients it introduces bias into predicted event probabilities. We propose an alterative modified penalised likelihood, intermediate between Firth and no penalty, as a pragmatic compromise between stability and bias. Illustration on two data sets is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing model prediction performance for the expected cumulative number of recurrent events

Article 17 November 2023

Joint analysis of recurrent event data with a dependent terminal event

Article 12 December 2017

A new joint model of recurrent event data with the additive hazards model for the terminal event time

Article 01 April 2016

References

Aalen OO, Fosen J, Wedon-Fekjær H, Borgan Ø, Husebye E (2004) Dynamic analysis of multivariate failure time data. Biometrics 60:764–773
Article MATH MathSciNet Google Scholar
Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71:1–10
Article MATH MathSciNet Google Scholar
Anscome FJ (1956) On estimating binomial response relations. Biometrika 43:461–464
Article MathSciNet Google Scholar
Berkson J (1953) A statistically precise and relatively simple method of estimating the bioassay with quantal response, based on the logistic function. J Am Statist Assoc 48:565–599
MATH Google Scholar
Borgan Ø, Fiaccone RL, Henderson R, Barreto ML (2007) Dynamic analysis of recurrent event data with missing observations, with application to infant diarrhoea in Brazil. Scandinavian J Statist 34:53–69
Article MATH MathSciNet Google Scholar
Cox DR (1970) Analysis of binary data, 1st edn. Chapman and Hall, London
Diggle PJ, Heagerty PJ, Liang K-Y, Zeger S (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
Google Scholar
Ferro CAT, Stephenson DB (2012) Deterministic forecasts of extreme events and warnings. In: Jolliffe IB, Stephenson DB (eds) Forecast verification: a practitioner’s guide in atmospheric science, 2nd edn. Wiley, Chichester
Google Scholar
Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38
Article MATH MathSciNet Google Scholar
Fosen J, Borgan Ø, Weedon-Fekær H, Aalen OO (2006) Dynamic analysis of recurrent event data using the additive hazard model. Biometr J 48:381–398
Article Google Scholar
Haldane JBS (1956) The estimation and significance of the logarithm of a ratio of frequencies. Ann Human Genet 20:309–311
Article MATH Google Scholar
Heinz G, Puhr R (2010) Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Statist Med 29:770–777
Article Google Scholar
Heinze G, Schemper M (2002) A solution to the problem of separation in logistic regression. Statist Med 21:2409–2419
Article Google Scholar
Heinze G (2006) A comparative investigation of methods for logistic regression with separated or nearly separated data. Statist Med 25:4216–4226
Article MathSciNet Google Scholar
Henderson R, Diggle PJ, Dobson A (2002) Identification and efficacy of longitudinal markers for survival. Biostatistics 3:33–50
Article MATH Google Scholar
Henderson R, Keiding N (2005a) Individual survival time prediction using statistical models. (Forudsigelse af individuelle levetider ved hjaelp af statistuiske modeller). Danish Med J 167/10:1174–1177
Google Scholar
Henderson R, Keiding N (2005b) Individual survival time prediction using statistical models. J Med Ethics 31:703–706
Article Google Scholar
Jachan M, Feldwisch H, Posdziech F, Brandt A, Altenmüller D-M, Schulze-Bonhage A, Timmer J, Schelter B (2009) Probabilistic forecasts of epileptic seizures and evaluation by the Brier score. Fourth Eur Conf Int Federation Medi Biol Eng Proc 22:1701–1705
Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York
MATH Google Scholar
Mehta CR, Patel NR (1995) Exact logistic regression: theory and examples. Statist Med 14:2143–2160
Article Google Scholar
Proust-Lima C, Taylor JMG (2009) Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics 10:535–549
Article Google Scholar
van Houwelingen H, Putter H (2011) Dynamic prediction in clinical survival analysis. Chapman and Hall/CRC Press, London
Google Scholar

Download references

Acknowledgments

The research of Rosemeire Fiaccone was supported in part by National Council of Technological and Scientific Development—CNPq, Brazil (Num. 237094/2012-6, 480614/2011-3). Robin Henderson benefited from participation in Deutsche Forschungsgemeinschaft research programme FR 3070/1-1. We are grateful for the comments of the reviewers.

Author information

Authors and Affiliations

Department of Statistics, Tripoli University, Tripoli, Libya
Entisar Elgmati
Department of Statistics, Universidade Federal da Bahia, Salvador, Brazil
Rosemeire L. Fiaccone
School of Mathematics and Statistics, Newcastle University, Newcastle, UK
R. Henderson & John N. S. Matthews

Authors

Entisar Elgmati
View author publications
You can also search for this author in PubMed Google Scholar
Rosemeire L. Fiaccone
View author publications
You can also search for this author in PubMed Google Scholar
R. Henderson
View author publications
You can also search for this author in PubMed Google Scholar
John N. S. Matthews
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Henderson.

Appendix: logistic regression with separation not detected in R

In the following, y is a vector of length 100, with all elements zero except the first, which is one, and x1 is a vector of 50 zeros followed by 50 ones, representing two equally sized groups. If we attempt to fit the logistic regression

$$\begin{aligned} \pi _x = P\big (Y=1 \big | x\big ) = \mathrm{expit} \big (\beta _0+\beta _1x\big ) \end{aligned}$$

then clearly a perfect fit is obtained at $\hat{\beta }_0=\mathrm{logit}(1/50)=-3.892$ and $\hat{\beta }_1=-\infty $. Some R (version 3.1.2) output, edited to remove unnecessary material (marked by [...], is:

Of most concern is the statement of convergence, which is true because the maximised likelihood has indeed converged: moving either of the coefficients away from their current values leads to no improvement. The fitted probabilities $\hat{\pi }_0$ and $\hat{\pi }_1$ are accurate but clearly $\hat{\beta }_1$ is unrealistic. Uncritical assessment of the results might lead to this problem being missed.

If we use the Firth correction as implemented in Kosmidis’ bias reduction package brglm, we obtain:

Hence the coefficients are stabilised, at the expense of higher values of $\hat{\pi }_0$ and $\hat{\pi }_1$ as expected. Heinze’ package logistf gives the same results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elgmati, E., Fiaccone, R.L., Henderson, R. et al. Penalised logistic regression and dynamic prediction for discrete-time recurrent event data. Lifetime Data Anal 21, 542–560 (2015). https://doi.org/10.1007/s10985-015-9321-4

Download citation

Received: 18 August 2014
Accepted: 07 January 2015
Published: 28 January 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10985-015-9321-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalised logistic regression and dynamic prediction for discrete-time recurrent event data

Abstract

Access this article

Similar content being viewed by others

Assessing model prediction performance for the expected cumulative number of recurrent events

Joint analysis of recurrent event data with a dependent terminal event

A new joint model of recurrent event data with the additive hazards model for the terminal event time

References

Acknowledgments