Skip to main content

Advertisement

Log in

The Wally plot approach to assess the calibration of clinical prediction models

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

A prediction model is calibrated if, roughly, for any percentage x we can expect that x subjects out of 100 experience the event among all subjects that have a predicted risk of x%. Typically, the calibration assumption is assessed graphically but in practice it is often challenging to judge whether a “disappointing” calibration plot is the consequence of a departure from the calibration assumption, or alternatively just “bad luck” due to sampling variability. We propose a graphical approach which enables the visualization of how much a calibration plot agrees with the calibration assumption to address this issue. The approach is mainly based on the idea of generating new plots which mimic the available data under the calibration assumption. The method handles the common non-trivial situations in which the data contain censored observations and occurrences of competing events. This is done by building on ideas from constrained non-parametric maximum likelihood estimation methods. Two examples from large cohort data illustrate our proposal. The ‘wally’ R package is provided to make the methodology easily usable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aalen OO, Johansen S (1978) An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 5:141–150

    MathSciNet  MATH  Google Scholar 

  • Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York

    Book  MATH  Google Scholar 

  • Austin PC, Steyerberg EW (2014) Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med 33(3):517–535

    Article  MathSciNet  Google Scholar 

  • Barber S, Jennison C (1999) Symmetric tests and confidence intervals for survival probabilities and quantiles of censored survival data. Biometrics 55(2):430–436

    Article  MathSciNet  MATH  Google Scholar 

  • Beyersmann J, Allignol A, Schumacher M (2011) Competing risks and multistate models with R. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Blanche P (2017) Confidence intervals for the cumulative incidence function via constrained NPMLE. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/index.php5/Research_reports

  • Blanche P, Proust-Lima C, Loubère L, Berr C, Dartigues J-F, Jacqmin-Gadda H (2015) Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics 71(1):102–113

    Article  MathSciNet  MATH  Google Scholar 

  • Bröcker J, Smith LA (2007) Increasing the reliability of reliability diagrams. Weather Forecast 22(3):651–661

    Article  Google Scholar 

  • Buja A, Cook D, Hofmann H, Lawrence M, Lee E-K, Swayne DF, Wickham H (2009) Statistical inference for exploratory data analysis and model diagnostics. Philos Trans R Soc Lond A Math Phys Eng Sci 367(1906):4361–4383

    Article  MathSciNet  MATH  Google Scholar 

  • Camm A et al (2010) Guidelines for the management of atrial fibrillation: the task force for the management of atrial fibrillation of the european society of cardiology (esc). Eur Heart J 31:2369–2429

    Article  Google Scholar 

  • Crowson CS, Atkinson EJ, Therneau TM (2016) Assessing calibration of prognostic risk scores. Stat Methods Med Res 25:1692–1706

    Article  MathSciNet  Google Scholar 

  • Demler OV, Paynter NP, Cook NR (2015) Tests of calibration and goodness-of-fit in the survival setting. Stat Med 34(10):1659–1680

    Article  MathSciNet  Google Scholar 

  • Efron B (1981) Censored data and the bootstrap. J Am Stat Assoc 76(374):312–319

    Article  MathSciNet  MATH  Google Scholar 

  • Ekstrøm CT (2013) Teaching ’instant experience’ with graphical model validation techniques. Teach Stat 36(1):23–26

    Article  Google Scholar 

  • Fournier M-C, Foucher Y, Blanche P, Buron F, Giral M, Dantan E (2016) A joint model for longitudinal and time-to-event data to better assess the specific role of donor and recipient factors on long-term kidney transplantation outcomes. Eur J Epidemiol 31(5):469–479

    Article  Google Scholar 

  • Freedman AN, Seminara D, Gail MH, Hartge P, Colditz GA, Ballard-Barbash R, Pfeiffer RM (2005) Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 97(10):715–723

    Article  Google Scholar 

  • Gail MH, Pfeiffer RM (2005) On criteria for evaluating models of absolute risk. Biostatistics 6(2):227–239

    Article  MATH  Google Scholar 

  • Gerds TA, Cai T, Schumacher M (2008) The performance of risk prediction models. Biometr J 50(4):457–479

    Article  MathSciNet  Google Scholar 

  • Gerds TA, Andersen PK, Kattan MW (2014) Calibration plots for risk prediction models in the presence of competing risks. Stat Med 33(18):3191–3203

    Article  MathSciNet  Google Scholar 

  • Geskus RB (2015) Data analysis with competing risks and intermediate states, vol 82. CRC Press, Boca Raton

    Book  Google Scholar 

  • Handford M (2007) Where is Wally?. Walker Books Ltd, London

    Google Scholar 

  • Kaplan E, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481

    Article  MathSciNet  MATH  Google Scholar 

  • Lemeshow S, Hosmer DW (1982) A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115(1):92–106

    Article  Google Scholar 

  • Li G, Sun Y (2000) A simulation-based goodness-of-fit test for survival data. Stat Probab Lett 47(4):403–410

    Article  MathSciNet  MATH  Google Scholar 

  • Lin DY, Wei L-J, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80(3):557–572

    Article  MathSciNet  MATH  Google Scholar 

  • Loy A, Follett L, Hofmann H (2016) Variations of Q–Q plots: the power of our eyes!. Am Stat 70(2):202–214

    Article  MathSciNet  Google Scholar 

  • Majumder M, Hofmann H, Cook D (2013) Validation of visual statistical inference, applied to linear models. J Am Stat Assoc 108(503):942–956

    Article  MathSciNet  MATH  Google Scholar 

  • Martinussen T, Scheike T (2006) Dynamic regression models for survival data. Springer, Berlin

    MATH  Google Scholar 

  • Pepe M, Janes H (2013) Methods for evaluating prediction performance of biomarkers and tests. In: Lee M-L, Gail G, Cai T, Pfeiffer R, Gandy A (eds) Risk assessment and evaluation of predictions. Springer, Berlin

    Google Scholar 

  • Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, Zheng Y (2008) Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol 167(3):362–368

    Article  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Robins J, Ritov Y et al (1997) Toward a curse of dimentionality appropriate asymptotic theory for semi-parametric models. Stat Med 16(3):285–319

    Article  Google Scholar 

  • Steyerberg E (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer, Berlin

    Book  MATH  Google Scholar 

  • Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1):128

    Article  Google Scholar 

  • Thomas DR, Grunkemeier GL (1975) Confidence interval estimation of survival probabilities for censored data. J Am Stat Assoc 70(352):865–871

    Article  MathSciNet  MATH  Google Scholar 

  • Tukey J (1972) Some graphic and semigraphic displays. In: Bancroft T (ed) Statistical papers in honor of George W. Snedecor. Iowa State University, Ames, Iowa, p 293–316

  • Viallon V, Benichou J, Clavel-Chapelon F, Ragusa S (2009) How to evaluate the calibration of a disease risk prediction tool. Stat Med 28:901–916

    Article  MathSciNet  Google Scholar 

  • Vickers A, Cronin A (2010) Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology 76(6):1298–1301

    Article  Google Scholar 

Download references

Acknowledgements

PB is grateful to the Bettencourt Schueller foundation for its support. We thank the DIVAT consortium and the Three-City study group for providing the data of the DIVAT and of the Three-City cohorts. Their supports are listed at www.divat.fr and www.three-city-study.com.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Blanche.

Appendices

A Constrained Kaplan–Meier estimates

Let us assume that we observe the data \(\big \{\big (\widetilde{T}_i,\varDelta _i\big ),i=1,\ldots ,n\big \}\), where \(\widetilde{T}_i=\min (T_i,C_i)\) and \(\varDelta _i={\mathbb {1}}\{ T_i \le C_i \}\). The constrained Kaplan–Meier estimate \(\widehat{S}(u)\) of \(S(u)=\mathbb {P}(T>u)\), with constraint \(\widehat{S}(t)=p\), for a given value \(p \in ]0,1[\), is defined as

$$\begin{aligned} \widehat{S}(u)&= \prod _{i \, : \, \widetilde{T}_i \le u} \left( 1 - \frac{d_i}{n_i + \lambda }\right) \quad \text{ for } \text{ all } \quad u \le t, \\ \text{ and } \quad \widehat{S}(u)&= \left\{ \prod _{i: \, \widetilde{T}_i \le t} \left( 1 - \frac{d_i}{n_i + \lambda }\right) \right\} \left\{ \prod _{i \, : \, t< \widetilde{T}_i \le u } \left( 1 - \frac{d_i}{n_i }\right) \right\} \quad \text{ for } \text{ all } \quad t < u, \end{aligned}$$

where \(d_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j = \widetilde{T}_i \}\varDelta _j\) is the number of observed events at time \(\widetilde{T}_i\), where \(n_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j \ge \widetilde{T}_i \}\) is number of subjects at risk at time \(\widetilde{T}_i\) and with \(\lambda \in \mathbb {R}\) such that \(\widehat{S}(t)=p\) (Thomas and Grunkemeier 1975). The usual Kaplan–Meier estimator corresponds to the above formulas in the special case where \(\lambda =0\).

B Constrained Aalen–Johansen estimates

Let us assume that we observe the data \(\big \{\big (\widetilde{T}_i,\widetilde{\eta }_i\big ),i=1,\ldots ,n\big \}\) where \(\widetilde{\eta }_i=\varDelta _i\eta _i\). For for sake of clarity, we further assume that there is no ties in the sample \(\big \{\widetilde{T}_i, i=1,\ldots ,n\big \}\). Therefore, without loss of generality, for the formulas below we assume to observe \(0< \widetilde{T}_1< \cdots < \widetilde{T}_n\). In particular, this implies \(n_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j \ge \widetilde{T}_i \}=n-(i-1)\) for all \(i=1,\ldots ,n\).

The constrained Aalen–Johansen estimates \(\widehat{F}_{k}^{(1)}(u)\) of the cumulative incidence functions of event \(k=1,2\) at time u, that is \(F_{k}(u)=\mathbb {P}(T \le u, \eta =k)\), with constraint \(\widehat{F}_{1}^{(1)}(t)=p\), for a given \(p \in ]0,1[\), is defined as

$$\begin{aligned} \widehat{F}_{k}^{(1)}(u)&= \sum _{i \, : \, \widetilde{T}_i \le u}\left\{ \prod _{j=1}^{i-1} \left( 1 - \widehat{a}_{1,j}^{(1)} - \widehat{a}_{2,j}^{(1)} \right) \widehat{a}_{k,i}^{(1)} \right\} \end{aligned}$$

where

$$\begin{aligned} \widehat{a}_{1,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=1 \} }{ n - (i-1) - \lambda \left\{ 1 - \widehat{F}_{2}^{(1)}(\widetilde{T}_{i-1}) - p \right\} } \quad \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} \le t, \\ \text{ and } \qquad \widehat{a}_{1,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=1 \} }{ n - (i-1) } \qquad \qquad \qquad \qquad \qquad \qquad \ \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} > t, \nonumber \end{aligned}$$

and

$$\begin{aligned} \widehat{a}_{2,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=2 \} }{ n - (i-1) - \lambda \left\{ \widehat{F}_{1}^{(1)}(\widetilde{T}_{i-1}) - p \right\} } \quad \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} \le t, \\ \text{ and } \qquad \widehat{a}_{2,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=2 \} }{ n - (i-1) } \qquad \qquad \qquad \qquad \qquad \ \ \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} > t. \nonumber \end{aligned}$$

In the above equations, we define \(\widetilde{T}_{0}=0\), \(\widehat{F}_{1}^{(1)}(0)=\widehat{F}_{2}^{(1)}(0)=0\) and \(\lambda \in \mathbb {R}\) such that \(\widehat{F}_{1}^{(1)}(t)=p\). The superscript \(^{(1)}\) refers to the fact that the constraint relates to the cumulative incidence function of event 1. Following similar ideas to those used to derive the formulas of “Appendix A” by Thomas and Grunkemeier (1975), these formulas were derived from maximizing the following non-parametric likelihood \(L= \prod _{i=1}^n a_{1,i}^{{\mathbb {1}}\{ \widetilde{\eta }_i=1 \}} a_{2,i}^{{\mathbb {1}}\{ \widetilde{\eta }_i=2 \}} \big ( 1 - a_{1,i} - a_{2,i} \big )^{n-i}\), under the constraint \(\widehat{F}_{1}^{(1)}(t)=p\), using the Lagrange multiplier technique. Extensions of the above formulas can also be derived to account for ties in \(\big \{\widetilde{T}_i, i=1,\ldots ,n\big \}\) (Blanche 2017). The usual Aalen–Johansen estimator corresponds to the above formulas in the special case where \(\lambda =0\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blanche, P., Gerds, T.A. & Ekstrøm, C.T. The Wally plot approach to assess the calibration of clinical prediction models. Lifetime Data Anal 25, 150–167 (2019). https://doi.org/10.1007/s10985-017-9414-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-017-9414-3

Keywords

Navigation