The Wally plot approach to assess the calibration of clinical prediction models

Blanche, Paul; Gerds, Thomas A.; Ekstrøm, Claus T.

doi:10.1007/s10985-017-9414-3

The Wally plot approach to assess the calibration of clinical prediction models

Published: 06 December 2017

Volume 25, pages 150–167, (2019)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Paul Blanche¹,
Thomas A. Gerds² &
Claus T. Ekstrøm²

694 Accesses
1 Citation
Explore all metrics

Abstract

A prediction model is calibrated if, roughly, for any percentage x we can expect that x subjects out of 100 experience the event among all subjects that have a predicted risk of x%. Typically, the calibration assumption is assessed graphically but in practice it is often challenging to judge whether a “disappointing” calibration plot is the consequence of a departure from the calibration assumption, or alternatively just “bad luck” due to sampling variability. We propose a graphical approach which enables the visualization of how much a calibration plot agrees with the calibration assumption to address this issue. The approach is mainly based on the idea of generating new plots which mimic the available data under the calibration assumption. The method handles the common non-trivial situations in which the data contain censored observations and occurrences of competing events. This is done by building on ideas from constrained non-parametric maximum likelihood estimation methods. Two examples from large cohort data illustrate our proposal. The ‘wally’ R package is provided to make the methodology easily usable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A computational approach to compare regression modelling strategies in prediction research

Article Open access 25 August 2016

Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review

Article Open access 12 December 2022

The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models

Article Open access 04 May 2018

References

Aalen OO, Johansen S (1978) An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 5:141–150
MathSciNet MATH Google Scholar
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York
Book MATH Google Scholar
Austin PC, Steyerberg EW (2014) Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med 33(3):517–535
Article MathSciNet Google Scholar
Barber S, Jennison C (1999) Symmetric tests and confidence intervals for survival probabilities and quantiles of censored survival data. Biometrics 55(2):430–436
Article MathSciNet MATH Google Scholar
Beyersmann J, Allignol A, Schumacher M (2011) Competing risks and multistate models with R. Springer Science & Business Media, Berlin
MATH Google Scholar
Blanche P (2017) Confidence intervals for the cumulative incidence function via constrained NPMLE. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/index.php5/Research_reports
Blanche P, Proust-Lima C, Loubère L, Berr C, Dartigues J-F, Jacqmin-Gadda H (2015) Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics 71(1):102–113
Article MathSciNet MATH Google Scholar
Bröcker J, Smith LA (2007) Increasing the reliability of reliability diagrams. Weather Forecast 22(3):651–661
Article Google Scholar
Buja A, Cook D, Hofmann H, Lawrence M, Lee E-K, Swayne DF, Wickham H (2009) Statistical inference for exploratory data analysis and model diagnostics. Philos Trans R Soc Lond A Math Phys Eng Sci 367(1906):4361–4383
Article MathSciNet MATH Google Scholar
Camm A et al (2010) Guidelines for the management of atrial fibrillation: the task force for the management of atrial fibrillation of the european society of cardiology (esc). Eur Heart J 31:2369–2429
Article Google Scholar
Crowson CS, Atkinson EJ, Therneau TM (2016) Assessing calibration of prognostic risk scores. Stat Methods Med Res 25:1692–1706
Article MathSciNet Google Scholar
Demler OV, Paynter NP, Cook NR (2015) Tests of calibration and goodness-of-fit in the survival setting. Stat Med 34(10):1659–1680
Article MathSciNet Google Scholar
Efron B (1981) Censored data and the bootstrap. J Am Stat Assoc 76(374):312–319
Article MathSciNet MATH Google Scholar
Ekstrøm CT (2013) Teaching ’instant experience’ with graphical model validation techniques. Teach Stat 36(1):23–26
Article Google Scholar
Fournier M-C, Foucher Y, Blanche P, Buron F, Giral M, Dantan E (2016) A joint model for longitudinal and time-to-event data to better assess the specific role of donor and recipient factors on long-term kidney transplantation outcomes. Eur J Epidemiol 31(5):469–479
Article Google Scholar
Freedman AN, Seminara D, Gail MH, Hartge P, Colditz GA, Ballard-Barbash R, Pfeiffer RM (2005) Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 97(10):715–723
Article Google Scholar
Gail MH, Pfeiffer RM (2005) On criteria for evaluating models of absolute risk. Biostatistics 6(2):227–239
Article MATH Google Scholar
Gerds TA, Cai T, Schumacher M (2008) The performance of risk prediction models. Biometr J 50(4):457–479
Article MathSciNet Google Scholar
Gerds TA, Andersen PK, Kattan MW (2014) Calibration plots for risk prediction models in the presence of competing risks. Stat Med 33(18):3191–3203
Article MathSciNet Google Scholar
Geskus RB (2015) Data analysis with competing risks and intermediate states, vol 82. CRC Press, Boca Raton
Book Google Scholar
Handford M (2007) Where is Wally?. Walker Books Ltd, London
Google Scholar
Kaplan E, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Article MathSciNet MATH Google Scholar
Lemeshow S, Hosmer DW (1982) A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115(1):92–106
Article Google Scholar
Li G, Sun Y (2000) A simulation-based goodness-of-fit test for survival data. Stat Probab Lett 47(4):403–410
Article MathSciNet MATH Google Scholar
Lin DY, Wei L-J, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80(3):557–572
Article MathSciNet MATH Google Scholar
Loy A, Follett L, Hofmann H (2016) Variations of Q–Q plots: the power of our eyes!. Am Stat 70(2):202–214
Article MathSciNet Google Scholar
Majumder M, Hofmann H, Cook D (2013) Validation of visual statistical inference, applied to linear models. J Am Stat Assoc 108(503):942–956
Article MathSciNet MATH Google Scholar
Martinussen T, Scheike T (2006) Dynamic regression models for survival data. Springer, Berlin
MATH Google Scholar
Pepe M, Janes H (2013) Methods for evaluating prediction performance of biomarkers and tests. In: Lee M-L, Gail G, Cai T, Pfeiffer R, Gandy A (eds) Risk assessment and evaluation of predictions. Springer, Berlin
Google Scholar
Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, Zheng Y (2008) Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol 167(3):362–368
Article Google Scholar
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Robins J, Ritov Y et al (1997) Toward a curse of dimentionality appropriate asymptotic theory for semi-parametric models. Stat Med 16(3):285–319
Article Google Scholar
Steyerberg E (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer, Berlin
Book MATH Google Scholar
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1):128
Article Google Scholar
Thomas DR, Grunkemeier GL (1975) Confidence interval estimation of survival probabilities for censored data. J Am Stat Assoc 70(352):865–871
Article MathSciNet MATH Google Scholar
Tukey J (1972) Some graphic and semigraphic displays. In: Bancroft T (ed) Statistical papers in honor of George W. Snedecor. Iowa State University, Ames, Iowa, p 293–316
Viallon V, Benichou J, Clavel-Chapelon F, Ragusa S (2009) How to evaluate the calibration of a disease risk prediction tool. Stat Med 28:901–916
Article MathSciNet Google Scholar
Vickers A, Cronin A (2010) Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology 76(6):1298–1301
Article Google Scholar

Download references

Acknowledgements

PB is grateful to the Bettencourt Schueller foundation for its support. We thank the DIVAT consortium and the Three-City study group for providing the data of the DIVAT and of the Three-City cohorts. Their supports are listed at www.divat.fr and www.three-city-study.com.

Author information

Authors and Affiliations

LMBA, University of South Brittany, Vannes, France
Paul Blanche
Department of biostatistics, University of Copenhagen, Copenhagen, Denmark
Thomas A. Gerds & Claus T. Ekstrøm

Authors

Paul Blanche
View author publications
You can also search for this author in PubMed Google Scholar
Thomas A. Gerds
View author publications
You can also search for this author in PubMed Google Scholar
Claus T. Ekstrøm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Blanche.

Appendices

A Constrained Kaplan–Meier estimates

Let us assume that we observe the data $\big \{\big (\widetilde{T}_i,\varDelta _i\big ),i=1,\ldots ,n\big \}$, where $\widetilde{T}_i=\min (T_i,C_i)$ and $\varDelta _i={\mathbb {1}}\{ T_i \le C_i \}$. The constrained Kaplan–Meier estimate $\widehat{S}(u)$ of $S(u)=\mathbb {P}(T>u)$, with constraint $\widehat{S}(t)=p$, for a given value $p \in ]0,1[$, is defined as

$$\begin{aligned} \widehat{S}(u)&= \prod _{i \, : \, \widetilde{T}_i \le u} \left( 1 - \frac{d_i}{n_i + \lambda }\right) \quad \text{ for } \text{ all } \quad u \le t, \\ \text{ and } \quad \widehat{S}(u)&= \left\{ \prod _{i: \, \widetilde{T}_i \le t} \left( 1 - \frac{d_i}{n_i + \lambda }\right) \right\} \left\{ \prod _{i \, : \, t< \widetilde{T}_i \le u } \left( 1 - \frac{d_i}{n_i }\right) \right\} \quad \text{ for } \text{ all } \quad t < u, \end{aligned}$$

where $d_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j = \widetilde{T}_i \}\varDelta _j$ is the number of observed events at time $\widetilde{T}_i$, where $n_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j \ge \widetilde{T}_i \}$ is number of subjects at risk at time $\widetilde{T}_i$ and with $\lambda \in \mathbb {R}$ such that $\widehat{S}(t)=p$ (Thomas and Grunkemeier 1975). The usual Kaplan–Meier estimator corresponds to the above formulas in the special case where $\lambda =0$.

B Constrained Aalen–Johansen estimates

Let us assume that we observe the data $\big \{\big (\widetilde{T}_i,\widetilde{\eta }_i\big ),i=1,\ldots ,n\big \}$ where $\widetilde{\eta }_i=\varDelta _i\eta _i$. For for sake of clarity, we further assume that there is no ties in the sample $\big \{\widetilde{T}_i, i=1,\ldots ,n\big \}$. Therefore, without loss of generality, for the formulas below we assume to observe $0< \widetilde{T}_1< \cdots < \widetilde{T}_n$. In particular, this implies $n_i=\sum _{j=1}^n {\mathbb {1}}\{ \widetilde{T}_j \ge \widetilde{T}_i \}=n-(i-1)$ for all $i=1,\ldots ,n$.

The constrained Aalen–Johansen estimates $\widehat{F}_{k}^{(1)}(u)$ of the cumulative incidence functions of event $k=1,2$ at time u, that is $F_{k}(u)=\mathbb {P}(T \le u, \eta =k)$, with constraint $\widehat{F}_{1}^{(1)}(t)=p$, for a given $p \in ]0,1[$, is defined as

$$\begin{aligned} \widehat{F}_{k}^{(1)}(u)&= \sum _{i \, : \, \widetilde{T}_i \le u}\left\{ \prod _{j=1}^{i-1} \left( 1 - \widehat{a}_{1,j}^{(1)} - \widehat{a}_{2,j}^{(1)} \right) \widehat{a}_{k,i}^{(1)} \right\} \end{aligned}$$

where

$$\begin{aligned} \widehat{a}_{1,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=1 \} }{ n - (i-1) - \lambda \left\{ 1 - \widehat{F}_{2}^{(1)}(\widetilde{T}_{i-1}) - p \right\} } \quad \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} \le t, \\ \text{ and } \qquad \widehat{a}_{1,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=1 \} }{ n - (i-1) } \qquad \qquad \qquad \qquad \qquad \qquad \ \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} > t, \nonumber \end{aligned}$$

and

$$\begin{aligned} \widehat{a}_{2,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=2 \} }{ n - (i-1) - \lambda \left\{ \widehat{F}_{1}^{(1)}(\widetilde{T}_{i-1}) - p \right\} } \quad \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} \le t, \\ \text{ and } \qquad \widehat{a}_{2,i}^{(1)}&= \frac{ {\mathbb {1}}\{ \widetilde{\eta }_i=2 \} }{ n - (i-1) } \qquad \qquad \qquad \qquad \qquad \ \ \text{ if } \quad i \quad \text{ is } \text{ such } \text{ that } \quad \widetilde{T}_{i} > t. \nonumber \end{aligned}$$

In the above equations, we define $\widetilde{T}_{0}=0$, $\widehat{F}_{1}^{(1)}(0)=\widehat{F}_{2}^{(1)}(0)=0$ and $\lambda \in \mathbb {R}$ such that $\widehat{F}_{1}^{(1)}(t)=p$. The superscript $^{(1)}$ refers to the fact that the constraint relates to the cumulative incidence function of event 1. Following similar ideas to those used to derive the formulas of “Appendix A” by Thomas and Grunkemeier (1975), these formulas were derived from maximizing the following non-parametric likelihood $L= \prod _{i=1}^n a_{1,i}^{{\mathbb {1}}\{ \widetilde{\eta }_i=1 \}} a_{2,i}^{{\mathbb {1}}\{ \widetilde{\eta }_i=2 \}} \big ( 1 - a_{1,i} - a_{2,i} \big )^{n-i}$, under the constraint $\widehat{F}_{1}^{(1)}(t)=p$, using the Lagrange multiplier technique. Extensions of the above formulas can also be derived to account for ties in $\big \{\widetilde{T}_i, i=1,\ldots ,n\big \}$ (Blanche 2017). The usual Aalen–Johansen estimator corresponds to the above formulas in the special case where $\lambda =0$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blanche, P., Gerds, T.A. & Ekstrøm, C.T. The Wally plot approach to assess the calibration of clinical prediction models. Lifetime Data Anal 25, 150–167 (2019). https://doi.org/10.1007/s10985-017-9414-3

Download citation

Received: 02 June 2017
Accepted: 29 November 2017
Published: 06 December 2017
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s10985-017-9414-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Wally plot approach to assess the calibration of clinical prediction models

Abstract

Access this article

Similar content being viewed by others

A computational approach to compare regression modelling strategies in prediction research

Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review

The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A Constrained Kaplan–Meier estimates

B Constrained Aalen–Johansen estimates

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Wally plot approach to assess the calibration of clinical prediction models

Abstract

Access this article

Similar content being viewed by others

A computational approach to compare regression modelling strategies in prediction research

Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review

The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A Constrained Kaplan–Meier estimates

B Constrained Aalen–Johansen estimates

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation