Using decision curve analysis to benchmark performance of a magnetic resonance imaging–based deep learning model for prostate cancer risk assessment



To benchmark the performance of a calibrated 3D convolutional neural network (CNN) applied to multiparametric MRI (mpMRI) for risk assessment of clinically significant prostate cancer (csPCa) using decision curve analysis (DCA).


We retrospectively analyzed 499 patients who had positive mpMRI (PI-RADSv2 ≥ 3) and MRI-targeted biopsy. The training cohort comprised 449 men, including a calibration set of 50 men. Biopsy decision strategies included using risk estimates from the CNN (original and calibrated), to perform biopsy in men with PI-RADSv2 ≥ 4 only, or additionally in men with PI-RADSv2 3 and PSA density (PSAd) ≥ 0.15 ng/ml/ml. Discrimination, calibration and clinical usefulness in the unseen test cohort (n = 50) were assessed using C-statistic, calibration plots and DCA, respectively.


The calibrated CNN achieved moderate calibration (Hosmer-Lemeshow calibration test, p = 0.41) and good discrimination (C = 0.85). DCA revealed consistently higher net benefit and net reduction in biopsies for the calibrated CNN compared with the original CNN, PI-RADSv2 ≥ 4 and the combined strategy of PI-RADSv2 and PSAd. Original CNN predictions were severely miscalibrated (p < 0.0001) resulting in net harm compared with a ‘biopsy all’ patients strategy. At-risk thresholds ≥ 10% using the calibrated CNN and the combined strategy reduced the number of biopsies by an estimated 201 and 55 men, respectively, per 1000 men at risk, without missing csPCa, while original CNN and PI-RADSv2 ≥ 4 could not achieve a net reduction in biopsies.


DCA revealed that our calibrated 3D-CNN resulted in fewer unnecessary biopsies compared with using PI-RADSv2 alone or in combination with PSAd. CNN calibration is important in achieving clinical utility.

Key Points

• A 3D deep learning model applied to multiparametric MRI may help to prevent unnecessary prostate biopsies in patients eligible for MRI-targeted biopsy.

• Owing to miscalibration, original risk estimates by the deep learning model require prior calibration to enable clinical utility.

• Decision curve analysis confirmed a net benefit of using our calibrated deep learning model for biopsy decisions compared with alternative strategies, including PI-RADSv2 alone and in combination with prostate-specific antigen density.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5



Apparent diffusion coefficient


Artificial Intelligence


Area under the receiver-operating characteristic curve


Confidence interval


Convolutional neural network


Clinically significant


Decision curve analysis


Deep learning




Magnetic resonance imaging


Prostate cancer


Prostate Imaging Reporting and Data System version 2.0


Prostate-specific antigen density


T2 weighted


Transrectal ultrasound


  1. 1.

    Drost F-JHJH, Osses DF, Nieboer D et al (2019) Prostate MRI, with or without MRI-targeted biopsy, and systematic biopsy for detecting prostate cancer. Cochrane Database Syst Rev 2019:CD012663.

    Article  Google Scholar 

  2. 2.

    Ahdoot M, Wilbur AR, Reese SE et al (2020) MRI-targeted, systematic, and combined biopsy for prostate cancer diagnosis. N Engl J Med 382:917–928.

    Article  Google Scholar 

  3. 3.

    Weinreb JC, Barentsz JO, Choyke PL et al (2016) PI-RADS prostate imaging – reporting and data system: 2015, version 2. Eur Urol 69:16–40

    Article  Google Scholar 

  4. 4.

    Smith CP, Harmon SA, Barrett T et al (2019) Intra- and interreader reproducibility of PI-RADSv2: a multireader study. J Magn Reson Imaging 49:1694–1703.

    Article  Google Scholar 

  5. 5.

    Greer MD, Shih JH, Lay N et al (2019) Interreader variability of prostate imaging reporting and data system version 2 in detecting and assessing prostate cancer lesions at prostate MRI. AJR Am J Roentgenol 212:1197–1205.

    Article  Google Scholar 

  6. 6.

    Song Y, Zhang YD, Yan X et al (2018) Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI. J Magn Reson Imaging 48:1570–1577.

    Article  Google Scholar 

  7. 7.

    Aldoj N, Lukas S, Dewey M, Penzkofer T (2019) Semi-automatic classification of prostate cancer on multi-parametric MR imaging using a multi-channel 3D convolutional neural network. Eur Radiol.

  8. 8.

    Schelb P, Kohl S, Radtke JP et al (2019) Classification of cancer at prostate MRI: deep learning versus clinical PI-RADS assessment. Radiology 293:607–617.

    Article  Google Scholar 

  9. 9.

    Ishioka J, Matsuoka Y, Uehara S et al (2018) Computer-aided diagnosis of prostate cancer on magnetic resonance imaging using a convolutional neural network algorithm. BJU Int 122:411–417.

    Article  Google Scholar 

  10. 10.

    Yang X, Liu C, Wang Z et al (2017) Co-trained convolutional neural networks for automated detection of prostate cancer in multi-parametric MRI. Med Image Anal 42:212–227.

    Article  Google Scholar 

  11. 11.

    Alkadi R, Taher F, El-baz A, Werghi N (2019) A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J Digit Imaging 32:793–807.

    Article  Google Scholar 

  12. 12.

    Yoo S, Gujrathi I, Haider MA, Khalvati F (2019) Prostate cancer detection using deep convolutional neural networks. Sci Rep 9:19518.

    CAS  Article  Google Scholar 

  13. 13.

    Clark T, Zhang J, Baig S, Wong A, Haider MA, Khalvati F (2017) Fully automated segmentation of prostate whole gland and transition zone in diffusion-weighted MRI using convolutional neural networks. J Med Imaging (Bellingham) 4:1.

  14. 14.

    Goldenberg SL, Nir G, Salcudean SE (2019) A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol 16:391–403

    Article  Google Scholar 

  15. 15.

    Khalvati F, Zhang J, Chung AG et al (2018) MPCaD: a multi-scale radiomics-driven framework for automated prostate cancer localization and detection. BMC Med Imaging.

  16. 16.

    Lay N, Tsehay Y, Greer MD et al (2017) Detection of prostate cancer in multiparametric MRI using random forest with instance weighting. J Med Imaging (Bellingham) 4:024506.

    Article  Google Scholar 

  17. 17.

    Thompson IM, Ankerst DP, Chi C et al (2006) Assessing prostate cancer risk: results from the prostate cancer prevention trial. J Natl Cancer Inst 98:529–534.

    Article  Google Scholar 

  18. 18.

    Roobol MJ, van Vugt HA, Loeb S et al (2012) Prediction of prostate cancer risk: the role of prostate volume and digital rectal examination in the ERSPC risk calculators. Eur Urol 61:577–583.

    Article  Google Scholar 

  19. 19.

    Mottet N, Cornford P, van den Bergh RCN et al (2019) EAU - EANM - ESTRO - ESUR - SIOG guidelines on prostate cancer 2019. Eur Assoc Urol Guidel 53:1–161

    Google Scholar 

  20. 20.

    Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21:128–138

    Article  Google Scholar 

  21. 21.

    Vickers AJ, Elkin EB (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26:565–574.

    Article  Google Scholar 

  22. 22.

    Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 162:55–63.

    Article  Google Scholar 

  23. 23.

    Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proc 34th Int Conf Mach Learn 70:1321–1330

  24. 24.

    Van Calster B, Vickers AJ (2015) Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Making 35:162–169.

    Article  Google Scholar 

  25. 25.

    Fitzgerald M, Saville BR, Lewis RJ (2015) Decision curve analysis. JAMA 313:409–410

    CAS  Article  Google Scholar 

  26. 26.

    Balachandran VP, Gonen M, Smith JJ, DeMatteo RP (2015) Nomograms in oncology: more than meets the eye. Lancet Oncol 16:e173–e180

    Article  Google Scholar 

  27. 27.

    Kerr KF, Brown MD, Zhu K, Janes H (2016) Assessing the clinical impact of risk prediction models with decision curves: guidance for correct interpretation and appropriate use. J Clin Oncol 34:2534–2540.

    Article  Google Scholar 

  28. 28.

    Vickers AJ, Van Calster B, Steyerberg EW (2016) Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 352.

  29. 29.

    Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  30. 30.

    Nagendran M, Chen Y, Lovejoy CA et al (2020) Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies in medical imaging. BMJ 368:m689.

    Article  Google Scholar 

  31. 31.

    Moore CM, Kasivisvanathan V, Eggener S et al (2013) Standards of reporting for MRI-targeted biopsy studies (START) of the prostate: recommendations from an international working group. Eur Urol 64:544–552.

    Article  Google Scholar 

  32. 32.

    Epstein JI, Egevad L, Amin MB et al (2016) The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma definition of grading patterns and proposal for a new grading system. Am J Surg Pathol 40:244–252.

    Article  Google Scholar 

  33. 33.

    Lehmann TM, Gönner C, Spitzer K (2001) Addendum: B-spline interpolation in medical image processing. IEEE Trans Med Imaging 20:660–665.

    CAS  Article  Google Scholar 

  34. 34.

    Kull M, Silva Filho TM, Flach P (2017) Beyond Sigmoids: how to obtain well-calibrated probabilities from binary classifiers with beta calibration. Electron J Stat 11:5052–5080.

    Article  Google Scholar 

  35. 35.

    van der Ploeg T, Nieboer D, Steyerberg EW (2016) Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury. J Clin Epidemiol 78:83–89.

    Article  Google Scholar 

  36. 36.

    Schoots IG, Osses DF, Drost F-JH et al (2018) Reduction of MRI-targeted biopsies in men with low-risk prostate cancer on active surveillance by stratifying to PI-RADS and PSA-density, with different thresholds for significant disease. Transl Androl Urol 7:132–144.

    Article  Google Scholar 

  37. 37.

    Hansen NL, Kesch C, Barrett T et al (2017) Multicentre evaluation of targeted and systematic biopsies using magnetic resonance and ultrasound image-fusion guided transperineal prostate biopsy in patients with a previous negative biopsy. BJU Int 120:631–638.

    CAS  Article  Google Scholar 

  38. 38.

    Venderink W, van Luijtelaar A, Bomers JGR et al (2018) Results of targeted biopsy in men with magnetic resonance imaging lesions classified equivocal, likely or highly likely to be clinically significant prostate cancer. Eur Urol 73:353–360.

    Article  Google Scholar 

  39. 39.

    Van Calster B, Wynants L, Verbeek JFMM et al (2018) Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol 74:796–804.

    Article  Google Scholar 

  40. 40.

    Capogrosso P, Vickers AJ (2019) A systematic review of the literature demonstrates some errors in the use of decision curve analysis but generally correct interpretation of findings. Med Decis Making 39:493–498.

    Article  Google Scholar 

  41. 41.

    Vickers AJ, van Calster B, Steyerberg EW (2019) A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res 3:18.

    Article  Google Scholar 

  42. 42.

    Bossuyt PM, Reitsma JB, Bruns DE et al (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Radiology 226:24–28.

    Article  Google Scholar 

  43. 43.

    Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates.

Download references


Guarantors of the integrity of the entire study, D.D., F.K. and M.A.H.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of the final version of the submitted manuscript, all authors; literature research, D.D., N.A., F.K. and M.A.H.; clinical studies, D.D., L.M. and M.A.H; statistical analysis, D.D. and X.D. and manuscript editing, D.D., F.K. and M.A.H.


This study has received funding by the Ontario Institute for Cancer Research and the Deutsche Forschungsgemeinschaft (DFG; German Research Foundation) fellowship [DE 3207/1-1].

Author information



Corresponding author

Correspondence to Masoom A. Haider.

Ethics declarations


The scientific guarantor of this publication is Masoom A. Haider.

Conflict of interest

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors, Xin Dong, has a Master of Science degree in Mathematics with significant statistical expertise.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained.

Study subjects or cohorts overlap

Some study subjects or cohorts have been previously reported in Yoo S, Gujrathi I, Haider MA, Khalvati F (2019) Prostate cancer detection using deep convolutional neural networks. Sci Rep 9:19518.


• retrospective

• diagnostic or prognostic study

• performed at one institution

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material


(PDF 501 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Deniffel, D., Abraham, N., Namdar, K. et al. Using decision curve analysis to benchmark performance of a magnetic resonance imaging–based deep learning model for prostate cancer risk assessment. Eur Radiol 30, 6867–6876 (2020).

Download citation


  • Artificial intelligence
  • Deep Learning
  • Magnetic resonance imaging
  • Prostatic neoplasms
  • Decision analysis