External Validation of a “Black-Box” Clinical Predictive Model in Nephrology: Can Interpretability Methods Help Illuminate Performance Differences?

  • Harry F. da CruzEmail author
  • Boris Pfahringer
  • Frederic Schneider
  • Alexander Meyer
  • Matthieu-P. Schapranow
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11526)


The number of machine learning clinical prediction models being published is rising, especially as new fields of application are being explored in medicine. Notwithstanding these advances, only few of such models are actually deployed in clinical contexts for a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients when applied to an external cohort of a German research hospital. To help account for the performance differences observed, we utilized interpretability methods which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. We argue that such methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.


Clinical predictive modeling Nephrology Validation Interpretability methods 



Parts of the given work were generously supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 780495.


  1. 1.
    Andrew, A.: Git-crypt (2013).
  2. 2.
    Che, Z., Purushotham, S., Khemani, R., Liu, Y.: Interpretable deep models for ICU outcome prediction. In: AMIA Symposium 2016, pp. 371–380 (2016)Google Scholar
  3. 3.
    Freitas da Cruz, H., Schneider, F., Schapranow, M.P.: Prediction of acute kidney injury in cardiac surgery patients. In: Proceedings of the 12th International Conference on Biomedical Engineering Systems and Technologies, vol. 5, pp. 380–387 (2019)Google Scholar
  4. 4.
    Doshi-Velez, F., Kim, B.: Towards A Rigorous Science of Interpretable Machine Learning. arXiv e-prints arXiv:1702.08608, February 2017
  5. 5.
    Eyck, J.V., et al.: Data mining techniques for predicting acute kidney injury after elective cardiac surgery. Crit. Care 16(Suppl 1), P344 (2012)CrossRefGoogle Scholar
  6. 6.
    Flechet, M., et al.: AKIpredictor, an on-line prognostic calculator for acute kidney injury in adult critically ill patients. Intensive Care Med. 43(6), 764–773 (2017)CrossRefGoogle Scholar
  7. 7.
    Glas, A.S., et al.: The diagnostic odds ratio: a single indicator of test performance. J. Clin. Epidemiol. 56(11), 1129–1135 (2003)CrossRefGoogle Scholar
  8. 8.
    Guidotti, R. et al.: A Survey of Methods for Explaining Black Box Models. arXiv e-prints arXiv:1802.01933, February 2018
  9. 9.
    Hall, P., Gill, N.: An Introduction to Machine Learning Interpretability. O’Reilly, Boca Raton (2018)Google Scholar
  10. 10.
    Johnson, A., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)CrossRefGoogle Scholar
  11. 11.
    Kate, R.J., et al.: Prediction and detection models for acute kidney injury in hospitalized older adults. BMC Med. Inform. Decis. Mak. 16(1), 39 (2016)CrossRefGoogle Scholar
  12. 12.
    Knöpfel, A., Gröne, B., Tabeling, P.: Fundamental Modeling Concepts: Effective Communication of IT Systems. Wiley, Hoboken (2005)Google Scholar
  13. 13.
    Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. 69(6), 16 (2004)MathSciNetGoogle Scholar
  14. 14.
    Lee, H.C., et al.: Derivation and validation of machine learning approaches to predict acute kidney injury after cardiac surgery. J. Clin. Med. 7(10), 322 (2018)CrossRefGoogle Scholar
  15. 15.
    Legrand, M., et al.: Incidence, risk factors and prediction of post-operative acute kidney injury following cardiac surgery for active inefective endocarditis: an observational study. Crit. Care 17(5), R220 (2013)CrossRefGoogle Scholar
  16. 16.
    Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Neural Information Processing Systems, pp. 1–9 (2013)Google Scholar
  17. 17.
    Moons, K.G.M., Altman, D.G., Vergouwe, Y., Royston, P.: Prognosis and prognostic research: application and impact of prognostic models in clinical practice. Brit. Med. J. 338, b606 (2009)CrossRefGoogle Scholar
  18. 18.
    Murdoch, W.J., et al.: Interpretable Machine Learning: Definitions, Methods, and Applications. arXiv e-prints arXiv:1901.04592, January 2019
  19. 19.
    O’Neal, J.B., et al.: Acute kidney injury following cardiac surgery: current understanding and future directions. Crit. Care 20(1), 187 (2016)CrossRefGoogle Scholar
  20. 20.
    Ribeiro, M., Singh, S., Guestrin, C.: "Why should i trust you?": explaining the predictions of any classifier. In: Proceedings of 22nd ACM SIGKDD, pp. 1135–1144, NY, USA (2016)Google Scholar
  21. 21.
    Rossum, G.V., Drake, F.L.: Python tutorial. History 42(4), 1–122 (2010)Google Scholar
  22. 22.
    Thakar, C.V., et al.: A clinical score to predict acute renal failure after cardiac surgery. J. Am. Soc. Nephrol. 14(8), 2176–7 (2004)Google Scholar
  23. 23.
    Thottakkara, P., et al.: Application of machine learning techniques to high-dimensional clinical data to forecast postoperative complications. PLoS ONE 11(5), 1–19 (2016)CrossRefGoogle Scholar
  24. 24.
    Toll, D., Janssen, K., Vergouwe, Y., Moons, K.: Validation, updating and impact of clinical prediction rules: a review. J. Clin. Epidemiol. 61, 1085–1094 (2008)CrossRefGoogle Scholar
  25. 25.
    Wyatt, J.C., Altman, D.G.: Commentary: prognostic models: clinically useful or quickly forgotten? Brit. Med. J. 311(7019), 1539–1541 (1995)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Harry F. da Cruz
    • 1
    Email author
  • Boris Pfahringer
    • 2
  • Frederic Schneider
    • 1
  • Alexander Meyer
    • 2
  • Matthieu-P. Schapranow
    • 1
  1. 1.Digital Health CenterHasso Plattner InstitutePotsdamGermany
  2. 2.Department of Cardiothoracic and Vascular SurgeryGerman Heart Center BerlinBerlinGermany

Personalised recommendations