Skip to main content

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12903)


Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at


  • Failure analysis
  • Model repairment
  • Deep learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Panfilov, E., Tiulpin, A., Klein, S., Nieminen, M.T., Saarakkala, S.: Improving robustness of deep learning based knee MRI segmentation: Mixup and adversarial domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  2. Bdair, T., Navab, N., Albarqouni, S.: Roam: Random layer mixup for semi-supervised learning in medical imaging. arXiv preprint arXiv:2003.09439 (2020)

  3. Billot, B., Greve, D., Van Leemput, K., Fischl, B., Iglesias, J.E., Dalca, A.V.: A learning strategy for contrast-agnostic mri segmentation. arXiv preprint arXiv:2003.01995 (2020)

  4. Liu, Q., Dou, Q., Yu, L., Heng, P.A.: Ms-net: multi-site network for improving prostate segmentation with heterogeneous MRI data. IEEE Trans. Med. Imaging 39(9), 2713–2724 (2020)

    CrossRef  Google Scholar 

  5. Dou, Q., de Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. Adv. Neural Inf. Process. Syst. 32, 6450–6461 (2019)

    Google Scholar 

  6. Collins, G.S., Moons, K.G.M.: Reporting of artificial intelligence prediction models. Lancet 393(10181), 1577–1579 (2019)

    CrossRef  Google Scholar 

  7. Liu, X., Rivera, S.C., Moher, D., Calvert, M.J., Denniston, A.K.: Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the consort-AI extension. BMJ, 370 (2020)

    Google Scholar 

  8. Oakden-Rayner, L., Dunnmon, J., Carneiro, G., Ré, C.: Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In: Proceedings of the ACM conference on health, inference, and learning, pp. 151–159 (2020)

    Google Scholar 

  9. Singla, S., Nushi, B., Shah, S., Kamar, E., Horvitz, E.: Understanding failures of deep networks via robust feature extraction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  10. McInnes, L., Healy, J., Saul, N., Grossberger, L.: UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3(29), 861 (2018)

    CrossRef  Google Scholar 

  11. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. 114(13), 3521–3526 (2017)

    CrossRef  MathSciNet  Google Scholar 

  12. Karani, N., Chaitanya, K., Baumgartner, C., Konukoglu, E.: A lifelong learning approach to brain MR segmentation across scanners and protocols. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 476–484. Springer, Cham (2018).

    CrossRef  Google Scholar 

  13. Hofmanninger, J., Perkonigg, M., Brink, J.A., Pianykh, O., Herold, C., Langs, G.: Dynamic memory to alleviate catastrophic forgetting in continuous learning settings. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12262, pp. 359–368. Springer, Cham (2020).

    CrossRef  Google Scholar 

  14. Karani, N., Erdil, E., Chaitanya, K., Konukoglu, E.: Test-time adaptable neural networks for robust medical image segmentation. Med. Image Anal. 68, 101907 (2021)

    CrossRef  Google Scholar 

  15. Kamnitsas, K., et al.: Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 597–609. Springer, Cham (2017).

    CrossRef  Google Scholar 

  16. Yang, J., Shi, R., Ni, B.: Medmnist classification decathlon: a lightweight automl benchmark for medical image analysis. arXiv preprint arXiv:2010.14925 (2020)

  17. Kather, J.N., et al.: Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16(1), e1002730 (2019)

    CrossRef  Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Thomas Henn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Henn, T. et al. (2021). A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12903. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87198-7

  • Online ISBN: 978-3-030-87199-4

  • eBook Packages: Computer ScienceComputer Science (R0)