Skip to main content

Interpreting and Correcting Medical Image Classification with PIP-Net

  • Conference paper
  • First Online:
Artificial Intelligence. ECAI 2023 International Workshops (ECAI 2023)

Abstract

Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net’s decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net’s unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If segmentation masks were not available, patch-related prototypes could efficiently be collected manually, since the sparsity of PIP-Net results in a reasonable number of relevant prototypes (only 119 for ISIC).

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  2. Akata, Z., et al.: A research agenda for hybrid intelligence: Augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer 53(8), 18–28 (2020). https://doi.org/10.1109/MC.2020.2996587

    Article  Google Scholar 

  3. Anders, C.J., Weber, L., Neumann, D., Samek, W., Müller, K.R., Lapuschkin, S.: Finding and removing clever hans: Using explanation methods to debug and improve deep models. Inform. Fusion 77, 261–295 (2022). https://doi.org/10.1016/j.inffus.2021.07.015, https://www.sciencedirect.com/science/article/pii/S1566253521001573

  4. Badgeley, M.A., et al.: Deep learning predicts hip fracture using confounding patient and healthcare variables. npj Digital Med. 2(1), 1–10 (Apr 2019). https://doi.org/10.1038/s41746-019-0105-1, https://www.nature.com/articles/s41746-019-0105-1

  5. Barnett, A.J., et al.: A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nature Mach. Intell. 3(12), 1061–1070 (2021)

    Article  Google Scholar 

  6. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)

    Article  Google Scholar 

  7. Borys, K., Schmitt, Y.A., Nauta, M., Seifert, C., Krämer, N., Friedrich, C.M., Nensa, F.: Explainable ai in medical imaging: An overview for clinical practitioners - saliency-based xai approaches. Europ. J. Radiol. 162, 110787 (2023). https://doi.org/10.1016/j.ejrad.2023.110787, https://www.sciencedirect.com/science/article/pii/S0720048X23001018

  8. Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019). https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html

  9. Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368 [cs] (2019). http://arxiv.org/abs/1902.03368

  10. Colin, J., Fel, T., Cadene, R., Serre, T.: What i cannot predict, i do not understand: a human-centered evaluation framework for explainability methods. In: Advances in Neural Information Processing Systems (Oct 2022)

    Google Scholar 

  11. DeGrave, A.J., Janizek, J.D., Lee, S.I.: Ai for radiographic covid-19 detection selects shortcuts over signal. Nature Machi. Intell. 3(7), 610–619 (2021)

    Article  Google Scholar 

  12. Geirhos, R., et al.: Shortcut learning in deep neural networks. Nature Mach. Intell. 2(11), 665–673 (2020)

    Article  Google Scholar 

  13. Han, S.M., et al.: Radiographic analysis of adult ankle fractures using combined danis-weber and lauge-hansen classification systems. Sci. Rep. 10(1), 7655 (2020)

    Article  Google Scholar 

  14. Jin, W., Li, X., Hamarneh, G.: Evaluating explainable AI on a multi-modal medical imaging task: can existing algorithms fulfill clinical requirements? Proc. AAAI Conf. Artif. Intell.36(11), 11945–11953 (Jun 2022). https://doi.org/10.1609/aaai.v36i11.21452, https://ojs.aaai.org/index.php/AAAI/article/view/21452

  15. Kahn, C.E., Carrino, J.A., Flynn, M.J., Peck, D.J., Horii, S.C.: Dicom and radiology: past, present, and future. J. Am. Coll. Radiol. 4(9), 652–657 (2007)

    Article  Google Scholar 

  16. Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937 (2022)

  17. Langerhuizen, D.W.G., et al.: What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? a systematic review. Clin. Orthopaedics Related Res. ®477(11), 2482 (Nov 2019). https://doi.org/10.1097/CORR.0000000000000848

  18. Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nature Commun. 10(1), 1096 (2019). https://doi.org/10.1038/s41467-019-08987-4, https://www.nature.com/articles/s41467-019-08987-4, number: 1 Publisher: Nature Publishing Group

  19. Lau, B.C., Allahabadi, S., Palanca, A., Oji, D.E.: Understanding radiographic measurements used in foot and ankle surgery. J. Am. Acad. Orthop. Surg. 30(2), e139–e154 (2022). https://doi.org/10.5435/JAAOS-D-20-00189

    Article  Google Scholar 

  20. Liu, X., et al.: A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health 1(6), e271–e297 (Oct 2019). https://doi.org/10.1016/S2589-7500(19)30123-2, http://www.sciencedirect.com/science/article/pii/S2589750019301232

  21. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986 (June 2022)

    Google Scholar 

  22. Mawatari, T., et al.: The effect of deep convolutional neural networks on radiologists’ performance in the detection of hip fractures on digital pelvic radiographs. European J. Radiol. 130, 109188 (2020). https://doi.org/10.1016/j.ejrad.2020.109188, https://www.sciencedirect.com/science/article/pii/S0720048X20303776

  23. Mishra, N.K., Celebi, M.E.: An overview of melanoma detection in dermoscopy images using image processing and machine learning. arXiv preprint arXiv:1601.07843 (2016)

  24. Mohammadjafari, S., Cevik, M., Thanabalasingam, M., Basar, A.: Using protopnet for interpretable alzheimer’s disease classification. In: Canadian Conference on AI (2021)

    Google Scholar 

  25. Müller, S.G., Hutter, F.: Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 774–782 (October 2021)

    Google Scholar 

  26. Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14928–14938 (2021). https://doi.org/10.1109/CVPR46437.2021.01469

  27. Nauta, M., Schlötterer, J., van Keulen, M., Seifert, C.: Pip-net: patch-based intuitive prototypes for interpretable image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023)

    Google Scholar 

  28. Nauta, M., Walsh, R., Dubowski, A., Seifert, C.: Uncovering and correcting shortcut learning in machine learning models for skin cancer diagnosis. Diagnostics 12(1) (2022). DOI: 10.3390/diagnostics12010040, https://www.mdpi.com/2075-4418/12/1/40

  29. Pahde, F., Dreyer, M., Samek, W., Lapuschkin, S.: Reveal to revise: An explainable AI life cycle for iterative bias correction of deep models (2023)

    Google Scholar 

  30. Rajpurkar, P., et al.: Mura: large dataset for abnormality detection in musculoskeletal radiographs. arXiv preprint arXiv:1712.06957 (2017)

  31. Rieger, L., Singh, C., Murdoch, W.J., Yu, B.: Interpretations are useful: Penalizing explanations to align neural networks with prior knowledge. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 8116–8126. PMLR (2020). http://proceedings.mlr.press/v119/rieger20a.html

  32. Ross, A.S., Hughes, M.C., Doshi-Velez, F.: Right for the right reasons: training differentiable models by constraining their explanations. In: IJCAI (2017). https://doi.org/10.24963/ijcai.2017/371, https://doi.org/10.24963/ijcai.2017/371

  33. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  34. Rymarczyk, D., Pardyl, A., Kraus, J., Kaczyńska, A., Skomorowski, M., Zieliński, B.: Protomil: multiple instance learning with prototypical parts for whole-slide image classification. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 421–436. Springer International Publishing, Cham (2023)

    Chapter  Google Scholar 

  35. Rymarczyk, D., Struski, Ł., Górszczak, M., Lewandowska, K., Tabor, J., Zieliński, B.: Interpretable image classification with differentiable prototypes assignment. In: Computer Vision - ECCV 2022. pp. 351–368. Springer Nature Switzerland, Cham (2022)

    Google Scholar 

  36. Salahuddin, Z., Woodruff, H.C., Chatterjee, A., Lambin, P.: Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput. Biol. Med. 140, 105111 (2022). https://doi.org/10.1016/j.compbiomed.2021.105111, https://www.sciencedirect.com/science/article/pii/S0010482521009057

  37. Shen, H., Huang, T.H.: How useful are the machine-generated interpretations to general users? a human evaluation on guessing the incorrectly predicted labels. Proc. AAAI Conf. Human Comput. Crowdsourc. 8(1), 168–172 (Oct 2020). https://doi.org/10.1609/hcomp.v8i1.7477, https://ojs.aaai.org/index.php/HCOMP/article/view/7477

  38. Singh, G., Yow, K.C.: An interpretable deep learning model for Covid-19 detection with chest x-ray images. IEEE Access 9, 85198–85208 (2021). https://doi.org/10.1109/ACCESS.2021.3087583

    Article  Google Scholar 

  39. Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016). https://doi.org/10.1109/TMI.2016.2535302

    Article  Google Scholar 

  40. Teso, S., Kersting, K.: Explanatory interactive machine learning. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 239–245. AIES ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3306618.3314293, https://doi.org/10.1145/3306618.3314293

  41. Yufit, P., Seligson, D.: Malleolar ankle fractures. a guide to evaluation and treatment. Orthopaedics Trauma 24(4), 286–297 (2010). https://doi.org/10.1016/j.mporth.2010.03.010, https://www.sciencedirect.com/science/article/pii/S1877132710000357

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meike Nauta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nauta, M., Hegeman, J.H., Geerdink, J., Schlötterer, J., Keulen, M.v., Seifert, C. (2024). Interpreting and Correcting Medical Image Classification with PIP-Net. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50396-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50395-5

  • Online ISBN: 978-3-031-50396-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics