Skip to main content

Evaluation of Importance Estimators in Deep Learning Classifiers for Computed Tomography

  • Conference paper
  • First Online:
Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS 2022)

Abstract

Deep learning has shown superb performance in detecting objects and classifying images, ensuring a great promise for analyzing medical imaging. Translating the success of deep learning to medical imaging, in which doctors need to understand the underlying process, requires the capability to interpret and explain the prediction of neural networks. Interpretability of deep neural networks often relies on estimating the importance of input features (e.g., pixels) with respect to the outcome (e.g., class probability). However, a number of importance estimators (also known as saliency maps) have been developed and it is unclear which ones are more relevant for medical imaging applications. In the present work, we investigated the performance of several importance estimators in explaining the classification of computed tomography (CT) images by a convolutional deep network, using three distinct evaluation metrics. Specifically, the ResNet-50 was trained to classify CT scans of lungs acquired with and without contrast agents, in which clinically relevant anatomical areas were manually determined by experts as segmentation masks in the images. Three evaluation metrics were used to quantify different aspects of interpretability. First, the model-centric fidelity measures a decrease in the model accuracy when certain inputs are perturbed. Second, concordance between importance scores and the expert-defined segmentation masks is measured on a pixel level by a receiver operating characteristic (ROC) curves. Third, we measure a region-wise overlap between a XRAI-based map and the segmentation mask by Dice Similarity Coefficients (DSC). Overall, two versions of SmoothGrad topped the fidelity and ROC rankings, whereas both Integrated Gradients and SmoothGrad excelled in DSC evaluation. Interestingly, there was a critical discrepancy between model-centric (fidelity) and human-centric (ROC and DSC) evaluation. Expert expectation and intuition embedded in segmentation maps does not necessarily align with how the model arrived at its prediction. Understanding this difference in interpretability would help harnessing the power of deep learning in medicine.

L. Brocki, W. Marchadour, M. Hatt, F. Vermet and N. C. Chung—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The class score \(S_c\) is the activation of the neuron in the prediction vector that corresponds to the class c.

References

  1. Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017)

  2. Armato, S.G., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)

    Article  Google Scholar 

  3. Bae, K.T.: Intravenous contrast medium administration and scan timing at CT: considerations and approaches. Radiology 256(1), 32–61 (2010)

    Article  Google Scholar 

  4. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803–1831 (2010)

    MathSciNet  MATH  Google Scholar 

  5. Brocki, L., Chung, N.C.: Evaluation of interpretability methods and perturbation artifacts in deep neural networks. arXiv preprint arXiv:2203.02928 (2022)

  6. Diamant, A., Chatterjee, A., Vallières, M., Shenouda, G., Seuntjens, J.: Deep learning in head & neck cancer outcome prediction. Sci. Rep. 9(1), 2764 (2019)

    Article  Google Scholar 

  7. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  8. Dong, Y.C., Cormode, D.P.: Heavy elements for X-ray contrast (2021)

    Google Scholar 

  9. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  10. Erion, G., Janizek, J.D., Sturmfels, P., Lundberg, S.M., Lee, S.I.: Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nat. Mach. Intell. 3(7), 620–631 (2021)

    Article  Google Scholar 

  11. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)

    Article  Google Scholar 

  12. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  13. Hatt, M., Le Rest, C.C., Tixier, F., Badic, B., Schick, U., Visvikis, D.: Radiomics: data are also images. J. Nucl. Med. 60, 38S-44S (2019)

    Article  Google Scholar 

  14. Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretability methods in deep neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  15. Kapishnikov, A., Bolukbasi, T., Viégas, F., Terry, M.: XRAI: better attributions through regions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4948–4957 (2019)

    Google Scholar 

  16. Kim, B., Seo, J., Jeon, S., Koo, J., Choe, J., Jeon, T.: Why are saliency maps noisy? Cause of and solution to noisy saliency maps. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4149–4157. IEEE (2019)

    Google Scholar 

  17. Lohrke, J., et al.: 25 years of contrast-enhanced MRI: developments, current challenges and future perspectives. Adv. Ther. 33(1), 1–28 (2016)

    Article  Google Scholar 

  18. Papadimitroulas, P., et al.: Artificial intelligence: deep learning in oncological radiomics and challenges of interpretability and data harmonization. Physica Med. 83, 108–121 (2021)

    Article  Google Scholar 

  19. Pedrosa, J., et al.: LNDb: a lung nodule database on computed tomography. arXiv:1911.08434 [cs, eess] (2019)

  20. Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421 (2018)

  21. Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2016)

    Article  MathSciNet  Google Scholar 

  22. Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6

    Book  Google Scholar 

  23. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: In Workshop at International Conference on Learning Representations. Citeseer (2014)

    Google Scholar 

  24. Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)

  25. Stiglic, G., Kocbek, P., Fijacko, N., Zitnik, M., Verbert, K., Cilar, L.: Interpretability of machine learning-based prediction models in healthcare. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 10(5), e1379 (2020)

    Google Scholar 

  26. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  27. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  28. Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535. IEEE (2010)

    Google Scholar 

Download references

Acknowledgements

This work was partly funded by the ERA-Net CHIST-ERA grant [CHIST-ERA-19-XAI-007] long term challenges in ICT project INFORM (ID: 93603), the French Ministry for Research and Higher Education and the French National Research Agency (ANR) through the CIFRE program, The Brittany Region, by General Secretariat for Research and Innovation (GSRI) of Greece, by National Science Centre (NCN) of Poland [2020/02/Y/ST6/00071]. This work has utilized computing resources from the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at University of Warsaw and the NVIDIA GPU grant.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mathieu Hatt , Franck Vermet or Neo Christopher Chung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brocki, L. et al. (2022). Evaluation of Importance Estimators in Deep Learning Classifiers for Computed Tomography. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds) Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2022. Lecture Notes in Computer Science(), vol 13283. Springer, Cham. https://doi.org/10.1007/978-3-031-15565-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15565-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15564-2

  • Online ISBN: 978-3-031-15565-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics