The Interpretation of Deep Learning Based Analysis of Medical Images—An Examination of Methodological and Practical Challenges Using Chest X-ray Data

Valsson, Steinar; Arandjelović, Ognjen

doi:10.1007/978-3-031-14771-5_14

Steinar Valsson⁵ &
Ognjen Arandjelović⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1060))

719 Accesses

Abstract

With the increase in availability of annotated X-ray image data, there has been an accompanying and consequent increase in research on machine learning based, and particularly deep learning based, X-ray image analysis. A major problem with this body of work lies in how newly proposed algorithms are evaluated. Usually, comparative analysis is reduced to the presentation of a single metric, often the area under the receiver operating characteristic (AUROC), which does not provide much clinical value or insight, and thus fails to communicate the applicability of proposed models. In the present paper we address this limitation of previous work by presenting a thorough analysis of a state of the art learning approach, and hence illuminate various weaknesses of similar algorithms in the literature, which have not yet been fully acknowledged and appreciated. Our analysis is performed on the ChestX-ray14 dataset which has 14 lung disease labels and metainfo such as patient age, gender, and the relative X-ray direction. We examine the diagnostic significance of different metrics used in the literature including those proposed by the International Medical Device Regulators Forum, and present qualitative assessment of spatial information learnt by the model. We show that models that have very similar AUROCs can exhibit widely differing clinical applicability. As a result, our work demonstrates the importance of detailed reporting and analysis of performance of machine learning approaches in this field, which is crucial both for progress in the field and the adoption of such models in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T., & Saalbach, A. (2019). Comparison of deep learning approaches for multi-label chest X-ray classification. Scientific Reports, 9(1), 1–10.
Article Google Scholar
Boone, D., Mallett, S., Zhu, S., Yao, G. L., Bell, N., Ghanouni, A., von Wagner, C., Taylor, S. A., Altman, D. G., Lilford, R., & Halligan, S. (2013). Patients’ & healthcare professionals’ values regarding true- & false-positive diagnosis when colorectal cancer screening by CT colonography: Discrete choice experiment. PLoS ONE, 8(12), e80767.
Google Scholar
Center for Devices and Radiological Health, Food And Drug Administration: Software as a Medical Device (SAMD): Clinical Evaluation. Technical Report, FDA, Center for Devices and Radiological Health (2018). https://www.fda.gov/media/100714/download.
Cheung, T., Harianto, H., Spanger, M., Young, A., & Wadhwa, V. (2018). Low accuracy and confidence in chest radiograph interpretation amongst junior doctors and medical students. Internal Medicine Journal, 48(7), 864–868.
Article Google Scholar
Cooper, J., Arandjelović, O., & Harrison, D. (2021). Believe the hipe: Hierarchical perturbation for fast and robust explanation of black box models. arXiv:2103.05108.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 248–255).
Google Scholar
Dimitriou, N., Arandjelović, O., Harrison, D. J., & Caie, P. D. (2018). A principled machine learning framework improves accuracy of stage II colorectal cancer prognosis. NPJ Digital Medicine, 1(1), 1–9.
Article Google Scholar
Gündel, S., Grbic, S., Georgescu, B., Liu, S., Maier, A., & Comaniciu, D. (2019). Learning to recognize abnormalities in chest X-rays with location-aware dense networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11401 LNCS (pp. 757–765).
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2261–2269).
Google Scholar
Li, Z., Wang, C., Han, M., Xue, Y., Wei, W., Li, L. J., & Fei-Fei, L. (2018). Thoracic disease identification and localization with limited supervision. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 8290–8299).
Google Scholar
Medizino: Buying a new X-ray machine – advice and offers (2020). https://medizinio.de/en/medical-equipment/x-ray.
Morley, J., Floridi, L., & Goldacre, B. (2020). The poor performance of apps assessing skin cancer risk. The British Medical Journal, 368, m428.
Google Scholar
Oakden-Rayner, L. (2017). Exploring the ChestXray14 dataset: Problems. https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/.
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M. P., & Ng, A. Y. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv:1711.05225.
Satia, I., Bashagha, S., Bibi, A., Ahmed, R., Mellor, S., & Zaman, F. (2013). Assessing the accuracy and certainty in interpreting chest X-rays in the medical division. Clinical Medicine, 13(4), 349–352.
Article Google Scholar
Schwartz, L. M. (2000). US women’s attitudes to false positive mammography results and detection of ductal carcinoma in situ: Cross sectional survey. The British Medical Journal, 320(7250), 1635–1640.
Google Scholar
Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., & Batra, D. (2016). Grad-cam: Why did you say that? arXiv:1611.07450.
Sistrom, C. L., & McKay, N. L. (2005). Costs, charges, and revenues for hospital diagnostic imaging procedures: differences by modality and hospital characteristics. Journal of the American College of Radiology, 2(6), 511–519.
Article Google Scholar
Tun, W., Arandjelovic, O., & Caie, P. D. (2018). Using machine learning and urine cytology for bladder cancer prescreening and patient stratification. In Proceedings of the Workshops at the AAAI (pp. 2–7).
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-Ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and Localization of common thorax diseases. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3462–3471). IEEE.
Google Scholar
Yao, L., Poblenz, E., Dagunts, D., Covington, B., Bernard, D., & Lyman, K. (2017). Learning to diagnose from scratch by exploiting dependencies among labels. arXiv:1710.10501.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2921–2929).
Google Scholar

Download references

Author information

Authors and Affiliations

University of St Andrews, St Andrews, KY16 9SX, Scotland, UK
Steinar Valsson & Ognjen Arandjelović

Authors

Steinar Valsson
View author publications
You can also search for this author in PubMed Google Scholar
Ognjen Arandjelović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ognjen Arandjelović .

Editor information

Editors and Affiliations

Oak-Ridge National Laboratory (ORNL), Center for Biomedical Informatics, The University of Tennessee Health Science Center (UTHSC), Memphis, TN, USA
Arash Shaban-Nejad
School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski
Research Center, IBM Almaden, San Jose, CA, USA
Simone Bianco

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Valsson, S., Arandjelović, O. (2023). The Interpretation of Deep Learning Based Analysis of Medical Images—An Examination of Methodological and Practical Challenges Using Chest X-ray Data. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds) Multimodal AI in Healthcare. Studies in Computational Intelligence, vol 1060. Springer, Cham. https://doi.org/10.1007/978-3-031-14771-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-14771-5_14
Published: 29 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14770-8
Online ISBN: 978-3-031-14771-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics