Skip to main content

Weakly Supervised One-Stage Vision and Language Disease Detection Using Large Scale Pneumonia and Pneumothorax Studies

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020 (MICCAI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12264))

Abstract

Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation mapping (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles – implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, pp. 2546–2554 (2011). http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization

  2. Brooks, J.: Coco annotator (2019). https://github.com/jsbroks/coco-annotator/

  3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  4. Girshick, R.B.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 1440–1448. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.169

  5. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. CoRR abs/1611.10012 (2016). http://arxiv.org/abs/1611.10012

  6. Irvin, J., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. CoRR abs/1901.07031 (2019). http://arxiv.org/abs/1901.07031

  7. Johnson, A.E.W., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. CoRR abs/1901.07042 (2019). http://arxiv.org/abs/1901.07042

  8. Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  9. Kao, H.: Gradcam on chexnet, March 2020. https://github.com/thtang/CheXNet-with-localization

  10. Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.L.: Referitgame: referring to objects in photographs of natural scenes. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 787–798. ACL (2014). https://doi.org/10.3115/v1/d14-1086

  11. Li, Z., et al.: Thoracic disease identification and localization with limited supervision. CoRR abs/1711.06373 (2017). http://arxiv.org/abs/1711.06373

  12. Lin, M., Chen, Q., Yan, S.: Network in network. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014). http://arxiv.org/abs/1312.4400

  13. Loper, E., Bird, S.: NLTK: the natural language toolkit. CoRR cs.CL/0205028 (2002). https://arxiv.org/abs/cs/0205028

  14. Lyubinets, V., Boiko, T., Nicholas, D.: Automated labeling of bugs and tickets using attention-based mechanisms in recurrent neural networks. CoRR abs/1807.02892 (2018). http://arxiv.org/abs/1807.02892

  15. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Baltimore, MD, USA, 22–27 June 2014, System Demonstrations, pp. 55–60. The Association for Computer Linguistics (2014). https://doi.org/10.3115/v1/p14-5010

  16. Moradi, M., Madani, A., Gur, Y., Guo, Y., Syeda-Mahmood, T.: Bimodal network architectures for automatic generation of image annotation from text. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 449–456. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_51

    Chapter  Google Scholar 

  17. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. CoRR abs/1912.01703 (2019). http://arxiv.org/abs/1912.01703

  18. Rajpurkar, P., et al.: Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. CoRR abs/1711.05225 (2017). http://arxiv.org/abs/1711.05225

  19. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. CoRR abs/1806.03822 (2018). http://arxiv.org/abs/1806.03822

  20. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. CoRR abs/1506.02640 (2015). http://arxiv.org/abs/1506.02640

  21. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767

  22. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: why did you say that? Visual explanations from deep networks via gradient-based localization. CoRR abs/1610.02391 (2016). http://arxiv.org/abs/1610.02391

  23. Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. CoRR abs/1905.05950 (2019). http://arxiv.org/abs/1905.05950

  24. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  25. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. CoRR abs/1705.02315 (2017). http://arxiv.org/abs/1705.02315

  26. Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. CoRR abs/1801.04334 (2018). http://arxiv.org/abs/1801.04334

  27. Yan, K., Wang, X., Lu, L., Summers, R.M.: Deeplesion: automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations. CoRR abs/1710.01766 (2017). http://arxiv.org/abs/1710.01766

  28. Yang, Z., Gong, B., Wang, L., Huang, W., Yu, D., Luo, J.: A fast and accurate one-stage approach to visual grounding. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 4682–4692. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00478

  29. Yang, Z., Gong, B., Wang, L., Huang, W., Yu, D., Luo, J.: A fast and accurate one-stage approach to visual grounding. CoRR abs/1908.06354 (2019). http://arxiv.org/abs/1908.06354

  30. Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. CoRR abs/1512.04150 (2015). http://arxiv.org/abs/1512.04150

  31. Zhu, W., Vang, Y.S., Huang, Y., Xie, X.: DeepEM: deep 3D ConvNets with EM for weakly supervised pulmonary nodule detection. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 812–820. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_90

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leo K. Tam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tam, L.K., Wang, X., Turkbey, E., Lu, K., Wen, Y., Xu, D. (2020). Weakly Supervised One-Stage Vision and Language Disease Detection Using Large Scale Pneumonia and Pneumothorax Studies. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12264. Springer, Cham. https://doi.org/10.1007/978-3-030-59719-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59719-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59718-4

  • Online ISBN: 978-3-030-59719-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics