Skip to main content

A Self-guided Framework for Radiology Report Generation

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13438))

Abstract

Automatic radiology report generation is essential to computer-aided diagnosis. Through the success of image captioning, medical report generation has been achievable. However, the lack of annotated disease labels is still the bottleneck of this area. In addition, the image-text data bias problem and complex sentences make it more difficult to generate accurate reports. To address these gaps, we present a self-guided framework (SGF), a suite of unsupervised and supervised deep learning methods to mimic the process of human learning and writing. In detail, our framework obtains the domain knowledge from medical reports without extra disease labels and guides itself to extract fined-grain visual features associated with the text. Moreover, SGF successfully improves the accuracy and length of medical report generation by incorporating a similarity comparison mechanism that imitates the process of human self-improvement through comparative practice. Extensive experiments demonstrate the utility of our SGF in the majority of cases, showing its superior performance over state-of-the-art methods. Our results highlight the capacity of the proposed framework to distinguish fined-grained visual details between words and verify its advantage in generating medical reports.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vinyals, O., et al.: Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

    Google Scholar 

  2. Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp. 2048–2057, PMLR (2015)

    Google Scholar 

  3. Lu, J., et al.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 375–383 (2017)

    Google Scholar 

  4. Lu, J., et al.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018)

    Google Scholar 

  5. Liu, G., et al.: Medical-VLBERT: medical visual language BERT for covid-19 CT report generation with alternate learning. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 3786–3797 (2021)

    Article  Google Scholar 

  6. Yang, Y., et al.: Joint embedding of deep visual and semantic features for medical image report generation. IEEE Trans. Multimedia (2021)

    Google Scholar 

  7. Tran, A., et al.: Transform and tell: Entity-aware news image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13035–13045 (2020)

    Google Scholar 

  8. Chen, L., et al.: Human-like controllable image captioning with verb-specific semantic roles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16846–16856 (2021)

    Google Scholar 

  9. Xu, G., et al.: Towards accurate text-based image captioning with content diversity exploration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12637–12646 (2021)

    Google Scholar 

  10. Zhang, Y., et al.: When radiology report generation meets knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12910–12917 (2020)

    Google Scholar 

  11. Liu, F., et al.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2021)

    Google Scholar 

  12. Li, C. Y., et al.: Hybrid retrieval-generation reinforced agent for medical image report generation. Adv. Neural Info Process. Syst. 31 1537–1547 (2018)

    Google Scholar 

  13. Wang, Z., et al.: A self-boosting framework for automated radiographic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2433–2442 (2021)

    Google Scholar 

  14. Jing, B., et al.: Show, describe and conclude: on exploiting the structure information of chest X-ray reports. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6570–6580 (2019)

    Google Scholar 

  15. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

    Google Scholar 

  16. Chen, Z., et al.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1439–1449 (2020)

    Google Scholar 

  17. You, D., et al.: Aligntransformer: hierarchical alignment of visual regions and disease tags for medical report generation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 72–82, Springer (2021)

    Google Scholar 

  18. Reimers, N., et al.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 671–688, Association for Computational Linguistics (2019)

    Google Scholar 

  19. Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)

  20. Williams, A., et al.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1 (Long Papers), pp. 1112–1122 (2018)

    Google Scholar 

  21. McInnes, L., et al.: UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  22. Campello, R.J., et al.: Density-based clustering based on hierarchical density estimates. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160–172, Springer (2013)

    Google Scholar 

  23. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  24. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, IEEE (2009)

    Google Scholar 

  25. Ba, J.L., et al.: Layer normalization. arXiv preprint arXiv:1607.06450(2016)

  26. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)

    Article  Google Scholar 

  27. Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  28. Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)

    Google Scholar 

  29. Banerjee, S., et al.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and Summarization, pp. 65–72 (2005)

    Google Scholar 

  30. Chen, X., et al.: Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

  31. Kingma, D.P., et al.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgements

 This work was supported in part by Key-Area Research and Development Program of Guangdong Province (No.2020B0909020002), National Natural Science Foundation of China (Grant No. 62003330), Shenzhen Fundamental Research Funds (Grant No. JCYJ20200109114233670, JCYJ20190807170407391), and Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. This work was also supported by Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shibo Li or Ying Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Li, S., Hu, Y., Tao, H. (2022). A Self-guided Framework for Radiology Report Generation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13438. Springer, Cham. https://doi.org/10.1007/978-3-031-16452-1_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16452-1_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16451-4

  • Online ISBN: 978-3-031-16452-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics