Skip to main content

Semantic Tree Driven Thyroid Ultrasound Report Generation by Voice Input

  • Conference paper
  • First Online:
Advances in Computer Vision and Computational Biology

Abstract

The automatic speech recognition has achieved quite good performance in the medical domain in the past several years. However, it is still lacking of enough practical solutions with considering the characteristics of real applications. In this work, we develop an approach to automatically generate semantic-coherent ultrasound reports with voice input. The solution includes key algorithms based on a proposed semantic tree structure. The radiologists do not need to follow the fixed templates. They just need to speak their specific observations for individual patients. We have carried out a set of experiments against a real world thyroid ultrasound dataset with more than 40,000 reports from a reputable hospital in Shanghai, China. The experimental results show that our proposed solution can generate concise and accurate reports.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. V.Y. Park, K. Han, Y.K. Seong, M.H. Park, E. Kim, Moon, H.J. et al., Diagnosis of Thyroid nodules: performance of a deep learning convolutional neural network model vs. radiologists. Sci. Rep. 9, 17843 (2019). https://doi.org/10.1038/s41598-019-54434-1

  2. X. Mei, H. Lee, K. Diao, M. Huang, B. Lin, C. Liu, et al., Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nat. Med. 26, 1224–1228 (2020). https://doi.org/10.1038/s41591-020-0931-3

    Article  Google Scholar 

  3. X. Wang, Y. Peng, L. Lu, Z. Lu, R.M. Summers, TieNet: Text-image embedding network for common thorax disease classification and reporting in chest X-rays, in The IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 9049–9058

    Google Scholar 

  4. P. Kisilev, E. Walach, E. Barkan, B. Ophir, S. Alpert, S.Y. Hashoul, From medical image to automatic medical report generation. IBM J. Res. Develop. 59(2/3), 2:1–2:7 (2015)

    Google Scholar 

  5. A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in International Conference on Machine Learning (2014), pp. 1764–1772

    Google Scholar 

  6. Y. He, T.N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, et al., Streaming end-to-end speech recognition for mobile devices, in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (2019), pp. 6381–6385

    Google Scholar 

  7. D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, Deep speech 2: End-to-end speech recognition in English and mandarin, in Proceedings of the 33rd International Conference on Machine Learning (2016), pp. 173–182

    Google Scholar 

  8. L.E. Shafey, H. Soltau, I. Shafran, Joint speech recognition and speaker diarization via sequence transduction, in Conference of the International Speech Communication Association (2019), pp. 396–400

    Google Scholar 

  9. L. Zhou, S.V. Blackley, L. Kowalski, B. Adam, E. Kontrient, D. Mack, et al., Analysis of errors in dictated clinical documents assisted by speech recognition software and professional transcriptionists. JAMA Netw. Open. 1(3), e180530 (2018)

    Google Scholar 

  10. Nuance Communications, Control your computer by voice with speed and accuracy. https://www.nuance.com/en-gb/dragon.html#standardpage-mainpar_backgroundimage_copy. Accessed 18 Decemebr 2019

  11. Nuance Communications, Dragon Medical One: Secure, cloud-based clinical speech recognition. https://www.nuance.com/en-au/healthcare/provider-solutions/speech-recognition/dragon-medical-one.html. Accessed 18 Decemebr 2019

  12. Amazon Web Service, Amazon Transcribe Medical. https://aws.amazon.com/cn/transcribe/medical/. Accessed 16 January 2020

  13. WebChartMD, Healthcare’s leading dictation and medical transcription software. https://www.webchartmd.org/. Accessed 27 May 2020

  14. VoiceboxMD, Medical Dictation for Physicians and Nurse Practitioners. https://voiceboxmd.com/medical-dictation/. Accessed 27 May 2020

  15. A. Paats, T. Alumäe, E. Meister, I. Fridolin, Retrospective analysis of clinical performance of an Estonian speech recognition system for radiology: effects of different acoustic and language models. J. Digit. Imaging. 31(5), 615–621 (2018)

    Article  Google Scholar 

  16. T. Takao, R. Masumura, S. Sakauchi, Y. Ohara, E. Bilgic, E. Umegaki, et al., New report preparation system for endoscopic procedures using speech recognition technology. Endoscopy Int. Open 6(6), E676–E687 (2018)

    Article  Google Scholar 

  17. A. Trujillo, M. Orellana, M.I. Acosta, Design of emergency call record support system applying natural language processing techniques, in Conference on Information Technologies and Communication of Ecuador (2019), pp. 53–65

    Google Scholar 

  18. T.N. Hanna, H. Shekhani, K. Maddu, C. Zhang, Z. Chen, J. Johnson, Structured report compliance: Effect on audio dictation time, report length, and total radiologist study time. Emerg Radiol. 23(5), 449–453 (2016)

    Article  Google Scholar 

  19. K. Papineni, S. Roukos, T. Ward, W. Zhu, BLEU: A method for automatic evaluation of machine translation, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2002), pp. 311–318

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Key R&D Program of China under Grant 2019YFE0190500.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, L., Wang, M., Dong, Y., Zhao, W., Yang, J., Su, J. (2021). Semantic Tree Driven Thyroid Ultrasound Report Generation by Voice Input. In: Arabnia, H.R., Deligiannidis, L., Shouno, H., Tinetti, F.G., Tran, QN. (eds) Advances in Computer Vision and Computational Biology. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71051-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71051-4_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71050-7

  • Online ISBN: 978-3-030-71051-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics