Abstract
The automatic speech recognition has achieved quite good performance in the medical domain in the past several years. However, it is still lacking of enough practical solutions with considering the characteristics of real applications. In this work, we develop an approach to automatically generate semantic-coherent ultrasound reports with voice input. The solution includes key algorithms based on a proposed semantic tree structure. The radiologists do not need to follow the fixed templates. They just need to speak their specific observations for individual patients. We have carried out a set of experiments against a real world thyroid ultrasound dataset with more than 40,000 reports from a reputable hospital in Shanghai, China. The experimental results show that our proposed solution can generate concise and accurate reports.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
V.Y. Park, K. Han, Y.K. Seong, M.H. Park, E. Kim, Moon, H.J. et al., Diagnosis of Thyroid nodules: performance of a deep learning convolutional neural network model vs. radiologists. Sci. Rep. 9, 17843 (2019). https://doi.org/10.1038/s41598-019-54434-1
X. Mei, H. Lee, K. Diao, M. Huang, B. Lin, C. Liu, et al., Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nat. Med. 26, 1224–1228 (2020). https://doi.org/10.1038/s41591-020-0931-3
X. Wang, Y. Peng, L. Lu, Z. Lu, R.M. Summers, TieNet: Text-image embedding network for common thorax disease classification and reporting in chest X-rays, in The IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 9049–9058
P. Kisilev, E. Walach, E. Barkan, B. Ophir, S. Alpert, S.Y. Hashoul, From medical image to automatic medical report generation. IBM J. Res. Develop. 59(2/3), 2:1–2:7 (2015)
A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in International Conference on Machine Learning (2014), pp. 1764–1772
Y. He, T.N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, et al., Streaming end-to-end speech recognition for mobile devices, in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (2019), pp. 6381–6385
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, Deep speech 2: End-to-end speech recognition in English and mandarin, in Proceedings of the 33rd International Conference on Machine Learning (2016), pp. 173–182
L.E. Shafey, H. Soltau, I. Shafran, Joint speech recognition and speaker diarization via sequence transduction, in Conference of the International Speech Communication Association (2019), pp. 396–400
L. Zhou, S.V. Blackley, L. Kowalski, B. Adam, E. Kontrient, D. Mack, et al., Analysis of errors in dictated clinical documents assisted by speech recognition software and professional transcriptionists. JAMA Netw. Open. 1(3), e180530 (2018)
Nuance Communications, Control your computer by voice with speed and accuracy. https://www.nuance.com/en-gb/dragon.html#standardpage-mainpar_backgroundimage_copy. Accessed 18 Decemebr 2019
Nuance Communications, Dragon Medical One: Secure, cloud-based clinical speech recognition. https://www.nuance.com/en-au/healthcare/provider-solutions/speech-recognition/dragon-medical-one.html. Accessed 18 Decemebr 2019
Amazon Web Service, Amazon Transcribe Medical. https://aws.amazon.com/cn/transcribe/medical/. Accessed 16 January 2020
WebChartMD, Healthcare’s leading dictation and medical transcription software. https://www.webchartmd.org/. Accessed 27 May 2020
VoiceboxMD, Medical Dictation for Physicians and Nurse Practitioners. https://voiceboxmd.com/medical-dictation/. Accessed 27 May 2020
A. Paats, T. Alumäe, E. Meister, I. Fridolin, Retrospective analysis of clinical performance of an Estonian speech recognition system for radiology: effects of different acoustic and language models. J. Digit. Imaging. 31(5), 615–621 (2018)
T. Takao, R. Masumura, S. Sakauchi, Y. Ohara, E. Bilgic, E. Umegaki, et al., New report preparation system for endoscopic procedures using speech recognition technology. Endoscopy Int. Open 6(6), E676–E687 (2018)
A. Trujillo, M. Orellana, M.I. Acosta, Design of emergency call record support system applying natural language processing techniques, in Conference on Information Technologies and Communication of Ecuador (2019), pp. 53–65
T.N. Hanna, H. Shekhani, K. Maddu, C. Zhang, Z. Chen, J. Johnson, Structured report compliance: Effect on audio dictation time, report length, and total radiologist study time. Emerg Radiol. 23(5), 449–453 (2016)
K. Papineni, S. Roukos, T. Ward, W. Zhu, BLEU: A method for automatic evaluation of machine translation, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2002), pp. 311–318
Acknowledgement
This work was supported by the National Key R&D Program of China under Grant 2019YFE0190500.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, L., Wang, M., Dong, Y., Zhao, W., Yang, J., Su, J. (2021). Semantic Tree Driven Thyroid Ultrasound Report Generation by Voice Input. In: Arabnia, H.R., Deligiannidis, L., Shouno, H., Tinetti, F.G., Tran, QN. (eds) Advances in Computer Vision and Computational Biology. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71051-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-71051-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71050-7
Online ISBN: 978-3-030-71051-4
eBook Packages: Computer ScienceComputer Science (R0)