Abstract
Purpose
The purpose of this paper is to construct a 3D tongue model and to generate an animation of tongue movement for speech therapy in patients with lateral articulation (LA).
Methods
The 3D tongue model is generated based on ultrasound (US) images, which are widely used in many clinics. A tongue model is constructed by extracting the tongue surfaces from US images with the help of image processing techniques and a deep learning method. A reference tongue model is generated first using US images of a normal speaker, and a model of an LA patient is then constructed by modifying the reference tongue model. An animation of the tongue movement is generated by deforming the model according to a time sequence.
Results
The accuracy of the tongue surfaces estimated by a deep learning method were 22/45 = 49% and 29/45 = 64% for US images of a normal speaker and an LA patient, respectively. In addition, the maximum vertical errors between the ground truth and the estimated spline curves were 1.01 and 1.03 mm for US images of a normal speaker and an LA patient, respectively.
Conclusion
We have constructed a tongue model and generated a tongue movement animation of an LA patient using US images. The maximum vertical error between the ground truth and the estimated spline curves was only 1.03 mm, and we have confirmed that the generated tongue model is very useful for speech therapy in LA patients.
Similar content being viewed by others
References
Hewer A, Steiner I, Wuhrer S. A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. INTERSPEECH 2014. Singapore, 2014. p. 418–22.
Fang Q, Chen Y, Wang H, et al. An improved 3D geometric tongue model. INTERSPEECH 2016. San Francisco, 2016. p. 1104–7.
Xing F, Woo J, Lee J, et al. Analysis of 3-D tongue motion from tagged and cine magnetic resonance images. J Speech Lang Hear Res. 2016;59:468–79.
Hewer A, Wuhrer S, Steiner I, et al. A multilinear tongue model derived from speech related MRI data of the human vocal tract. Comput Speech Lang. 2018;51:68–92.
Koike N, Ii S, Yoshinaga T, et al. Modal-based inverse estimation for active contraction stresses of tongue muscles using 3D surface shape in speech production. J Biomech. 2017;64:69–76.
Karimi E, Ménard L, Laporte C. Fully-automated tongue detection in ultrasound images. Comput Biol Med. 2019;111:1–13.
Laporte C, Ménard L. Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech. Med Image Anal. 2018;44:98–114.
Mozaffari MH. Lee WS. Domain adaptation for ultrasound tongue contour extraction using transfer learning: a deep learning approach. J Acoust Soc Am 2019:146.
Mozaffari MH, Lee WS. Encoder-decoder CNN models for automatic tracking of tongue contours in real-time ultrasound data. Methods. 2020;179:26–36.
Ruthven M, Miquel ME, Andrew PK. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. Comput Methods Programs Biomed. 2021;198:1–10.
Mukai N, Ishizu T, Mori K, et al. 3D tongue model construction and the motion regeneration. iSMIT; 2017. S5–Image Guided Surgery 07.
Mukai N, Yata R, Mori K, et al. Deep learning based tongue surface extraction method for tongue model construction. iSMIT; 2020. (Online).
Otsu N. A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern. 1978;8:62–6.
Hilditch CJ. Linear skeletons from square cupboards. Mach Intell. 1969;4:403–20.
Ronnegerger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. MICCAI. 2015;Part III:234–41.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. CVPR. 2015:3431–40.
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans PAMI. 2017;39:2481–95.
Krizhevsky A, Sutskever I, Hinton EG. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.
Anthimopoulos M, Christodoulidis S, Ebner G, et al. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans Med Imaging. 2016;35:1207–16.
Gao M, Xu Z, Lu L. et al. Segmentation label propagation using deep convolutional neural networks and dense conditional random field. ISBI. 2016:1265–8.
Acknowledgements
This project was performed by Takahiro Kondo, Tsuyoshi Ishizu, Maiko Kinoshita, Ryoma Yata, and Yutaro Matsutomo, who graduated from Tokyo City University, and the authors also appreciate Dr. Yukari Yamashita and Hiroko Yamada (Showa University School of Dentistry), and Dr. Kazuko Hasegawa (Kamiinaseikyo Hospital) for their support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethical statements
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions.
Informed consent
Informed consent was obtained from all participants for being included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Mukai, N., Mori, K. & Takei, Y. Tongue model construction based on ultrasound images with image processing and deep learning method. J Med Ultrasonics 49, 153–161 (2022). https://doi.org/10.1007/s10396-022-01193-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10396-022-01193-8