Tongue model construction based on ultrasound images with image processing and deep learning method

Mukai, Nobuhiko; Mori, Kimie; Takei, Yoshiko

doi:10.1007/s10396-022-01193-8

Tongue model construction based on ultrasound images with image processing and deep learning method

Original Article—Physics & Engineering
Published: 18 February 2022

Volume 49, pages 153–161, (2022)
Cite this article

Journal of Medical Ultrasonics Aims and scope Submit manuscript

439 Accesses
1 Citation
Explore all metrics

Abstract

Purpose

The purpose of this paper is to construct a 3D tongue model and to generate an animation of tongue movement for speech therapy in patients with lateral articulation (LA).

Methods

The 3D tongue model is generated based on ultrasound (US) images, which are widely used in many clinics. A tongue model is constructed by extracting the tongue surfaces from US images with the help of image processing techniques and a deep learning method. A reference tongue model is generated first using US images of a normal speaker, and a model of an LA patient is then constructed by modifying the reference tongue model. An animation of the tongue movement is generated by deforming the model according to a time sequence.

Results

The accuracy of the tongue surfaces estimated by a deep learning method were 22/45 = 49% and 29/45 = 64% for US images of a normal speaker and an LA patient, respectively. In addition, the maximum vertical errors between the ground truth and the estimated spline curves were 1.01 and 1.03 mm for US images of a normal speaker and an LA patient, respectively.

Conclusion

We have constructed a tongue model and generated a tongue movement animation of an LA patient using US images. The maximum vertical error between the ground truth and the estimated spline curves was only 1.03 mm, and we have confirmed that the generated tongue model is very useful for speech therapy in LA patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks

Automated tongue contour extraction from ultrasound sequences using signal enhancing neural network and energy minimized spline

Article 15 December 2023

Ultra2Speech - A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images

References

Hewer A, Steiner I, Wuhrer S. A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. INTERSPEECH 2014. Singapore, 2014. p. 418–22.
Fang Q, Chen Y, Wang H, et al. An improved 3D geometric tongue model. INTERSPEECH 2016. San Francisco, 2016. p. 1104–7.
Xing F, Woo J, Lee J, et al. Analysis of 3-D tongue motion from tagged and cine magnetic resonance images. J Speech Lang Hear Res. 2016;59:468–79.
Article Google Scholar
Hewer A, Wuhrer S, Steiner I, et al. A multilinear tongue model derived from speech related MRI data of the human vocal tract. Comput Speech Lang. 2018;51:68–92.
Article Google Scholar
Koike N, Ii S, Yoshinaga T, et al. Modal-based inverse estimation for active contraction stresses of tongue muscles using 3D surface shape in speech production. J Biomech. 2017;64:69–76.
Article Google Scholar
Karimi E, Ménard L, Laporte C. Fully-automated tongue detection in ultrasound images. Comput Biol Med. 2019;111:1–13.
Article Google Scholar
Laporte C, Ménard L. Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech. Med Image Anal. 2018;44:98–114.
Article Google Scholar
Mozaffari MH. Lee WS. Domain adaptation for ultrasound tongue contour extraction using transfer learning: a deep learning approach. J Acoust Soc Am 2019:146.
Mozaffari MH, Lee WS. Encoder-decoder CNN models for automatic tracking of tongue contours in real-time ultrasound data. Methods. 2020;179:26–36.
Article Google Scholar
Ruthven M, Miquel ME, Andrew PK. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. Comput Methods Programs Biomed. 2021;198:1–10.
Article Google Scholar
Mukai N, Ishizu T, Mori K, et al. 3D tongue model construction and the motion regeneration. iSMIT; 2017. S5–Image Guided Surgery 07.
Mukai N, Yata R, Mori K, et al. Deep learning based tongue surface extraction method for tongue model construction. iSMIT; 2020. (Online).
Otsu N. A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern. 1978;8:62–6.
Article Google Scholar
Hilditch CJ. Linear skeletons from square cupboards. Mach Intell. 1969;4:403–20.
Google Scholar
Ronnegerger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. MICCAI. 2015;Part III:234–41.
Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. CVPR. 2015:3431–40.
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans PAMI. 2017;39:2481–95.
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton EG. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.
Article Google Scholar
Anthimopoulos M, Christodoulidis S, Ebner G, et al. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans Med Imaging. 2016;35:1207–16.
Article Google Scholar
Gao M, Xu Z, Lu L. et al. Segmentation label propagation using deep convolutional neural networks and dense conditional random field. ISBI. 2016:1265–8.

Download references

Acknowledgements

This project was performed by Takahiro Kondo, Tsuyoshi Ishizu, Maiko Kinoshita, Ryoma Yata, and Yutaro Matsutomo, who graduated from Tokyo City University, and the authors also appreciate Dr. Yukari Yamashita and Hiroko Yamada (Showa University School of Dentistry), and Dr. Kazuko Hasegawa (Kamiinaseikyo Hospital) for their support.

Author information

Authors and Affiliations

Information Technology, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya, Tokyo, 158-8557, Japan
Nobuhiko Mukai
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo, 153-8505, Japan
Nobuhiko Mukai
Special Needs Dentistry, Showa University School of Dentistry, 2-1-1 Kitasenzoku, Ohta, Tokyo, 145-8515, Japan
Kimie Mori & Yoshiko Takei

Authors

Nobuhiko Mukai
View author publications
You can also search for this author in PubMed Google Scholar
Kimie Mori
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiko Takei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nobuhiko Mukai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical statements

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions.

Informed consent

Informed consent was obtained from all participants for being included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Mukai, N., Mori, K. & Takei, Y. Tongue model construction based on ultrasound images with image processing and deep learning method. J Med Ultrasonics 49, 153–161 (2022). https://doi.org/10.1007/s10396-022-01193-8

Download citation

Received: 07 September 2021
Accepted: 14 January 2022
Published: 18 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10396-022-01193-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tongue model construction based on ultrasound images with image processing and deep learning method