Skip to main content

Advertisement

Log in

Tongue model construction based on ultrasound images with image processing and deep learning method

  • Original Article—Physics & Engineering
  • Published:
Journal of Medical Ultrasonics Aims and scope Submit manuscript

Abstract

Purpose

The purpose of this paper is to construct a 3D tongue model and to generate an animation of tongue movement for speech therapy in patients with lateral articulation (LA).

Methods

The 3D tongue model is generated based on ultrasound (US) images, which are widely used in many clinics. A tongue model is constructed by extracting the tongue surfaces from US images with the help of image processing techniques and a deep learning method. A reference tongue model is generated first using US images of a normal speaker, and a model of an LA patient is then constructed by modifying the reference tongue model. An animation of the tongue movement is generated by deforming the model according to a time sequence.

Results

The accuracy of the tongue surfaces estimated by a deep learning method were 22/45 = 49% and 29/45 = 64% for US images of a normal speaker and an LA patient, respectively. In addition, the maximum vertical errors between the ground truth and the estimated spline curves were 1.01 and 1.03 mm for US images of a normal speaker and an LA patient, respectively.

Conclusion

We have constructed a tongue model and generated a tongue movement animation of an LA patient using US images. The maximum vertical error between the ground truth and the estimated spline curves was only 1.03 mm, and we have confirmed that the generated tongue model is very useful for speech therapy in LA patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Hewer A, Steiner I, Wuhrer S. A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. INTERSPEECH 2014. Singapore, 2014. p. 418–22.

  2. Fang Q, Chen Y, Wang H, et al. An improved 3D geometric tongue model. INTERSPEECH 2016. San Francisco, 2016. p. 1104–7.

  3. Xing F, Woo J, Lee J, et al. Analysis of 3-D tongue motion from tagged and cine magnetic resonance images. J Speech Lang Hear Res. 2016;59:468–79.

    Article  Google Scholar 

  4. Hewer A, Wuhrer S, Steiner I, et al. A multilinear tongue model derived from speech related MRI data of the human vocal tract. Comput Speech Lang. 2018;51:68–92.

    Article  Google Scholar 

  5. Koike N, Ii S, Yoshinaga T, et al. Modal-based inverse estimation for active contraction stresses of tongue muscles using 3D surface shape in speech production. J Biomech. 2017;64:69–76.

    Article  Google Scholar 

  6. Karimi E, Ménard L, Laporte C. Fully-automated tongue detection in ultrasound images. Comput Biol Med. 2019;111:1–13.

    Article  Google Scholar 

  7. Laporte C, Ménard L. Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech. Med Image Anal. 2018;44:98–114.

    Article  Google Scholar 

  8. Mozaffari MH. Lee WS. Domain adaptation for ultrasound tongue contour extraction using transfer learning: a deep learning approach. J Acoust Soc Am 2019:146.

  9. Mozaffari MH, Lee WS. Encoder-decoder CNN models for automatic tracking of tongue contours in real-time ultrasound data. Methods. 2020;179:26–36.

    Article  Google Scholar 

  10. Ruthven M, Miquel ME, Andrew PK. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. Comput Methods Programs Biomed. 2021;198:1–10.

    Article  Google Scholar 

  11. Mukai N, Ishizu T, Mori K, et al. 3D tongue model construction and the motion regeneration. iSMIT; 2017. S5–Image Guided Surgery 07.

  12. Mukai N, Yata R, Mori K, et al. Deep learning based tongue surface extraction method for tongue model construction. iSMIT; 2020. (Online).

  13. Otsu N. A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern. 1978;8:62–6.

    Article  Google Scholar 

  14. Hilditch CJ. Linear skeletons from square cupboards. Mach Intell. 1969;4:403–20.

    Google Scholar 

  15. Ronnegerger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. MICCAI. 2015;Part III:234–41.

    Google Scholar 

  16. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. CVPR. 2015:3431–40.

  17. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans PAMI. 2017;39:2481–95.

    Article  Google Scholar 

  18. Krizhevsky A, Sutskever I, Hinton EG. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.

    Article  Google Scholar 

  19. Anthimopoulos M, Christodoulidis S, Ebner G, et al. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans Med Imaging. 2016;35:1207–16.

    Article  Google Scholar 

  20. Gao M, Xu Z, Lu L. et al. Segmentation label propagation using deep convolutional neural networks and dense conditional random field. ISBI. 2016:1265–8.

Download references

Acknowledgements

This project was performed by Takahiro Kondo, Tsuyoshi Ishizu, Maiko Kinoshita, Ryoma Yata, and Yutaro Matsutomo, who graduated from Tokyo City University, and the authors also appreciate Dr. Yukari Yamashita and Hiroko Yamada (Showa University School of Dentistry), and Dr. Kazuko Hasegawa (Kamiinaseikyo Hospital) for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nobuhiko Mukai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical statements

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions.

Informed consent

Informed consent was obtained from all participants for being included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukai, N., Mori, K. & Takei, Y. Tongue model construction based on ultrasound images with image processing and deep learning method. J Med Ultrasonics 49, 153–161 (2022). https://doi.org/10.1007/s10396-022-01193-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10396-022-01193-8

Keywords

Navigation