Advertisement

Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks

  • Enes AslanEmail author
  • Yusuf Sinan Akgul
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)

Abstract

Analysis of ultrasound images of the human tongue has many applications such as tongue modeling, speech therapy, language education and speech disorder diagnosis. In this paper we propose a novel ultrasound tongue contour tracker that enforces constraints of ultrasound imaging of the tongue such as spatial and temporal smoothness of the tongue contours. We use 3 different LSTM networks in sequence to satisfy these constraints. The first network uses only spatial image information from each video frame separately. The second and third networks add temporal information to the results of the first spatial network. Our networks are designed by considering the ultrasound image formation process of the human tongue. We use polar Brightness-Mode of the ultrasound images, which makes it possible to assume that each column of the image can contain at most one contour position. We tested our system on a dataset that we collected from 4 volunteers while they read written text. The final accuracy results are very promising and they exceed the state of the art results while keeping the run times at very reasonable levels (several frames per second). We provide the complete results of our system as supplementary material.

Notes

Acknowledgement

We like to thank Dr. Naci Dumlu of Pendik State Hospital, Istanbul for providing the experiment environment.

Supplementary material

480714_1_En_36_MOESM1_ESM.zip (18.9 mb)
Supplementary material 1 (zip 19352 KB)

References

  1. 1.
    Fasel, I., Berry, J.: Deep belief networks for real-time extraction of tongue contours from ultrasound during speech. In: 20th International Conference on Pattern Recognition, pp. 1493–1496. IEEE, August 2010Google Scholar
  2. 2.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)Google Scholar
  3. 3.
    Akgul, Y.S., Kambhamettu, C., Stone, M.: Automatic extraction and tracking of the tongue contours. IEEE Trans. Med. Imaging 18(10), 1035–1045 (1999)CrossRefGoogle Scholar
  4. 4.
    Li, M., Kambhamettu, C., Stone, M.: Automatic contour tracking in ultrasound images. Clin. Linguist. Phonetics 19(6–7), 545–554 (2005)CrossRefGoogle Scholar
  5. 5.
    Stone, M.: A guide to analysing tongue motion from ultrasound images. Clin. Linguist. Phonetics 19(6–7), 455–501 (2005)CrossRefGoogle Scholar
  6. 6.
    Xu, K., et al.: Robust contour tracking in ultrasound tongue image sequences. Clin. Linguist. Phonetics 30(3–5), 313–327 (2016)CrossRefGoogle Scholar
  7. 7.
    Wen, S.: Automatic tongue contour segmentation using deep learning. Doctoral dissertation, Université d’Ottawa/University of Ottawa (2018)Google Scholar
  8. 8.
    Lai, K.F., Chin, R.T.: Deformable contours: modeling and extraction. IEEE Trans. Pattern Anal. Mach. Intell. 17(11), 1084–1090 (1995)CrossRefGoogle Scholar
  9. 9.
    Aslan, E., Dumlu, N., Akgul, Y.S.: Tongue contour extraction from ultrasound images using image parts. In: 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE, May 2018Google Scholar
  10. 10.
    Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35289-8_32CrossRefGoogle Scholar
  11. 11.
    Chollet, F.: Keras: deep learning library for Theano and TensorFlow, GitHub Repos. (2015)Google Scholar
  12. 12.
    Gérard, J.M., Perrier, P., Payan, Y.: 3D biomechanical tongue modeling to study speech production, pp. 85–102 (2006)Google Scholar
  13. 13.
    Mozaffari, M.H., Wen, S., Wang, N., Lee, W.: Real-time automatic tongue contour tracking in ultrasound video for guided pronunciation training. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 302–309 (2019)Google Scholar
  14. 14.
    Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 236–243. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45243-0_31CrossRefGoogle Scholar
  15. 15.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  16. 16.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  17. 17.
    Makin, I.R.S., Dunki-Jacobs, R., Pellegrino, R.C., Slayton, M.H.: U.S. Patent No. 7,806,892. U.S. Patent and Trademark Office, Washington, DC (2010)Google Scholar
  18. 18.
    Makin, I.R., Avidor, Y., Barthe, P., Slayton, M.: U.S. Patent Application No. 10/847,209 (2005)Google Scholar
  19. 19.
    Bridal, S.L., Correas, J.M., Saied, A.M.E.N.A., Laugier, P.: Milestones on the road to higher resolution, quantitative, and functional ultrasonic imaging. Proc. IEEE 91(10), 1543–1561 (2003)CrossRefGoogle Scholar
  20. 20.
    Abel, J., et al.: Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Can. Acoust. 43(3), 124–125 (2015)Google Scholar
  21. 21.
    Bernhardt, M.B., et al.: Ultrasound as visual feedback in speech habilitation: exploring consultative use in rural British Columbia, Canada. Clin. Linguist. Phonetics 22(2), 149–162 (2008)CrossRefGoogle Scholar
  22. 22.
    Preston, J.L., McCabe, P., Rivera-Campos, A., Whittle, J.L., Landry, E., Maas, E.: Ultrasound visual feedback treatment and practice variability for residual speech sound errors. J. Speech Lang. Hear. Res. 57(6), 2102–2115 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Engineering, GIT Vision LabGebze Technical UniversityKocaeliTurkey
  2. 2.R&D DepartmentKuveyt Turk Participation BankKocaeliTurkey

Personalised recommendations