Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks
Analysis of ultrasound images of the human tongue has many applications such as tongue modeling, speech therapy, language education and speech disorder diagnosis. In this paper we propose a novel ultrasound tongue contour tracker that enforces constraints of ultrasound imaging of the tongue such as spatial and temporal smoothness of the tongue contours. We use 3 different LSTM networks in sequence to satisfy these constraints. The first network uses only spatial image information from each video frame separately. The second and third networks add temporal information to the results of the first spatial network. Our networks are designed by considering the ultrasound image formation process of the human tongue. We use polar Brightness-Mode of the ultrasound images, which makes it possible to assume that each column of the image can contain at most one contour position. We tested our system on a dataset that we collected from 4 volunteers while they read written text. The final accuracy results are very promising and they exceed the state of the art results while keeping the run times at very reasonable levels (several frames per second). We provide the complete results of our system as supplementary material.
We like to thank Dr. Naci Dumlu of Pendik State Hospital, Istanbul for providing the experiment environment.
- 1.Fasel, I., Berry, J.: Deep belief networks for real-time extraction of tongue contours from ultrasound during speech. In: 20th International Conference on Pattern Recognition, pp. 1493–1496. IEEE, August 2010Google Scholar
- 2.Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)Google Scholar
- 7.Wen, S.: Automatic tongue contour segmentation using deep learning. Doctoral dissertation, Université d’Ottawa/University of Ottawa (2018)Google Scholar
- 9.Aslan, E., Dumlu, N., Akgul, Y.S.: Tongue contour extraction from ultrasound images using image parts. In: 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE, May 2018Google Scholar
- 11.Chollet, F.: Keras: deep learning library for Theano and TensorFlow, GitHub Repos. (2015)Google Scholar
- 12.Gérard, J.M., Perrier, P., Payan, Y.: 3D biomechanical tongue modeling to study speech production, pp. 85–102 (2006)Google Scholar
- 13.Mozaffari, M.H., Wen, S., Wang, N., Lee, W.: Real-time automatic tongue contour tracking in ultrasound video for guided pronunciation training. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 302–309 (2019)Google Scholar
- 17.Makin, I.R.S., Dunki-Jacobs, R., Pellegrino, R.C., Slayton, M.H.: U.S. Patent No. 7,806,892. U.S. Patent and Trademark Office, Washington, DC (2010)Google Scholar
- 18.Makin, I.R., Avidor, Y., Barthe, P., Slayton, M.: U.S. Patent Application No. 10/847,209 (2005)Google Scholar
- 20.Abel, J., et al.: Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Can. Acoust. 43(3), 124–125 (2015)Google Scholar