Animated Pronunciation Generated from Speech for Pronunciation Training

  • Yurie Iribe
  • Silasak Manosavan
  • Kouichi Katsurada
  • Tsuneo Nitta
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 14)


Computer-assisted pronunciation training (CAPT) was introduced for language education in recent years. CAPT scores the learner’s pronunciation quality and points out wrong phonemes by using speech recognition technology. However, although the learner can thus realize that his/her speech is different from the teacher’s, the learner still cannot control the articulation organs to pronounce correctly. The learner cannot understand how to correct the wrong articulatory gestures precisely. We indicate these differences by visualizing a learner’s wrong pronunciation movements and the correct pronunciation movements with CG animation. We propose a system for generating animated pronunciation by estimating a learner’s pronunciation movements from his/her speech automatically. The proposed system maps speech to coordinate values that are needed to generate the animations by using multi-layer neural networks (MLN). We use MRI data to generate smooth animated pronunciations. Additionally, we verify whether the vocal tract area and articulatory features are suitable as characteristics of pronunciation movement through experimental evaluation


Vocal Tract Area Articulatory Feature Animated Pronunciation Pronunciation Training 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Delmonte, R.: SLIM prosodic automatic tools for self-learning instruction. Speech Communication 30(2-3), 145–166 (2000)CrossRefGoogle Scholar
  2. 2.
    Gamper, J., Knapp, J.: A Review of Intelligent CALL Systems. Computer Assisted Language Learning 15(4), 329–342 (2002)CrossRefGoogle Scholar
  3. 3.
    Neumeyer, L., Franco, H., Digalakis, V., Weintraub, M.: Automatic scoring of pronunciation quality. Speech Communication 30(2-3), 83–93 (2000)CrossRefGoogle Scholar
  4. 4.
    Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30(2-3), 95–108 (1995)CrossRefGoogle Scholar
  5. 5.
    Deroo, O., Ris, C., Gielen, S., Vanparys, J.: Automatic detection of mispronounced phonemes for language learning tools. In: Proceedings of ICSLP 2000, vol. 1, pp. 681–684 (2000)Google Scholar
  6. 6.
    Wang, S., Higgins, M., Shima, Y.: Training English pronunciation for Japanese learners of English online. The JALT Call Journal 1(1), 39–47 (2005)Google Scholar
  7. 7.
    Phonetics Flash Animation Project,
  8. 8.
    Wong, K.H., Lo, W.K., Meng, H.: Allophonic variations in visual speech synthesis for corrective feedback in capt. In: Proc. ICASSP 2011, pp. 5708–5711 (2011)Google Scholar
  9. 9.
    Iribe, Y., Manosavanh, S., Katsurada, K., Hayashi, R., Zhu, C., Nitta, T.: Generation Animated Pronunciation from Speech through Articulatory Feature Extraction. In: Proc. of Interspeecch 2011, pp. 1617–1621 (2011)Google Scholar
  10. 10.
    Huda, M.N., Katsurada, K., Nitta, T.: Phoneme recognition based on hybrid neural networks with inhibition/enhancement of Distinctive Phonetic Feature (DPF) trajectories. In: Proc. Interspeech 2008, pp. 1529–1532 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yurie Iribe
    • 1
  • Silasak Manosavan
    • 2
  • Kouichi Katsurada
    • 2
  • Tsuneo Nitta
    • 2
  1. 1.Information and Media CenterToyohashi University of TechnologyToyohashiJapan
  2. 2.Graduate School of EngineeringToyohashi University of TechnologyToyohashiJapan

Personalised recommendations