A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis

  • Bernd J. Kröger
  • Peter Birkholz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4775)


An articulatory speech synthesizer comprising a three-dimensional vocal tract model and a gesture-based concept for control of articulatory movements is introduced and discussed in this paper. A modular learning concept based on speech perception is outlined for the creation of gestural control rules. The learning concept includes on sensory feedback information for articulatory states produced by the model itself, and auditory and visual information of speech items produced by external speakers. The complete model (control module and synthesizer) is capable of producing high-quality synthetic speech signals and introduces a scheme for the natural speech production and speech perception processes.


articulation speech synthesis speech production gesture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abry, C., Boë, L.J.: Laws for lips. Speech Communication 5, 97–104 (1986)CrossRefGoogle Scholar
  2. Birkholz, P.: 3D-Artikulatorische Sprachsynthese. Unpublished PhD thesis. University Rostock (2005)Google Scholar
  3. Birkholz, P.: Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets. In: Proceedings of the Interspeech 2007 - Eurospeech. Antwerp, Belgium (2007c)Google Scholar
  4. Birkholz, P., Jackèl, D.: Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system. In: Proceedings of Interspeech 2004-ICSLP. Jeju, Korea, pp. 1125–1128 (2004)Google Scholar
  5. Birkholz, P., Kröger, B.J.: Vocal tract model adaptation using magnetic resonance imaging. In: Proceedings of the 7th International Seminar on Speech Production, pp. 493–500. Belo Horizonte, Brazil (2006) Google Scholar
  6. Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and control of a three-dimensional vocal tract model. In: ICASSP 2006. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, pp. 873–876 (2006)Google Scholar
  7. Birkholz, P., Jackèl, D., Kröger, B.J.: Simulation of losses due to turbulence in the time-varying vocal system. IEEE Transactions on Audio, Speech, and Language Processing 15, 1218–1225 (2007a)CrossRefGoogle Scholar
  8. Birkholz, P., Steiner, I., Breuer, S.: Control concepts for articulatory speech synthesis. In: Proceedings of the 6th ISCA Speech Synthesis Research Workshop. Universität Bonn (2007b)Google Scholar
  9. Browman, C.P., Goldstein, L.: Articulatory gestures as phonological units. Phonology 6, 201–251 (1989)CrossRefGoogle Scholar
  10. Browman, C.P., Goldstein, L.: Tiers in articulatory phonology, with some implications for casual speech. In: Kingston, J., Beckman, M.E. (eds.) Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, pp. 341–376. Cambridge University Press, Cambridge (1990a)Google Scholar
  11. Browman, C.P., Goldstein, L.: Gestural specification using dynamically-defined articulatory structures. Journal of Phonetics 18, 299–320 (1990b)Google Scholar
  12. Browman, C.P., Goldstein, L.: Articulatory phonology: An overview. Phonetica 49, 155–180 (1992)Google Scholar
  13. Cranen, B., Schroeter, J.: Modeling a leaky glottis. Journal of Phonetics 23, 165–177 (1995)CrossRefGoogle Scholar
  14. Dang, J., Honda, K.: Morphological and acoustical analysis of the nasal and the paranasal cavities. Journal of the Acoustical Society of America 96, 2088–2100 (1994)CrossRefGoogle Scholar
  15. Fadiga, L., Crahighero, L.: Electrophysiology of action representation. Journal of clinical Neurophysiology 21, 157–169 (2004)CrossRefGoogle Scholar
  16. Flanagan, J.L.: Speech Analysis, Synthesis and Perception. Springer, Berlin (1965)Google Scholar
  17. Guenther, F.H., Perkell, J.S.: A neural model of speech production and its application to studies of the role of auditory feedback in speech. In: Maassen, B., Kent, R., Peters, H., van Lieshout, P., Hulstijn, W. (eds.) Speech motor control in normal and disordered speech, pp. 29–49. Oxford University Press, Oxford (2004)Google Scholar
  18. Guenther, F.H., Hampson, M., Johnson, D.: A theoretical investigation of reference frames for the planning of speech movements. Psychological Review 105, 611–633 (1998)CrossRefGoogle Scholar
  19. Guenther, F.H.: Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders 39, 350–365 (2006)CrossRefGoogle Scholar
  20. Guenther, F.H., Ghosh, S.S., Tourville, J.A.: Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96, 280–301 (2006)CrossRefGoogle Scholar
  21. Ito, T., Gomi, H., Honda, M.: Dynamic simulation of speech cooperative articulation by muscle linkages. Biological Cybernetics 91, 275–282 (2004)zbMATHCrossRefGoogle Scholar
  22. Kent, R.D.: Research on speech motor control and its disorders: A review and prospective. Journal of Communication disorders 33, 391–428 (2000)CrossRefGoogle Scholar
  23. Kohler, K.J.: Gestural reorganization in connected speech: A functional viewpoint on ’articulatory phonology’. Phonetica 49, 205–211 (1992)Google Scholar
  24. Kröger, B.J.: A gestural production model and its application to reduction in German. Phonetica 50, 213–233 (1993)CrossRefGoogle Scholar
  25. Kröger, B.J.: Ein phonetisches Modell der Sprachproduktion. Niemeyer Verlag, Tübingen (1998)Google Scholar
  26. Kröger, B.J., Birkholz, P., Kannampuzha, J., Neuschaefer-Rube, C.: Modeling sensory-to-motor mappings using neural nets and a 3D articulatory speech synthesizer. In: Proceedings of the 9th International Conference on Spoken Language Processing, Interspeech 2006, ICSLP, pp. 565–568 (2006a)Google Scholar
  27. Kröger, B.J., Birkholz, P., Kannampuzha, J., Neuschaefer-Rube, C.: Learning to associate speech-like sensory and motor states during babbling. In: Proceedings of the 7th International Seminar on Speech Production. Belo Horizonte, Brazil, pp. 67–74 (2006b)Google Scholar
  28. Kröger, B.J., Birkholz, P., Kannampuzha, J., Neuschaefer-Rube, C.: Spatial-to-joint coordinate mapping in a neural model of speech production. In: DAGA-Proceedings of the Annual Meeting of the German Acoustical Society. Braunschweig, Germany, pp. 561–562 (2006c)Google Scholar
  29. Kröger, B.J., Birkholz, P., Kannampuzha, J., Neuschaefer-Rube, C.: Modeling the perceptual magnet effect and categorical perception using self-organizing neural networks. In: Proceedings of the International Congress of Phonetic Sciences. Saarbrücken, Germany (2007)Google Scholar
  30. Kröger, B.J., Schröder, G., Opgen-Rhein, C.: A gesture-based dynamic model describing articulatory movement data. Journal of the Acoustical Society of America 98, 1878–1889 (1995)CrossRefGoogle Scholar
  31. Lindblom, B.: Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 35, 1773–1781 (1963)CrossRefGoogle Scholar
  32. Mermelstein, P.: Articulatory model for the study of speech production. Journal of the Acoustical Society of America 53, 1070–1082 (1973)CrossRefGoogle Scholar
  33. Ogata, K., Sonoda, Y.: Evaluation of articulatory dynamics and timing based on cascaded first-order systems. In: Proceedings of the 5th Seminar on Speech Production, Kloster Seeon, Germany, pp. 321–324 (2000)Google Scholar
  34. Paine, R.W., Tani, J.: Motor primitive and sequence self-organization in a hierarchical recurrent neural network. Neural Networks 17, 1291–1309 (2004)CrossRefGoogle Scholar
  35. Perkell, J.S., Matthies, M., Lane, H., Guenther, F., Wilhelms-Tricarico, R., Wozniak, J., Guiod, P.: Speech motor control: Acoustic goals, saturaltion effects, auditory feedback and internal models. Speech communication 22, 227–250 (1997)CrossRefGoogle Scholar
  36. Saltzman, E.L., Munhall, K.G.: A dynamic approach to gestural patterning in speech production. Ecological Psychology 1, 333–382 (1989)CrossRefGoogle Scholar
  37. Smith, C.L., Browman, C.P., Kay, B., McGowan, R.S.: Extracting dynamic parameters from speech movement data. Journal of the Acoustical Society of America 93, 1580–1588 (1993)CrossRefGoogle Scholar
  38. Sober, S.J., Sabes, P.N.: Multisensory integration during motor planning. The Journal of Neuroscience 23, 6982–6992 (2003)Google Scholar
  39. Stevens, K.N.: On the quantal nature of speech. Journal of Phonetics 17, 3–45 (1989)Google Scholar
  40. Strange, W.: Dynamic specification of coarticulated vowels spoken in sentence context. Journal of the Acoustical Society of America 85, 2135–2153 (1989)CrossRefGoogle Scholar
  41. Titze, I.R.: Parameterization of the glottal area, glottal flow, and vocal fold contact area. Journal of the Acoustical Society of America 75, 570–580 (1984)CrossRefGoogle Scholar
  42. Todorov, E.: Optimality principles in sensorimotro control. Nature Neuroscience 7, 907–915 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Bernd J. Kröger
    • 1
  • Peter Birkholz
    • 2
  1. 1.Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and Aachen University, 52074 AachenGermany
  2. 2.Institute for Computer Science, University of Rostock, 18059 RostockGermany

Personalised recommendations