Biological Cybernetics

, Volume 72, Issue 1, pp 43–53 | Cite as

A neural network model of speech acquisition and motor equivalent speech production

  • Frank H. Guenther


This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.


Neural Network Model Movement Direction Speech Production Vocal Tract Speech Sound 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abbs JH (1986) Invariance and variability in speech production: a distinction between linguistic intent and its neuromotor implementation. In: Perkell JS, Klatt DH (eds) Invariance and variability in speech processes. Erlbaum, Hillsdale, pp 202–219Google Scholar
  2. Abbs JH, Gracco VL (1984) Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J Neurophysiol 51:705–723PubMedGoogle Scholar
  3. Boyce SE, Krakow RA, Bell-Berti F, Gelfer CE (1990) Converging sources of evidence for dissecting articulatory movements into core gestures. J Phonetics 18:173–188Google Scholar
  4. Bullock D, Grossberg S (1988) Neural dynamics of planned arm movements: emergent invariants and speed-accuracy properties during trajectory formation. Psychol Rev 95:49–90CrossRefPubMedGoogle Scholar
  5. Bullock D, Grossberg S, Guenther FH (1993) A self-organizing neural network model for redundant sensory-motor control, motor equivalence, and tool use. J Cogn Neurosci 5:408–435Google Scholar
  6. Cohen MA, Grossberg S, Stork DG (1988) Speech perception and production by a self-organizing neural network. In: Lee YC (ed) Evolution, learning, cognition, and advanced architectures. World Scientific Publishers, Hong KongGoogle Scholar
  7. Daniloff R, Schuckers G, Feth L (1980) The physiology of speech and hearing: an introduction. Prentice-Hall, Englewood CliffsGoogle Scholar
  8. Easton TA (1972) On the normal use of reflexes. Am Sci 60:591–599PubMedGoogle Scholar
  9. Eimas PD, Siqueland ER, Jusczyk P, Vigorito J (1971) Speech perception in infants. Science 171:303–306PubMedGoogle Scholar
  10. Folkins JW, Abbs JH (1975) Lip and jaw motor control during speech: responses to resistive loading of the jaw. J Speech Hearing Res 18:207–220PubMedGoogle Scholar
  11. Fowler CA (1980) Coarticulation and theories of extrinsic timing. J Phonetics 8:113–133Google Scholar
  12. Fowler CA (1990) Some regularities of speech are not consequences of formal rules: comments on Keating's paper. In: Kingston J, Beckman ME (eds) Papers in laboratory phonology. I. Between the grammar and physics of speech. Cambridge University Press, Cambridge, UK, pp, 476–487Google Scholar
  13. Gaudiano P, Grossberg S (1991) Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories. Neural Networks 4:147–183CrossRefGoogle Scholar
  14. Grobstein P (1991) Directed movement in the frog: a closer look at a central representation of spatial location. In: Arbib MA, Ewert JP (eds) Visual structures and integrated functions. Springer, Berlin Heidelberg New York, pp 125–138Google Scholar
  15. Guenther FH (1992) Neural models of adaptive sensory-motor control for flexible reaching and speaking. PhD dissertation, Boston UniversityGoogle Scholar
  16. Guenther FH (1993) A self-organizing neural model for motor equivalent phoneme production. In: Proceedings of the World Congress on Neural Networks, Portland. Erlbaum, Hillsdale, pp III-6–9Google Scholar
  17. Guenther FH (1994) Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Boston University Center for Adaptive Systems Technical Report CAS/CNS-94-012Google Scholar
  18. Henke WL (1966) Dynamic articulatory model of speech production using computer simulation. PhD dissertation, Massachusetts Institute of TechnologyGoogle Scholar
  19. Kaplan E, Kaplan G (1971) The prelinguistic child. In: Eliot J (ed) Human development and cognitive processes. Holt, Rinehart, and Winston, New York, pp 358–381Google Scholar
  20. Keating PA (1990) The window model of coarticulation: articulatory evidence. In: Kingston J, Beckman ME (eds) Papers in laboratory phonology. I. Between the grammar and physics of speech. Cambridge University Press, Cambridge, UK, pp 451–570Google Scholar
  21. Kelso JAS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J Exp Psychol Hum Percep Perform 10:812–832CrossRefGoogle Scholar
  22. Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonetics 5:115–133Google Scholar
  23. Kent RD, Carney P, Severeid L (1974) Velar movement and timing: evaluation of a model for binary control. J Speech Hearing Res 17:470–488PubMedGoogle Scholar
  24. Kozhevnikov VA, Chistovich LA (1965) Speech: articulation and perception. Translation by Joint Publications Research Service. Washington DC (JPRS 30543)Google Scholar
  25. Kuhl PK (1979) Speech perception in early infancy: perceptual constancy for spectrally dissimilar vowel categories. J Acoust Soc Am 66:1668–1679CrossRefPubMedGoogle Scholar
  26. Lindblom B (1983) Economy of speech gestures. In: MacNeilage PF (ed) The production of speech. Springer, Berlin Heidelberg New York, pp 217–245Google Scholar
  27. Lindblom B, Lubker J, Gay T (1979) Format frequencies of some fixed-mandible vowels and a model of speech motor programming by predictive simulation. J Phonetics 7:147–161Google Scholar
  28. MacNeilage PF (1970) Motor control of serial ordering in speech. Psychol Rev 77:182–196PubMedGoogle Scholar
  29. MacNeilage PF, Davis B (1990) Acquisition of speech production: frames, then content. In: Jeannerod M (ed) Attention and performance. XIII. Motor representation and control. Erlbaum, Hillsdale, pp 453–576Google Scholar
  30. Miyawaki K, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, Fujimura O (1975) An effect of linguistic experience: the discrimination of [r] and [1] by native speakers of Japanese and English. Percept Psychophys 18:331–340Google Scholar
  31. Munhall KG, Ostry DJ, Flanagan JR (1991) Coordinate spaces in speech planning. J Phonetics 19:293–307Google Scholar
  32. Oller DK (1980) The emergence of the sounds of speech in infancy. In: Yeni-Komshian GH, Kavanagh JF, Ferguson CA (eds) Child phonology, Vol. 1. production. Academic Press, New York, pp 93–112Google Scholar
  33. Penfield W, Rasmussen T (1950) The cerebral cortex of man: a clinical study of localization and function. MacMillan, New YorkGoogle Scholar
  34. Perkell JS (1980) Phonetic features and the physiology of speech production. In: Butterworth B (ed) Language production, Vol 1. Speech and talk. Academic Press, New York, pp 337–372Google Scholar
  35. Perkell JS, Nelson WL (1985) Variability in production of the vowels /i/ and /a/. J Acoust Soc Am 77:1889–1895CrossRefPubMedGoogle Scholar
  36. Sachs J (1976) The developments of speech. In: Carterette EC, Friedman MP (eds) Handbook of perception, Vol VIL. Languange and speech. Academic Press, New York, pp 145–172Google Scholar
  37. Sakata H, Shibutani H, Kawano K (1980) Spatial properties of visual fixation neurons in posterior parietal association cortex of the monkey. J Neurophysiol 43:1654–1672Google Scholar
  38. Saltzman EL, Kelso JAS (1987) Skilled actions: a task-dynamic approach. Psychol Rev 94:84–106CrossRefPubMedGoogle Scholar
  39. Saltzman EL, Munhall KG (1989) A dynamical approach to gestural patterning in speech production. Ecol Psychol 1:333–382Google Scholar
  40. Stark RE (1980) Stages of speech development in the first year of life. In: Yeni-Komshian GH, Kavanagh JF, Ferguson CA (eds) Child phonology, Vol 1. production. Academic Press, New York, pp 73–92Google Scholar
  41. Sussman HM, Smith JU (1971) Jaw movements under delayed auditory feedback. J Acoust Soc Am 50:685–691CrossRefPubMedGoogle Scholar
  42. Werker JF, Tees RC (1984) Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav Develop 7:49–63CrossRefGoogle Scholar
  43. Wood SAJ (1991) X-ray data on the temporal coordination of speech gestures. J Phonetics 19:281–292Google Scholar

Copyright information

© Springer-Verlag 1994

Authors and Affiliations

  • Frank H. Guenther
    • 1
  1. 1.Center for Adaptive Systems and Department of Cognitive and Neural SystemsBoston UniversityBostonUSA

Personalised recommendations