Building a Talking Baby Robot: A Contribution to the Study of Speech Acquisition and Evolution
Abstract
Speech is a perceptuo-motor system. A natural computational modelling framework is provided by cognitive robotics, or more precisely speech robotics, which is also based on embodiment, multimodality, development, and interaction. This chapter describes the bases of a virtual baby robot, an articulatory model that integrates the non-uniform growth of the vocal tract, a set of sensors, and a learning model. The articulatory model delivers sagittal contour, lip shape and acoustic formants from seven input parameters that characterize the configurations of the jaw, the tongue, the lips and the larynx. To simulate the growth of the vocal tract from birth to adulthood, a process modifies the longitudinal dimension of the vocal tract shape as a function of age. The auditory system of the robot comprises a “phasic” system for event detection over time, and a “tonic” system to track formants. The model of visual perception specifies the basic lip characteristics: height, width, area and protrusion. The orosensorial channel, which provides tactile sensations on the lips, the tongue and the palate, is elaborated as a model for the prediction of tongue–palatal contacts from articulatory commands. Learning involves Bayesian programming, in which there are two phases: (i) specification of the variables, decomposition of the joint distribution and identification of the free parameters through exploration of a learning set; and (ii) utilization, which relies on questions about the joint distribution.
Keywords
Speech Production Vocal Tract Speech Sound Speech Development Audiovisual SpeechPreview
Unable to display preview. Download preview PDF.
References
- Abry, C.: [b ]-[d ]-[g ] as a universal triangle as acoustically optimal as [i ]-[a ]-[u ]. 15th Int. Congr. Phonetics ICPhS, 727–730 (2003)Google Scholar
- Abry, C., Badin, P.: Speech mapping as a framework for an integrated approach to the sensori-motor foundations of language. In: 4th Speech Production Seminar, 1st ESCA Tutorial and Research Workshop on Speech Production Modelling: from control strategies to acoustics, pp. 175–184 (1996)Google Scholar
- Abry, C., Boë, L.-J.: Laws for lips. Speech Communication 5, 97–104 (1986)CrossRefGoogle Scholar
- Abry, C., Benoît, C., Boë, L.-J., Sock, R.: Un choix d’événements pour l’organisation temporelle du signal de parole. In: 14èmes Journées d’Etudes sur la Parole, Société Française d’Acoustique, pp. 133–137 (1985)Google Scholar
- Abry, C., Orliaguet, J., Sock, R.: Patterns of speech phasing. their robustness in the production of a timed linguistic task: single vs. double (abutted) consonants in french. European Bull. of Cogn. Psych. 10, 269–288 (1990)Google Scholar
- Abry, C., Cathiard, M., Vilain, A., Laboissière, R., Loevenbruck, H., Savariaux, C., Schwartz, J.-L.: Some insights in bimodal perception given for free by the natural time course of speech production. In: Vatikiotis-Bateson, E., Bailly, G., Perrier, P. (eds.) Audiovisual Speech Processing, MIT Press, Cambridge (2006)Google Scholar
- Bailly, G.: Learning to speak. sensori-motor control of speech movements. Speech Communication 22, 251–268 (1997)CrossRefGoogle Scholar
- Berrah, A., Glotin, H., Laboissière, R., Bessière, P., Boë, L.-J.: From form to formation of phonetic structures: an evolutionary computing perspective. In: Fogarty, T., Venturini, G. (eds.) ICML 1996 Workshop onn Evolutionary Computing and Machine Learning, pp. 23–29 (1996)Google Scholar
- Bessière, P.: Vers une théorie probabiliste des systèmes sensori-moteurs (HDR). PhD thesis, Université Joseph Fourier, Grenoble, France (2000)Google Scholar
- Bladon, A.: Arguments against formants in the auditory representation of speech. In: Carlson, R., Granström, B. (eds.) The Representation of Speech in the Peripheral Auditory System, pp. 95–102. Elsevier Biomedical, Amsterdam (1982)Google Scholar
- Boë, L.-J.: Modelling the growth of the vocal tract vowel spaces of newly-born infants and adults. In: Proc. XIVth International Congress of Phonetic Sciences, pp. 2501–2504 (1999)Google Scholar
- Boë, L.-J., Maeda, S.: Modélisation de la croissance du conduit vocal. In: Journées d’Études Linguistiques “La Voyelle dans tous ses états”, pp. 98–105 (1998)Google Scholar
- Boë, L.-J., Perrier, P., Guérin, B., Schwartz, J.-L.: Maximal vowel space. In: Proc. of Eurospeech 1989, pp. 281–284 (1989)Google Scholar
- Boë, L.-J., Perrier, P., Bailly, G.: The geometric vocal tract variables controlled for vowel production: proposals for constraining acoustic-to-articulatory inversion. Journal of Phonetics 20, 27–38 (1992)Google Scholar
- Boë, L.-J., Gabioud, B., Perrier, P.: Speech maps interactive plant ”smip”. In: Proc. XIIIth International Congress of Phonetic Sciences, vol. 2, pp. 426–429 (1995a)Google Scholar
- Boë, L.-J., Gabioud, B., Perrier, P., Schwartz, J.-L., Vallée, N.: Vers une unification des espaces vocaliques. In: Levels in Speech Communication: Relations and Interactions, pp. 63–71. Elsevier Science, Amsterdam (1995)Google Scholar
- Boë, L.-J., Abry, C., Beautemps, D., Schwartz, J., Laboissière, R.: Les sosies vocaliques – inversion et focalisation. XXIIIèmes Journées d’Étude sur la Parole, 257–260 (2000)Google Scholar
- Boë, L.-J., Vallée, N., Badin, P., Schwartz, J.-L., Abry, C.: Tendencies in phonological structures: The influence of substance on form. Les Cahiers de l’ICP, Bulletin de la Communication Parlée 5, 35–55 (2000)Google Scholar
- Bosma, J. (ed.): Symposium on oral sensation and perception. Charles C. Thomas (1967)Google Scholar
- Bothorel, A., Simon, P., Wioland, F., Zerling, J.P.: Cinéradiographie des voyelles et des consonnes du français. recueil de documents synchronisés pour quatre sujets: vues latérales du conduit vocal, vues frontales de l’orifice labial, données acoustiques. Technical report, Institut de Phonétique, Strasbourg, France (1986)Google Scholar
- Brooks, R., Breazeal, C., Marjanovic, M., Scassellati, B., Williamson, M.: The cog project: Building a humanoid robot. In: Nehaniv, C. (ed.) Computation for Metaphors, Analogy, and Agents. Notes in Artificial Intelligence, pp. 52–87. Springer, Heidelberg (1999)CrossRefGoogle Scholar
- Campbell, R., Dodd, B., Burnham, D. (eds.): Hearing by eye, II. Perspectives and directions in research on audiovisual aspects of language processing. Psychology Press (1998)Google Scholar
- Chistovich, L.: Auditory processing of speech. Language and Speech 23, 67–72 (1980)Google Scholar
- Davis, B., MacNeilage.: The articulatory basis of babbling. Am. SLH Ass. 38, 1199–1211 (1995)Google Scholar
- De Boer, B.: Self-organisation in vowel systems. Journal of Phonetics, 441–465 (2000)Google Scholar
- Delgutte, B.: Speech coding in the auditory nerve ii: Processing schemes for vowel-like sounds. J. Acoust. Soc. Am. 75, 879–886 (1984)CrossRefGoogle Scholar
- Dodd, B., Campbell, R. (eds.): Hearing by eye: the psychology of lipreading. Lawrence Erlbaum Associates, Mahwah (1987)Google Scholar
- Fant, G.: Acoustic Theory of Speech Production. The Hague, Mouton (1960)Google Scholar
- Gabioud, B.: Articulatory models in speech synthesis. In: Keller, E. (ed.) Fundamentals of Speech Synthesis and Recognition. Basic Concepts, State-of-the-Art and Future Challenges, pp. 215–230. John Willey (1994)Google Scholar
- Goldstein, U.: An articulatory model for the vocal tract of the growing children. PhD thesis, MIT, Cambridge, Massachusetts, USA (1988)Google Scholar
- Guenther, F.: Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review 102, 594–621 (1995)CrossRefGoogle Scholar
- Guiard Marigny, T.: Modélisation des lèvres. Master’s thesis, DEA Signal Image Parole, INP, Grenoble, France (1992)Google Scholar
- Hardcastle, W.: Physiology of speech production. Academic Press, London (1976)Google Scholar
- Hoole, P.: Bite-block speech in the absence of oral sensibility. In: Proc. ICPhS, vol. 4, pp. 16–19 (1987)Google Scholar
- Jakobson, R.: Child language aphasia,and phonological universals. Mouton, The Hague (1968)Google Scholar
- Kent, R., Miolo, G.: Phonetic abilities in the first year of life. In: Fletcher, P.M. (ed.) The Handbook of Child Language, Blackwel Publishers (1995)Google Scholar
- Kent, R., Martin, R., Sufit, R.: Oral sensation: a review and clinical prospective. In: Winitz, H. (ed.) Human Communication and its Disorders, pp. 135–191. Ablex Publishing, Greenwich (1990)Google Scholar
- Koopmans-Van Beinum, F., Van Der Stelt, J.: Early stages in the development of speech movements. In: Lindblom, B., Zetterstrom, R. (eds.) Precursors of Early Speech, pp. 37–49. Stockton Press (1986)Google Scholar
- Kuhl, P., Meltzoff, A.: The bimodal perception of speech in infancy. Science 218, 1138–1141 (1992)CrossRefGoogle Scholar
- Kuhl, P., Meltzoff, A.: Infant vocalizations in response to speech: vocal imitation and developmental changes. J. Acoust. Soc. Am. 100, 2425–2438 (1996)CrossRefGoogle Scholar
- Laboissière, R.: Préliminaires pour une robotique de la communication parlée: inversion et contrôle d’un modèle articulatoire du conduit vocal. PhD thesis, Thèse de Docteur de l’INPG, Signal-Image-Parole, Grenoble, France (1992)Google Scholar
- Landgren, S., Olsson, K.: Oral mechanoreceptors. In: Grillner, S. (ed.) Speech Motor Control, Pergamon Press, Oxford (1982)Google Scholar
- Liljencrants, J., Lindblom, B.: Numerical simulations of vowel quality systems: The role of perceptual contrast. Language 48, 839–862 (1972)CrossRefGoogle Scholar
- Lindblom, B.: Phonetic universals in vowel systems. In: Ohala, J., Jaeger, J. (eds.) Experimental Phonology, pp. 13–44. Academic Press, London (1986)Google Scholar
- Lindblom, B.: On the notion of possible speech sound. Journal of Phonetics 18, 135–152 (1990)Google Scholar
- Lindblom, B.: Systemic constraints and adaptive change in the formation of sound structure. In: Hurford, J. (ed.) Evolution of Human Language, Edimburgh Univ. Press (1997)Google Scholar
- Lindblom, B., Lubker, J., McAllister, R.: Compensatory articulation and the modeling of normal speech production behavior. In: Carré, R. (ed.) Articulatory modeling and phonetics, pp. 147–161. GALF (1977)Google Scholar
- Mackenzie Beck, J.: Organic variation of the vocal apparatus. In: Hardcastle, W., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 256–297. Blackwell Publishers, Malden (1997)Google Scholar
- MacNeilage, P., Davis, B.: Acquisition of speech production, frames then content. In: Jeannerod, M. (ed.) Attention and Performance, XIII: Motor Representation and Control, pp. 453–476. Lawrence Erlbaum Associates, Mahwah (1990)Google Scholar
- MacNeilage, P., Rootes, T., Chase, R.: Speech production and perception in a patient with severe impairment of somesthesic perception and motor control. Journal of Speech and Hearing Research 10, 449–467 (1967)Google Scholar
- MacNeilage, P.F.: The frame/content theory of evolution of speech production. Behavioral and Brain Sciences (BBS) 21(4), 499–511 (1998)Google Scholar
- Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model. In: Hardcastle, W., Marchal, A. (eds.) Speech Production and Modelling, pp. 131–149. Kluwer Academic Publishers, Dordrecht (1989)Google Scholar
- Matyear, C.L.: An acoustical study of vowels in babbling. PhD thesis, Doct. diss. University of Texas. Austin (1997)Google Scholar
- Matyear, C.L., MacNeilage, P.F., Davis, B.L.: Nasalization of vowels in nasal environments in babbling: evidence for frame dominance. Phonetica 55, 1–17 (1998)CrossRefGoogle Scholar
- McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRefGoogle Scholar
- Meltzoff, A.N.: Newborn imitation. In: Min, D., Blater, A. (eds.) Infant development, the essentiel readings, pp. 165–181. Blackwell, Malden (2000)Google Scholar
- Ménard, L., Schwartz, J.-L., Boë, L.-J., Kandel, S., Vallée, N.: Auditory normalization of french vowels synthesized by an articulatory model simulating growth from birth to adulthood. Journal of the Acoustical Society of America 4(111), 1892–1905 (2002)CrossRefGoogle Scholar
- Ménard, L., Schwartz, J.-L., Boë, L.-J.: The role of vocal tract morphology in speech development: Perceptual targets and sensori-motor maps for french synthesized vowels from birth to adulthood. Journal of Language, Speech and Hearing Research 47, 1059–1080 (2004)CrossRefGoogle Scholar
- Mills, A.: The development of phonology in the blind child. In: Dodd, B., Campbell, R. (eds.) Hearing by eye: the psychology of lipreading, pp. 145–161. Lawrence Erlbaum, Mahwah (1987)Google Scholar
- Piquemal, M., Schwartz, J.-L., Berthommier, F., Lallouache, T., Escudier, P.: Détection et localisation auditive d’explosions consonantiques dans des séquences vcv bruitées. In: Actes des XXIemes Journées d’études sur la parole, pp. 143–146 (1996)Google Scholar
- Pols, L.: Analysis and synthesis of speech using a broad-band spectral representation. In: Fant, G., Tatham, M. (eds.) Auditory Analysis and Perception of Speech, Academic Press, London (1975)Google Scholar
- Recasens, D.: An electropalatographic and acoustic study of consinant-to-vowel coarticulation. Journal of Phonetics 19, 177–192 (1991)Google Scholar
- Savariaux, C., Perrier, P., Orliaguet, J.: Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube: A study of the control space in speech production. J. Acoust. Soc. Am. 98, 2428–2442 (1995)CrossRefGoogle Scholar
- Schroeder, M., Atal, B., Hall, J.: Objective measure of certain speech signal degradations based on masking properties of human auditory perception. In: Lindblom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 217–229. Academic Press, London (1979)Google Scholar
- Schwartz, J.-L., Boë, L.-J.: Predicting palatal contacts from jaw and tongue commands: a new sensory model and its potential use in speech control. In: 5th Seminar on speech production: Models and data (2000)Google Scholar
- Schwartz, J.-L., Arrouas, Y., Beautemps, D., Escudier, P.: Auditory analysis of speech gestures. In: Schouten, M. (ed.) The Auditory Processing of Speech – From Sounds to Words, Speech Research. Mouton de Gruyter (1992)Google Scholar
- Schwartz, J.-L., Boë, L.-J., Vallée, N., Abry, C.: The dispersion-focalization theory of vowel systems. Journal of Phonetics 25, 255–286 (1997)CrossRefGoogle Scholar
- Schwartz, J.-L., Robert-Ribes, J., Escudier, P.: Ten years after summerfield a taxonomy of models for audiovisual fusion in speech perception. In: Campbell, B.D.R., Burnham, D. (eds.) Hearing by eye, II. Perspectives and directions in research on audiovisual aspects of language processing, pp. 85–108. Psychology Press (1998)Google Scholar
- Schwartz, J.-L., Abry, C., Boë, L.-J., Cathiard, M.: Phonology in a theory of perception-for-action-control. In: Durand, B.L.J. (ed.) Phonetics, Phonology and Cognition, pp. 255–280. Oxford University Press, Oxford (2002)Google Scholar
- Serkhane, J., Schwartz, J.-L.: Simulating vocal imitation in infants, using a growth articulatory model and speech robotics. In: Proc. ICPhS, Barcelona, pp. 2241–2245 (2003)Google Scholar
- Serkhane, J., Schwartz, J.-L., Boë, L.-J., Davis, B., Matyear, C.: Motor specifications of a baby robot via the analysis of infants’ vocalizations. In: ICSLP 2002, pp. 45–48 (2002)Google Scholar
- Steels, L.: Synthesising the origins of language and meaning using co-evolution, self oprganisation and level formation. In: Hurford, M.S.-K.J.R., Knight, C. (eds.) Approaches to the evolution of language, pp. 384–404. Cambridge University Press, Cambridge (1998)Google Scholar
- Vilain, A., Abry, C., Badin, P.: Coproduction strategies in french vcvc: Confronting ohman’s model with adult and developmental articulatory data. In: Proc.5th Seminar on Speech Production, Munich, Germany, pp. 81–84 (2000)Google Scholar
- Wood, S.: A radiographic analysis of constriction locations for vowels. Journal of Phonetics 7, 25–43 (1979)Google Scholar
- Wu, Z., Schwartz, J.-L., Escudier, P.: Physiologically plausible modules for the detection of articulatory-acoustic events. In: Ainsworth, B. (ed.) Advances in Speech, Hearing and Language Processing, Cochlear Nucleus, vol. 3, pp. 479–495. JAI Press (1996)Google Scholar