Multimodal Language Acquisition Based on Motor Learning and Interaction

Hörnstein, Jonas; Gustavsson, Lisa; Santos-Victor, José; Lacerda, Francisco

doi:10.1007/978-3-642-05181-4_20

Jonas Hörnstein⁴,
Lisa Gustavsson⁵,
José Santos-Victor⁴ &
…
Francisco Lacerda⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 264))

1522 Accesses
4 Citations

Abstract

This work presents a developmental and ecological approach to language acquisition in robots, which has its roots in the interaction between infants and their caregivers. We show that the signal directed to infants by their caregivers include several hints that can facilitate the language acquisition and reduce the need for preprogrammed linguistic structure. Moreover, infants also produce sounds, which enables for richer types of interactions such as imitation games, and for the use of motor learning. By using a humanoid robot with embodied models of the infant’s ears, eyes, vocal tract, and memory functions, we can mimic the adult-infant interaction and take advantage of the inherent structure in the signal. Two experiments are shown, where the robot learn a number of word-object associations and the articulatory target positions for a number of vowels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Albin, D.D., Echols, C.H.: Stressed and word-final syllables in infant-directed speech. Infant Behavior and Development 19, 401–418 (1996)
Article Google Scholar
Andruski, J.E., Kuhl, O.K., Hayashi, A.: Point vowels in Japanese mothers’ speech to infants and adults. The Journal of the Acoustical Society of America 105, 1095–1096 (1999)
Article Google Scholar
Batliner, A., Biersack, S., Steidl, S.: The Prosody of Pet Robot Directed Speech: Evidence from Children. In: Proc. of Speech Prosody 2006, Dresden, pp. 1–4 (2006)
Google Scholar
Burnham, D.: What’s new pussycat? On talking to babies and animnals. Science 296, 1435 (2002)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, Chichester (2006)
MATH Google Scholar
Crystal, D.: Non-segmental phonology in language acquisition: A review of the issues. Lingua 32, 1–45 (1973)
Article Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics, speech, and signal processing ASSP-28(4) (August 1980)
Google Scholar
de Boer, B.: Infant directed speech and evolution of language. In: Evolutionary Prerequisites for Language, pp. 100–121. Oxford University Press, Oxford (2005)
Google Scholar
Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Neuroscience 15, 399–402 (2002)
Article Google Scholar
Ferguson, C.A.: Baby talk in six languages. American Anthropologist 66, 103–114 (1964)
Article Google Scholar
Fernald, A.l.: The perceptual and affective salience of mothers’ speech to infants. In: The origins and growth of communication, Norwood, N.J, Ablex (1984)
Google Scholar
Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behavior and Development 8, 181–195 (1985)
Article Google Scholar
Fernald, A., Mazzie, C.: Prosody and focus in speech to infants and adults. Developmental Psychology 27, 209–221 (1991)
Article Google Scholar
Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G.: Action Recognition in the Premotor Cortex. Brain 199, 593–609 (1996)
Article Google Scholar
Gustavsson, L., Sundberg, U., Klintfors, E., Marklund, E., Lagerkvist, L., Lacerda, F.: Integration of audio-visual information in 8-months-old infants. In: Proceedings of the Fourth Internation Workshop on Epigenetic Robotics Lund University Cognitive Studies, vol. 117, pp. 143–144 (2004)
Google Scholar
Fitzgibbon, A., Pilu, M., Risher, R.B.: Direct least square fitting of ellipses. Tern Analysis and Machine Intelligence, 21 (1999)
Google Scholar
Fitzpatrick, P., Varchavskaia, P., Breazeal, C.: Characterizing and processing robotdirected speech. In: Proceedings of the International IEEE/RSJ Conference on Humanoid Robotics (2001)
Google Scholar
Fukui, K., Nishikawa, K., Kuwae, T., Takanobu, H., Mochida, T., Honda, M., Takanishi, A.: Development of a New Humanlike Talking Robot for Human Vocal Mimicry. In: Proc. International Conference on Robotics and Automation, Barcelona, Spain, April 2005, pp. 1437–1442 (2005)
Google Scholar
Guenther, F.H., Ghosh, S.S., Tourville, J.A.: Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96(3), 280–301
Google Scholar
Hastie, T.: The elements of statistical learning data mining inference and prediction. Springer, Heidelberg (2001)
MATH Google Scholar
Higashimoto, T., Sawanda, H.: Speech Production by a Mechanical Model: Construction of a Vocal Tract and Its Control by Neural Network. In: Proc. International Conference on Robotics and Automation, Washington DC, May 2002, pp. 3858–3863 (2002)
Google Scholar
Hirsh-Pasek, K.: Doggerel: motherese in a new context. Journal of Child Language 9, 229–237 (1982)
Article Google Scholar
Hörnstein, J., Santos-Victor, J.: A Unified Approach to Speech Production and Recognition Based on Articulatory Motor Representations. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA (October 2007)
Google Scholar
Hörnstein, J., Soares, C., Santos-Victor, J., Bernardino, A.: Early Speech Development of a Humanoid Robot using Babbling and Lip Tracking. In: Symposium on Language and Robots, Aveiro, Portugal, (December 2007)
Google Scholar
Hörnstein, J., Gustavsson, L., Santos-Victor, J., Lacerda, F.: Modeling Speech imitation. In: IROS-2008 Workshop - From motor to interaction learning in robots, Nice, France (September 2008)
Google Scholar
Hörnstein, J., Lopes, M., Santos-Victor, J., Lacerda, F.: Sound localization for humanoid robots - building audio-motor maps based on the HRTF. In: IEEE/RSJ International Conference on intelligent Robots and Systems, Beijing, China, October 9-15 (2006)
Google Scholar
Jusczyk, P., Kemler Nelson, D.G., Hirsh-Pasek, K., Kennedy, L., Woodward, A., Piwoz, J.: Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology 24, 252–293 (1992)
Article Google Scholar
Kanda, H., Ogata, T.: Vocal imitation using physical vocal tract model. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, October 2007, pp. 1846–(1851)
Google Scholar
Kass, M., Witkin, A., Terzopoulus, D.: Snakes: Active contour models. International Journal of Computer Vision (1987)
Google Scholar
Krstulovic, S.: LPC modeling with speech production constraints. In: Proc. 5th speech production seminar (2000)
Google Scholar
Kuhl, P., Andruski, J.E., Christovich, I.A., Christovich, L.A., Kozhevnikova, E.V., Ryskina, V.L., et al.: Cross-language analysis of Phonetic units in language addressed to infants. Science 277, 684–686 (1997)
Article Google Scholar
Kuhl, P., Miller, J.: Discrimination of auditory target dimensions in the presence or absence of variation in a second dimension by infants. Perception and Psychophysics 31, 279–292 (1982)
Google Scholar
Lacerda, F., Marklund, E., Lagerkvist, L., Gustavsson, L., Klintfors, E., Sundberg, U.: On the linguistic implications of context-bound adult-infant interactions. In: Genova: Epirob 2004 (2004)
Google Scholar
Lacerda, F., Klintfors, E., Gustavsson, L., Lagerkvist, L., Marklund, E., Sundberg, U.: Ecological Theory of Language Acquisition. In: Genova: Epirob 2004 (2004)
Google Scholar
Lacerda, F.: Phonology: An emergent consequence of memory constraints and sonsory input. Reading and Writing: An Interdisciplinary Journal 16, 41–59 (2003)
Article Google Scholar
Lenneberg, E.: Biological Foundations of Language. Wiley, New York (1967)
Google Scholar
Liberman, A., Mattingly, I.: The motor theory of speech perception revisited. Cognition 21, 1–36 (1985)
Article Google Scholar
Lien, J.J.-J., Kanade, T., Cohn, J., Li, C.-C.: Detection, tracking, and classification of action units in facial expression. Journal of Robotics and Autonomous Systems (1999)
Google Scholar
Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: IEEE ICIP, pp. 900–903 (2002)
Google Scholar
Liljencrants, J., Fant, G.: Computer program for VT-resonance frequency calculations. In: Liljencrants, J., Fant, G. (eds.) STL-QPSR, pp. 15–20 (1975)
Google Scholar
Maeda, S.: Compensatory articulation during speech: evidence from the analysis and synthesis of vocat-tract shapes using an articulatory model. In: Hardcastle, W.J., Marchal, A. (eds.) Speech production and speech modelling, pp. 131–149. Kluwer Academic Publishers, Boston
Google Scholar
Moore, R.K.: PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction. IEEE Transactions on Computers 56(9) (September 2007)
Google Scholar
Mulford, R.: First words of the blind child. In: Smith, M.D., Locke, J.L. (eds.) The emergent lexicon: The child’s development of a linguisticvocabulary. Academic Press, New York (1988)
Google Scholar
Nakamura, M., Sawada, H.: Talking Robot and the Analysis of Autonomous Voice Acquisition. In: Proc. International Conference on Intelligent Robots and Systems, Beijing, China, October 2006, pp. 4684–4689 (2006)
Google Scholar
Nowak, M.A., Plotkin, J.B., Jansen, V.A.A.: The evolution of syntactic communication. Nature 404, 495–498 (2000)
Article Google Scholar
Roy, D., Pentland, A.: Learning words from sights and sounds: A computational model. Cognitive Science 26, 113–146 (2002)
Article Google Scholar
Saffran, J.R., Johnson, E.K., Aslin, R.N., Newport, E.: Statistical learning of tone sequences by human infants and adults. Cognition 70, 27–52 (1999)
Article Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)
Article MATH Google Scholar
Stoel-Gammon, C.: Prelinguistic vocalizations of hearing-impaired and normally hearing subjects: a comparison of consonantal inventories. J. Speech Hear Disord. 53(3), 302–315 (1988)
Google Scholar
Sundberg, U., Lacerda, F.: Voice onset time in speech to infants and adults. Phonetica 56, 186–199 (1999)
Article Google Scholar
Sundberg, U.: Mother tongue – Phonetic aspects of infant-directed speech, Department of Linguistics, Stockholm University (1998)
Google Scholar
ten Bosch, L., Van hamme, H., Boves, L.: A computational model of language acquisition: focus on word discovery”. In: Interspeech 2008, Brisbane (2008)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2) (2001)
Google Scholar
Vihman, M.M.: Phonological development. Blackwell, Oxford (1996)
Google Scholar
Vihman, M., McCune, L.: When is a word a word? Journal of Child Language 21, 517–542 (1994)
Article Google Scholar
Viola, P., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR (2001)
Google Scholar
Yoshikawa, Y., Koga, J., Asada, M., Hosoda, K.: Primary Vowel Imitation between Agents with Different Articulation Parameters by Parrot-like Teaching. In: Proc. Int. Conference on Intelligent Robots and Systems, Las Vegas, Nevada, October 2003, pp. 149–154 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for System and Robotics (ISR), Instituto Superior Técnico, Lisbon, Portugal
Jonas Hörnstein & José Santos-Victor
Department of Linguistics, Stockholm University, Stockholm, Sweden
Lisa Gustavsson & Francisco Lacerda

Authors

Jonas Hörnstein
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Gustavsson
View author publications
You can also search for this author in PubMed Google Scholar
José Santos-Victor
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Lacerda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut des Systèmes Intelligents et de Robotique (CNRS UMR 7222), Université Pierre et Marie Curie Pyramide, Tour 55 Boîte courrier 173, 4 Place Jussieu, 75252, PARIS cedex 05, France
Olivier Sigaud
Dept. Schölkopf, Max-Planck Institute for Biological Cybernetics, Spemannstraße 38,Rm 223, 72076, Tübingen, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hörnstein, J., Gustavsson, L., Santos-Victor, J., Lacerda, F. (2010). Multimodal Language Acquisition Based on Motor Learning and Interaction. In: Sigaud, O., Peters, J. (eds) From Motor Learning to Interaction Learning in Robots. Studies in Computational Intelligence, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05181-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-05181-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05180-7
Online ISBN: 978-3-642-05181-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics