Skip to main content

Multimodal Language Acquisition Based on Motor Learning and Interaction

  • Chapter
From Motor Learning to Interaction Learning in Robots

Part of the book series: Studies in Computational Intelligence ((SCI,volume 264))

Abstract

This work presents a developmental and ecological approach to language acquisition in robots, which has its roots in the interaction between infants and their caregivers. We show that the signal directed to infants by their caregivers include several hints that can facilitate the language acquisition and reduce the need for preprogrammed linguistic structure. Moreover, infants also produce sounds, which enables for richer types of interactions such as imitation games, and for the use of motor learning. By using a humanoid robot with embodied models of the infant’s ears, eyes, vocal tract, and memory functions, we can mimic the adult-infant interaction and take advantage of the inherent structure in the signal. Two experiments are shown, where the robot learn a number of word-object associations and the articulatory target positions for a number of vowels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albin, D.D., Echols, C.H.: Stressed and word-final syllables in infant-directed speech. Infant Behavior and Development 19, 401–418 (1996)

    Article  Google Scholar 

  2. Andruski, J.E., Kuhl, O.K., Hayashi, A.: Point vowels in Japanese mothers’ speech to infants and adults. The Journal of the Acoustical Society of America 105, 1095–1096 (1999)

    Article  Google Scholar 

  3. Batliner, A., Biersack, S., Steidl, S.: The Prosody of Pet Robot Directed Speech: Evidence from Children. In: Proc. of Speech Prosody 2006, Dresden, pp. 1–4 (2006)

    Google Scholar 

  4. Burnham, D.: What’s new pussycat? On talking to babies and animnals. Science 296, 1435 (2002)

    Article  Google Scholar 

  5. Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, Chichester (2006)

    MATH  Google Scholar 

  6. Crystal, D.: Non-segmental phonology in language acquisition: A review of the issues. Lingua 32, 1–45 (1973)

    Article  Google Scholar 

  7. Davis, S.B., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics, speech, and signal processing ASSP-28(4) (August 1980)

    Google Scholar 

  8. de Boer, B.: Infant directed speech and evolution of language. In: Evolutionary Prerequisites for Language, pp. 100–121. Oxford University Press, Oxford (2005)

    Google Scholar 

  9. Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Neuroscience 15, 399–402 (2002)

    Article  Google Scholar 

  10. Ferguson, C.A.: Baby talk in six languages. American Anthropologist 66, 103–114 (1964)

    Article  Google Scholar 

  11. Fernald, A.l.: The perceptual and affective salience of mothers’ speech to infants. In: The origins and growth of communication, Norwood, N.J, Ablex (1984)

    Google Scholar 

  12. Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behavior and Development 8, 181–195 (1985)

    Article  Google Scholar 

  13. Fernald, A., Mazzie, C.: Prosody and focus in speech to infants and adults. Developmental Psychology 27, 209–221 (1991)

    Article  Google Scholar 

  14. Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G.: Action Recognition in the Premotor Cortex. Brain 199, 593–609 (1996)

    Article  Google Scholar 

  15. Gustavsson, L., Sundberg, U., Klintfors, E., Marklund, E., Lagerkvist, L., Lacerda, F.: Integration of audio-visual information in 8-months-old infants. In: Proceedings of the Fourth Internation Workshop on Epigenetic Robotics Lund University Cognitive Studies, vol. 117, pp. 143–144 (2004)

    Google Scholar 

  16. Fitzgibbon, A., Pilu, M., Risher, R.B.: Direct least square fitting of ellipses. Tern Analysis and Machine Intelligence, 21 (1999)

    Google Scholar 

  17. Fitzpatrick, P., Varchavskaia, P., Breazeal, C.: Characterizing and processing robotdirected speech. In: Proceedings of the International IEEE/RSJ Conference on Humanoid Robotics (2001)

    Google Scholar 

  18. Fukui, K., Nishikawa, K., Kuwae, T., Takanobu, H., Mochida, T., Honda, M., Takanishi, A.: Development of a New Humanlike Talking Robot for Human Vocal Mimicry. In: Proc. International Conference on Robotics and Automation, Barcelona, Spain, April 2005, pp. 1437–1442 (2005)

    Google Scholar 

  19. Guenther, F.H., Ghosh, S.S., Tourville, J.A.: Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96(3), 280–301

    Google Scholar 

  20. Hastie, T.: The elements of statistical learning data mining inference and prediction. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  21. Higashimoto, T., Sawanda, H.: Speech Production by a Mechanical Model: Construction of a Vocal Tract and Its Control by Neural Network. In: Proc. International Conference on Robotics and Automation, Washington DC, May 2002, pp. 3858–3863 (2002)

    Google Scholar 

  22. Hirsh-Pasek, K.: Doggerel: motherese in a new context. Journal of Child Language 9, 229–237 (1982)

    Article  Google Scholar 

  23. Hörnstein, J., Santos-Victor, J.: A Unified Approach to Speech Production and Recognition Based on Articulatory Motor Representations. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA (October 2007)

    Google Scholar 

  24. Hörnstein, J., Soares, C., Santos-Victor, J., Bernardino, A.: Early Speech Development of a Humanoid Robot using Babbling and Lip Tracking. In: Symposium on Language and Robots, Aveiro, Portugal, (December 2007)

    Google Scholar 

  25. Hörnstein, J., Gustavsson, L., Santos-Victor, J., Lacerda, F.: Modeling Speech imitation. In: IROS-2008 Workshop - From motor to interaction learning in robots, Nice, France (September 2008)

    Google Scholar 

  26. Hörnstein, J., Lopes, M., Santos-Victor, J., Lacerda, F.: Sound localization for humanoid robots - building audio-motor maps based on the HRTF. In: IEEE/RSJ International Conference on intelligent Robots and Systems, Beijing, China, October 9-15 (2006)

    Google Scholar 

  27. Jusczyk, P., Kemler Nelson, D.G., Hirsh-Pasek, K., Kennedy, L., Woodward, A., Piwoz, J.: Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology 24, 252–293 (1992)

    Article  Google Scholar 

  28. Kanda, H., Ogata, T.: Vocal imitation using physical vocal tract model. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, October 2007, pp. 1846–(1851)

    Google Scholar 

  29. Kass, M., Witkin, A., Terzopoulus, D.: Snakes: Active contour models. International Journal of Computer Vision (1987)

    Google Scholar 

  30. Krstulovic, S.: LPC modeling with speech production constraints. In: Proc. 5th speech production seminar (2000)

    Google Scholar 

  31. Kuhl, P., Andruski, J.E., Christovich, I.A., Christovich, L.A., Kozhevnikova, E.V., Ryskina, V.L., et al.: Cross-language analysis of Phonetic units in language addressed to infants. Science 277, 684–686 (1997)

    Article  Google Scholar 

  32. Kuhl, P., Miller, J.: Discrimination of auditory target dimensions in the presence or absence of variation in a second dimension by infants. Perception and Psychophysics 31, 279–292 (1982)

    Google Scholar 

  33. Lacerda, F., Marklund, E., Lagerkvist, L., Gustavsson, L., Klintfors, E., Sundberg, U.: On the linguistic implications of context-bound adult-infant interactions. In: Genova: Epirob 2004 (2004)

    Google Scholar 

  34. Lacerda, F., Klintfors, E., Gustavsson, L., Lagerkvist, L., Marklund, E., Sundberg, U.: Ecological Theory of Language Acquisition. In: Genova: Epirob 2004 (2004)

    Google Scholar 

  35. Lacerda, F.: Phonology: An emergent consequence of memory constraints and sonsory input. Reading and Writing: An Interdisciplinary Journal 16, 41–59 (2003)

    Article  Google Scholar 

  36. Lenneberg, E.: Biological Foundations of Language. Wiley, New York (1967)

    Google Scholar 

  37. Liberman, A., Mattingly, I.: The motor theory of speech perception revisited. Cognition 21, 1–36 (1985)

    Article  Google Scholar 

  38. Lien, J.J.-J., Kanade, T., Cohn, J., Li, C.-C.: Detection, tracking, and classification of action units in facial expression. Journal of Robotics and Autonomous Systems (1999)

    Google Scholar 

  39. Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: IEEE ICIP, pp. 900–903 (2002)

    Google Scholar 

  40. Liljencrants, J., Fant, G.: Computer program for VT-resonance frequency calculations. In: Liljencrants, J., Fant, G. (eds.) STL-QPSR, pp. 15–20 (1975)

    Google Scholar 

  41. Maeda, S.: Compensatory articulation during speech: evidence from the analysis and synthesis of vocat-tract shapes using an articulatory model. In: Hardcastle, W.J., Marchal, A. (eds.) Speech production and speech modelling, pp. 131–149. Kluwer Academic Publishers, Boston

    Google Scholar 

  42. Moore, R.K.: PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction. IEEE Transactions on Computers 56(9) (September 2007)

    Google Scholar 

  43. Mulford, R.: First words of the blind child. In: Smith, M.D., Locke, J.L. (eds.) The emergent lexicon: The child’s development of a linguisticvocabulary. Academic Press, New York (1988)

    Google Scholar 

  44. Nakamura, M., Sawada, H.: Talking Robot and the Analysis of Autonomous Voice Acquisition. In: Proc. International Conference on Intelligent Robots and Systems, Beijing, China, October 2006, pp. 4684–4689 (2006)

    Google Scholar 

  45. Nowak, M.A., Plotkin, J.B., Jansen, V.A.A.: The evolution of syntactic communication. Nature 404, 495–498 (2000)

    Article  Google Scholar 

  46. Roy, D., Pentland, A.: Learning words from sights and sounds: A computational model. Cognitive Science 26, 113–146 (2002)

    Article  Google Scholar 

  47. Saffran, J.R., Johnson, E.K., Aslin, R.N., Newport, E.: Statistical learning of tone sequences by human infants and adults. Cognition 70, 27–52 (1999)

    Article  Google Scholar 

  48. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  49. Stoel-Gammon, C.: Prelinguistic vocalizations of hearing-impaired and normally hearing subjects: a comparison of consonantal inventories. J. Speech Hear Disord. 53(3), 302–315 (1988)

    Google Scholar 

  50. Sundberg, U., Lacerda, F.: Voice onset time in speech to infants and adults. Phonetica 56, 186–199 (1999)

    Article  Google Scholar 

  51. Sundberg, U.: Mother tongue – Phonetic aspects of infant-directed speech, Department of Linguistics, Stockholm University (1998)

    Google Scholar 

  52. ten Bosch, L., Van hamme, H., Boves, L.: A computational model of language acquisition: focus on word discovery”. In: Interspeech 2008, Brisbane (2008)

    Google Scholar 

  53. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2) (2001)

    Google Scholar 

  54. Vihman, M.M.: Phonological development. Blackwell, Oxford (1996)

    Google Scholar 

  55. Vihman, M., McCune, L.: When is a word a word? Journal of Child Language 21, 517–542 (1994)

    Article  Google Scholar 

  56. Viola, P., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR (2001)

    Google Scholar 

  57. Yoshikawa, Y., Koga, J., Asada, M., Hosoda, K.: Primary Vowel Imitation between Agents with Different Articulation Parameters by Parrot-like Teaching. In: Proc. Int. Conference on Intelligent Robots and Systems, Las Vegas, Nevada, October 2003, pp. 149–154 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hörnstein, J., Gustavsson, L., Santos-Victor, J., Lacerda, F. (2010). Multimodal Language Acquisition Based on Motor Learning and Interaction. In: Sigaud, O., Peters, J. (eds) From Motor Learning to Interaction Learning in Robots. Studies in Computational Intelligence, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05181-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05181-4_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05180-7

  • Online ISBN: 978-3-642-05181-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics