Robots That Learn Language: Developmental Approach to Human-Machine Conversations

  • Naoto Iwahashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4211)


This paper describes a machine learning method that enables robots to learn the capability of linguistic communication from scratch through verbal and nonverbal interaction with users. The method focuses on two major problems that should be pursued to realize natural human-machine conversation: a scalable grounded symbol system and belief sharing. The learning is performed in the process of joint perception and joint action with a user. The method enables the robot to learn beliefs for communication by combining speech, visual, and behavioral reinforcement information in a probabilistic framework. The beliefs learned include speech units like phonemes or syllables, a lexicon, grammar, and pragmatic knowledge, and they are integrated in a system represented by a dynamical graphical model. The method also enables the user and the robot to infer the state of each other’s beliefs related to communication. To facilitate such inference, the belief system held by the robot possesses a structure that represents the assumption of shared beliefs and allows for fast and robust adaptation of it through communication with the user. This adaptive behavior of the belief systems is modeled by the structural coupling of the belief systems held by the robot and the user, and it is performed through incremental online optimization in the process of interaction. Experimental results reveal that through a practical, small number of learning episodes with a user, the robot was eventually able to understand even fragmental and ambiguous utterances, act upon them, and generate utterances appropriate for the given situation. This work discusses the importance of properly handling the risk of being misunderstood in order to facilitate mutual understanding and to keep the coupling effective.


Belief System Belief Function Linguistic Knowledge Behavioral Context Input Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sperber, D., Wilson, D.: Relevance, 2nd edn. Blackwell, Malden (1995)Google Scholar
  2. 2.
    Maturana, H.R.: Biology of language – the epistemology of reality. In: Miller, G.A., Lenneberg, E. (eds.) Psychology and Biology of Language and Thought – Essay in Honor of Eric Lenneberg, pp. 27–64 (1978)Google Scholar
  3. 3.
    Negroponte, N.: Being Digital. Alfred A. Knopf Inc. (1995)Google Scholar
  4. 4.
    Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A.: Toward conversational human-computer interaction. AI Magazine (2001)Google Scholar
  5. 5.
    Johnson, M.: The Body in the Mind - The Bodily Basis of Meaning, Imagination, and Reason. University of Chicago Press, Chicago (1987)Google Scholar
  6. 6.
    Winograd, T.: Understanding Natural Language. Academic Press, New York (1972)Google Scholar
  7. 7.
    Shapiro, C.S., Ismail, O., Santore, J.F.: Our dinner with Cassie. In: AAAI 2000 Spring Symposium on Natural Dialogues with Practical Robotic Devices, pp. 57–61 (2000)Google Scholar
  8. 8.
    Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)CrossRefGoogle Scholar
  9. 9.
    Traum, D.R.: A computational theory of grounding in natural language conversation. Doctoral dissertation, University of Rochester (1994)Google Scholar
  10. 10.
    Iwahashi, N.: Language acquisition through a human-robot interface by combining speech, visual, and behavioral information. Information Sciences 156 (2003)Google Scholar
  11. 11.
    Iwahashi, N.: A method of coupling of belief systems through human-robot language interaction. In: IEEE Workshop on Robot and Human Interactive Communication (2003)Google Scholar
  12. 12.
    Iwahashi, N.: Active and unsupervised learning of spoken words through a multimodal interface. In: IEEE Workshop on Robot and Human Interactive Communication (2004)Google Scholar
  13. 13.
    Brent, M.R.: Advances in the computational study of language acquisition. Cognition (61), 1–61 (1996)CrossRefGoogle Scholar
  14. 14.
    Dyer, M.G., Nenov, V.I.: Learning language via perceptual/motor experiences. In: Proc. of Annual Conf. of the Congnitive Science Society, pp. 400–405 (1993)Google Scholar
  15. 15.
    Nakagawa, S., Masukata, M.: An acquisition system of concept and grammar based on combining with visual and auditory information. Trans. Information Society of Japan 10(4), 129–137 (1995)Google Scholar
  16. 16.
    Regier, T.: The Human Semantic Potential. MIT Press, Cambridge (1997)Google Scholar
  17. 17.
    Roy, D.: Integration of speech and vision using mutual information. In: Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2369–2372 (2000)Google Scholar
  18. 18.
    Steels, L., Kaplan, K.: Aibo’s first words the social learning of language and meaning. Evolution of Communication 4(1), 3–32 (2001)CrossRefGoogle Scholar
  19. 19.
    Gorin, A., Levinson, S., Sanker, A.: An experiment in spoken language acquisition. IEEE Trans. on Speech and Audio Processing 2(1), 224–240 (1994)CrossRefGoogle Scholar
  20. 20.
    Steels, L., Vogt, P.: Grounding adaptive language games in robotic agents. In: Proc. of the Fourth European Conf. on Artificial Life (1997)Google Scholar
  21. 21.
    Steels, L.: Evolving grounded communication for robots. Trends in Cognitive Science 7(7), 308–312 (2003)CrossRefGoogle Scholar
  22. 22.
    Sugita, Y., Tani, J.: Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adaptive Behavior 13(1), 33–52Google Scholar
  23. 23.
    Jordan, M.I., Sejnowski, T.J. (eds.): Graphical Models - Foundations of Neural Computation. The MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  24. 24.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  25. 25.
    Persoon, E., Fu, K.S.: Shape discrimination using Fourier descriptors. IEEE Trans. Systems, Man, and Cybernetics 7(3), 170–179 (1977)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41(1), 164–171 (1970)zbMATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Bloom, P.: How children learn the meanings of words. MIT Press, Cambridge (2000)Google Scholar
  28. 28.
    Imai, M., Gentner, D.: A crosslinguistic study of early word meaning – universal ontology and linguistic influence. Cognition 62, 169–200 (1997)CrossRefGoogle Scholar
  29. 29.
    DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York (1970)zbMATHGoogle Scholar
  30. 30.
    Langacker, R.: Foundation of cognitive grammar. Stanford University Press, CA (1991)Google Scholar
  31. 31.
    Haoka, T., Iwahashi, N.: Learning of the reference-point-dependent concepts on movement for language acquisition. Tech. Rep. of the Institute of Electronics, Information and Communication Engineers PRMU2000-105 (2000)Google Scholar
  32. 32.
    Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from HMM using dynamic features. In: Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pp. 660–663 (1995)Google Scholar
  33. 33.
    Savage-Rumgaugh, E.: Ape Language – From Conditional Response to Symbol. Columbia Univ. Press (1986)Google Scholar
  34. 34.
    Iwahashi, N., Satoh, K., Asoh, H.: Learning abstract concepts and words from perception based on Bayesian model selection. Tech. Rep. of the Institute of Electronics, Information and Communication Engineers PRMU-2005-234 (2006)Google Scholar
  35. 35.
    Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Int. Conf. on Uncertainty in Artificial Intelligence, pp. 21–30 (1999)Google Scholar
  36. 36.
    Dayan, P., Sejnowski, T.J.: Exploration bonuses and dual conrol. Machine Learning 25, 5–22 (1996)Google Scholar
  37. 37.
    Brooks, R.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation (1), 14–23 (1986)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Matsubara, H., Hashida, K.: Partiality of information and unsolvability of the frame problem. Japanese Society for Artificial Intelligence 4(6), 695–703 (1989)Google Scholar
  39. 39.
    Pearl, J.: Probabilistic reasoning in intellignet systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
  40. 40.
    Wolpert, D.H.: The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In: Wolpert, D.H. (ed.) The mathematics of Generalization. Addison-Wesley, Reading (1995)Google Scholar
  41. 41.
    Tomasello, M.: The pragmatics of word learning. Cognitive Studies 4(1), 59–74 (1997)Google Scholar
  42. 42.
    Carpenter, M., Tomasello, M., Striano, T.: Role reversal imitation and language in typically developing infants and children with autism. INFANCY 8(3), 253–278 (2005)CrossRefGoogle Scholar
  43. 43.
    Behne, T., Carpenter, M., Call, J., Tomasello, M.: Unwilling versus unable – infants’ understanding of intentional action. Developmental Psychology 41(2), 328–337 (2005)CrossRefGoogle Scholar
  44. 44.
    Onishi, K.H., Baillargeon, R.: Do 15-month-old infants understand false beliefs? Science 308, 225–258 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Naoto Iwahashi
    • 1
    • 2
  1. 1.National Institute of Information and Communication Technology 
  2. 2.ATR Spoken Language Communication Research LabsKyotoJapan

Personalised recommendations