Speech and Language in Humanoid Robots

  • Angelo CangelosiEmail author
  • Tetsuya Ogata
Living reference work entry


A fundamental behavioral and cognitive capability of humanoid robots is speech, as spoken language is the primary means of communication between humans. However, communication between people, and between humans and robots, is not only based on speech, but rather is a rich multimodal process combining spoken language with a variety of nonverbal behaviors such as eye gaze, gestures, tactile interaction, and emotional cues. This chapter gives an overview of the state of the art on language and speech capabilities in robots (i.e., “speech interface”), using multimodal approaches. The chapter considers the different levels of analysis of language studies. The computational solutions for the phonetic, lexical, and syntactic levels are general to linguistic analysis and do not require specific consideration from a robotics point of view. Other aspects of language analysis, as semantics and pragmatics, however, have specific peculiarity in robotics given their relationship to the difficult problem of “symbol grounding.” In robot language research, two main approaches have been used for the design of speech interfaces: one is based on standard, predefined natural language processing (NLP) techniques, and the second approach is based on learning methods. The chapter introduces the main NLP methods used in robot language research and subsequently looks at the speech interfaces based on such methods, also considering their use in multimodal interfaces. After this, we will look at language learning approaches which distinguish between developmental learning systems in which the robot goes through a series of developmental training phases, taking inspiration from human language learning, and machine learning approaches in which a set of learning techniques is used to engineer communication capabilities via training of multimodal speech interfaces. Finally, a critical assessment of the current state of the art and the identification of future lines of work is given.


Speech Language learning Developmental robotics NLP natural language processing Deep learning 



This work was supported by the H2020 Marie Sklodowska Curie projects APRIL [674868] and DCOMM [676063] and the UK EPSRC project BABEL (Cangelosi), and through the Program for the Top Global University Project “Waseda Goes Global” by the Ministry of Education, Culture, Sports, Science and Technology.


  1. 1.
    A. Cangelosi, M. Schlesinger, Developmental Robotics: From Babies to Robots (MIT Press, Cambridge, MA, 2015)Google Scholar
  2. 2.
    N. Mavridis, A review of verbal and non-verbal human–robot interactive communication. Robot. Auton. Syst. 63, 22–35 (2015)MathSciNetCrossRefGoogle Scholar
  3. 3.
    A. Cangelosi, Language processing, in From Neuron to Cognition Via Computational Neuroscience, ed. by M. Arbib, J. Bonaiuto (Cambridge, MA: MIT Press, 2017)Google Scholar
  4. 4.
    B. Heine, H. Narrog, The Oxford Handbook of Linguistic Analysis (Oxford Handbooks in Linguistics) (Oxford University Press, Oxford, 2009)CrossRefGoogle Scholar
  5. 5.
    J.L. Austin, How to Do Things with Words (Oxford University Press, Oxford, 1962)Google Scholar
  6. 6.
    L. Wittgenstein, Philosophical Investigations (Blackwell, Oxford, 1953)zbMATHGoogle Scholar
  7. 7.
    A. Cangelosi, Solutions and open challenges for the symbol grounding problem. Int. J. Signs Semiot. Sys. 1, 49–54 (2011)CrossRefGoogle Scholar
  8. 8.
    S. Harnad, The symbol grounding problem. Physica D 42, 335–346 (1990)CrossRefGoogle Scholar
  9. 9.
    A. Clark, C. Fox, S. Lappin (eds.), Handbook of Computational Linguistics and Natural Language Processing (Blackwell Handbooks in Linguistics) (Wiley-Blackwell, Chichster, 2013)Google Scholar
  10. 10.
    A. Cangelosi, Grounding language in action and perception: from cognitive agents to humanoid robots. Phys Life Rev 7(2), 139–151 (2010)CrossRefGoogle Scholar
  11. 11.
    L. Steels, Evolving grounded communication for robots. Trends Cogn. Sci. 7, 308–312 (2003)CrossRefGoogle Scholar
  12. 12.
    L. Steels (ed.), Experiments in Cultural Language Evolution, vol 3 (John Benjamins Publishing, Amsterdam, 2012)Google Scholar
  13. 13.
    D. Jurafsky, H. James, Speech and Language Processing an Introduction to Natural Language Processing, Computational Linguistics, and Speech (Prentice Hall, Englewood Cliffs, 2000)Google Scholar
  14. 14.
    G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)CrossRefGoogle Scholar
  15. 15.
    P.-Y. Oudeyer, Self-Organization in the Evolution of Speech, Studies in the Evolution of Language, vol 6 (Oxford University Press, Oxford, 2006), p. 177CrossRefGoogle Scholar
  16. 16.
    S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)MathSciNetzbMATHGoogle Scholar
  17. 17.
    S. Abney, Part-of-speech tagging and partial parsing, in Corpus-based Methods in Language and Speech Processing (Springer, Dordrecht, 1997), pp. 118–136Google Scholar
  18. 18.
    C.T. Ishi, S. Matsuda, T. Kanda, T. Jitsuhiro, H. Ishiguro, Nakamura S. Hagita N, Robust speech recognition system for communication robots in real environments, in 2006 6th IEEE-RAS International Conference on Humanoid Robots, Genoa (IEEE, 2006), pp. 340–345Google Scholar
  19. 19.
    K.F. Lee, H.W. Hon, R. Reddy, An overview of the SPHINX speech recognition system. IEEE Trans. Acoust. Speech Signal Process. 38(1), 35–45 (1990)CrossRefGoogle Scholar
  20. 20.
    A. Lee, T. Kawahara, K. Shikano, Julius – an open source real-time large vocabulary recognition engine, in Proceedings of Eurospeech, 2001, pp 1691–1694Google Scholar
  21. 21.
    G. A. Finkco, Developing HMM-based recognizers with ESMERALDA, in International Workshop on Text, Speech and Dialogue (Springer, Berlin, 1999), pp. 229–234Google Scholar
  22. 22.
    O. Mubin, J. Henderson, C. Bartneck, You just do not understand me! Speech recognition in human robot interaction, in Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Aalborg (IEEE, New York, 2014), pp. 637–642Google Scholar
  23. 23.
    T. Dutoit, An Introduction to Text-to-Speech Synthesis, vol 3 (Springer Science & Business Media, Berlin, 1997)Google Scholar
  24. 24.
    A. Di Nuovo, N. Wang, F. Broz, T. Belpaeme, R. Jones, A. Cangelosi, Experimental evaluation of a multi-modal user interface for a robotic service, in Towards Autonomous Robotics Systems: 17th Annual Conference, TAROS 2016, Proceedings (LNAI9716), ed. by L. Alboul et al. (Springer, Cham, 2016), pp. 87–98Google Scholar
  25. 25.
    A. Di Nuovo, F. Broz , T. Belpaeme, A. Cangelosi, F. Cavallo, R. Esposito, P. Dario, A web based multi-modal interface for elderly users of the robot-era multi-robot services, in 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Sheffield (IEEE, 2014), pp. 2186–2191Google Scholar
  26. 26.
    S.A. Cassidy, B. Stenger, L. Van Dongen, K. Yanagisawa, R. Anderson, V. Wan, S. Baron-Cohenc, R. Cipolla, Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions. Comput. Vis. Image Underst. 148, 193–200 (2016)CrossRefGoogle Scholar
  27. 27.
    S. Young, M. Gašić, B. Thomson, J.D. Williams, POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101(5), 1160–1179 (2013)CrossRefGoogle Scholar
  28. 28.
    D. Bohus, A.I. Rudnicky, The RavenClaw dialog management framework: architecture and systems. Comput. Speech Lang. 23(3), 332–361 (2009)CrossRefGoogle Scholar
  29. 29.
    T. Harris, S. Banerjee, A. Rudnicky, Heterogeneous multi-robot dialogues for search tasks, in Proceedings of AAAI Spring Symposium: Dialogical Robots, Palo Alto, 2005Google Scholar
  30. 30.
    S. Al Moubayed, G. Skantze, J. Beskow, The furhat back-projected humanoid head – lip reading, gaze and multiparty interaction. Int. J. Humanoid Rob. 10(1) (2013) ID: 1350005Google Scholar
  31. 31.
    D.O. Johnson, A. Agah, Human robot interaction through semantic integration of multiple modalities, dialog management, and contexts. Int. J. Soc. Robot. 1, 283 (2009). CrossRefGoogle Scholar
  32. 32.
    H. Holzapfel, A dialogue manager for multimodal human-robot interaction and learning of a humanoid robot. Ind. Robot: An Int. J. 35(6), 528–535 (2008)CrossRefGoogle Scholar
  33. 33.
    J. Weizenbaum, ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9 (1), 36–45 (1966). Association for Computing Machinery, New YorkGoogle Scholar
  34. 34.
    R.S. Wallace, The anatomy of A.L.I.C.E, in Parsing the Turing Test, ed. by R. Epstein, G. Roberts, G. Beber (Springer Science+Business Media, London, 2009), pp. 181–210Google Scholar
  35. 35.
    H. Ishiguro, Android science, in Robotics Research (Springer, Berlin/Heidelberg, 2007), pp. 118–127Google Scholar
  36. 36.
    M. Shiomi, D. Sakamoto, T. Kanda, C.T. Ishi, H. Ishiguro, N. Hagita, A semi-autonomous communication robot: a field trial at a train station, in Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction, Amsterdam (ACM, 2008), pp. 303–310Google Scholar
  37. 37.
    T. Belpaeme, P. Baxter, R. Read, R. Wood, H. Cuayáhuitl, B. Kiefer, S. Racioppa, I. Kruijff-Korbayová, G. Athanasopoulos, V. Enescu, R. Looije, M. Neerincx, Y. Demiris, R. Ros-Espinoza, A. Beck, L. Cañamero, A. Hiolle, M. Lewis, I. Baroni, M. Nalin, P. Cosi, G. Paci, F. Tesser, G. Sommavilla, R. Humbert, Multimodal child-robot interaction: building social bonds. J. Hum. Robot. Interact. 1(2), 33–53 (2012)Google Scholar
  38. 38.
    J. Kennedy, J. de Greeff, R. Read, P. Baxter, T. Belpaeme. The Chatbot strikes back, in Proceedings of the 9th IEEE/ACM Conference on Human-Robot Interaction (HRI2014) (IEEE/ACM Press, Bielefeld, 2014)Google Scholar
  39. 39.
    K. Hayashi, T. Kanda, T. Miyashita, H. Ishiguro, N. Hagita, Robot manzai: Robot conversation as a passive–social medium. Int. J. Humanoid Robot. 5(01), 67–86 (2008)CrossRefGoogle Scholar
  40. 40.
    N. Mitsunaga, T. Miyashita, H. Ishiguro, K. Kogure, N. Hagita, Robovie-IV: a communication robot interacting with people daily in an office, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing (IEEE, 2006), pp. 5066–5072Google Scholar
  41. 41.
    A. Csap, E. Gilmartin, J. Grizou, J. Han, R. Meena, D. Anastasiou, K. Jokinen, G. Wilcock, Multimodal conversational interaction with a humanoid robot, in IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom), Kosice (IEEE, 2012), (pp. 667–672)Google Scholar
  42. 42.
    M. Nakano, Y. Hasegawa, K. Nakadai, T. Nakamura, J. Takeuchi, T. Torii, H.G. Okuno, A two-layer model for behavior and dialogue planning in conversational service robots, in 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton (IEEE, 2005), pp. 3329–3335Google Scholar
  43. 43.
    M. Bennewitz, F. Faber, D. Joho, M. Schreiber, S. Behnke, Towards a humanoid museum guide robot that interacts with multiple persons, in 5th IEEE-RAS International Conference on Humanoid Robots, Tukuba (IEEE, 2005), pp. 418–423Google Scholar
  44. 44.
    W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, S. Thrun, Experiences with an interactive museum tour-guide robot. Artif. Intell. 114(1–2), 3–55 (1999)CrossRefzbMATHGoogle Scholar
  45. 45.
    I. Hara, F. Asano, H. Asoh, J. Ogata, N. Ichimura, Y. Kawai, Robust speech interface based on audio and video information fusion for humanoid HRP-2, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai (IEEE Cat. No.04CH37566), vol 3, 2004, pp. 2404–2410Google Scholar
  46. 46.
    K. Pastra, PRAXICON: the development of a grounding resource, in Proceedings of the International Workshop on Human-Computer Conversation, Bellagio, 2008Google Scholar
  47. 47.
    K. Pastra, Y. Aloimonos, The minimalist grammar of action. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 367(1585), 103–117 (2012)CrossRefGoogle Scholar
  48. 48.
    A. Antunes, L. Jamone, G. Saponaro, A. Bernardino, R. Ventura, From human instructions to robot actions: formulation of goals, affordances and probabilistic planning, in The IEEE-RAS International Conference on Robotics and Automation (ICRA), Stockholm, 2016Google Scholar
  49. 49.
    Y. Yang, Y. Li, C. Fermüller, Y. Aloimonos. Robot learning manipulation action plans by “watching” unconstrained videos from the world wide web, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI Press, Austin, 2015)Google Scholar
  50. 50.
    L. Kunze, T. Roehm, M. Beetz, Towards semantic robot description languages, in 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, 2011), pp. 5589–5595Google Scholar
  51. 51.
    B. Mutlu, T. Kanda, J. Forlizzi, J. Hodgins, H. Ishiguro, Conversational gaze mechanisms for humanlike robots. ACM Trans. Interact. Intell. Sys. (TiiS) 1(2), 1–33 (2012)CrossRefGoogle Scholar
  52. 52.
    R. Stiefelhagen, H.K. Ekenel, C. Fugen, P. Gieselmann, H. Holzapfel, F. Kraft, K. Nickel, M. Voit, A. Waibel, Enabling multimodal human–robot interaction for the karlsruhe humanoid robot. IEEE Trans. Robot. 23(5), 840–851 (2007)CrossRefGoogle Scholar
  53. 53.
    S. Fujie, Y. Ejiri, K. Nakajima, Y. Matsusaka, T. Kobayashi, A conversation robot using head gesture recognition as para-linguistic information, robot and human interactive communication, in 13th IEEE International Workshop on ROMAN 2004, Okayama, 2004, pp. 159–164Google Scholar
  54. 54.
    G. Skantze, C. Oertel, A. Hjalmarsson, User feedback in human-robot interaction: prosody, gaze and timing, in Proceedings of Interspeech, Lyon, 2013Google Scholar
  55. 55.
    S. Yilmazyildiz, R. Read, T. Belpaeme, W. Verhelst, Review of semantic free utterances in social human-robot interaction. Int. J. Hum. Comput. Interact 32(1), 63–85 (2015). CrossRefGoogle Scholar
  56. 56.
    J. Kędzierski, R. Muszyński, C. Zoll, A. Oleksy, M. Frontkiewicz, EMYS – emotive head of a social robot. Int. J. Soc. Robot. 5(2), 237–249 (2013)CrossRefGoogle Scholar
  57. 57.
    L. Zhang, M. Jiang, D. Farid, M.A. Hossain, Intelligent facial emotion recognition and semantic-based topic detection for a humanoid robot. Expert Syst. Appl. 40(13), 5160–5168 (2013)CrossRefGoogle Scholar
  58. 58.
    M. Zecca. Y. Mizoguchi, K. Endo, F. Iida, Y. Kawabata, N. Endo, A. Takanishi, Whole body emotion expressions for KOBIAN humanoid robot – preliminary experiments with different emotional patterns, in RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication, Toyama (IEEE, 2009), pp. 381–386Google Scholar
  59. 59.
    S.G. Koolagudi, K.S. Rao, Emotion recognition from speech: a review. Inter. J. Speech Technol. 15(2), 99–117 (2012)CrossRefGoogle Scholar
  60. 60.
    G. Pierris, T.S. Dahl, Humanoid tactile gesture production using a hierarchical SOM-based encoding. IEEE Trans. Auton. Ment. Dev. 6(2), 153–167 (2014)CrossRefGoogle Scholar
  61. 61.
    M. Asada, K.F. MacDorman, H. Ishiguro, Y. Kuniyoshi, Cognitive developmental robotics as a new paradigm for the design of humanoid robots. Robot. Auton. Syst. 37(2), 185–193 (2001)CrossRefzbMATHGoogle Scholar
  62. 62.
    G. Pezzulo, L.W. Barsalou, A. Cangelosi, M.H. Fischer, K. McRae, M. Spivey, Computational grounded cognition: a new alliance between grounded cognition and computational modelling. Front. Psychol. 6(612), 1–11 (2013). Google Scholar
  63. 63.
    A. Cangelosi, A. Morse, A. Di Nuovo, M. Rucinski, F. Stramandinoli, M. Marocco, V. De La Cruz, K. Fischer, Embodied language and number learning in developmental robots, in Foundations of Embodied Cognition, ed. by M.H. Fischer, Y. Coello (Taylor & Francis Press, New York, 2016)Google Scholar
  64. 64.
    L.K. Samuelson, L.B. Smith, L.K. Perry, J.P. Spencer, Grounding word learning in space. PLoS One 6(12), e28095 (2011)CrossRefGoogle Scholar
  65. 65.
    A. Morse, T. Belpaeme, L. Smith, A. Cangelosi, Posture affects how robots and infants map words to objects. PLoS One 10(3), e0116012 (2015)CrossRefGoogle Scholar
  66. 66.
    A.F. Morse, J. DeGreeff, T. Belpeame, A. Cangelosi, Epigenetic robotics architecture (ERA). IEEE Trans. Auton. Ment. Dev. 2(4), 325–339 (2010)CrossRefGoogle Scholar
  67. 67.
    S. Murata, Y. Yamashita, H. Arie, T. Ogata, S. Sugano, J. Tani, Learning to perceive the world as probabilistic or deterministic via interaction with others: a neuro-robotics experiment. IEEE Trans. Neural. Netw. Learn. Sys. (2015).
  68. 68.
    A. Morse, A. Cangelosi, Why are there developmental stages in language learning? A developmental robotics model of language development. Cogn. Sci. 41(Suppl 1), 32–51 (2017)CrossRefGoogle Scholar
  69. 69.
    K.E. Twomey, A.F. Morse, A. Cangelosi, J. Horst, Children’s referent selection and word learning: insights from a developmental robotic system. Interact. Stud. 17(1), 101–127 (2016)CrossRefGoogle Scholar
  70. 70.
    E. Thelen, L.B. Smith, A Dynamic Systems Approach to the Development of Cognition and Action (MIT press, Cambridge, MA, 1996)Google Scholar
  71. 71.
    V. Tikhanoff, A. Cangelosi, G. Metta, Language understanding in humanoid robots: iCub simulation experiments. IEEE Trans. Auton. Ment. Dev. 3(1), 17–29 (2011)CrossRefGoogle Scholar
  72. 72.
    E. Bates, L. Benigni, I. Bretherton, L. Camaioni, V. Volterra, The Emergence of Symbols: Communication and Cognition in Infancy (Academic, New York, 1979)Google Scholar
  73. 73.
    Y. Sugita, J. Tani, Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt. Behav. 13(1), 33–52 (2005)CrossRefGoogle Scholar
  74. 74.
    E. Tuci, T. Ferrauto, A. Zeschel, G. Massera, S. Nolfi, An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots. IEEE Trans. Auton. Ment. Dev. 3(2), 176–189 (2011)CrossRefGoogle Scholar
  75. 75.
    J. Tani, M. Ito, Self-organization of behavioral primitives as multiple attractor dynamics: a robot experiment. IEEE Trans. Syst. Man. Cybern. Part A: Syst. Hum. 33(4), 481–488 (2003)CrossRefGoogle Scholar
  76. 76.
    Y. Yamashita, J. Tani, Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS Comput. Biol. 4(11), e1000220 (2008)CrossRefGoogle Scholar
  77. 77.
    J. Zhong, A. Cangelosi, Y. Ogata, Toward abstraction from multi-modal data: empirical studies on multiple time-scale recurrent models, in Proceedings of IJCNN17 International Joint Conference on Neural Networks, (IEEE Press, Anchorage, 2017)Google Scholar
  78. 78.
    J. Zhong, M. Peniak, J. Tani, T. Ogata, A. Cangelosi, Sensorimotor Input as a Language Generalisation Tool. A Neurorobotics Model for Generation and Generalisation of Noun-Verb Combinations with Sensorimotor Inputs. arXiv:1605.03261 (2016)Google Scholar
  79. 79.
    A.M. Borghi, F. Cimatti, Words as tools and the problem of abstract words meanings, in Proceedings of the 31st Annual Conference of the Cognitive Science Society, vol. 31, (Cognitive Science Society, Amsterdam, 2009), pp. 2304–2309Google Scholar
  80. 80.
    K. Wiemer-Hastings, J. Krug, X. Xu, Imagery, context availability, contextual constraints and abstractness, in Proceedings of 23rd Annual Meeting of the Cognitive Science Society, ed. by J. D. Moore, K. Stenning (Lawrence Erlbaum Associates, Hillsdale, 2001), pp. 1106–1111Google Scholar
  81. 81.
    F. Stramandinoli, A. Cangelosi, S. Wermter, The grounding of higher order concepts in action and language: a cognitive robotics model. Neural Netw. 32, 165–173 (2012)CrossRefGoogle Scholar
  82. 82.
    F. Stramandinoli, D. Marocco, A. Cangelosi, Making sense of words: a robotic model for language abstraction. Auton. Robot. 41(2), 367–383 (2017)CrossRefGoogle Scholar
  83. 83.
    J.I. Campbell, Handbook of Mathematical Cognition (Psychology Press, New York, 2005)Google Scholar
  84. 84.
    M.W. Alibali, A.A. DiRusso, The function of gesture in learning to count: more than keeping track. Cogn. Dev. 14(1), 37–56 (1999)CrossRefGoogle Scholar
  85. 85.
    K. Moeller, L. Martignon, S. Wessolowski, J. Engel, H.C. Nuerk, Effects of finger counting on numerical development – the opposing views of neurocognition and mathematics education. Front. Psychol. 2, 328 (2011). fpsyg.2011.00328 CrossRefGoogle Scholar
  86. 86.
    M. Rucinski, A. Cangelosi, T. Belpaeme, Robotic model of the contribution of gesture to learning to count, in Proceedings of the IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-Epirob 2012), San Diego, 2012, pp. 1–6Google Scholar
  87. 87.
    V.M. De La Cruz, A. Di Nuovo, S. Di Nuovo, A. Cangelosi, Making fingers and words count in a cognitive robot. Front. Behav. Neurosci. 8, 13 (2014)Google Scholar
  88. 88.
    K. Moeller, U. Fischer, T. Link, M. Wasner, S. Huber, U. Cress, Learning and development of embodied numerosity. Cogn. Process. 13, 271–274 (2012)CrossRefGoogle Scholar
  89. 89.
    C.M. Bishop, Pattern recognition. Mach. Learn. 128, 1–58 (2006)Google Scholar
  90. 90.
    R. Brooks, A robust layered control system for a mobile robot. IEEE J. Rob. Autom. 2(1), 14–23 (1986)CrossRefGoogle Scholar
  91. 91.
    A. Jauffret, N. Cuperlier, P. Gaussier, P. Tarroux, Multimodal integration of visual place cells and grid cells for navigation tasks of a real robot, in Proceedings of the 12th International Conference on Simulation of Adaptive Behavior, vol 7426, Odense, 2012, pp. 136–145Google Scholar
  92. 92.
    A. Pitti, A. Blanchard, M. Cardinaux, P. Gaussier, Distinct mechanisms for multimodal integration and unimodal representation in spatial development, in Proceedings of the IEEE International Conference on Development and Learning and Epigenetic Robotics, San Diego, 2012, pp. 1–6Google Scholar
  93. 93.
    D. Lahat, T. Adali, C. Jutten, Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)CrossRefGoogle Scholar
  94. 94.
    R.R. Murphy, Introduction to AI Robotics (The MIT Press, Cambridge, MA, 2000)Google Scholar
  95. 95.
    T. Kuriyama, T. Shibuya, T. Harada, Y. Kuniyoshi, Learning interaction rules through compression of sensori-motor causality space, in Proceedings of the 10th International Conference on Epigenetic Robotics, Örenäs Slott, 2010, pp. 57–64Google Scholar
  96. 96.
    M. Ogino, H. Toichi, Y. Yoshikawa, M. Asada, Interaction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping. Robot. Auton. Syst. 54(5), 414–418 (2006)CrossRefGoogle Scholar
  97. 97.
    H. Celikkanat, G. Orhan, N. Pugeault, F. Guerin, S. Erol, S. Kalkan, Learning and using context on a humanoid robot using latent dirichlet allocation, in Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob), Genoa (IEEE, 2014), pp. 201–207Google Scholar
  98. 98.
    S. Lallee, D.P. Ford, Multi-modal convergence maps: From body schema and self-representation to mental imagery. Adapt. Behav. 21(4), 274–285 (2013)CrossRefGoogle Scholar
  99. 99.
    J. Sinapov, A. Stoytchev, Object category recognition by a humanoid robot using behavior- grounded relational learning, in IEEE International Conference on Robotics and Automation (ICRA), Shanghai, 2011, pp. 184–190Google Scholar
  100. 100.
    J. Sinapov, C. Schenck, K. Staley, V. Sukhoy, A. Stoytchev, Grounding semantic categories in be- havioral interactions: experiments with 100 objects. Robot. Auton. Syst. 62(5), 632–645 (2014)CrossRefGoogle Scholar
  101. 101.
    T. Nakamura, Y. Ando, T. Nagai, M. Kaneko, Concept formation by robots using an infinite mixture of models, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, 2015Google Scholar
  102. 102.
    S. Ivaldi, S.M. Nguyen, N. Lyubova, A. Droniou, V. Padois, D. Filliat, P.Y. Oudeyer, O. Sigaud, Object learning through active exploration. IEEE Trans. Auton. Ment. Dev. 6(1), 56–72 (2014)CrossRefGoogle Scholar
  103. 103.
    T. Nakamura, T. Nagai, N. Iwahashi, Grounding of word meanings in multimodal concepts using LDA, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Press, St. Louis, 2009), pp. 3943–3948Google Scholar
  104. 104.
    T. Araki, T. Nakamura, T. Nagai, K. Funakoshi, M. Nakano, N. Iwahashi, Autonomous acquisition of multimodal information for online object concept formation by robots, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco (IEEE, 2011), pp. 1540–1547Google Scholar
  105. 105.
    M. Cooke, J. Barker, S. Cunningham, X. Shao, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. America 120(5), 2421–2424 (2006)CrossRefGoogle Scholar
  106. 106.
    A.V. Nefian, L. Liang, X. Pi, X. Liu, K. Murphy, Dynamic bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Sig. Proces. 11, 1274–1288 (2002)zbMATHGoogle Scholar
  107. 107.
    K. Noda, Y. Yamaguchi, K. Nakadai, H.G. Okuno, T. Ogata, Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2015)CrossRefGoogle Scholar
  108. 108.
    Q. Summerfield, Lipreading and audio-visual speech perception. Philos. Trans. R. Soc. London B: Biol. Sci. 335(1273), 71–78 (1992)CrossRefGoogle Scholar
  109. 109.
    X. Zhang, C. Broun, R. Mersereau, M. Clements, Automatic speech reading with applications to human-computer interfaces. EURASIP J. Appl. Sig. Proces. 11, 1228–1247 (2002)zbMATHGoogle Scholar
  110. 110.
    K. Kumar, T. Chen, R. Stern, Profile view lip reading, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, 2007Google Scholar
  111. 111.
    K. Kaneko, F. Kanehiro, S. Kajita , H. Hirukawa, T. Kawasaki, M. Hirata, K. Akachi, T. Isozumi, Humanoid robot HRP-2, in Proceedings of the IEEE International Conference on Robotics and Automation, vol 2, Barcelona, 2004, pp. 1083–1090Google Scholar
  112. 112.
    Y. Sakagami, R. Watanabe, C. Aoyama , S. Matsunaga, N. Higaki, K. Fujimura, The intelligent ASIMO: system overview and integration, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and System, vol 3, Lausanne, 2002, pp. 2478–2483Google Scholar
  113. 113.
    Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  114. 114.
    A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (Neural Information Processing Systems Foundation, Lake Tahoe, 2012), pp. 1097–1105Google Scholar
  115. 115.
    Q.V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G.S. Corrado, J. Dean, A.Y. Ng, Building high-level features using large scale unsupervised learning, in International conference in machine learning (ICML), Bellevue, 2011Google Scholar
  116. 116.
    Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  117. 117.
    J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A.Y. Ng, Multimodal deep learning, in Proceedings of the 28th International Conference on Machine Learning, Bellevue, 2011, pp. 689–696Google Scholar
  118. 118.
    J. Huangand, B. Kingsbury, Audio-visual deep learning for noise robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013, pp. 7596–7599Google Scholar
  119. 119.
    M. Gurban, J.P. Thiran, T. Drugman, T. Dutoit, Dynamic modality weighting for multi-stream hmms in audio-visual speech recognition, in Proceedings of the 10th International Conference on Multimodal Interfaces, Chania, 2008, pp. 237–240Google Scholar
  120. 120.
    S. Heinrich, S. Magg, S. Wermter, Analysing the multiple timescale recurrent neural network for embodied language understanding, in Artificial Neural Networks, vol 4, ed. by P. Koprinkova-Hristova, V. Mladenov, N. K. Kasabov (Springer International Publishing, 2015), pp. 149–174Google Scholar

Recommended Readings

  1. A. Cangelosi, M. Schlesinger, Developmental Robotics: From Babies to Robots (MIT Press, Cambridge, MA, 2015.) see chapter 7 and 8Google Scholar
  2. A. Cangelosi, G. Metta, G. Sagerer, S. Nolfi, C.L. Nehaniv, K. Fischer, J. Tani, B. Belpaeme, G. Sandini, L. Fadiga, B. Wrede, K. Rohlfing, E. Tuci, K. Dautenhahn, J. Saunders, A. Zeschel, Integration of action and language knowledge: a roadmap for developmental robotics. IEEE Trans. Auton. Ment. Dev. 2(3), 167–195 (2010)CrossRefGoogle Scholar
  3. T. Taniguchi, T. Nagai, T. Nakamura, N. Iwahashi, T. Ogata, H. Asoh, Symbol emergence in robotics: a survey. Adv. Robot. 30(11–12), 706–728 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Centre for Robotics and Neural Systems, School of Computing and MathematicsPlymouth UniversityPlymouthUK
  2. 2.Department of Intermedia Art and Science, Faculty of Science and EngineeringWaseda UniversityTokyoJapan

Personalised recommendations