Encyclopedia of Robotics

Living Edition
| Editors: Marcelo H Ang, Oussama Khatib, Bruno Siciliano

Voice Speech Interfaces

  • Angelo CangelosiEmail author
  • Tetsuya Ogata
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-41610-1_28-1



Voice speech interfaces concerns the design and use of algorithms and tools based on natural language and machine-learning methods for human-robot communication.


A fundamental behavioral and cognitive capability of a robot interacting with a human user is speech, since spoken language is the primary means used by people to communicate with each other. Moreover, communication between people, and between humans and robots, is not only based on speech. Rather, communication is based on a rich multimodal process that combines spoken language with a variety of nonverbal behaviors such as eye gaze, hand gestures, tactile interaction, and emotional cues (Mavridis 2015; Cangelosi and Schlesinger 2015). Speech-based interfaces, complemented by multimodal communication, can contribute to forming a consistent and robust recognition process for the robot (and humans) by reducing ambiguity about the sensory...

This is a preview of subscription content, log in to check access.


  1. Antunes A, Saponaro G, Morse A, Jamone L, Santos-Victor J, Cangelosi A (2017) Learn, plan, remember: a developmental robot architecture for task solving. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), LisbonGoogle Scholar
  2. Araki T, Nakamura T, Nagai T, Funakoshi K, Nakano M, Iwahashi N (2011) Autonomous acquisition of multimodal information for online object concept formation by robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1540–1547Google Scholar
  3. Cangelosi A (2010) Grounding language in action and perception: from cognitive agents to humanoid robots. Phys Life Rev 7(2):139–151CrossRefGoogle Scholar
  4. Cangelosi A, Ogata T (2017) Language and speech in humanoid robots. In: Vadakkepat P, Goswami A (eds) Humanoid robotics: a reference. SpringerGoogle Scholar
  5. Cangelosi A, Schlesinger M (2015) Developmental robotics: from babies to robots. MIT Press, Cambridge, MA. (see chapter 7 and 8)Google Scholar
  6. Cangelosi A, Metta G, Sagerer G, Nolfi S, Nehaniv CL, Fischer K, Tani J, Belpaeme B, Sandini G, Fadiga L, Wrede B, Rohlfing K, Tuci E, Dautenhahn K, Saunders J, Zeschel A (2010) Integration of action and language knowledge: a roadmap for developmental robotics. IEEE Trans Auton Ment Dev 2(3):167–195CrossRefGoogle Scholar
  7. Celikkanat H, Orhan G, Pugeault N, Guerin F, Erol S, Kalkan S (2014) Learning and using context on a humanoid robot using latent Dirichlet allocation. In: Joint IEEE international conferences on development and learning and epigenetic robotics (ICDL-Epirob), pp 201–207Google Scholar
  8. Hara I, Asano F, Asoh H, Ogata J, Ichimura N, Kawai Y (2004) Robust speech interface based on audio and video information fusion for humanoid HRP-2. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2404–2410Google Scholar
  9. Hayashi K, Kanda T, Miyashita T, Ishiguro H, Hagita N (2008) Robot manzai: robot conversation as a passive–social medium. Int J Humanoid Rob 5(01):67–86CrossRefGoogle Scholar
  10. Ishiguro H (2007) Android science. In: Robotics research. Springer, Berlin/Heidelberg, pp 118–127CrossRefGoogle Scholar
  11. Kennedy J, de Greeff J, Read R, Baxter P, Belpaeme T (2014) The Chatbot strikes back. In: Proceedings of the 9th IEEE/ACM conference on human-robot interaction (HRI2014). IEEE/ACM Press, BielefeldGoogle Scholar
  12. Lallee S, Ford Dominey P (2013) Multi-modal convergence maps: from body schema and self-representation to mental imagery. Adapt Behav 21:274CrossRefGoogle Scholar
  13. Mavridis N (2015) A review of verbal and non-verbal human–robot interactive communication. Robot Auton Syst 63:22–35MathSciNetCrossRefGoogle Scholar
  14. Morse A, Cangelosi A (2017) Why are there developmental stages in language learning? A developmental robotics model of language development. Cogn Sci 41:32CrossRefGoogle Scholar
  15. Morse AF, DeGreeff J, Belpeame T, Cangelosi A (2010) Epigenetic robotics architecture (ERA). IEEE Trans Auton Ment Dev 2(4):325–339CrossRefGoogle Scholar
  16. Morse A, Belpaeme T, Smith L, Cangelosi A (2015) Posture affects how robots and infants map words to objects. PLoS One 10(3)CrossRefGoogle Scholar
  17. Nakamura T, Ando Y, Nagai T, Kaneko M (2015) Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)Google Scholar
  18. Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Sig Process 2002(11):1274–1288zbMATHGoogle Scholar
  19. Noda K, Arie H, Suga Y, Ogata T (2014) Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst 62(6):721–736CrossRefGoogle Scholar
  20. Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. Appl Intell 42(4):722–737CrossRefGoogle Scholar
  21. Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos Trans R Soc Lond B Biol Sci 367(1585):103–117CrossRefGoogle Scholar
  22. Samuelson LK, Smith LB, Perry LK, Spencer JP (2011) Grounding word learning in space. PLoS One 6(12):e28095CrossRefGoogle Scholar
  23. Shiomi M, Sakamoto D, Kanda T, Ishi CT, Ishiguro H, Hagita N (2008) A semi-autonomous communication robot: a field trial at a train station. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction, ACM, pp 303–310Google Scholar
  24. Steels L (ed) (2012) Experiments in cultural language evolution, vol 3. John Benjamins Publishing, Amsterdam/PhiladelphiaGoogle Scholar
  25. Sugita Y, Tani J (2005) Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt Behav 13(1):33–52CrossRefGoogle Scholar
  26. Taniguchi T, Nagai T, Nakamura T, Iwahashi N, Ogata T, Asoh H (2016) Symbol emergence in robotics: a surveyGoogle Scholar
  27. Tikhanoff V, Cangelosi A, Metta G (2011) Language understanding in humanoid robots: iCub simulation experiments. IEEE Trans Auton Ment Dev 3(1):17–29CrossRefGoogle Scholar
  28. Tuci E, Ferrauto T, Zeschel A, Massera G, Nolfi S (2011) An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots. IEEE Trans Auton Ment Dev 3(2):176–118CrossRefGoogle Scholar
  29. Twomey KE, Morse AF, Cangelosi A, Horst J (2016) Children’s referent selection and word learning: insights from a developmental robotic system. Interact Stud 17(1):101–127CrossRefGoogle Scholar
  30. Wallace RS (2009) The anatomy of A.L.I.C.E. In: Epstein R, Roberts G, Beber G (eds) Parsing the turing test. Springer Science+Business Media, London, pp 181–210CrossRefGoogle Scholar
  31. Yamashita Y, Tani J (2008) Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS Comput Biol 4(11):e1000220CrossRefGoogle Scholar
  32. Yang Y, Li Y, Fermüller C, Aloimonos Y (2015) Robot learning manipulation action plans by “Watching” unconstrained videos from the World Wide Web. In: The twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  33. Zhong J, Cangelosi A, Ogata T (2017) Understanding natural language sentences with word embedding and multi-modal interaction. In: Proceedings of 2017 IEEE joint international conference on development and learning and epigenetic robotics (ICDL-EpiRob), LisbonGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Centre for Robotics and Neural Systems, School of Computing and MathematicsPlymouth UniversityPlymouthUK
  2. 2.Faculty of Science and EngineeringWaseda UniversityTokyoJapan

Section editors and affiliations

  • Jee-Hwan Ryu
    • 1
  1. 1.School of Mechanical EngineeringKorea University of Technology & EducationCheon-AnRepublic of Korea