Applied Intelligence

, Volume 44, Issue 1, pp 43–66 | Cite as

Speaky for robots: the development of vocal interfaces for robotic applications

  • Emanuele Bastianelli
  • Daniele Nardi
  • Luigia Carlucci Aiello
  • Fabrizio Giacomelli
  • Nicolamaria Manes


The currently available speech technologies on mobile devices achieve effective performance in terms of both reliability and the language they are able to capture. The availability of performant speech recognition engines may also support the deployment of vocal interfaces in consumer robots. However, the design and implementation of such interfaces still requires significant work. The language processing chain and the domain knowledge must be built for the specific features of the robotic platform, the deployment environment and the tasks to be performed. Hence, such interfaces are currently built in a completely ad hoc way. In this paper, we present a design methodology together with a support tool aiming to streamline and improve the implementation of dedicated vocal interfaces for robots. This work was developed within an experimental project called Speaky for Robots. We extend the existing vocal interface development framework to target robotic applications. The proposed solution is built using a bottom-up approach by refining the language processing chain through the development of vocal interfaces for different robotic platforms and domains. The proposed approach is validated both in experiments involving several research prototypes and in tests involving end-users.


Human robot interaction Natural language interfaces Spoken language understanding Knowledge representation 


  1. 1.
    Asoh H, Vlassis NA, Motomura Y, Asano F, Hara I, Hayamizu S, Itou K, Kurita T, Matsui T, Bunschoten R, Kröse BJA (2001) Jijo-2: An office robot that communicates and learns. IEEE Intell Syst 16(5):46–55Google Scholar
  2. 2.
    Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley framenet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics. ACL, pp 86–90Google Scholar
  3. 3.
    Bannat A, Blume J, Geiger JT, Rehrl T, Wallhoff F, Mayer C, Radig B, Sosnowski S, Kühnlenz K (2010) A multimodal human-robot-dialog applying emotional feedbacks. In: Proceedings of international conference of social robotics, pp 1–10Google Scholar
  4. 4.
    Bastianelli E, Bloisi D, Capobianco R, Cossu F, Gemignani G, Iocchi L, Nardi D (2013) On-line semantic mapping. In: Proceeding of international conference on advanced robotics. IEEE, pp 1–6Google Scholar
  5. 5.
    Bastianelli E, Bloisi D, Capobianco R, Gemignani G, Iocchi L, Nardi D (2013) Knowledge representation for robots through human-robot interaction. CoRR
  6. 6.
    Bastianelli E, Castellucci G, Croce D, Basili R, Nardi D (2014) Effective and robust natural language understanding for human-robot interaction. In: Proceedings of 21st European conference on artificial intelligence. IOS Press, pp 57–62Google Scholar
  7. 7.
    Bastianelli E, Castellucci G, Croce D, Basili R, Nardi D Natural language technologies for adaptive spoken human-robot interaction (2014). In preparationGoogle Scholar
  8. 8.
    Bastianelli E, Castellucci G, Croce D, Iocchi L, Basili R, Nardi D (2014) Huric: a human robot interaction corpus. In: Chair NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the 9th international conference on language resources and evaluation (LREC’14), European Language Resources Association (ELRA), ReykjavikGoogle Scholar
  9. 9.
    Bos J (2002) Compilation of unification grammars with compositional semantics to speech recognition packages. In: Proceedings of the 19th international conference on computational linguistics. ACL, pp 1–7Google Scholar
  10. 10.
    Bos J, Oka T (2007) A spoken language interface with a mobile robot. Artif Life Robot 11(1):42–47CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Connell JH (2014) Extensible grounding of speech for robot instruction. In: Markowitz J (ed) Robots that talk and listen: technology and social impact. Walter De Gruyter IncGoogle Scholar
  13. 13.
    Coradeschi S, Saffiotti A (2003) An introduction to the anchoring problem. Robot Auton Syst 43(2–3):85–96CrossRefGoogle Scholar
  14. 14.
    Fasola J, Mataric M (2013) Using semantic fields to model dynamic spatial relations in a robot architecture for natural language instruction of service robots. In: Proceedings of international conference on intelligent robots and systems, pp 143–150Google Scholar
  15. 15.
    Fillmore CJ (1985) Frames and the semantics of understanding. Quaderni di Semantica 6(2):222–254Google Scholar
  16. 16.
    Foster ME, Giuliani M, Isard A, Matheson C, Oberlander J, Knoll A (2009) Evaluating description and reference strategies in a cooperative human-robot dialogue system. In: Proceedings of 21st international jont conference on artifical intelligence. Morgan Kaufmann Publishers Inc, pp 1818–1823Google Scholar
  17. 17.
    Harnad S (1990) The symbol grounding problem. Physica D: Nonlinear Phenomena 42(1-3):335–346CrossRefGoogle Scholar
  18. 18.
    Kamp H (1981) A theory of truth and semantic representation. In: Groenendijk JAG, Janssen TMV, Stokhof MBJ (eds) Formal methods in the study of language, vol 1. Mathematisch Centrum, pp 277–322Google Scholar
  19. 19.
    Kollar T, Tellex S, Roy D, Roy N (2010) Toward understanding natural language directions. In: Proceedings of the 5th international conference on human-robot interaction. ACM/IEEE, IEEE Press, pp 259–266Google Scholar
  20. 20.
    Kollar T, Tellex S, Roy N (2010) A discriminative model for understanding natural language route directions. In: Proceedings of association for the advancement of artificial intelligence fall symposium: dialog with robots’10Google Scholar
  21. 21.
    Kruijff G, Zender H, Jensfelt P, Christensen H (2007) Situated dialogue and spatial organization: What, where... and why, vol 4, pp 125–138. Special issue on human and robot interactive communicationGoogle Scholar
  22. 22.
    Kruijff GJM (2001) A categorial-modal logical architecture of informativity: dependency grammar logic & information structure. Ph.D. thesis, Faculty of Mathematics and Physics. Charles University, Czech RepublicGoogle Scholar
  23. 23.
    de Mori R (2007). In: Furui S, Kawahara T (eds) Spoken language understanding: a survey. IEEE, pp 365–376Google Scholar
  24. 24.
  25. 25.
    Nardi D, Lima P (2012) RoboCup: the robot soccer world cup. In: Lima P, Cortesao R (eds) Proceedings of the international conference on intelligent robots and systems. Workshop on robot competitions: benchmarking, technology transfer and education. IEEE/RSJ, IEEEGoogle Scholar
  26. 26.
    Nishimori M, Saitoh T, Konishi R (2007) Voice controlled intelligent wheelchair. In: Proceedings of society of instrument and control engineers annual conference. IEEE, pp 336–340Google Scholar
  27. 27.
    Nisimura R, Uchida T, Lee A, Saruwatari H, Shikano K, Matsumoto Y (2002) ASKA: Receptionist robot with speech dialogue system. IEEE/RSJ, pp 1314–1319Google Scholar
  28. 28.
    Nüchter A, Hertzberg J (2008) Towards semantic maps for mobile robots. Robot Auton Syst 56(11):915–926CrossRefGoogle Scholar
  29. 29.
    Palmer M, Gildea D, Xue N (2010) Semantic role labeling. Synthesis lectures on human language technologies. Morgan & Claypool PublishersGoogle Scholar
  30. 30.
    Popović M, Ney H (2007) Word error rates: decomposition over pos classes and applications for error analysis. In: Proceedings of the 2nd workshop on statistical machine translation. ACL, pp 48–55Google Scholar
  31. 31. The corpora robot company.
  32. 32.
    Rybski P, Yoon K, Stolarz J, Veloso M (2007) Interactive robot task training through dialog and demonstration. In: Proceedings of international conference on human-robot interaction. ACM/IEEE, ACM, pp 49–56Google Scholar
  33. 33.
    Sallé D, Traonmilin M, Canou J, Dupourqué V (2007) Using microsoft robotics studio for the design of generic robotics controllers: the robubox software. In: Proceedings of international conference on robotics and automation. Workshop software development and integration in robotics. IEEEGoogle Scholar
  34. 34.
    Stiefelhagen R, Ekenel H, Fugen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Voit M, Waibel A (2007) Enabling multimodal human–robot interaction for the Karlsruhe humanoid robot. IEEE Trans Robot 23(5):840–851CrossRefGoogle Scholar
  35. 35.
    Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller S, Roy N (2011) Approaching the symbol grounding problem with probabilistic graphical models. AI Mag 32(4):64–76Google Scholar
  36. 36.
    Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller SJ, Roy N (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In: Proceedings of association for the advancement of artificial intelligenceGoogle Scholar
  37. 37.
    Theobalt C, Bos J, Chapman T, Espinosa-Romero A, Fraser M, Hayes G, Klein E, Oka T, Reeve R (2002) Talking to godot: dialogue with a mobile robot. In: Proceedings of international conference on intelligent robots and systems. IEEE/RSJGoogle Scholar
  38. 38.
    Thomas BJ, Jenkins OC (2012) Roboframenet: verb-centric semantics for actions in robot middleware. In: Proceedings of international conference on robotics and automation, pp 4750–4755Google Scholar
  39. 39.
    Thrun S, Beetz M, Bennewitz M, Burgard W, Cremers A, Dellaert F, Fox D, Haehnel D, Rosenberg C, Roy N, Schulte J, Schulz D (2000) Probabilistic algorithms and the interactive museum tour-guide robot Minerva. J Robot Res 19(11)Google Scholar
  40. 40.
    Topp EA (2008) Human-robot interaction and mapping with a service robot: human augmented mapping. Ph.D. thesis, Royal Institute of Technology, School of Computer Science and CommunicationGoogle Scholar
  41. 41.
    Warwick K, Shah H (2013) Good machine performance in turing’s imitation game. IEEE Trans Comput Intell AI Games 6(3):289–299CrossRefGoogle Scholar
  42. 42.
    Zuo X, Iwahashi N, Taguchi R, Funakoshi K, Nakano M, Matsuda S, Sugiura K, Oka N (2010) Detecting robot-directed speech by situated understanding in object manipulation tasks. In: Avizzano CA, Ruffaldi E (eds) Proceedings of the international symposium of robots and human interactive communication. IEEE, pp 608–613Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Emanuele Bastianelli
    • 1
  • Daniele Nardi
    • 1
  • Luigia Carlucci Aiello
    • 1
  • Fabrizio Giacomelli
    • 2
  • Nicolamaria Manes
    • 2
  1. 1.Department of Computer, Control and Management EngineeringSapienza University of RomeRomeItaly
  2. 2.Mediavoice S.r.l.RomeItaly

Personalised recommendations