Skip to main content
Log in

Speaky for robots: the development of vocal interfaces for robotic applications

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The currently available speech technologies on mobile devices achieve effective performance in terms of both reliability and the language they are able to capture. The availability of performant speech recognition engines may also support the deployment of vocal interfaces in consumer robots. However, the design and implementation of such interfaces still requires significant work. The language processing chain and the domain knowledge must be built for the specific features of the robotic platform, the deployment environment and the tasks to be performed. Hence, such interfaces are currently built in a completely ad hoc way. In this paper, we present a design methodology together with a support tool aiming to streamline and improve the implementation of dedicated vocal interfaces for robots. This work was developed within an experimental project called Speaky for Robots. We extend the existing vocal interface development framework to target robotic applications. The proposed solution is built using a bottom-up approach by refining the language processing chain through the development of vocal interfaces for different robotic platforms and domains. The proposed approach is validated both in experiments involving several research prototypes and in tests involving end-users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. www.mediavoice.it

  2. www.echord.eu

  3. http://www.loquendo.com/

  4. http://msdn.microsoft.com/en-us/library/office/hh361572%28v=office.14%29.aspx

  5. http://www.w3.org/TR/speech-grammar/

  6. Available at http://www.dis.uniroma1.it/~labrococo/S4R/interface.zip

  7. http://www.aldebaran.com/en/humanoid-robot/nao-robot

  8. http://www.ros.org/

  9. http://www.dis.uniroma1.it/~spqr/MARRtino

  10. Available at http://sag.art.uniroma2.it/huric

References

  1. Asoh H, Vlassis NA, Motomura Y, Asano F, Hara I, Hayamizu S, Itou K, Kurita T, Matsui T, Bunschoten R, Kröse BJA (2001) Jijo-2: An office robot that communicates and learns. IEEE Intell Syst 16(5):46–55

    Google Scholar 

  2. Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley framenet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics. ACL, pp 86–90

  3. Bannat A, Blume J, Geiger JT, Rehrl T, Wallhoff F, Mayer C, Radig B, Sosnowski S, Kühnlenz K (2010) A multimodal human-robot-dialog applying emotional feedbacks. In: Proceedings of international conference of social robotics, pp 1–10

  4. Bastianelli E, Bloisi D, Capobianco R, Cossu F, Gemignani G, Iocchi L, Nardi D (2013) On-line semantic mapping. In: Proceeding of international conference on advanced robotics. IEEE, pp 1–6

  5. Bastianelli E, Bloisi D, Capobianco R, Gemignani G, Iocchi L, Nardi D (2013) Knowledge representation for robots through human-robot interaction. CoRR http://arxiv.org/abs/1307.7351

  6. Bastianelli E, Castellucci G, Croce D, Basili R, Nardi D (2014) Effective and robust natural language understanding for human-robot interaction. In: Proceedings of 21st European conference on artificial intelligence. IOS Press, pp 57–62

  7. Bastianelli E, Castellucci G, Croce D, Basili R, Nardi D Natural language technologies for adaptive spoken human-robot interaction (2014). In preparation

  8. Bastianelli E, Castellucci G, Croce D, Iocchi L, Basili R, Nardi D (2014) Huric: a human robot interaction corpus. In: Chair NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the 9th international conference on language resources and evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik

  9. Bos J (2002) Compilation of unification grammars with compositional semantics to speech recognition packages. In: Proceedings of the 19th international conference on computational linguistics. ACL, pp 1–7

  10. Bos J, Oka T (2007) A spoken language interface with a mobile robot. Artif Life Robot 11(1):42–47

    Article  Google Scholar 

  11. Cocorobo: Sharp. http://www.sharp.co.jp/cocorobo/

  12. Connell JH (2014) Extensible grounding of speech for robot instruction. In: Markowitz J (ed) Robots that talk and listen: technology and social impact. Walter De Gruyter Inc

  13. Coradeschi S, Saffiotti A (2003) An introduction to the anchoring problem. Robot Auton Syst 43(2–3):85–96

    Article  Google Scholar 

  14. Fasola J, Mataric M (2013) Using semantic fields to model dynamic spatial relations in a robot architecture for natural language instruction of service robots. In: Proceedings of international conference on intelligent robots and systems, pp 143–150

  15. Fillmore CJ (1985) Frames and the semantics of understanding. Quaderni di Semantica 6(2):222–254

    Google Scholar 

  16. Foster ME, Giuliani M, Isard A, Matheson C, Oberlander J, Knoll A (2009) Evaluating description and reference strategies in a cooperative human-robot dialogue system. In: Proceedings of 21st international jont conference on artifical intelligence. Morgan Kaufmann Publishers Inc, pp 1818–1823

  17. Harnad S (1990) The symbol grounding problem. Physica D: Nonlinear Phenomena 42(1-3):335–346

    Article  Google Scholar 

  18. Kamp H (1981) A theory of truth and semantic representation. In: Groenendijk JAG, Janssen TMV, Stokhof MBJ (eds) Formal methods in the study of language, vol 1. Mathematisch Centrum, pp 277–322

  19. Kollar T, Tellex S, Roy D, Roy N (2010) Toward understanding natural language directions. In: Proceedings of the 5th international conference on human-robot interaction. ACM/IEEE, IEEE Press, pp 259–266

  20. Kollar T, Tellex S, Roy N (2010) A discriminative model for understanding natural language route directions. In: Proceedings of association for the advancement of artificial intelligence fall symposium: dialog with robots’10

  21. Kruijff G, Zender H, Jensfelt P, Christensen H (2007) Situated dialogue and spatial organization: What, where... and why, vol 4, pp 125–138. Special issue on human and robot interactive communication

  22. Kruijff GJM (2001) A categorial-modal logical architecture of informativity: dependency grammar logic & information structure. Ph.D. thesis, Faculty of Mathematics and Physics. Charles University, Czech Republic

    Google Scholar 

  23. de Mori R (2007). In: Furui S, Kawahara T (eds) Spoken language understanding: a survey. IEEE, pp 365–376

  24. Nao: Aldebaran. http://www.aldebaran-robotics.com/

  25. Nardi D, Lima P (2012) RoboCup: the robot soccer world cup. In: Lima P, Cortesao R (eds) Proceedings of the international conference on intelligent robots and systems. Workshop on robot competitions: benchmarking, technology transfer and education. IEEE/RSJ, IEEE

  26. Nishimori M, Saitoh T, Konishi R (2007) Voice controlled intelligent wheelchair. In: Proceedings of society of instrument and control engineers annual conference. IEEE, pp 336–340

  27. Nisimura R, Uchida T, Lee A, Saruwatari H, Shikano K, Matsumoto Y (2002) ASKA: Receptionist robot with speech dialogue system. IEEE/RSJ, pp 1314–1319

  28. Nüchter A, Hertzberg J (2008) Towards semantic maps for mobile robots. Robot Auton Syst 56(11):915–926

    Article  Google Scholar 

  29. Palmer M, Gildea D, Xue N (2010) Semantic role labeling. Synthesis lectures on human language technologies. Morgan & Claypool Publishers

  30. Popović M, Ney H (2007) Word error rates: decomposition over pos classes and applications for error analysis. In: Proceedings of the 2nd workshop on statistical machine translation. ACL, pp 48–55

  31. Q.bo: The corpora robot company. http://thecorpora.com

  32. Rybski P, Yoon K, Stolarz J, Veloso M (2007) Interactive robot task training through dialog and demonstration. In: Proceedings of international conference on human-robot interaction. ACM/IEEE, ACM, pp 49–56

  33. Sallé D, Traonmilin M, Canou J, Dupourqué V (2007) Using microsoft robotics studio for the design of generic robotics controllers: the robubox software. In: Proceedings of international conference on robotics and automation. Workshop software development and integration in robotics. IEEE

  34. Stiefelhagen R, Ekenel H, Fugen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Voit M, Waibel A (2007) Enabling multimodal human–robot interaction for the Karlsruhe humanoid robot. IEEE Trans Robot 23(5):840–851

    Article  Google Scholar 

  35. Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller S, Roy N (2011) Approaching the symbol grounding problem with probabilistic graphical models. AI Mag 32(4):64–76

    Google Scholar 

  36. Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller SJ, Roy N (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In: Proceedings of association for the advancement of artificial intelligence

  37. Theobalt C, Bos J, Chapman T, Espinosa-Romero A, Fraser M, Hayes G, Klein E, Oka T, Reeve R (2002) Talking to godot: dialogue with a mobile robot. In: Proceedings of international conference on intelligent robots and systems. IEEE/RSJ

  38. Thomas BJ, Jenkins OC (2012) Roboframenet: verb-centric semantics for actions in robot middleware. In: Proceedings of international conference on robotics and automation, pp 4750–4755

  39. Thrun S, Beetz M, Bennewitz M, Burgard W, Cremers A, Dellaert F, Fox D, Haehnel D, Rosenberg C, Roy N, Schulte J, Schulz D (2000) Probabilistic algorithms and the interactive museum tour-guide robot Minerva. J Robot Res 19(11)

  40. Topp EA (2008) Human-robot interaction and mapping with a service robot: human augmented mapping. Ph.D. thesis, Royal Institute of Technology, School of Computer Science and Communication

  41. Warwick K, Shah H (2013) Good machine performance in turing’s imitation game. IEEE Trans Comput Intell AI Games 6(3):289–299

    Article  Google Scholar 

  42. Zuo X, Iwahashi N, Taguchi R, Funakoshi K, Nakano M, Matsuda S, Sugiura K, Oka N (2010) Detecting robot-directed speech by situated understanding in object manipulation tasks. In: Avizzano CA, Ruffaldi E (eds) Proceedings of the international symposium of robots and human interactive communication. IEEE, pp 608–613

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuele Bastianelli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bastianelli, E., Nardi, D., Aiello, L.C. et al. Speaky for robots: the development of vocal interfaces for robotic applications. Appl Intell 44, 43–66 (2016). https://doi.org/10.1007/s10489-015-0695-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0695-5

Keywords

Navigation