Intelligent Service Robotics

, Volume 1, Issue 1, pp 3–26 | Cite as

Low-level grounding in a multimodal mobile service robot conversational system using graphical models

  • Plamen ProdanovEmail author
  • Andrzej Drygajlo
  • Jonas Richiardi
  • Anil Alexander
Original Research Paper


The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users.


Service robots Spoken interaction Grounding Bayesian networks Efficient inference 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aji SM, McEliece RJ (2000) The generalized distributive law. IEEE Trans Inf Theory 46(2):325–343zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Aoyama K, Shimomura H (2005) Real world speech interaction with a humanoid robot on a layered robot behavior architecture. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA05, Barcelona, Spain, pp 3825–3830Google Scholar
  3. 3.
    Brennan SE, Hulteen EA (1995) Interaction and feedback in a spoken language system: a theoretical framework. Knowl Based Syst 8(2–3):143–151CrossRefGoogle Scholar
  4. 4.
    Burgard W, Cremers AB, Fox D, Hhnel D, Lakemeyer G, Schulz D, Steiner W, Thrun S (1999) Experiences with an interactive museum tour-guide robot. Artif Intell 114(1–2): 1–53Google Scholar
  5. 5.
    Clark H, Brennan S (1991) Perspectives on socially shared cognition Grounding in Communication American Psychological Association, Washington, pp 127–149Google Scholar
  6. 6.
    Clark HH, Schaefer EF (1989) Contributing to discourse. Cognit Sci 13(2):259–294CrossRefGoogle Scholar
  7. 7.
    Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks (research note). Artif Intell 42(2–3):393–405CrossRefGoogle Scholar
  8. 8.
    Drygajlo A, Prodanov P, Ramel G, Messier M, Siegwart R (2003) On developing voice enabled interface for interactive tour-guide robots. Adv Robot 17(7):599–616CrossRefGoogle Scholar
  9. 9.
    Dybkjaer L, Bernsen NO, Minker W (2004) Evaluation and usability of multimodal spoken language dialogue systems. Speech Communi 43(1–2):33–54CrossRefGoogle Scholar
  10. 10.
    Gibbon D, Moore R, R. Winski, e. (1997) Handbook of standards and resources for spoken language systems. Mouton de Gruyter, BerlinGoogle Scholar
  11. 11.
    Gibbon D, Mertins I, R. Moore, e. (2000) Handbook of multimodal and spoken dialogue systems: resources, termonology and product evaluation. Kluwer, DordchetGoogle Scholar
  12. 12.
    Hong J-H, Song Y-S, Cho S-B (2005) A hierarchical bayesian network for mixed-initiative human-robot interaction. In: 2005 IEEE International Conference on Robotics and Automation, ICRA 2005 Barcelona, Spain, pp 3819–3824Google Scholar
  13. 13.
    Horvitz E, Paek T (1999) A computational architecture for conversation. In: UM ’99: Proceedings of the seventh international conference on User modeling, Springer, New York, Secaucus, NJ, USA, pp 201–210Google Scholar
  14. 14.
    Horvitz E, Paek T (2000) Deeplistener: Harnessing expected utility to guide clarification dialog in spoken language systems. In: ICSLP 2000: 6th international conference on spoken language processing, Beijing, ChinaGoogle Scholar
  15. 15.
    Horvitz E, Paek T (2001) Harnessing models of users’ goals to mediate clarification dialog in spoken language systems. In: UM ’01: Proceedings of the 8th international conference on user modeling 2001, Springer, Berlin, London, UK, pp 3–13Google Scholar
  16. 16.
    Huang X, Acero A, Hon H-W (2001) Spoken language Processing: a guide to theory, algorithm and system development, 1st edn. Prentice HallGoogle Scholar
  17. 17.
    Huttenrauch H, Green A, Norman M, Oestreicher L, Eklundh K (2004) Involving users in the design of a mobile office robot. IEEE Trans Syst Man Cybern, C 34(2):113–124CrossRefGoogle Scholar
  18. 18.
    Jensen B, Froidevaux G, Greppin X, Lorotte A, Mayor L, Meisser M, Ramel G, Siegwart R (2002a) The interactive autonomous mobile system robox. In: International Conference on intelligent robots and systems, IROS 2002, Lausanne, Switzerland, pp 1221–1227Google Scholar
  19. 19.
    Jensen B, Froidevaux G, Greppin X, Lorotte A, Mayor L, Meisser M, Ramel G, Siegwart R (2002b) Visitor flow management using human-robot interaction at expo.02. In: Workshop: robotics in exhibitions, IROS 2002, Lausanne, SwitzerlandGoogle Scholar
  20. 20.
    Jensen B, Tomatis N, Mayor L, Drygajlo A, Siegwart R (2005) Robots meet humans—interaction in public spaces. IEEE Trans Ind Electron 52(6):1530–1546CrossRefGoogle Scholar
  21. 21.
    Jensen F (1996) An introduction to Bayesian networks, 1st edn. UCL PressGoogle Scholar
  22. 22.
    Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Machine Learn 37(2):183–233zbMATHCrossRefGoogle Scholar
  23. 23.
    Josifovski L (2002) Robust automatic speech recognition with missing and unreliable data. Ph.D. thesis, Department of Computer Science, University of Sheffield, UKGoogle Scholar
  24. 24.
    Kleinehagenbrock M, Lang S, Fritsch J, Lömker F, Fink GA, Sagerer G (2002) Person tracking with a mobile robot based on multi-modal anchoring. In: Proceedings IEEE International workshop on robot and human interactive communication (ROMAN), IEEE Berlin, Germany, IEEE, pp 423–429Google Scholar
  25. 25.
    Lang S, Kleinehagenbrock M, Hohenner S, Fritsch J, Fink GA, Sagerer G (2003) Providing the basis for human–robot-interaction: a multi-modal attention system for a mobile robot. In: ICMI ’03: Proceedings of the 5th international conference on multimodal interfaces, NY, USA ACM Press, New York, pp 28–35Google Scholar
  26. 26.
    Li S, Haasch A, Wrede B, Fritsch J, Sagerer G (2005) Human-style interaction with a robot for cooperative learning of scene objects. In: ICMI ’05: Proceedings of the 7th international conference on Multimodal interfaces, ACM Press, New York, NY, USA, pp 151–158Google Scholar
  27. 27.
    Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid objection detection. IEEE ICIP, 900–903Google Scholar
  28. 28.
    Murphy K (2002) Dynamic bayesian networks: representation, inference and learning. Ph.D. thesis, U.C. BerkeleyGoogle Scholar
  29. 29.
    Nakadai K, Hidai K, Mizoguchi H, Okuno HG, Kitano H (2001) Real-time auditory and visual multiple-object tracking for humanoids. In: Proceedings of the 17th international joint conference on artificial intelligence, IJCAI 2001, Seattle, Washington, USA, pp 1425–1436Google Scholar
  30. 30.
    Paek T, Horvitz E (1999) Uncertainty, utility, and misunderstanding: A decision-theoretic perspective on grounding in conversational systems. In: Brennan SE, Giboin A, Traum D (eds) Working papers of the AAAI fall symposium on psychological models of communication in collaborative systems, American Association for Artificial Intelligence, Menlo Park, California, pp 85–92Google Scholar
  31. 31.
    Paek T, Horvitz E, Ringger E (2000) Continuous listening for unconstrained spoken dialog. In: ICSLP 2000: 6th international conference on Spoken Language Processing, Beijing, ChinaGoogle Scholar
  32. 32.
    Pavlovic VI (1999) Dynamic Bayesian networks for information fusion with application to human-computer interfaces. Ph.D. thesis, University of Illinois Urbana-ChampaignGoogle Scholar
  33. 33.
    Prodanov P, Drygajlo A (2005a) Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions. Speech Communi 45(3): 231–248CrossRefGoogle Scholar
  34. 34.
    Prodanov P, Drygajlo A (2005b) Decision networks for repair strategies in speech-based interaction with mobile tour-guide robots. In: Proceedings of international conference on robotics and automation, IEEE ICRA 2005, Barcelona, SpainGoogle Scholar
  35. 35.
    Russell S, Norvig P (2003) Artificial intelligence: a modern approach.2nd edn. Prentice HallGoogle Scholar
  36. 36.
    Sidner CL, Kidd C, Lee C, Lesh N (2004) Where to look: A study of human-robot engagement. In: Proceedings intelligent user interfaces (IUI), Funchal, Island of Madeira, Portugal, pp 78–84Google Scholar
  37. 37.
    Tasaki T, Komatani K, Ogata T, Okuno HG (2005) Spatially mapping of friendliness for human-robot interaction. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS 2005), Edmonton, Alberta, CanadaGoogle Scholar
  38. 38.
    Traum D (1999) Computational models of grounding in collaborative systems. In: AAAI fall symposium on psychological models of communication, pp 124–131Google Scholar
  39. 39.
    Traum DR, Dillenbourg P (1998) Towards a normative model of grounding in collaboration.Google Scholar
  40. 40.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), ISSN: 1063-6919, vol 1, pp 511–518Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Plamen Prodanov
    • 1
    • 3
    Email author
  • Andrzej Drygajlo
    • 1
  • Jonas Richiardi
    • 1
  • Anil Alexander
    • 2
  1. 1.Perceptual Artificial Intelligence Laboratory, Signal Processing InstituteSwiss Federal Institute of Technology Lausanne (EPFL)LausanneSwitzerland
  2. 2.Clarifying Technologies LtdOxfordUK
  3. 3.TBS Holding AGPfäffikonSwitzerland

Personalised recommendations