The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Aji SM, McEliece RJ (2000) The generalized distributive law. IEEE Trans Inf Theory 46(2):325–343
Aoyama K, Shimomura H (2005) Real world speech interaction with a humanoid robot on a layered robot behavior architecture. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA05, Barcelona, Spain, pp 3825–3830
Brennan SE, Hulteen EA (1995) Interaction and feedback in a spoken language system: a theoretical framework. Knowl Based Syst 8(2–3):143–151
Burgard W, Cremers AB, Fox D, Hhnel D, Lakemeyer G, Schulz D, Steiner W, Thrun S (1999) Experiences with an interactive museum tour-guide robot. Artif Intell 114(1–2): 1–53
Clark H, Brennan S (1991) Perspectives on socially shared cognition Grounding in Communication American Psychological Association, Washington, pp 127–149
Clark HH, Schaefer EF (1989) Contributing to discourse. Cognit Sci 13(2):259–294
Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks (research note). Artif Intell 42(2–3):393–405
Drygajlo A, Prodanov P, Ramel G, Messier M, Siegwart R (2003) On developing voice enabled interface for interactive tour-guide robots. Adv Robot 17(7):599–616
Dybkjaer L, Bernsen NO, Minker W (2004) Evaluation and usability of multimodal spoken language dialogue systems. Speech Communi 43(1–2):33–54
Gibbon D, Moore R, R. Winski, e. (1997) Handbook of standards and resources for spoken language systems. Mouton de Gruyter, Berlin
Gibbon D, Mertins I, R. Moore, e. (2000) Handbook of multimodal and spoken dialogue systems: resources, termonology and product evaluation. Kluwer, Dordchet
Hong J-H, Song Y-S, Cho S-B (2005) A hierarchical bayesian network for mixed-initiative human-robot interaction. In: 2005 IEEE International Conference on Robotics and Automation, ICRA 2005 Barcelona, Spain, pp 3819–3824
Horvitz E, Paek T (1999) A computational architecture for conversation. In: UM ’99: Proceedings of the seventh international conference on User modeling, Springer, New York, Secaucus, NJ, USA, pp 201–210
Horvitz E, Paek T (2000) Deeplistener: Harnessing expected utility to guide clarification dialog in spoken language systems. In: ICSLP 2000: 6th international conference on spoken language processing, Beijing, China
Horvitz E, Paek T (2001) Harnessing models of users’ goals to mediate clarification dialog in spoken language systems. In: UM ’01: Proceedings of the 8th international conference on user modeling 2001, Springer, Berlin, London, UK, pp 3–13
Huang X, Acero A, Hon H-W (2001) Spoken language Processing: a guide to theory, algorithm and system development, 1st edn. Prentice Hall
Huttenrauch H, Green A, Norman M, Oestreicher L, Eklundh K (2004) Involving users in the design of a mobile office robot. IEEE Trans Syst Man Cybern, C 34(2):113–124
Jensen B, Froidevaux G, Greppin X, Lorotte A, Mayor L, Meisser M, Ramel G, Siegwart R (2002a) The interactive autonomous mobile system robox. In: International Conference on intelligent robots and systems, IROS 2002, Lausanne, Switzerland, pp 1221–1227
Jensen B, Froidevaux G, Greppin X, Lorotte A, Mayor L, Meisser M, Ramel G, Siegwart R (2002b) Visitor flow management using human-robot interaction at expo.02. In: Workshop: robotics in exhibitions, IROS 2002, Lausanne, Switzerland
Jensen B, Tomatis N, Mayor L, Drygajlo A, Siegwart R (2005) Robots meet humans—interaction in public spaces. IEEE Trans Ind Electron 52(6):1530–1546
Jensen F (1996) An introduction to Bayesian networks, 1st edn. UCL Press
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Machine Learn 37(2):183–233
Josifovski L (2002) Robust automatic speech recognition with missing and unreliable data. Ph.D. thesis, Department of Computer Science, University of Sheffield, UK
Kleinehagenbrock M, Lang S, Fritsch J, Lömker F, Fink GA, Sagerer G (2002) Person tracking with a mobile robot based on multi-modal anchoring. In: Proceedings IEEE International workshop on robot and human interactive communication (ROMAN), IEEE Berlin, Germany, IEEE, pp 423–429
Lang S, Kleinehagenbrock M, Hohenner S, Fritsch J, Fink GA, Sagerer G (2003) Providing the basis for human–robot-interaction: a multi-modal attention system for a mobile robot. In: ICMI ’03: Proceedings of the 5th international conference on multimodal interfaces, NY, USA ACM Press, New York, pp 28–35
Li S, Haasch A, Wrede B, Fritsch J, Sagerer G (2005) Human-style interaction with a robot for cooperative learning of scene objects. In: ICMI ’05: Proceedings of the 7th international conference on Multimodal interfaces, ACM Press, New York, NY, USA, pp 151–158
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid objection detection. IEEE ICIP, 900–903
Murphy K (2002) Dynamic bayesian networks: representation, inference and learning. Ph.D. thesis, U.C. Berkeley
Nakadai K, Hidai K, Mizoguchi H, Okuno HG, Kitano H (2001) Real-time auditory and visual multiple-object tracking for humanoids. In: Proceedings of the 17th international joint conference on artificial intelligence, IJCAI 2001, Seattle, Washington, USA, pp 1425–1436
Paek T, Horvitz E (1999) Uncertainty, utility, and misunderstanding: A decision-theoretic perspective on grounding in conversational systems. In: Brennan SE, Giboin A, Traum D (eds) Working papers of the AAAI fall symposium on psychological models of communication in collaborative systems, American Association for Artificial Intelligence, Menlo Park, California, pp 85–92
Paek T, Horvitz E, Ringger E (2000) Continuous listening for unconstrained spoken dialog. In: ICSLP 2000: 6th international conference on Spoken Language Processing, Beijing, China
Pavlovic VI (1999) Dynamic Bayesian networks for information fusion with application to human-computer interfaces. Ph.D. thesis, University of Illinois Urbana-Champaign
Prodanov P, Drygajlo A (2005a) Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions. Speech Communi 45(3): 231–248
Prodanov P, Drygajlo A (2005b) Decision networks for repair strategies in speech-based interaction with mobile tour-guide robots. In: Proceedings of international conference on robotics and automation, IEEE ICRA 2005, Barcelona, Spain
Russell S, Norvig P (2003) Artificial intelligence: a modern approach.2nd edn. Prentice Hall
Sidner CL, Kidd C, Lee C, Lesh N (2004) Where to look: A study of human-robot engagement. In: Proceedings intelligent user interfaces (IUI), Funchal, Island of Madeira, Portugal, pp 78–84
Tasaki T, Komatani K, Ogata T, Okuno HG (2005) Spatially mapping of friendliness for human-robot interaction. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS 2005), Edmonton, Alberta, Canada
Traum D (1999) Computational models of grounding in collaborative systems. In: AAAI fall symposium on psychological models of communication, pp 124–131
Traum DR, Dillenbourg P (1998) Towards a normative model of grounding in collaboration.
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), ISSN: 1063-6919, vol 1, pp 511–518
About this article
Cite this article
Prodanov, P., Drygajlo, A., Richiardi, J. et al. Low-level grounding in a multimodal mobile service robot conversational system using graphical models. Intel Serv Robotics 1, 3–26 (2008). https://doi.org/10.1007/s11370-006-0001-9
- Service robots
- Spoken interaction
- Bayesian networks
- Efficient inference