Tasking robots through multimodal interfaces: The “Coach Metaphor”

  • Luc Julia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1456)


This paper presents multimodal interfaces to task and control multiple robots controlled by an agent-based architecture. For the past few years, SRI International have followed an approach based on the “Coach Metaphor”. In sports or business, coaches are meant to apply predefined strategies to their teams, or, if something goes wrong, to find new means and plans during an ongoing game, so as to retask either the entire team or specific players. This is also the challenge facing a robot's operator. SRI's agent-based framework, the Open Agent ArchitectureTM (OAA), provides communication between the members of a team and the external world. The coach, or the robot's operator, who is an active member of the team, is provided with a multimodal interface that uses pen and voice. The analogy of a coach talking and drawing on a white clipboard representing the virtual world where the players are developing their game reinforces the metaphor. We present several interfaces specifically developed for SRI's robots, and we show an example (controlling robots on a soccer field) where the metaphor matches, one to one, the real world. To clarify our views, we will give an overview of the technologies in use, such as the agent architecture, the speech and gesture recognizers, and the robot controller.


Speech Recognition Speech Recognition System Multimodal Interface Task Robot Soccer Field 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1•.
    Bolt, R. Put-That-There: Voice and Gesture at the graphics interface. Computer Graphics, Vol. 14, Number 3, pp. 262–270, 1980.Google Scholar
  2. 2•.
    Cheyer, A. and Julia, L. Multimodal maps: An agent-based approach. In Proc. of CMC'95, pp. 103–113, Eindhoven, The Netherlands, May 1995.Google Scholar
  3. 3•.
    Cheyer, A. and Julia, L. MVIEWS: Multimodal Tools for the Video Analyst. In Proc. of IUI'98, pp 55–62, San Francisco, USA, January 1998.Google Scholar
  4. 4•.
    Cheyer, A., Julia, L. and Martin. J.C. A Unified Framework for Constructing Multimodal Experiments and Applications. In Proc. of CMC'98, pp. 63–69, Tilburg, The Netherlands, January 1998Google Scholar
  5. 5•.
    Digalakis, V., Monaco, P. and Murveit, H. Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers. IEEE Transactions of Speech and Audio Processing, Vol.4, Num. 4, p 281, 1996.Google Scholar
  6. 6•.
    Dowding, J., Gawron, J., M. Appelt, D., Bear, J., Cherny, L., Moore, R. and Moran, D. GEMINI: A natural language system for spoken-language understanding. 31st Annual Meeting of the Association for Computational Linguistics. Pp. 54–61. Colombus, USA, 1996Google Scholar
  7. 7•.
    Guzzoni, D., Cheyer, A., Julia, L. and Konolige, K. Many Robots Make Short Work. AI Magazine, Vol. 18, Number 1, pp. 55–64, Spring 1997.Google Scholar
  8. 8•.
    Hobbs, J., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., and Tyson, M. FASTUS: a cascaded finite-state transducer for extracting information from natural-language text. in Finite State Devices for Natural Language Processing (E. Roche and Y. Schabes, eds.) MIT Press, Cambridge, USA, 1996.Google Scholar
  9. 9•.
    Julia, L. and Faure, C. A multimodai interface for incremental graphic document design, In Proc. HCI'93, p 186, Orlando, USA, August 1993.Google Scholar
  10. 10•.
    Julia, L. and Faure, C. Pattern recognition and beautification for a pen-based interface. In Proc. of ICDAR'95, pp. 58–63, Montreal, Canada, August 1995.Google Scholar
  11. 11•.
    Julia, L. and Cheyer, A. Speech: A Privileged Modality. In Proc. ofEuroSpeech'97, Vol. 4, pp. 1843–1846, Rhodes, Greece, September 1997Google Scholar
  12. 12•.
    Konolige, K., Myers, K., Ruspini, E. and Saffiotti, A. The SAPHIRA Architecture: A Design for Autonomy. Journal ofExperimental and Theoretical AI, Vol. 4, Number 0, pp. ?-?, ? 1997Google Scholar
  13. 13•.
    Martin, J.C., Julia, L. and Cheyer, A. A Theoretical Framework for Multimodal User Studies. In Proc. CMC'98, pp. 104–110, Tilburg, the Netherlands, January 1998.Google Scholar
  14. 14•.
    Mellor, B.A., Baber, C. and Tunley, C. In goal-oriented multimodal dialogue systems. In Proc. ICSLP'96, pp. 1668–1671, Philadelphia, USA, 1996.Google Scholar
  15. 15•.
    Moran, D., Cheyer, A., Julia, L. and Park, S. Multimodal user interfaces in the Open Agent Architecture. In Proc. ofIUI'97, pp. 61–68. Orlando, January 1997.Google Scholar
  16. 16•.
    Siroux, J., Guyomard, M., Jolly, Y., Multon, F. and Remondeau, C. Speech and Tactile-Based Georal System. In Proc. EUROSPEECH'95, pp. 1943–1946, Madrid, Spain, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Luc Julia
    • 1
  1. 1.STAR LaboratorySRI InternationalMenlo Park

Personalised recommendations