Loosely-coupled approach towards multi-modal browsing

  • Jan Kleindienst
  • Ladislav Seredi
  • Pekka Kapanen
  • Janne Bergman
Special issue on multimodality: a step towards universal access

Abstract

Contemplating the concept of universal-access multi-modal browsing comes as one of the emerging “killer” technologies that promises broader and more flexible access to information, faster task completion, and advanced user experience. Inheriting the best from GUI and speech, based on the circumstances, hardware capabilities, and environment, multi-modality’s great advantage is to provide application developers with a scalable blend of input and output channels that may accommodate any user, device, and platform. This article describes a flexible multi-modal browser architecture, named Ferda the Ant, which reuses uni-modal browser technologies available for VoiceXML, WML, and HTML browsing. A central component, the Virtual Proxy, acts as a synchronization coordinator. This browser architecture can be implemented in either a single client configuration, or by distributing the browser components across the network. We have defined and implemented a synchronization protocol to communicate the changes occurring in the context of a component browser to the other browsers participating in the multi-modal browser framework. Browser wrappers implement the required synchronization protocol functionality at each of the component browsers. The component browsers comply with existing content authoring standards, and we have designed a set of markup-level authoring conventions that facilitate maintaining the browser synchronization .

Keywords

Multi-modal Browser VoiceXML HTML WML MM, multi-modal DOM, Document Object Model VP, Virtual Proxy GUI, Graphical User Interface NLU, Natural Language Understanding WML,Wireless Markup Language HTML, HyperText Markup Language WWW, World-Wide Web WAP, Wireless Application Protocol W3C, World-Wide Web Consortium VoiceXML, Voice eXtensible Markup Language COM, Component Object Model HTTP, HyperText Transfer Protocol API, Application Programming Interface UI, User Interface FIA, Form Interpretation Algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    CATCH 2004 EU Project. Available at: http://www.catch2004.org Google Scholar
  2. 2.
    W3C Recommendation (13 November, 2000) Document Object Model (DOM) Level 2 Core Specification, Version 1.0. Available at: http://www.w3.org/TR/DOM-Level-2-Core/ Google Scholar
  3. 3.
    W3C Working Draft (Oct 2001) VoiceXML 2.0. Available at: http://www.w3.org/TR/2001/WD-voicexml20-20011023 Google Scholar
  4. 4.
    W3C Working Draft (June 2000) Multimodal Requirements for Voice Markup Languages. Available at: http://www.w3.org/TR/multimodal-reqs Google Scholar
  5. 5.
    Vergo J (1998) A statistical approach to multimodal natural language interaction. Proc 15th National Conference on Artificial Intelligence (AAAI’98), Madison, Wisconsin Google Scholar
  6. 6.
    Oviatt S (2000) Ten myths of multimodal interaction. Available at: http://www.cse.ogi.edu/CHCC/Papers/sharonPaper/Myths/myths.html Google Scholar
  7. 7.
    Oviatt S (2000) Taming recognition errors with a multimodal interface. Available at: http://www.cse.ogi.edu/CHCC/Publications/cacm9-2000/cacm9-2000.htm Google Scholar
  8. 8.
    IBM Websphere Voice Server SDK 2.0. Available at: http://www-3.ibm.com/software/speech/enterprise/ep_11.html Google Scholar
  9. 9.
    Microsoft Internet Explorer. Available at: http://www.microsoft.com Google Scholar
  10. 10.
    Nokia WAP Toolkit. Available at: http://www.nokia.com Google Scholar
  11. 11.
    Maes SH, Hosn R, Kleindienst J, Macek T, Raman TV, Seredi L (2001) A DOM-based MVC multi-modal e-business. IEEE Int Conf Multimedia and Expo (ICME2001), Tokyo, Japan Google Scholar
  12. 12.
    Ramaswamy G, Kleindienst J, Coffman D, Gopalakrishnan P, Neti C (1999) A pervasive conversational interface for information interaction. Eurospeech 99, Budapest, Hungary Google Scholar
  13. 13.
    Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. Proc 5th Int Multimedia Conf (Multimedia ’97), ACM Press, pp 31–40 Google Scholar
  14. 14.
    House D (1995) Spoken language access to multimedia (SLAM): A multimodal interface to the World-Wide Web. Master’s thesis, Department of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology, Portland, OR Google Scholar
  15. 15.
    Généreux M, Klein A, Schwank I, Trost H (2000) Evaluating multi-modal input modes in a Wizard-of-Oz study for the domain of Web search. Presented at HCI-IHM2001, Lille, Franc. Available at: http://www.ai.univie.ac.at/∼michel/pub/IHM-HCL2001.pdf Google Scholar
  16. 16.
    Rössler H, Sienel J, Wajda W, Hoffmann J, Kostrzewa M (2001) Multimodal interaction for mobile environments. Int Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, Italy Google Scholar
  17. 17.
    Fischer V, Günther C, Ivanecky J, Kunzmann S, Sedivy J, Ures L (2002) Towards multi-modal interfaces for embedded devices. In: Hoffmann R (ed) Elektronische Spachsignalverarbeitung – Tagungsband der 13. Konferenz, Reihe: Studientexte der Sprachkommunikation, Bd 24. w.e.b. Universitätsverlag, Dresden, pp 154–160 Google Scholar
  18. 18.
    Demesticha V, Gergic J, Kleindienst J, Mast M, Polymenakos L, Schulz H, Seredi L (2001) Aspects of design and implementation of multi-channel and multi-modal information system. ICSM2001, Italy Google Scholar
  19. 19.
    Despotopoulos Y, Patikis G, Soldatos J, Polymenakos L, Kleindienst J, Gergic J (2001) Accessing and transforming dynamic content based on XML: alternative techniques and a practical implementation. IIWAS 2001, Linz Google Scholar
  20. 20.
    IBM, Motorola and Opera Software (2001) XHTML+Voice. Submission to W3C, November 2001. Available at: http://www.w3.org/Submission/2001/13/ Google Scholar
  21. 21.
    Microsoft (2002) SALT 1.0 Specification Contributed to W3C. Available at: http://www.saltforum.org Google Scholar
  22. 22.
    Niklfeld G, Pucher M, Finan R, Eckhart W (2002) Mobile multi-modal data services for GPRS phones and beyond. ICMI 2002, Pittsburgh, USA Google Scholar
  23. 23.
    Azzini I, Giorgino T, Nardelli L, Orlandi M, Rognoni C (2002) An architecture for a multi-modal Web browser. IDS 2002, Kloster Irsee, Germany.Google Scholar

Copyright information

© Springer-Verlag 2003

Authors and Affiliations

  • Jan Kleindienst
    • 1
  • Ladislav Seredi
    • 1
  • Pekka Kapanen
    • 2
  • Janne Bergman
    • 2
  1. 1.IBM Voice Technologies & SystemsPraha 10Czech Republic
  2. 2.Nokia Research CenterTampereFinland

Personalised recommendations