The W3C multimodal architecture and interfaces standard

Dahl, Deborah A.

doi:10.1007/s12193-013-0120-5

The W3C multimodal architecture and interfaces standard

Original Paper
Published: 07 April 2013

Volume 7, pages 171–182, (2013)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Deborah A. Dahl¹

544 Accesses
23 Citations
Explore all metrics

Abstract

This paper describes the World Wide Web Consortium’s (W3C) Multimodal Architecture and Interfaces (MMI Architecture) standard, an architecture and communications protocol that enables a wide variety of independent modalities to be integrated into multimodal applications. By encapsulating the functionalities of modality components and requiring all control information to go through the Interaction Manager, the MMI Architecture simplifies integrating components from multiple sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Universal design, inclusive design, accessible design, design for all: different concepts—one goal? On the concept of accessibility—historical, methodological and philosophical aspects

Article 07 May 2014

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

Article 11 January 2021

The Thing and I: Understanding the Relationship Between User and Product

Notes

For example, the StartRequest event might be mapped to a “startListening” method used by a modality-specific API.
In this example, we assume that the speech recognition component provides an interpretation of the input, in addition to the literal tokens of input, to allow for the user to express this request in other words, such as “Tell me about today’s weather”, or even “Will I need my umbrella?” However, the architecture supports interpreting the user’s input with a separate natural language understanding MC.

References

Turing A (1950) Computing machinery and intelligence. Mind 59:433–460
Article MathSciNet Google Scholar
Johnston M, Bangalore S, Vasireddy G, Stent A, Ehlen P, Walker M, Whittaker S, Maloor P (2001) MATCH: an architecture for multimodal dialogue systems. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for, Computational Linguistics, Philadelphia, pp 376–383
Bayer S (2005) Building a standards and research community with the galaxy communicator software infrastructure. In: Dahl DA (ed) Practical spoken dialog systems, vol 26. TextSpeech and Language Technology. Kluwer Academic Publishers, Dordrecht, pp 166–196
Oviatt SL (1999) Ten myths of multimodal interaction. Commun ACM 42:74–81
Article Google Scholar
Seneff S, Lau R, Polifroni J (1999) Organization, communication, and control in the Galaxy-II Conversational System. In: Proceedings of Eurospeech 1999, Budapest
Barnett J, Bodell M, Dahl DA, Kliche I, Larson J, Porter B, Raggett D, Raman TV, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. World Wide Web Consortium. http://www.w3.org/TR/mmi-arch/. Accessed November 20 2012
Barnett J, Akolkar R, Auburn RJ, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman TV, Reifenrath K, Rosenthal Na (2012) State chart XML (SCXML): state machine notation for control abstraction. World Wide Web Consortium. http://www.w3.org/TR/scxml/. Accessed November 20 2012
McGlashan S, Burnett DC, Carter J, Danielsen P, Ferrans J, Hunt A, Lucas B, Porter B, Rehor K, Tryphonas S (2004) Voice Extensible Markup Language (VoiceXML 2.0). W3C. http://www.w3.org/TR/voicexml20/. Accessed November 9 2012
Kopp S, Krenn B, Marsella S, Marshall A, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: The behavior markup language. In: International conference on intelligent virtual agents, Marina del Rey, California
Heylen D, Kopp S, Marsella S, Pelachaud C, Vilhjalmsson H (2008) The next step towards a functional markup language. Paper presented at the Proceeding of Intelligent Virtual Agents (IVA 2008), Tokyo
Scherer S, Marsella S, Stratou G, Xu Y, Morbini F, Egan A, Rizzo A, Morency L-P (2012) Perception markup language: towards a standardized representation of perceived nonverbal behaviors. In: Nakano Y, Neff M, Paiva A, Walker M (eds) Intelligent virtual agents, vol 7502. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 455–463. doi:10.1007/978-3-642-33197-8_47
Araki M, Tachibana K (2006) Multimodal dialog description language for Rapid system development. In: 7th SIGdial workshop on discourse and dialogue, Sydney
Rodriguez BH, Wiechno P, Dahl DA, Ashimura K, Tumuluri R (2012) Registration & discovery of multimodal modality components in multimodal systems: use cases and requirements. World Wide Web Consortium. http://www.w3.org/TR/mmi-discovery/. Accessed November 26 2012
Johnston M, Baggia P, Burnett D, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multimodal annotation markup language. W3C. http://www.w3.org/TR/emma/. Accessed November 9 2012
Bray T, Jean Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2004) Extensible Markup Language (XML) 1.0 (Third Edition). World Wide Web Consortium. http://www.w3.org/TR/2004/REC-xml-20040204/. Accessed November 9 2012
Burnett DC, Walker MR, Hunt A (2004) W3C speech synthesis markup language (SSML). W3C. http://www.w3.org/TR/speech-synthesis/
Oshry M, Auburn RJ, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice extensible markup language (VoiceXML) 2.1. http://www.w3.org/TR/voicexml21/. Accessed November 9 2012
Popescu A (2012) Geolocation API specification. World Wide Web Consortium. http://www.w3.org/TR/geolocation-API/. Accessed November 27 2012
Kostiainen A, Oksanen I, Hazaël-Massieux D (2012) HTML media capture. World Wide Web Consortium. http://www.w3.org/TR/capture-api/. Accessed November 27 2012
Microsoft (2007) Microsoft speech API 5.3 (SAPI). http://msdn2.microsoft.com/en-us/library/ms723627.aspx
Java Speech API (1998) Sun microsystems. http://java.sun.com/products/java-media/speech/
SALT Forum (2002) Speech application language tags (SALT). http://www.saltforum.org
IBM (2003) X+V 1.1. http://www-3.ibm.com/software/pervasive/multimodal/x+v/11/spec.htm
Bodell M, Bringert B, Brown R, Burnett DC, Dahl DA, Druta D, Ehlen P, Hemphill C, Johnston M, Pettay O, Sampath S, Schröder M, Shires G, Tumuluri R, Young M (2011) HTML speech incubator group final report. World Wide Web Consortium. http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/ . Accessed November 27 2012
Kliche I, Dahl DA, Larson JA, Rodriguez BH, Selvaraj M (2011) Best practices for creating MMI modality components. World Wide Web Consortium. http://www.w3.org/TR/2011/NOTE-mmi-mcbp-20110301/. Accessed November 20 2012
Watt SM, Underhill T, Chee Y-M, Franke K, Froumentin M, Madhvanath S, Magaña J-A, Pakosz G, Russell G, Selvaraj M, Seni G, Tremblay C, Yaeger L (2011) Ink markup language (InkML). World Wide Web Consortium. http://www.w3.org/TR/InkML. Accessed November 27 2012
Hickson I (2012) Server-sent Events. World Wide Web Consortium. http://www.w3.org/TR/eventsource/. Accessed November 20 2012
Hickson I (2012) The WebSocket API. The World Wide Web Consortium. http://www.w3.org/TR/websockets/. Accessed November 20 2012
Hunt A, McGlashan S (2004) W3C speech recognition grammar specification (SRGS). W3C. http://www.w3.org/TR/speech-grammar/. Accessed November 9 2012
Van Tichelen L, Burke D (2007) Semantic Interpretation for Speech Recognition. W3C. http://www.w3.org/TR/semantic-interpretation/. Accessed November 9 2012
Kliche I, Kharidi N, Wiechno P (2012) MMI interoperability test report. World Wide Web Consortium. http://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/. Accessed November 27 2012
Fette I, Melnikov A (2011) RFC 6455: The WebSocket protocol. Internet engineering task force. http://tools.ietf.org/html/rfc6455. Accessed November 20 2012
Bergkvist A, Burnett DC, Jennings C, Narayanan A (2012) WebRTC 1.0: real-time communication between browsers. World Wide Web Consortium. http://www.w3.org/TR/webrtc/. Accessed November 28 2012
Johnston M, Dahl DA, Kliche I, Baggia P, Burnett DC, Burkhardt F, Ashimura K (2009) Use cases for possible future EMMA features. World Wide Web Consortium. http://www.w3.org/TR/emma-usecases/
Wiechno P, Kharidi N, Kliche I, Rodriguez BH, Schnelle-Walka D, Dahl DA, Ashimura K (2012) Multimodal architecture and interfaces 1.0 implementation report. World Wide Web Consortium. http://www.w3.org/2002/mmi/2012/mmi-arch-ir/. Accessed November 27 2012
Openstream I (2013) Solutions. http://www.openstream.com/solutions.htm. Accessed March 15 2013
Rodriguez BH, Moissianc J-C, Demeure I (2010) Multimodal instantiation of assistance services. In: iiWAS ’10 proceedings of the 12th international conference on information integration and web-based applications & services Paris, ACM, France, pp 934–937
Pous M, Ceccaroni L (2010) Multimodal interaction in distributed and ubiquitous computing. In: Fifth international conference on internet and web applications and services (ICIW), Barcelona, Spain

Download references

Acknowledgments

The W3C Multimodal Architecture and Interfaces and EMMA specifications represent the work of many individuals who have participated in the Multimodal Interaction Working Group. In particular, I would like to acknowledge the work of the following authors of the MMI Architecture and EMMA specifications and related documents. Work of the following authors Kazuyuki Ashimura, Jim Barnett, Paolo Baggia, Michael Bodell, Daniel C. Burnett, Jerry Carter, Michael Johnston, Nagesh Kharidi, Ingmar Kliche, Jim Larson, Raj Tumuluri, Brad Porter, Dave Raggett, T. V. Raman, B. Helena Rodriguez, Muthuselvam Selvaraj, Andrew Wahbe, Piotr Wiechno, Moshe Yudkowsky. Special thanks go to Kazuyuki Ashimura, the W3C Team Contact for the Multimodal Interaction Working Group, for his guidance through the W3C process and to Jim Barnett, the Editor-in-Chief of the Multimodal Architecture and Interfaces specification.

Author information

Authors and Affiliations

Conversational Technologies, 1820 Gravers Road, Plymouth Meeting, PA, 19462, USA
Deborah A. Dahl

Authors

Deborah A. Dahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deborah A. Dahl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dahl, D.A. The W3C multimodal architecture and interfaces standard. J Multimodal User Interfaces 7, 171–182 (2013). https://doi.org/10.1007/s12193-013-0120-5

Download citation

Received: 30 November 2012
Accepted: 22 March 2013
Published: 07 April 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s12193-013-0120-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The W3C multimodal architecture and interfaces standard

Abstract

Access this article

Similar content being viewed by others

Universal design, inclusive design, accessible design, design for all: different concepts—one goal? On the concept of accessibility—historical, methodological and philosophical aspects

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

The Thing and I: Understanding the Relationship Between User and Product

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The W3C multimodal architecture and interfaces standard

Abstract

Access this article

Similar content being viewed by others

Universal design, inclusive design, accessible design, design for all: different concepts—one goal? On the concept of accessibility—historical, methodological and philosophical aspects

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

The Thing and I: Understanding the Relationship Between User and Product

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation