Skip to main content
Log in

The W3C multimodal architecture and interfaces standard

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

This paper describes the World Wide Web Consortium’s (W3C) Multimodal Architecture and Interfaces (MMI Architecture) standard, an architecture and communications protocol that enables a wide variety of independent modalities to be integrated into multimodal applications. By encapsulating the functionalities of modality components and requiring all control information to go through the Interaction Manager, the MMI Architecture simplifies integrating components from multiple sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. For example, the StartRequest event might be mapped to a “startListening” method used by a modality-specific API.

  2. In this example, we assume that the speech recognition component provides an interpretation of the input, in addition to the literal tokens of input, to allow for the user to express this request in other words, such as “Tell me about today’s weather”, or even “Will I need my umbrella?” However, the architecture supports interpreting the user’s input with a separate natural language understanding MC.

References

  1. Turing A (1950) Computing machinery and intelligence. Mind 59:433–460

    Article  MathSciNet  Google Scholar 

  2. Johnston M, Bangalore S, Vasireddy G, Stent A, Ehlen P, Walker M, Whittaker S, Maloor P (2001) MATCH: an architecture for multimodal dialogue systems. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for, Computational Linguistics, Philadelphia, pp 376–383

  3. Bayer S (2005) Building a standards and research community with the galaxy communicator software infrastructure. In: Dahl DA (ed) Practical spoken dialog systems, vol 26. TextSpeech and Language Technology. Kluwer Academic Publishers, Dordrecht, pp 166–196

  4. Oviatt SL (1999) Ten myths of multimodal interaction. Commun ACM 42:74–81

    Article  Google Scholar 

  5. Seneff S, Lau R, Polifroni J (1999) Organization, communication, and control in the Galaxy-II Conversational System. In: Proceedings of Eurospeech 1999, Budapest

  6. Barnett J, Bodell M, Dahl DA, Kliche I, Larson J, Porter B, Raggett D, Raman TV, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. World Wide Web Consortium. http://www.w3.org/TR/mmi-arch/. Accessed November 20 2012

  7. Barnett J, Akolkar R, Auburn RJ, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman TV, Reifenrath K, Rosenthal Na (2012) State chart XML (SCXML): state machine notation for control abstraction. World Wide Web Consortium. http://www.w3.org/TR/scxml/. Accessed November 20 2012

  8. McGlashan S, Burnett DC, Carter J, Danielsen P, Ferrans J, Hunt A, Lucas B, Porter B, Rehor K, Tryphonas S (2004) Voice Extensible Markup Language (VoiceXML 2.0). W3C. http://www.w3.org/TR/voicexml20/. Accessed November 9 2012

  9. Kopp S, Krenn B, Marsella S, Marshall A, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: The behavior markup language. In: International conference on intelligent virtual agents, Marina del Rey, California

  10. Heylen D, Kopp S, Marsella S, Pelachaud C, Vilhjalmsson H (2008) The next step towards a functional markup language. Paper presented at the Proceeding of Intelligent Virtual Agents (IVA 2008), Tokyo

  11. Scherer S, Marsella S, Stratou G, Xu Y, Morbini F, Egan A, Rizzo A, Morency L-P (2012) Perception markup language: towards a standardized representation of perceived nonverbal behaviors. In: Nakano Y, Neff M, Paiva A, Walker M (eds) Intelligent virtual agents, vol 7502. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 455–463. doi:10.1007/978-3-642-33197-8_47

  12. Araki M, Tachibana K (2006) Multimodal dialog description language for Rapid system development. In: 7th SIGdial workshop on discourse and dialogue, Sydney

  13. Rodriguez BH, Wiechno P, Dahl DA, Ashimura K, Tumuluri R (2012) Registration & discovery of multimodal modality components in multimodal systems: use cases and requirements. World Wide Web Consortium. http://www.w3.org/TR/mmi-discovery/. Accessed November 26 2012

  14. Johnston M, Baggia P, Burnett D, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multimodal annotation markup language. W3C. http://www.w3.org/TR/emma/. Accessed November 9 2012

  15. Bray T, Jean Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2004) Extensible Markup Language (XML) 1.0 (Third Edition). World Wide Web Consortium. http://www.w3.org/TR/2004/REC-xml-20040204/. Accessed November 9 2012

  16. Burnett DC, Walker MR, Hunt A (2004) W3C speech synthesis markup language (SSML). W3C. http://www.w3.org/TR/speech-synthesis/

  17. Oshry M, Auburn RJ, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice extensible markup language (VoiceXML) 2.1. http://www.w3.org/TR/voicexml21/. Accessed November 9 2012

  18. Popescu A (2012) Geolocation API specification. World Wide Web Consortium. http://www.w3.org/TR/geolocation-API/. Accessed November 27 2012

  19. Kostiainen A, Oksanen I, Hazaël-Massieux D (2012) HTML media capture. World Wide Web Consortium. http://www.w3.org/TR/capture-api/. Accessed November 27 2012

  20. Microsoft (2007) Microsoft speech API 5.3 (SAPI). http://msdn2.microsoft.com/en-us/library/ms723627.aspx

  21. Java Speech API (1998) Sun microsystems. http://java.sun.com/products/java-media/speech/

  22. SALT Forum (2002) Speech application language tags (SALT). http://www.saltforum.org

  23. IBM (2003) X+V 1.1. http://www-3.ibm.com/software/pervasive/multimodal/x+v/11/spec.htm

  24. Bodell M, Bringert B, Brown R, Burnett DC, Dahl DA, Druta D, Ehlen P, Hemphill C, Johnston M, Pettay O, Sampath S, Schröder M, Shires G, Tumuluri R, Young M (2011) HTML speech incubator group final report. World Wide Web Consortium. http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/ . Accessed November 27 2012

  25. Kliche I, Dahl DA, Larson JA, Rodriguez BH, Selvaraj M (2011) Best practices for creating MMI modality components. World Wide Web Consortium. http://www.w3.org/TR/2011/NOTE-mmi-mcbp-20110301/. Accessed November 20 2012

  26. Watt SM, Underhill T, Chee Y-M, Franke K, Froumentin M, Madhvanath S, Magaña J-A, Pakosz G, Russell G, Selvaraj M, Seni G, Tremblay C, Yaeger L (2011) Ink markup language (InkML). World Wide Web Consortium. http://www.w3.org/TR/InkML. Accessed November 27 2012

  27. Hickson I (2012) Server-sent Events. World Wide Web Consortium. http://www.w3.org/TR/eventsource/. Accessed November 20 2012

  28. Hickson I (2012) The WebSocket API. The World Wide Web Consortium. http://www.w3.org/TR/websockets/. Accessed November 20 2012

  29. Hunt A, McGlashan S (2004) W3C speech recognition grammar specification (SRGS). W3C. http://www.w3.org/TR/speech-grammar/. Accessed November 9 2012

  30. Van Tichelen L, Burke D (2007) Semantic Interpretation for Speech Recognition. W3C. http://www.w3.org/TR/semantic-interpretation/. Accessed November 9 2012

  31. Kliche I, Kharidi N, Wiechno P (2012) MMI interoperability test report. World Wide Web Consortium. http://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/. Accessed November 27 2012

  32. Fette I, Melnikov A (2011) RFC 6455: The WebSocket protocol. Internet engineering task force. http://tools.ietf.org/html/rfc6455. Accessed November 20 2012

  33. Bergkvist A, Burnett DC, Jennings C, Narayanan A (2012) WebRTC 1.0: real-time communication between browsers. World Wide Web Consortium. http://www.w3.org/TR/webrtc/. Accessed November 28 2012

  34. Johnston M, Dahl DA, Kliche I, Baggia P, Burnett DC, Burkhardt F, Ashimura K (2009) Use cases for possible future EMMA features. World Wide Web Consortium. http://www.w3.org/TR/emma-usecases/

  35. Wiechno P, Kharidi N, Kliche I, Rodriguez BH, Schnelle-Walka D, Dahl DA, Ashimura K (2012) Multimodal architecture and interfaces 1.0 implementation report. World Wide Web Consortium. http://www.w3.org/2002/mmi/2012/mmi-arch-ir/. Accessed November 27 2012

  36. Openstream I (2013) Solutions. http://www.openstream.com/solutions.htm. Accessed March 15 2013

  37. Rodriguez BH, Moissianc J-C, Demeure I (2010) Multimodal instantiation of assistance services. In: iiWAS ’10 proceedings of the 12th international conference on information integration and web-based applications & services Paris, ACM, France, pp 934–937

  38. Pous M, Ceccaroni L (2010) Multimodal interaction in distributed and ubiquitous computing. In: Fifth international conference on internet and web applications and services (ICIW), Barcelona, Spain

Download references

Acknowledgments

The W3C Multimodal Architecture and Interfaces and EMMA specifications represent the work of many individuals who have participated in the Multimodal Interaction Working Group. In particular, I would like to acknowledge the work of the following authors of the MMI Architecture and EMMA specifications and related documents. Work of the following authors Kazuyuki Ashimura, Jim Barnett, Paolo Baggia, Michael Bodell, Daniel C. Burnett, Jerry Carter, Michael Johnston, Nagesh Kharidi, Ingmar Kliche, Jim Larson, Raj Tumuluri, Brad Porter, Dave Raggett, T. V. Raman, B. Helena Rodriguez, Muthuselvam Selvaraj, Andrew Wahbe, Piotr Wiechno, Moshe Yudkowsky. Special thanks go to Kazuyuki Ashimura, the W3C Team Contact for the Multimodal Interaction Working Group, for his guidance through the W3C process and to Jim Barnett, the Editor-in-Chief of the Multimodal Architecture and Interfaces specification.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deborah A. Dahl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dahl, D.A. The W3C multimodal architecture and interfaces standard. J Multimodal User Interfaces 7, 171–182 (2013). https://doi.org/10.1007/s12193-013-0120-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-013-0120-5

Keywords

Navigation