Skip to main content
Log in

JVoiceXML as a modality component in the W3C multimodal architecture

Experiences implementing the W3C standard

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Research regarding multimodal interaction led to a multitude of proposals for suitable software architectures. With all architectures describing multimodal systems differently, interoperability is severely hindered. The W3C MMI architecture is a proposed recommendation for a common architecture. In this article, we describe our experiences integrating JVoiceXML into the W3C MMI architecture and identify general limitations with regard to the available design space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. It is unfortunate and confusing that the W3C MMI framework describes concepts comparable in granularity to what is called architectures in related work, while the actual W3C MMI architecture describes only a subset.

  2. A video of the scenario is available at http://www.youtube.com/watch?v=edXjU5ZVVnM.

References

  1. Aitenbichler E, Kangasharju J, Mühlhäuser M (2007) MundoCore: A Light-weight Infrastructure for Pervasive Computing. Pervasive Mobile Comput. 332–361. doi:10.1016/j.pmcj.2007.04.002

  2. Auburn R, Baggia P, Scott M (2011) Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation. http://www.w3.org/TR/2011/REC-ccxml-20110705/

  3. Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multiModal annotation markup language, W3C Recommendation. http://www.w3.org/TR/2009/REC-emma-20090210/

  4. Bailly G (2001) Audiovisual speech synthesis. Int J Speech Technol 6:6–331

    Google Scholar 

  5. Barnett J, Akolkar R, Auburn R, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman T, Reifenrath K, Rosenthal N (2012) State chart XML (SCXML): State machine notation for control abstraction. W3C working draft, W3C. http://www.w3.org/TR/2012/WD-scxml-20120216/.

  6. Barnett J, Bodell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces, W3C Proposed Recommendation. http://www.w3.org/TR/2012/PR-mmi-arch-20120814/

  7. Bolt RA (1980) ”put-that-there”: Voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’80, ACM, New York, pp 262–270

  8. Bondell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvari M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. W3C proposed recommendation, W3C http://www.w3.org/TR/2011/PR-mmi-arch-20120814/

  9. Bulterman D, Jansen J, Cesar P, Mullender S, Hyche E, DeMeglio M, Quint J, Kawamura H, Weck D, Paeda XG, Melendi D, Cruz-Lara S, Hanclik M, Zucker DF, Michel T (2008) Synchronized Multimedia Integration Language (SMIL 3.0), W3C Recommendation. http://www.w3.org/TR/2008/REC-SMIL3-20081201/

  10. Burnett DC, Walker MR, Hunt A (2004) Speech synthesis markup language (SSML) version 1.0, W3C Recommendation. http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/

  11. Chatty S (1994) Extending a graphical toolkit for two-handed interaction. In: ACM UIST 94 Symposium on User Interface Software and Technology, ACM Press, New York, pp 195–204

  12. Cisco Systems Inc., Comverse Inc., Intel Corporation, Microsoft Corporation, Philips Electronics N.V., SpeechWorks International Inc. (2002) SALT - Speech Application Language Tags (SALT) 1.0 Specification. Specification, SALT Forum. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf

  13. Courgeon M, Jacquemin C, Martin J (2008) Marc: a multimodal affective and reactive character. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, ACM

  14. Coutaz J (1987) PAC: An object-oriented model for dialog design. In: Proceedings of INTERACT 87: The IFIP Conference on, Human Computer Interaction, pp 431–436

  15. Dahlstrm E, Dengler P, Grasso A, Lilley C, McCormack C, Schepers D, Watt J (2011) Scalable vector graphics (SVG) 1.1, 2nd edn. W3C Recommendation. http://www.w3.org/TR/2011/REC-SVG11-20110816/

  16. Dumas B (2009) Multimodal interfaces: a survey of principles, models and frameworks. Human Mach Interact 1–25. http://www.springerlink.com/index/65J39M5P56341N49.pdf

  17. Gabriel R, Sandsjö J, Shahrokni A, Fjeld M (2008) Bounceslider: actuated sliders for music performance and composition. In: Proceedings of the 2nd international conference on Tangible and embedded interaction, TEI ’08, ACM, New York, pp 127–130

  18. Harel D, Politi M (1998) Modeling reactive systems with statecharts: the statemate approach. McGraw-Hill, Inc., New York

  19. Kasten O, Miche M, Schreiber D, Hartmann M, Hadjakos A, Hugeus P, Uren V, Dadzie AS, Kantorovitch J, Vildjiounaite E, Ilkka N, Mascolo J, Luitjens S, Nikolov A (2012) Smart products-D12.1.3: rolling report on use cases and trials. http://www.smartproducts-project.eu/mainpage/publications

  20. Katsurada K, Nakamura Y, Yamada H, Nitta T (2003) Xisl: a language for describing multimodal interaction scenarios. In: Proceedings of the 5th international conference on Multimodal interfaces, ICMI ’03, ACM, New York, pp 281–284. http://doi.acm.org/10.1145/958432.958483

  21. Kawamoto SI, Shimodaira H, Nitta T, Nishimoto T, Nakamura S, Itou K, Morishima S, Yotsukura T, Kai A, Lee A et al (2003) Galatea: open-source software for developing anthropomorphic spoken dialog agents. LifeLike Characters Tools Affective Functions and Applications, pp 1–25

  22. Lalanne D, Nigay L, Palanque P, Robinson P, Vanderdonckt J, Ladry JF (2009) Fusion engines for multimodal input: a survey. ICMI-MLMI ’09. ACM, New York

  23. Larson JA, Raman T, Raggett D, Bodell M, Johnston M, Kumar S, Potter S, Waters K (2003) Multimodal interaction framework, W3C Note. http://www.w3.org/TR/2003/NOTE-mmi-framework-20030506/

  24. Martin, DL, Cheyer AJ, Moran DB (1999) The open agent architecture: A framework for building distributed software systems. Appl Artif Intell 13(1–2):91–128

    Google Scholar 

  25. Maximilien EM, Campos P (2012) Facts, trends and challenges in modern software development. Int J Agile Extrem Softw Dev 1(1/2012):1–5

    Article  Google Scholar 

  26. Maybury MT, Wahlster W (eds) (1998) Readings in intelligent user interfaces. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  27. McCarron S, Ishikawa M, Altheim M (2011) XHTML 1.1 - Module-based XHTML, 2nd edn, W3C Recommendation. http://www.w3.org/TR/2010/REC-xhtml11-20101123/

  28. McGlashan S, Burnett DC, Akolkar R, Auburn R, Baggia P, Barnett J, Bodell M, Carter J, Oshry M, Rehor K, Yang X, Young M, Hosn R (2010) Voice extensible markup language (VoiceXML) Version 3.0, W3C Working Draft. http://www.w3.org/TR/voicexml30/

  29. Micrososft (2012) Kinect. http://www.xbox.com/en-us/kinect/. Accessed 26 Aug 2012

  30. Moran DB, Cheyer AJ, Julia LE, Martin DL (1997) Multimodal user interfaces in the open agent architecture. In: Proceedings of the 1997 International Conference on Intelligent User, Interfaces, IUI97, pp 61–68

  31. Nigay L, Coutaz J (1993) A design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, CHI ’93, ACM, New York, pp 172–178

  32. Norman DA (2002) The design of everyday things, reprint paperback edn. Basic Books, New York

    Google Scholar 

  33. Oshry M, Auburn R, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice Extensible Markup Language (VoiceXML) Version 2.1, W3C Recommendation. http://www.w3.org/TR/voicexml21/

  34. Oviatt S (2003) Advances in robust multimodal interface design. IEEE Comput Graph Appl 23(5):62–68. doi:10.1109/MCG.2003.1231179

    Article  Google Scholar 

  35. Oviatt S (2003) Multimodal interfaces. In: Jacko JA, Sears A (eds) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap. multimodal interfaces, L. Erlbaum Associates Inc., Hillsdale, pp 286–304. http://portal.acm.org/citation.cfm?id=772072.772093

  36. Paternó F, Santoro C, Spano LD (2009) Maria:a universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments. ACM Trans. Comput.-Hum. Interact 16(4): 19:1–19:30

    Google Scholar 

  37. Phanouriou C (2002) Uiml: a device-independent user interface markup language. Ph.D. thesis

  38. Raisamo R (1999) Multimodal human-computer interaction : a constructive and empirical study. Ph.D. thesis, Tampere

  39. Sun Microsystems (1988) RPC: remote procedure call protocol specification: Version 2. RFC 1057 (Informational). http://www.ietf.org/rfc/rfc1057.txt

  40. Turunen M, Hakulinen J, Räihä KJ, Salonen EP, Kainulainen A, Prusi P (2005) Jaspis an architecture and applications for speech-based accessibility systems. IBM Syst J 44(3):485–504

    Google Scholar 

  41. Vilhjálmsson H, Cantelmo N, Cassell JE, Chafai N, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: Recent developments and challenges. In: Intelligent Virtual Agents, Springer, Berlin, pp 99–111

  42. Workflow Management Coalition (2005) WfMC: Process Definition Language: XPDL 2.0. Specification TC-1025, Workflow Management Coalition. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Schnelle-Walka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schnelle-Walka, D., Radomski, S. & Mühlhäuser, M. JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7, 183–194 (2013). https://doi.org/10.1007/s12193-013-0119-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-013-0119-y

Keywords

Navigation