Journal on Multimodal User Interfaces

, Volume 7, Issue 3, pp 183–194 | Cite as

JVoiceXML as a modality component in the W3C multimodal architecture

Experiences implementing the W3C standard
  • Dirk Schnelle-Walka
  • Stefan Radomski
  • Max Mühlhäuser
Original Paper


Research regarding multimodal interaction led to a multitude of proposals for suitable software architectures. With all architectures describing multimodal systems differently, interoperability is severely hindered. The W3C MMI architecture is a proposed recommendation for a common architecture. In this article, we describe our experiences integrating JVoiceXML into the W3C MMI architecture and identify general limitations with regard to the available design space.


Multimodality Software architectures Standardization 


  1. 1.
    Aitenbichler E, Kangasharju J, Mühlhäuser M (2007) MundoCore: A Light-weight Infrastructure for Pervasive Computing. Pervasive Mobile Comput. 332–361. doi: 10.1016/j.pmcj.2007.04.002
  2. 2.
    Auburn R, Baggia P, Scott M (2011) Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation.
  3. 3.
    Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multiModal annotation markup language, W3C Recommendation.
  4. 4.
    Bailly G (2001) Audiovisual speech synthesis. Int J Speech Technol 6:6–331Google Scholar
  5. 5.
    Barnett J, Akolkar R, Auburn R, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman T, Reifenrath K, Rosenthal N (2012) State chart XML (SCXML): State machine notation for control abstraction. W3C working draft, W3C.
  6. 6.
    Barnett J, Bodell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces, W3C Proposed Recommendation.
  7. 7.
    Bolt RA (1980) ”put-that-there”: Voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’80, ACM, New York, pp 262–270Google Scholar
  8. 8.
    Bondell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvari M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. W3C proposed recommendation, W3C
  9. 9.
    Bulterman D, Jansen J, Cesar P, Mullender S, Hyche E, DeMeglio M, Quint J, Kawamura H, Weck D, Paeda XG, Melendi D, Cruz-Lara S, Hanclik M, Zucker DF, Michel T (2008) Synchronized Multimedia Integration Language (SMIL 3.0), W3C Recommendation.
  10. 10.
    Burnett DC, Walker MR, Hunt A (2004) Speech synthesis markup language (SSML) version 1.0, W3C Recommendation.
  11. 11.
    Chatty S (1994) Extending a graphical toolkit for two-handed interaction. In: ACM UIST 94 Symposium on User Interface Software and Technology, ACM Press, New York, pp 195–204Google Scholar
  12. 12.
    Cisco Systems Inc., Comverse Inc., Intel Corporation, Microsoft Corporation, Philips Electronics N.V., SpeechWorks International Inc. (2002) SALT - Speech Application Language Tags (SALT) 1.0 Specification. Specification, SALT Forum.
  13. 13.
    Courgeon M, Jacquemin C, Martin J (2008) Marc: a multimodal affective and reactive character. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, ACMGoogle Scholar
  14. 14.
    Coutaz J (1987) PAC: An object-oriented model for dialog design. In: Proceedings of INTERACT 87: The IFIP Conference on, Human Computer Interaction, pp 431–436Google Scholar
  15. 15.
    Dahlstrm E, Dengler P, Grasso A, Lilley C, McCormack C, Schepers D, Watt J (2011) Scalable vector graphics (SVG) 1.1, 2nd edn. W3C Recommendation.
  16. 16.
    Dumas B (2009) Multimodal interfaces: a survey of principles, models and frameworks. Human Mach Interact 1–25.
  17. 17.
    Gabriel R, Sandsjö J, Shahrokni A, Fjeld M (2008) Bounceslider: actuated sliders for music performance and composition. In: Proceedings of the 2nd international conference on Tangible and embedded interaction, TEI ’08, ACM, New York, pp 127–130Google Scholar
  18. 18.
    Harel D, Politi M (1998) Modeling reactive systems with statecharts: the statemate approach. McGraw-Hill, Inc., New YorkGoogle Scholar
  19. 19.
    Kasten O, Miche M, Schreiber D, Hartmann M, Hadjakos A, Hugeus P, Uren V, Dadzie AS, Kantorovitch J, Vildjiounaite E, Ilkka N, Mascolo J, Luitjens S, Nikolov A (2012) Smart products-D12.1.3: rolling report on use cases and trials.
  20. 20.
    Katsurada K, Nakamura Y, Yamada H, Nitta T (2003) Xisl: a language for describing multimodal interaction scenarios. In: Proceedings of the 5th international conference on Multimodal interfaces, ICMI ’03, ACM, New York, pp 281–284.
  21. 21.
    Kawamoto SI, Shimodaira H, Nitta T, Nishimoto T, Nakamura S, Itou K, Morishima S, Yotsukura T, Kai A, Lee A et al (2003) Galatea: open-source software for developing anthropomorphic spoken dialog agents. LifeLike Characters Tools Affective Functions and Applications, pp 1–25Google Scholar
  22. 22.
    Lalanne D, Nigay L, Palanque P, Robinson P, Vanderdonckt J, Ladry JF (2009) Fusion engines for multimodal input: a survey. ICMI-MLMI ’09. ACM, New YorkGoogle Scholar
  23. 23.
    Larson JA, Raman T, Raggett D, Bodell M, Johnston M, Kumar S, Potter S, Waters K (2003) Multimodal interaction framework, W3C Note.
  24. 24.
    Martin, DL, Cheyer AJ, Moran DB (1999) The open agent architecture: A framework for building distributed software systems. Appl Artif Intell 13(1–2):91–128Google Scholar
  25. 25.
    Maximilien EM, Campos P (2012) Facts, trends and challenges in modern software development. Int J Agile Extrem Softw Dev 1(1/2012):1–5CrossRefGoogle Scholar
  26. 26.
    Maybury MT, Wahlster W (eds) (1998) Readings in intelligent user interfaces. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  27. 27.
    McCarron S, Ishikawa M, Altheim M (2011) XHTML 1.1 - Module-based XHTML, 2nd edn, W3C Recommendation.
  28. 28.
    McGlashan S, Burnett DC, Akolkar R, Auburn R, Baggia P, Barnett J, Bodell M, Carter J, Oshry M, Rehor K, Yang X, Young M, Hosn R (2010) Voice extensible markup language (VoiceXML) Version 3.0, W3C Working Draft.
  29. 29.
    Micrososft (2012) Kinect. Accessed 26 Aug 2012
  30. 30.
    Moran DB, Cheyer AJ, Julia LE, Martin DL (1997) Multimodal user interfaces in the open agent architecture. In: Proceedings of the 1997 International Conference on Intelligent User, Interfaces, IUI97, pp 61–68Google Scholar
  31. 31.
    Nigay L, Coutaz J (1993) A design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, CHI ’93, ACM, New York, pp 172–178Google Scholar
  32. 32.
    Norman DA (2002) The design of everyday things, reprint paperback edn. Basic Books, New YorkGoogle Scholar
  33. 33.
    Oshry M, Auburn R, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice Extensible Markup Language (VoiceXML) Version 2.1, W3C Recommendation.
  34. 34.
    Oviatt S (2003) Advances in robust multimodal interface design. IEEE Comput Graph Appl 23(5):62–68. doi: 10.1109/MCG.2003.1231179 CrossRefGoogle Scholar
  35. 35.
    Oviatt S (2003) Multimodal interfaces. In: Jacko JA, Sears A (eds) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap. multimodal interfaces, L. Erlbaum Associates Inc., Hillsdale, pp 286–304.
  36. 36.
    Paternó F, Santoro C, Spano LD (2009) Maria:a universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments. ACM Trans. Comput.-Hum. Interact 16(4): 19:1–19:30 Google Scholar
  37. 37.
    Phanouriou C (2002) Uiml: a device-independent user interface markup language. Ph.D. thesisGoogle Scholar
  38. 38.
    Raisamo R (1999) Multimodal human-computer interaction : a constructive and empirical study. Ph.D. thesis, TampereGoogle Scholar
  39. 39.
    Sun Microsystems (1988) RPC: remote procedure call protocol specification: Version 2. RFC 1057 (Informational).
  40. 40.
    Turunen M, Hakulinen J, Räihä KJ, Salonen EP, Kainulainen A, Prusi P (2005) Jaspis an architecture and applications for speech-based accessibility systems. IBM Syst J 44(3):485–504Google Scholar
  41. 41.
    Vilhjálmsson H, Cantelmo N, Cassell JE, Chafai N, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: Recent developments and challenges. In: Intelligent Virtual Agents, Springer, Berlin, pp 99–111Google Scholar
  42. 42.
    Workflow Management Coalition (2005) WfMC: Process Definition Language: XPDL 2.0. Specification TC-1025, Workflow Management Coalition.

Copyright information

© OpenInterface Association 2013

Authors and Affiliations

  • Dirk Schnelle-Walka
    • 1
  • Stefan Radomski
    • 1
  • Max Mühlhäuser
    • 1
  1. 1.Technische Universität DarmstadtDarmstadtGermany

Personalised recommendations