JVoiceXML as a modality component in the W3C multimodal architecture

Schnelle-Walka, Dirk; Radomski, Stefan; Mühlhäuser, Max

doi:10.1007/s12193-013-0119-y

JVoiceXML as a modality component in the W3C multimodal architecture

Experiences implementing the W3C standard

Original Paper
Published: 07 April 2013

Volume 7, pages 183–194, (2013)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Dirk Schnelle-Walka¹,
Stefan Radomski¹ &
Max Mühlhäuser¹

273 Accesses
15 Citations
1 Altmetric
1 Mention
Explore all metrics

Abstract

Research regarding multimodal interaction led to a multitude of proposals for suitable software architectures. With all architectures describing multimodal systems differently, interoperability is severely hindered. The W3C MMI architecture is a proposed recommendation for a common architecture. In this article, we describe our experiences integrating JVoiceXML into the W3C MMI architecture and identify general limitations with regard to the available design space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to the Multimodal Architecture Specification

Standard Portals for Intelligent Services

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Notes

It is unfortunate and confusing that the W3C MMI framework describes concepts comparable in granularity to what is called architectures in related work, while the actual W3C MMI architecture describes only a subset.
A video of the scenario is available at http://www.youtube.com/watch?v=edXjU5ZVVnM.

References

Aitenbichler E, Kangasharju J, Mühlhäuser M (2007) MundoCore: A Light-weight Infrastructure for Pervasive Computing. Pervasive Mobile Comput. 332–361. doi:10.1016/j.pmcj.2007.04.002
Auburn R, Baggia P, Scott M (2011) Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation. http://www.w3.org/TR/2011/REC-ccxml-20110705/
Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2009) EMMA: extensible multiModal annotation markup language, W3C Recommendation. http://www.w3.org/TR/2009/REC-emma-20090210/
Bailly G (2001) Audiovisual speech synthesis. Int J Speech Technol 6:6–331
Google Scholar
Barnett J, Akolkar R, Auburn R, Bodell M, Burnett DC, Carter J, McGlashan S, Lager T, Helbing M, Hosn R, Raman T, Reifenrath K, Rosenthal N (2012) State chart XML (SCXML): State machine notation for control abstraction. W3C working draft, W3C. http://www.w3.org/TR/2012/WD-scxml-20120216/.
Barnett J, Bodell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvaraj M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces, W3C Proposed Recommendation. http://www.w3.org/TR/2012/PR-mmi-arch-20120814/
Bolt RA (1980) ”put-that-there”: Voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’80, ACM, New York, pp 262–270
Bondell M, Dahl D, Kliche I, Larson J, Porter B, Raggett D, Raman T, Rodriguez BH, Selvari M, Tumuluri R, Wahbe A, Wiechno P, Yudkowsky M (2012) Multimodal Architecture and Interfaces. W3C proposed recommendation, W3C http://www.w3.org/TR/2011/PR-mmi-arch-20120814/
Bulterman D, Jansen J, Cesar P, Mullender S, Hyche E, DeMeglio M, Quint J, Kawamura H, Weck D, Paeda XG, Melendi D, Cruz-Lara S, Hanclik M, Zucker DF, Michel T (2008) Synchronized Multimedia Integration Language (SMIL 3.0), W3C Recommendation. http://www.w3.org/TR/2008/REC-SMIL3-20081201/
Burnett DC, Walker MR, Hunt A (2004) Speech synthesis markup language (SSML) version 1.0, W3C Recommendation. http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/
Chatty S (1994) Extending a graphical toolkit for two-handed interaction. In: ACM UIST 94 Symposium on User Interface Software and Technology, ACM Press, New York, pp 195–204
Cisco Systems Inc., Comverse Inc., Intel Corporation, Microsoft Corporation, Philips Electronics N.V., SpeechWorks International Inc. (2002) SALT - Speech Application Language Tags (SALT) 1.0 Specification. Specification, SALT Forum. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf
Courgeon M, Jacquemin C, Martin J (2008) Marc: a multimodal affective and reactive character. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, ACM
Coutaz J (1987) PAC: An object-oriented model for dialog design. In: Proceedings of INTERACT 87: The IFIP Conference on, Human Computer Interaction, pp 431–436
Dahlstrm E, Dengler P, Grasso A, Lilley C, McCormack C, Schepers D, Watt J (2011) Scalable vector graphics (SVG) 1.1, 2nd edn. W3C Recommendation. http://www.w3.org/TR/2011/REC-SVG11-20110816/
Dumas B (2009) Multimodal interfaces: a survey of principles, models and frameworks. Human Mach Interact 1–25. http://www.springerlink.com/index/65J39M5P56341N49.pdf
Gabriel R, Sandsjö J, Shahrokni A, Fjeld M (2008) Bounceslider: actuated sliders for music performance and composition. In: Proceedings of the 2nd international conference on Tangible and embedded interaction, TEI ’08, ACM, New York, pp 127–130
Harel D, Politi M (1998) Modeling reactive systems with statecharts: the statemate approach. McGraw-Hill, Inc., New York
Kasten O, Miche M, Schreiber D, Hartmann M, Hadjakos A, Hugeus P, Uren V, Dadzie AS, Kantorovitch J, Vildjiounaite E, Ilkka N, Mascolo J, Luitjens S, Nikolov A (2012) Smart products-D12.1.3: rolling report on use cases and trials. http://www.smartproducts-project.eu/mainpage/publications
Katsurada K, Nakamura Y, Yamada H, Nitta T (2003) Xisl: a language for describing multimodal interaction scenarios. In: Proceedings of the 5th international conference on Multimodal interfaces, ICMI ’03, ACM, New York, pp 281–284. http://doi.acm.org/10.1145/958432.958483
Kawamoto SI, Shimodaira H, Nitta T, Nishimoto T, Nakamura S, Itou K, Morishima S, Yotsukura T, Kai A, Lee A et al (2003) Galatea: open-source software for developing anthropomorphic spoken dialog agents. LifeLike Characters Tools Affective Functions and Applications, pp 1–25
Lalanne D, Nigay L, Palanque P, Robinson P, Vanderdonckt J, Ladry JF (2009) Fusion engines for multimodal input: a survey. ICMI-MLMI ’09. ACM, New York
Larson JA, Raman T, Raggett D, Bodell M, Johnston M, Kumar S, Potter S, Waters K (2003) Multimodal interaction framework, W3C Note. http://www.w3.org/TR/2003/NOTE-mmi-framework-20030506/
Martin, DL, Cheyer AJ, Moran DB (1999) The open agent architecture: A framework for building distributed software systems. Appl Artif Intell 13(1–2):91–128
Google Scholar
Maximilien EM, Campos P (2012) Facts, trends and challenges in modern software development. Int J Agile Extrem Softw Dev 1(1/2012):1–5
Article Google Scholar
Maybury MT, Wahlster W (eds) (1998) Readings in intelligent user interfaces. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
McCarron S, Ishikawa M, Altheim M (2011) XHTML 1.1 - Module-based XHTML, 2nd edn, W3C Recommendation. http://www.w3.org/TR/2010/REC-xhtml11-20101123/
McGlashan S, Burnett DC, Akolkar R, Auburn R, Baggia P, Barnett J, Bodell M, Carter J, Oshry M, Rehor K, Yang X, Young M, Hosn R (2010) Voice extensible markup language (VoiceXML) Version 3.0, W3C Working Draft. http://www.w3.org/TR/voicexml30/
Micrososft (2012) Kinect. http://www.xbox.com/en-us/kinect/. Accessed 26 Aug 2012
Moran DB, Cheyer AJ, Julia LE, Martin DL (1997) Multimodal user interfaces in the open agent architecture. In: Proceedings of the 1997 International Conference on Intelligent User, Interfaces, IUI97, pp 61–68
Nigay L, Coutaz J (1993) A design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, CHI ’93, ACM, New York, pp 172–178
Norman DA (2002) The design of everyday things, reprint paperback edn. Basic Books, New York
Google Scholar
Oshry M, Auburn R, Baggia P, Bodell M, Burke D, Burnett DC, Candell E, Carter J, McGlashan S, Lee A, Porter B, Rehor K (2007) Voice Extensible Markup Language (VoiceXML) Version 2.1, W3C Recommendation. http://www.w3.org/TR/voicexml21/
Oviatt S (2003) Advances in robust multimodal interface design. IEEE Comput Graph Appl 23(5):62–68. doi:10.1109/MCG.2003.1231179
Article Google Scholar
Oviatt S (2003) Multimodal interfaces. In: Jacko JA, Sears A (eds) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap. multimodal interfaces, L. Erlbaum Associates Inc., Hillsdale, pp 286–304. http://portal.acm.org/citation.cfm?id=772072.772093
Paternó F, Santoro C, Spano LD (2009) Maria:a universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments. ACM Trans. Comput.-Hum. Interact 16(4): 19:1–19:30
Google Scholar
Phanouriou C (2002) Uiml: a device-independent user interface markup language. Ph.D. thesis
Raisamo R (1999) Multimodal human-computer interaction : a constructive and empirical study. Ph.D. thesis, Tampere
Sun Microsystems (1988) RPC: remote procedure call protocol specification: Version 2. RFC 1057 (Informational). http://www.ietf.org/rfc/rfc1057.txt
Turunen M, Hakulinen J, Räihä KJ, Salonen EP, Kainulainen A, Prusi P (2005) Jaspis an architecture and applications for speech-based accessibility systems. IBM Syst J 44(3):485–504
Google Scholar
Vilhjálmsson H, Cantelmo N, Cassell JE, Chafai N, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: Recent developments and challenges. In: Intelligent Virtual Agents, Springer, Berlin, pp 99–111
Workflow Management Coalition (2005) WfMC: Process Definition Language: XPDL 2.0. Specification TC-1025, Workflow Management Coalition. http://www.wfmc.org/standards/docs/TC-1025_xpdl_2_2005-10-03.pdf

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, Hochschulstraße 10, 64289 , Darmstadt, Germany
Dirk Schnelle-Walka, Stefan Radomski & Max Mühlhäuser

Authors

Dirk Schnelle-Walka
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Radomski
View author publications
You can also search for this author in PubMed Google Scholar
Max Mühlhäuser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Schnelle-Walka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schnelle-Walka, D., Radomski, S. & Mühlhäuser, M. JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7, 183–194 (2013). https://doi.org/10.1007/s12193-013-0119-y

Download citation

Received: 29 October 2012
Accepted: 18 March 2013
Published: 07 April 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s12193-013-0119-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

JVoiceXML as a modality component in the W3C multimodal architecture

Abstract

Access this article

Similar content being viewed by others

Introduction to the Multimodal Architecture Specification

Standard Portals for Intelligent Services

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

JVoiceXML as a modality component in the W3C multimodal architecture

Abstract

Access this article

Similar content being viewed by others

Introduction to the Multimodal Architecture Specification

Standard Portals for Intelligent Services

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation