Extending multimedia languages to support multimodal user interactions

Guedes, Álan Lívio Vasconcelos; Azevedo, Roberto Gerson de Albuquerque; Barbosa, Simone Diniz Junqueira

doi:10.1007/s11042-016-3846-8

Extending multimedia languages to support multimodal user interactions

Published: 01 October 2016

Volume 76, pages 5691–5720, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Álan Lívio Vasconcelos Guedes¹,
Roberto Gerson de Albuquerque Azevedo¹ &
Simone Diniz Junqueira Barbosa¹

560 Accesses
9 Citations
Explore all metrics

Abstract

Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

MVIC – An MVC Extension for Interactive, Multimodal Applications

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Extensible Multimodal Annotation for Intelligent Interactive Systems

Notes

The recommendation of modalities given users’ sensory capabilities should draw on accessibility research results. Table 1 is simply an illustration that our approach makes it feasible to build MUIs taking those capabilities into consideration.

References

ABNT (2008) ABNT NBR 15606-2: Televisão digital terrestre – Codificação de dados e especificações de transmissão para radiodifusão digital Parte 2: Ginga-NCL para receptores fixos e móveis – Linguagem de aplicação XML para codificação de aplicações. http://forumsbtvd.org.br/acervo-online/normas-brasileiras-de-tv-digital/. Accessed 3 Mar 2016
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843. doi:10.1145/182.358434
Article MATH Google Scholar
Angeluci ACB, de Albuquerque Azevedo RG, Soares LFG (2009) O uso da linguagem declarativa do Ginga-NCL na construção de conteúdos audiovisuais interativos: a experiência do “Roteiros do Dia.” 1^o Simpósio Int Telev Digit SIMTVD 91
Beckham JL, Fabbrizio GD, Klarlund N (2001) Towards SMIL as a foundation for multimodal, multimedia applications. In: Dalsgaard P, Lindberg B, Benner H, Tan Z-H (eds) EUROSPEECH 2001 Scand. 7th Eur Conf Speech Commun Technol ISCA, 1363–1366
Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc 7th Annu Conf Comput Graph Interact Tech. ACM, New York, NY, USA, 262–270
Bulterman DCA, Hardman L (2005) Structured multimedia authoring. ACM Trans Multimed Comput Commun Appl 1:89–109. doi:10.1145/1047936.1047943
Article Google Scholar
Bulterman DCA, Rutledge LW (2008) SMIL 3.0: Flexible multimedia for web, mobile devices and daisy talking books, 2nd ed. Springer Publishing Company, Incorporated
Carvalho LAMC, Guimarães AP, Macêdo HT (2008) Architectures for interactive vocal environment to Brazilian digital TV middleware. Proc 2008 Euro Am Conf Telemat Inf Syst ACM, New York, NY, USA, 22:1–22:8
Carvalho L, Macedo H (2010) Estendendo a NCL para promover interatividade vocal em Aplicações Ginga na TVDi Brasileira. WebMedia 10 Proc. 16th Braz Symp Multimed Web
Costa D, Duarte C (2011) Adapting multimodal fission to user’s abilities. In: Stephanidis C (ed) Univers. Access Hum.-Comput. Interact. Des. EInclusion. Springer, Berlin Heidelberg, pp 347–356
Chapter Google Scholar
Coutaz J, Nigay L, Salber D, Blandford A, May J, Young RM (1995) Four easy pieces for assessing the usability of multimodal interaction: the CARE properties. In: InterAct, 115–120
Dumas B, Lalanne D, Ingold R (2009) HephaisTK: a toolkit for rapid prototyping of multimodal interfaces. Proc 2009 Int Conf. Multimodal Interfaces. ACM, New York, NY, USA, 231–232
Dumas B, Lalanne D, Ingold R (2010) Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML. J Multimodal User Interfaces 3:237–247. doi:10.1007/s12193-010-0043-3
Article Google Scholar
Dumas B, Lalanne D, Oviatt S (2009) Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Kohlas J, Lalanne D (eds) Hum. Mach. Interact. Springer, Berlin Heidelberg, pp 3–26
Chapter Google Scholar
Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: state of the art, perspectives, and challenges. ACM Trans Multimed Comput Commun Appl 11:17:1–17:23. doi: 10.1145/2617994
Hachaj T, Ogiela MR (2014) Rule-based approach to recognizing human body poses and gestures in real time. Multimed Syst 20:81–99. doi:10.1007/s00530-013-0332-2
Article Google Scholar
Huang C-M, Wang C (1998) Synchronization for interactive multimedia presentations. IEEE Multimed 5:44–62. doi:10.1109/93.735868
Article Google Scholar
Ideum Inc (2016) Gesture markup language. http://www.gestureml.org/. Accessed 3 Mar 2016
ISO/IEC (2013) ISO/IEC 23005-3:2013 Information Technology - Media Context and Control - Part 3: Sensory Information. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60391. Accessed 3 Mar 2016
ISO/IEC (2014) ISO/IEC 23005-1:2014 Information technology - Media context and control - Part 1: Architecture. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60359. Accessed 3 Mar 2016
ITU (2015) ITU Recommendation H.761: Nested context language (NCL) and Ginga-NCL for IPTV services. http://handle.itu.int/11.1002/1000/12237. Accessed 3 Mar 2016
Jonathan Duddington (2016) eSpeak text to speech engine. http://espeak.sourceforge.net/. Accessed 3 Mar 2016
Katsurada K, Yamada H, Nakamura Y, Kobayashi S, Nitta T (2005) XISL: A Modality-Independent MMI Description Language. In: Bühler D, Dybkjær L, Minker W (eds) Spok. Multimodal Hum.-Comput. Dialogue Mob. Environ. Springer, Netherlands, pp 133–148
Chapter Google Scholar
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intell. virtual agents. Springer Berlin Heidelberg, Berlin, pp 205–217
Chapter Google Scholar
Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley Publishing
Leap Motion Inc (2016) Leap motion controller. https://www.leapmotion.com/. Accessed 3 Mar 2016
Lee Laboratory of Nagoya Institute of Technology (2016) Julius speech recognition engine. http://julius.osdn.jp/. Accessed 3 Mar 2016
Meixner B, Kosch H (2012) Interactive non-linear video: definition and XML structure. Proc 2012 ACM Symp Doc Eng ACM, New York, NY, USA, 49–58
Oviatt S (2007) Multimodal Interfaces. Hum-Comput Interact Handb. CRC Press, 413–432
Rainer B, Timmerer C (2014) a generic utility model representing the quality of sensory experience. ACM Trans Multimed Comput Commun Appl 11:14:1–14:17. doi: 10.1145/2648429
Rowe LA (2013) Looking forward 10 years to multimedia successes. ACM Trans Multimed Comput Commun Appl 9:37:1–37:7. doi: 10.1145/2490825
Salt Forum Speech Application Language Tags Specification. http://www.saltforum.org. Accessed 3 Mar 2016
Schnelle-Walka D, Radomski S, Mühlhäuser M (2013) JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7:183–194. doi:10.1007/s12193-013-0119-y
Article Google Scholar
Shneiderman B (1997) Designing the user interface: strategies for effective human-computer interaction, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Google Scholar
Soares LFG (2009) Nested context model 3.0: Part 1 – NCM Core. Monogr Comput Sci. PUC-Rio Inf MCC1805. ftp://obaluae.inf.puc-rio.br/pub/docs/techreports/05_18_soares.pdf. Accessed 3 Mar 2016
Soares LFG, Lima GF (2015) NCL handbook. Monogr Comput Sci. PUC-Rio Inf MCC1813. handbook.ncl.org.br. Accessed 3 Mar 2016
Soares LFG, Marcio Ferreira M, de Neto CSS, Moreno MF (2010) Ginga-NCL: declarative middleware for multimedia IPTV services. IEEE Commun Mag 48:74–81. doi:10.1109/MCOM.2010.5473867
Article Google Scholar
Turk M (2014) Multimodal interaction: a review. Pattern Recognit Lett 36:189–195. doi:10.1016/j.patrec.2013.07.003
Article Google Scholar
W3C (2001) XHTML + Voice Profile 1.0. http://www.w3.org/TR/xhtml+voice/. Accessed 3 Mar 2016
W3C (2003) Multimodal interaction framework. www.w3.org/TR/mmi-framework/. Accessed 3 Mar 2016
W3C (2004) Speech recognition grammar specification version 1.0. http://www.w3.org/TR/speech-grammar/. Accessed 3 Mar 2016
W3C (2007) Voice Extensible Markup Language (VoiceXML) 2.1. http://www.w3.org/TR/voicexml21/. Accessed 3 Mar 2016
W3C (2009) EMMA: Extensible MultiModal Annotation markup language. http://www.w3.org/TR/2009/REC-emma-20090210/. Accessed 3 Mar 2016
W3C (2010) Speech Synthesis Markup Language (SSML) Version 1.1. http://www.w3.org/TR/speech-synthesis11/. Accessed 3 Mar 2016
W3C (2011) Ink Markup Language (InkML). http://www.w3.org/TR/2011/REC-InkML-20110920/. Accessed 3 Mar 2016
W3C (2012) Multimodal Architecture and Interfaces. http://www.w3.org/TR/mmi-arch/. Accessed 3 Mar 2016
W3C (2012) State Chart XML (SCXML): State Machine Notation for Control Abstraction. http://www.w3.org/TR/scxml/. Accessed 3 Mar 2016
Wang K (2002) SALT: a spoken language interface for web-based multimodal dialog systems. Proc Int Conf Spok Lang Process

Download references

Acknowledgments

First, we are strongly thankful to Prof. Luiz Fernando Gomes Soares (in memoriam) for the profound guidance and friendship, essential to this work and its authors. We also thank Carlos Salles, Marcos Roriz, and all TeleMidia Lab’s researchers, who provided thoughtful discussions on this work. Finally, we thank the Brazilian National Council of Technological and Scientific Development (CNPq – process #309828/2015-5), and the Foundation for Research of the State of Rio de Janeiro (FAPERJ) for their financial support.

Author information

Authors and Affiliations

Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
Álan Lívio Vasconcelos Guedes, Roberto Gerson de Albuquerque Azevedo & Simone Diniz Junqueira Barbosa

Authors

Álan Lívio Vasconcelos Guedes
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Gerson de Albuquerque Azevedo
View author publications
You can also search for this author in PubMed Google Scholar
Simone Diniz Junqueira Barbosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Álan Lívio Vasconcelos Guedes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guedes, Á.L.V., Azevedo, R.G.d.A. & Barbosa, S.D.J. Extending multimedia languages to support multimodal user interactions. Multimed Tools Appl 76, 5691–5720 (2017). https://doi.org/10.1007/s11042-016-3846-8

Download citation

Received: 04 January 2016
Revised: 27 June 2016
Accepted: 08 August 2016
Published: 01 October 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3846-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Extending multimedia languages to support multimodal user interactions

Abstract

Access this article

Similar content being viewed by others

MVIC – An MVC Extension for Interactive, Multimodal Applications

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Extensible Multimodal Annotation for Intelligent Interactive Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extending multimedia languages to support multimodal user interactions

Abstract

Access this article

Similar content being viewed by others

MVIC – An MVC Extension for Interactive, Multimodal Applications

Developing Portable Context-Aware Multimodal Applications for Connected Devices Using the W3C Multimodal Architecture

Extensible Multimodal Annotation for Intelligent Interactive Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation