Abstract
Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.
Similar content being viewed by others
Notes
The recommendation of modalities given users’ sensory capabilities should draw on accessibility research results. Table 1 is simply an illustration that our approach makes it feasible to build MUIs taking those capabilities into consideration.
References
ABNT (2008) ABNT NBR 15606-2: Televisão digital terrestre – Codificação de dados e especificações de transmissão para radiodifusão digital Parte 2: Ginga-NCL para receptores fixos e móveis – Linguagem de aplicação XML para codificação de aplicações. http://forumsbtvd.org.br/acervo-online/normas-brasileiras-de-tv-digital/. Accessed 3 Mar 2016
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843. doi:10.1145/182.358434
Angeluci ACB, de Albuquerque Azevedo RG, Soares LFG (2009) O uso da linguagem declarativa do Ginga-NCL na construção de conteúdos audiovisuais interativos: a experiência do “Roteiros do Dia.” 1o Simpósio Int Telev Digit SIMTVD 91
Beckham JL, Fabbrizio GD, Klarlund N (2001) Towards SMIL as a foundation for multimodal, multimedia applications. In: Dalsgaard P, Lindberg B, Benner H, Tan Z-H (eds) EUROSPEECH 2001 Scand. 7th Eur Conf Speech Commun Technol ISCA, 1363–1366
Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc 7th Annu Conf Comput Graph Interact Tech. ACM, New York, NY, USA, 262–270
Bulterman DCA, Hardman L (2005) Structured multimedia authoring. ACM Trans Multimed Comput Commun Appl 1:89–109. doi:10.1145/1047936.1047943
Bulterman DCA, Rutledge LW (2008) SMIL 3.0: Flexible multimedia for web, mobile devices and daisy talking books, 2nd ed. Springer Publishing Company, Incorporated
Carvalho LAMC, Guimarães AP, Macêdo HT (2008) Architectures for interactive vocal environment to Brazilian digital TV middleware. Proc 2008 Euro Am Conf Telemat Inf Syst ACM, New York, NY, USA, 22:1–22:8
Carvalho L, Macedo H (2010) Estendendo a NCL para promover interatividade vocal em Aplicações Ginga na TVDi Brasileira. WebMedia 10 Proc. 16th Braz Symp Multimed Web
Costa D, Duarte C (2011) Adapting multimodal fission to user’s abilities. In: Stephanidis C (ed) Univers. Access Hum.-Comput. Interact. Des. EInclusion. Springer, Berlin Heidelberg, pp 347–356
Coutaz J, Nigay L, Salber D, Blandford A, May J, Young RM (1995) Four easy pieces for assessing the usability of multimodal interaction: the CARE properties. In: InterAct, 115–120
Dumas B, Lalanne D, Ingold R (2009) HephaisTK: a toolkit for rapid prototyping of multimodal interfaces. Proc 2009 Int Conf. Multimodal Interfaces. ACM, New York, NY, USA, 231–232
Dumas B, Lalanne D, Ingold R (2010) Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML. J Multimodal User Interfaces 3:237–247. doi:10.1007/s12193-010-0043-3
Dumas B, Lalanne D, Oviatt S (2009) Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Kohlas J, Lalanne D (eds) Hum. Mach. Interact. Springer, Berlin Heidelberg, pp 3–26
Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: state of the art, perspectives, and challenges. ACM Trans Multimed Comput Commun Appl 11:17:1–17:23. doi: 10.1145/2617994
Hachaj T, Ogiela MR (2014) Rule-based approach to recognizing human body poses and gestures in real time. Multimed Syst 20:81–99. doi:10.1007/s00530-013-0332-2
Huang C-M, Wang C (1998) Synchronization for interactive multimedia presentations. IEEE Multimed 5:44–62. doi:10.1109/93.735868
Ideum Inc (2016) Gesture markup language. http://www.gestureml.org/. Accessed 3 Mar 2016
ISO/IEC (2013) ISO/IEC 23005-3:2013 Information Technology - Media Context and Control - Part 3: Sensory Information. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60391. Accessed 3 Mar 2016
ISO/IEC (2014) ISO/IEC 23005-1:2014 Information technology - Media context and control - Part 1: Architecture. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60359. Accessed 3 Mar 2016
ITU (2015) ITU Recommendation H.761: Nested context language (NCL) and Ginga-NCL for IPTV services. http://handle.itu.int/11.1002/1000/12237. Accessed 3 Mar 2016
Jonathan Duddington (2016) eSpeak text to speech engine. http://espeak.sourceforge.net/. Accessed 3 Mar 2016
Katsurada K, Yamada H, Nakamura Y, Kobayashi S, Nitta T (2005) XISL: A Modality-Independent MMI Description Language. In: Bühler D, Dybkjær L, Minker W (eds) Spok. Multimodal Hum.-Comput. Dialogue Mob. Environ. Springer, Netherlands, pp 133–148
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intell. virtual agents. Springer Berlin Heidelberg, Berlin, pp 205–217
Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley Publishing
Leap Motion Inc (2016) Leap motion controller. https://www.leapmotion.com/. Accessed 3 Mar 2016
Lee Laboratory of Nagoya Institute of Technology (2016) Julius speech recognition engine. http://julius.osdn.jp/. Accessed 3 Mar 2016
Meixner B, Kosch H (2012) Interactive non-linear video: definition and XML structure. Proc 2012 ACM Symp Doc Eng ACM, New York, NY, USA, 49–58
Oviatt S (2007) Multimodal Interfaces. Hum-Comput Interact Handb. CRC Press, 413–432
Rainer B, Timmerer C (2014) a generic utility model representing the quality of sensory experience. ACM Trans Multimed Comput Commun Appl 11:14:1–14:17. doi: 10.1145/2648429
Rowe LA (2013) Looking forward 10 years to multimedia successes. ACM Trans Multimed Comput Commun Appl 9:37:1–37:7. doi: 10.1145/2490825
Salt Forum Speech Application Language Tags Specification. http://www.saltforum.org. Accessed 3 Mar 2016
Schnelle-Walka D, Radomski S, Mühlhäuser M (2013) JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7:183–194. doi:10.1007/s12193-013-0119-y
Shneiderman B (1997) Designing the user interface: strategies for effective human-computer interaction, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Soares LFG (2009) Nested context model 3.0: Part 1 – NCM Core. Monogr Comput Sci. PUC-Rio Inf MCC1805. ftp://obaluae.inf.puc-rio.br/pub/docs/techreports/05_18_soares.pdf. Accessed 3 Mar 2016
Soares LFG, Lima GF (2015) NCL handbook. Monogr Comput Sci. PUC-Rio Inf MCC1813. handbook.ncl.org.br. Accessed 3 Mar 2016
Soares LFG, Marcio Ferreira M, de Neto CSS, Moreno MF (2010) Ginga-NCL: declarative middleware for multimedia IPTV services. IEEE Commun Mag 48:74–81. doi:10.1109/MCOM.2010.5473867
Turk M (2014) Multimodal interaction: a review. Pattern Recognit Lett 36:189–195. doi:10.1016/j.patrec.2013.07.003
W3C (2001) XHTML + Voice Profile 1.0. http://www.w3.org/TR/xhtml+voice/. Accessed 3 Mar 2016
W3C (2003) Multimodal interaction framework. www.w3.org/TR/mmi-framework/. Accessed 3 Mar 2016
W3C (2004) Speech recognition grammar specification version 1.0. http://www.w3.org/TR/speech-grammar/. Accessed 3 Mar 2016
W3C (2007) Voice Extensible Markup Language (VoiceXML) 2.1. http://www.w3.org/TR/voicexml21/. Accessed 3 Mar 2016
W3C (2009) EMMA: Extensible MultiModal Annotation markup language. http://www.w3.org/TR/2009/REC-emma-20090210/. Accessed 3 Mar 2016
W3C (2010) Speech Synthesis Markup Language (SSML) Version 1.1. http://www.w3.org/TR/speech-synthesis11/. Accessed 3 Mar 2016
W3C (2011) Ink Markup Language (InkML). http://www.w3.org/TR/2011/REC-InkML-20110920/. Accessed 3 Mar 2016
W3C (2012) Multimodal Architecture and Interfaces. http://www.w3.org/TR/mmi-arch/. Accessed 3 Mar 2016
W3C (2012) State Chart XML (SCXML): State Machine Notation for Control Abstraction. http://www.w3.org/TR/scxml/. Accessed 3 Mar 2016
Wang K (2002) SALT: a spoken language interface for web-based multimodal dialog systems. Proc Int Conf Spok Lang Process
Acknowledgments
First, we are strongly thankful to Prof. Luiz Fernando Gomes Soares (in memoriam) for the profound guidance and friendship, essential to this work and its authors. We also thank Carlos Salles, Marcos Roriz, and all TeleMidia Lab’s researchers, who provided thoughtful discussions on this work. Finally, we thank the Brazilian National Council of Technological and Scientific Development (CNPq – process #309828/2015-5), and the Foundation for Research of the State of Rio de Janeiro (FAPERJ) for their financial support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guedes, Á.L.V., Azevedo, R.G.d.A. & Barbosa, S.D.J. Extending multimedia languages to support multimodal user interactions. Multimed Tools Appl 76, 5691–5720 (2017). https://doi.org/10.1007/s11042-016-3846-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3846-8