Skip to main content
Log in

Extending multimedia languages to support multimodal user interactions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Fig. 3
Listing 6
Listing 7
Listing 8

Similar content being viewed by others

Notes

  1. The recommendation of modalities given users’ sensory capabilities should draw on accessibility research results. Table 1 is simply an illustration that our approach makes it feasible to build MUIs taking those capabilities into consideration.

References

  1. ABNT (2008) ABNT NBR 15606-2: Televisão digital terrestre – Codificação de dados e especificações de transmissão para radiodifusão digital Parte 2: Ginga-NCL para receptores fixos e móveis – Linguagem de aplicação XML para codificação de aplicações. http://forumsbtvd.org.br/acervo-online/normas-brasileiras-de-tv-digital/. Accessed 3 Mar 2016

  2. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843. doi:10.1145/182.358434

    Article  MATH  Google Scholar 

  3. Angeluci ACB, de Albuquerque Azevedo RG, Soares LFG (2009) O uso da linguagem declarativa do Ginga-NCL na construção de conteúdos audiovisuais interativos: a experiência do “Roteiros do Dia.” 1o Simpósio Int Telev Digit SIMTVD 91

  4. Beckham JL, Fabbrizio GD, Klarlund N (2001) Towards SMIL as a foundation for multimodal, multimedia applications. In: Dalsgaard P, Lindberg B, Benner H, Tan Z-H (eds) EUROSPEECH 2001 Scand. 7th Eur Conf Speech Commun Technol ISCA, 1363–1366

  5. Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc 7th Annu Conf Comput Graph Interact Tech. ACM, New York, NY, USA, 262–270

  6. Bulterman DCA, Hardman L (2005) Structured multimedia authoring. ACM Trans Multimed Comput Commun Appl 1:89–109. doi:10.1145/1047936.1047943

    Article  Google Scholar 

  7. Bulterman DCA, Rutledge LW (2008) SMIL 3.0: Flexible multimedia for web, mobile devices and daisy talking books, 2nd ed. Springer Publishing Company, Incorporated

  8. Carvalho LAMC, Guimarães AP, Macêdo HT (2008) Architectures for interactive vocal environment to Brazilian digital TV middleware. Proc 2008 Euro Am Conf Telemat Inf Syst ACM, New York, NY, USA, 22:1–22:8

  9. Carvalho L, Macedo H (2010) Estendendo a NCL para promover interatividade vocal em Aplicações Ginga na TVDi Brasileira. WebMedia 10 Proc. 16th Braz Symp Multimed Web

  10. Costa D, Duarte C (2011) Adapting multimodal fission to user’s abilities. In: Stephanidis C (ed) Univers. Access Hum.-Comput. Interact. Des. EInclusion. Springer, Berlin Heidelberg, pp 347–356

    Chapter  Google Scholar 

  11. Coutaz J, Nigay L, Salber D, Blandford A, May J, Young RM (1995) Four easy pieces for assessing the usability of multimodal interaction: the CARE properties. In: InterAct, 115–120

  12. Dumas B, Lalanne D, Ingold R (2009) HephaisTK: a toolkit for rapid prototyping of multimodal interfaces. Proc 2009 Int Conf. Multimodal Interfaces. ACM, New York, NY, USA, 231–232

  13. Dumas B, Lalanne D, Ingold R (2010) Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML. J Multimodal User Interfaces 3:237–247. doi:10.1007/s12193-010-0043-3

    Article  Google Scholar 

  14. Dumas B, Lalanne D, Oviatt S (2009) Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Kohlas J, Lalanne D (eds) Hum. Mach. Interact. Springer, Berlin Heidelberg, pp 3–26

    Chapter  Google Scholar 

  15. Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: state of the art, perspectives, and challenges. ACM Trans Multimed Comput Commun Appl 11:17:1–17:23. doi: 10.1145/2617994

  16. Hachaj T, Ogiela MR (2014) Rule-based approach to recognizing human body poses and gestures in real time. Multimed Syst 20:81–99. doi:10.1007/s00530-013-0332-2

    Article  Google Scholar 

  17. Huang C-M, Wang C (1998) Synchronization for interactive multimedia presentations. IEEE Multimed 5:44–62. doi:10.1109/93.735868

    Article  Google Scholar 

  18. Ideum Inc (2016) Gesture markup language. http://www.gestureml.org/. Accessed 3 Mar 2016

  19. ISO/IEC (2013) ISO/IEC 23005-3:2013 Information Technology - Media Context and Control - Part 3: Sensory Information. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60391. Accessed 3 Mar 2016

  20. ISO/IEC (2014) ISO/IEC 23005-1:2014 Information technology - Media context and control - Part 1: Architecture. http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=60359. Accessed 3 Mar 2016

  21. ITU (2015) ITU Recommendation H.761: Nested context language (NCL) and Ginga-NCL for IPTV services. http://handle.itu.int/11.1002/1000/12237. Accessed 3 Mar 2016

  22. Jonathan Duddington (2016) eSpeak text to speech engine. http://espeak.sourceforge.net/. Accessed 3 Mar 2016

  23. Katsurada K, Yamada H, Nakamura Y, Kobayashi S, Nitta T (2005) XISL: A Modality-Independent MMI Description Language. In: Bühler D, Dybkjær L, Minker W (eds) Spok. Multimodal Hum.-Comput. Dialogue Mob. Environ. Springer, Netherlands, pp 133–148

    Chapter  Google Scholar 

  24. Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intell. virtual agents. Springer Berlin Heidelberg, Berlin, pp 205–217

    Chapter  Google Scholar 

  25. Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley Publishing

  26. Leap Motion Inc (2016) Leap motion controller. https://www.leapmotion.com/. Accessed 3 Mar 2016

  27. Lee Laboratory of Nagoya Institute of Technology (2016) Julius speech recognition engine. http://julius.osdn.jp/. Accessed 3 Mar 2016

  28. Meixner B, Kosch H (2012) Interactive non-linear video: definition and XML structure. Proc 2012 ACM Symp Doc Eng ACM, New York, NY, USA, 49–58

  29. Oviatt S (2007) Multimodal Interfaces. Hum-Comput Interact Handb. CRC Press, 413–432

  30. Rainer B, Timmerer C (2014) a generic utility model representing the quality of sensory experience. ACM Trans Multimed Comput Commun Appl 11:14:1–14:17. doi: 10.1145/2648429

  31. Rowe LA (2013) Looking forward 10 years to multimedia successes. ACM Trans Multimed Comput Commun Appl 9:37:1–37:7. doi: 10.1145/2490825

  32. Salt Forum Speech Application Language Tags Specification. http://www.saltforum.org. Accessed 3 Mar 2016

  33. Schnelle-Walka D, Radomski S, Mühlhäuser M (2013) JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7:183–194. doi:10.1007/s12193-013-0119-y

    Article  Google Scholar 

  34. Shneiderman B (1997) Designing the user interface: strategies for effective human-computer interaction, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  35. Soares LFG (2009) Nested context model 3.0: Part 1 – NCM Core. Monogr Comput Sci. PUC-Rio Inf MCC1805. ftp://obaluae.inf.puc-rio.br/pub/docs/techreports/05_18_soares.pdf. Accessed 3 Mar 2016

  36. Soares LFG, Lima GF (2015) NCL handbook. Monogr Comput Sci. PUC-Rio Inf MCC1813. handbook.ncl.org.br. Accessed 3 Mar 2016

  37. Soares LFG, Marcio Ferreira M, de Neto CSS, Moreno MF (2010) Ginga-NCL: declarative middleware for multimedia IPTV services. IEEE Commun Mag 48:74–81. doi:10.1109/MCOM.2010.5473867

    Article  Google Scholar 

  38. Turk M (2014) Multimodal interaction: a review. Pattern Recognit Lett 36:189–195. doi:10.1016/j.patrec.2013.07.003

    Article  Google Scholar 

  39. W3C (2001) XHTML + Voice Profile 1.0. http://www.w3.org/TR/xhtml+voice/. Accessed 3 Mar 2016

  40. W3C (2003) Multimodal interaction framework. www.w3.org/TR/mmi-framework/. Accessed 3 Mar 2016

  41. W3C (2004) Speech recognition grammar specification version 1.0. http://www.w3.org/TR/speech-grammar/. Accessed 3 Mar 2016

  42. W3C (2007) Voice Extensible Markup Language (VoiceXML) 2.1. http://www.w3.org/TR/voicexml21/. Accessed 3 Mar 2016

  43. W3C (2009) EMMA: Extensible MultiModal Annotation markup language. http://www.w3.org/TR/2009/REC-emma-20090210/. Accessed 3 Mar 2016

  44. W3C (2010) Speech Synthesis Markup Language (SSML) Version 1.1. http://www.w3.org/TR/speech-synthesis11/. Accessed 3 Mar 2016

  45. W3C (2011) Ink Markup Language (InkML). http://www.w3.org/TR/2011/REC-InkML-20110920/. Accessed 3 Mar 2016

  46. W3C (2012) Multimodal Architecture and Interfaces. http://www.w3.org/TR/mmi-arch/. Accessed 3 Mar 2016

  47. W3C (2012) State Chart XML (SCXML): State Machine Notation for Control Abstraction. http://www.w3.org/TR/scxml/. Accessed 3 Mar 2016

  48. Wang K (2002) SALT: a spoken language interface for web-based multimodal dialog systems. Proc Int Conf Spok Lang Process

Download references

Acknowledgments

First, we are strongly thankful to Prof. Luiz Fernando Gomes Soares (in memoriam) for the profound guidance and friendship, essential to this work and its authors. We also thank Carlos Salles, Marcos Roriz, and all TeleMidia Lab’s researchers, who provided thoughtful discussions on this work. Finally, we thank the Brazilian National Council of Technological and Scientific Development (CNPq – process #309828/2015-5), and the Foundation for Research of the State of Rio de Janeiro (FAPERJ) for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Álan Lívio Vasconcelos Guedes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guedes, Á.L.V., Azevedo, R.G.d.A. & Barbosa, S.D.J. Extending multimedia languages to support multimodal user interactions. Multimed Tools Appl 76, 5691–5720 (2017). https://doi.org/10.1007/s11042-016-3846-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3846-8

Keywords

Navigation