Multimedia Tools and Applications

, Volume 76, Issue 4, pp 5691–5720 | Cite as

Extending multimedia languages to support multimodal user interactions

  • Álan Lívio Vasconcelos GuedesEmail author
  • Roberto Gerson de Albuquerque Azevedo
  • Simone Diniz Junqueira Barbosa


Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.


Multimedia languages Multimodal user interactions MUI Nested context language NCL 



First, we are strongly thankful to Prof. Luiz Fernando Gomes Soares (in memoriam) for the profound guidance and friendship, essential to this work and its authors. We also thank Carlos Salles, Marcos Roriz, and all TeleMidia Lab’s researchers, who provided thoughtful discussions on this work. Finally, we thank the Brazilian National Council of Technological and Scientific Development (CNPq – process #309828/2015-5), and the Foundation for Research of the State of Rio de Janeiro (FAPERJ) for their financial support.


  1. 1.
    ABNT (2008) ABNT NBR 15606-2: Televisão digital terrestre – Codificação de dados e especificações de transmissão para radiodifusão digital Parte 2: Ginga-NCL para receptores fixos e móveis – Linguagem de aplicação XML para codificação de aplicações. Accessed 3 Mar 2016
  2. 2.
    Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843. doi: 10.1145/182.358434 CrossRefzbMATHGoogle Scholar
  3. 3.
    Angeluci ACB, de Albuquerque Azevedo RG, Soares LFG (2009) O uso da linguagem declarativa do Ginga-NCL na construção de conteúdos audiovisuais interativos: a experiência do “Roteiros do Dia.” 1o Simpósio Int Telev Digit SIMTVD 91Google Scholar
  4. 4.
    Beckham JL, Fabbrizio GD, Klarlund N (2001) Towards SMIL as a foundation for multimodal, multimedia applications. In: Dalsgaard P, Lindberg B, Benner H, Tan Z-H (eds) EUROSPEECH 2001 Scand. 7th Eur Conf Speech Commun Technol ISCA, 1363–1366Google Scholar
  5. 5.
    Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc 7th Annu Conf Comput Graph Interact Tech. ACM, New York, NY, USA, 262–270Google Scholar
  6. 6.
    Bulterman DCA, Hardman L (2005) Structured multimedia authoring. ACM Trans Multimed Comput Commun Appl 1:89–109. doi: 10.1145/1047936.1047943 CrossRefGoogle Scholar
  7. 7.
    Bulterman DCA, Rutledge LW (2008) SMIL 3.0: Flexible multimedia for web, mobile devices and daisy talking books, 2nd ed. Springer Publishing Company, IncorporatedGoogle Scholar
  8. 8.
    Carvalho LAMC, Guimarães AP, Macêdo HT (2008) Architectures for interactive vocal environment to Brazilian digital TV middleware. Proc 2008 Euro Am Conf Telemat Inf Syst ACM, New York, NY, USA, 22:1–22:8Google Scholar
  9. 9.
    Carvalho L, Macedo H (2010) Estendendo a NCL para promover interatividade vocal em Aplicações Ginga na TVDi Brasileira. WebMedia 10 Proc. 16th Braz Symp Multimed WebGoogle Scholar
  10. 10.
    Costa D, Duarte C (2011) Adapting multimodal fission to user’s abilities. In: Stephanidis C (ed) Univers. Access Hum.-Comput. Interact. Des. EInclusion. Springer, Berlin Heidelberg, pp 347–356CrossRefGoogle Scholar
  11. 11.
    Coutaz J, Nigay L, Salber D, Blandford A, May J, Young RM (1995) Four easy pieces for assessing the usability of multimodal interaction: the CARE properties. In: InterAct, 115–120Google Scholar
  12. 12.
    Dumas B, Lalanne D, Ingold R (2009) HephaisTK: a toolkit for rapid prototyping of multimodal interfaces. Proc 2009 Int Conf. Multimodal Interfaces. ACM, New York, NY, USA, 231–232Google Scholar
  13. 13.
    Dumas B, Lalanne D, Ingold R (2010) Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML. J Multimodal User Interfaces 3:237–247. doi: 10.1007/s12193-010-0043-3 CrossRefGoogle Scholar
  14. 14.
    Dumas B, Lalanne D, Oviatt S (2009) Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Kohlas J, Lalanne D (eds) Hum. Mach. Interact. Springer, Berlin Heidelberg, pp 3–26CrossRefGoogle Scholar
  15. 15.
    Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: state of the art, perspectives, and challenges. ACM Trans Multimed Comput Commun Appl 11:17:1–17:23. doi:  10.1145/2617994
  16. 16.
    Hachaj T, Ogiela MR (2014) Rule-based approach to recognizing human body poses and gestures in real time. Multimed Syst 20:81–99. doi: 10.1007/s00530-013-0332-2 CrossRefGoogle Scholar
  17. 17.
    Huang C-M, Wang C (1998) Synchronization for interactive multimedia presentations. IEEE Multimed 5:44–62. doi: 10.1109/93.735868 CrossRefGoogle Scholar
  18. 18.
    Ideum Inc (2016) Gesture markup language. Accessed 3 Mar 2016
  19. 19.
    ISO/IEC (2013) ISO/IEC 23005-3:2013 Information Technology - Media Context and Control - Part 3: Sensory Information. Accessed 3 Mar 2016
  20. 20.
    ISO/IEC (2014) ISO/IEC 23005-1:2014 Information technology - Media context and control - Part 1: Architecture. Accessed 3 Mar 2016
  21. 21.
    ITU (2015) ITU Recommendation H.761: Nested context language (NCL) and Ginga-NCL for IPTV services. Accessed 3 Mar 2016
  22. 22.
    Jonathan Duddington (2016) eSpeak text to speech engine. Accessed 3 Mar 2016
  23. 23.
    Katsurada K, Yamada H, Nakamura Y, Kobayashi S, Nitta T (2005) XISL: A Modality-Independent MMI Description Language. In: Bühler D, Dybkjær L, Minker W (eds) Spok. Multimodal Hum.-Comput. Dialogue Mob. Environ. Springer, Netherlands, pp 133–148CrossRefGoogle Scholar
  24. 24.
    Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intell. virtual agents. Springer Berlin Heidelberg, Berlin, pp 205–217CrossRefGoogle Scholar
  25. 25.
    Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley PublishingGoogle Scholar
  26. 26.
    Leap Motion Inc (2016) Leap motion controller. Accessed 3 Mar 2016
  27. 27.
    Lee Laboratory of Nagoya Institute of Technology (2016) Julius speech recognition engine. Accessed 3 Mar 2016
  28. 28.
    Meixner B, Kosch H (2012) Interactive non-linear video: definition and XML structure. Proc 2012 ACM Symp Doc Eng ACM, New York, NY, USA, 49–58Google Scholar
  29. 29.
    Oviatt S (2007) Multimodal Interfaces. Hum-Comput Interact Handb. CRC Press, 413–432Google Scholar
  30. 30.
    Rainer B, Timmerer C (2014) a generic utility model representing the quality of sensory experience. ACM Trans Multimed Comput Commun Appl 11:14:1–14:17. doi:  10.1145/2648429
  31. 31.
    Rowe LA (2013) Looking forward 10 years to multimedia successes. ACM Trans Multimed Comput Commun Appl 9:37:1–37:7. doi:  10.1145/2490825
  32. 32.
    Salt Forum Speech Application Language Tags Specification. Accessed 3 Mar 2016
  33. 33.
    Schnelle-Walka D, Radomski S, Mühlhäuser M (2013) JVoiceXML as a modality component in the W3C multimodal architecture. J Multimodal User Interfaces 7:183–194. doi: 10.1007/s12193-013-0119-y CrossRefGoogle Scholar
  34. 34.
    Shneiderman B (1997) Designing the user interface: strategies for effective human-computer interaction, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar
  35. 35.
    Soares LFG (2009) Nested context model 3.0: Part 1 – NCM Core. Monogr Comput Sci. PUC-Rio Inf MCC1805. Accessed 3 Mar 2016
  36. 36.
    Soares LFG, Lima GF (2015) NCL handbook. Monogr Comput Sci. PUC-Rio Inf MCC1813. Accessed 3 Mar 2016Google Scholar
  37. 37.
    Soares LFG, Marcio Ferreira M, de Neto CSS, Moreno MF (2010) Ginga-NCL: declarative middleware for multimedia IPTV services. IEEE Commun Mag 48:74–81. doi: 10.1109/MCOM.2010.5473867 CrossRefGoogle Scholar
  38. 38.
    Turk M (2014) Multimodal interaction: a review. Pattern Recognit Lett 36:189–195. doi: 10.1016/j.patrec.2013.07.003 CrossRefGoogle Scholar
  39. 39.
    W3C (2001) XHTML + Voice Profile 1.0. Accessed 3 Mar 2016
  40. 40.
    W3C (2003) Multimodal interaction framework. Accessed 3 Mar 2016
  41. 41.
    W3C (2004) Speech recognition grammar specification version 1.0. Accessed 3 Mar 2016
  42. 42.
    W3C (2007) Voice Extensible Markup Language (VoiceXML) 2.1. Accessed 3 Mar 2016
  43. 43.
    W3C (2009) EMMA: Extensible MultiModal Annotation markup language. Accessed 3 Mar 2016
  44. 44.
    W3C (2010) Speech Synthesis Markup Language (SSML) Version 1.1. Accessed 3 Mar 2016
  45. 45.
    W3C (2011) Ink Markup Language (InkML). Accessed 3 Mar 2016
  46. 46.
    W3C (2012) Multimodal Architecture and Interfaces. Accessed 3 Mar 2016
  47. 47.
    W3C (2012) State Chart XML (SCXML): State Machine Notation for Control Abstraction. Accessed 3 Mar 2016
  48. 48.
    Wang K (2002) SALT: a spoken language interface for web-based multimodal dialog systems. Proc Int Conf Spok Lang ProcessGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Álan Lívio Vasconcelos Guedes
    • 1
    Email author
  • Roberto Gerson de Albuquerque Azevedo
    • 1
  • Simone Diniz Junqueira Barbosa
    • 1
  1. 1.Pontifical Catholic University of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations