Multimodal Fusion and Fission within the W3C MMI Architectural Pattern

  • Dirk Schnelle-WalkaEmail author
  • Carlos Duarte
  • Stefan Radomski


The current W3C recommendation for multimodal interfaces provides a standard for the message exchange and overall structure of modality components in multimodal applications. However, the details for multimodal fusion to combine inputs coming from modality components and for multimodal fission to prepare multimodal presentations are left unspecified. This chapter provides a first analysis of possible integrations for several approaches for fusion and fission and their implications with regard to the standard.


Smart Home Modality Component Interaction Manager Feature Fusion Multimodal Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bateman, J., Kleinz, J., Kamps, T., & Reichenberger, K. (2001). Towards constructive text, diagram, and layout generation for information presentation. Computational Linguistics, 27(3), 409–449. doi: 10.1162/089120101317066131.
  2. 2.
    Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B., Raggett, D., et al. (2012). Multimodal architecture and interfaces. W3C Recommendation, W3C.
  3. 3.
    Duarte, C. (2008). Design and Evaluation of Adaptive Multimodal System. Ph.D. thesis, University of Lisbon.Google Scholar
  4. 4.
    Duarte, C., Costa, D., Feiteira, P., & Costa, D. (2013). Building an adaptive multimodal framework for resource constrained systems. In P. Biswas, C. Duarte, P. Langdon, L. Almeida, & C. Jung (Eds.), A multimodal end-2-end approach to accessible computing. Human–computer interaction series (pp. 155–173). London: Springer. doi: 10.1007/978-1-4471-5082-4_8.
  5. 5.
    Dumas, B., Lalanne, D., & Oviatt, S. (2009). Multimodal interfaces: A survey of principles, models and frameworks. In Human machine interaction (pp. 3–26). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-00437-7_1.
  6. 6.
    Fasciano, M., & Lapalme, G. (2000). Intentions in the coordinated generation of graphics and text from tabular data. Knowledge and Information Systems, 2(3), 310–339. doi: 10.1007/PL00011645.
  7. 7.
    Feiner, S. K., & McKeown, K. R. (1993). Automating the generation of coordinated multimedia explanations. In Intelligent multimedia interfaces (pp. 117–138). Menlo Park, CA: American Association for Artificial Intelligence. Google Scholar
  8. 8.
    Goubran, R. A., & Wood, C. (1996). Building an application framework for speech and pen input integration in multimodal learning interfaces. In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. On Conference Proceedings, 1996 I.E. International Conference - Volume 06, ICASSP ’96 (pp. 3545–3548). Washington, DC: IEEE Computer Society. doi: 10.1109/ICASSP.1996.550794.
  9. 9.
    Hall, D., & Llinas, J. (2001). Multisensor data fusion. In Handbook of multisensor data fusion (pp. 1–10). Boca Raton: CRC Press.Google Scholar
  10. 10.
    Han, Y., & Zukerman, I. (1997). A mechanism for multimodal presentation planning based on agent cooperation and negotiation. International Journal of Human-Computer Interaction, 12(1), 187–226. doi: 10.1207/s15327051hci1201&2_6.
  11. 11.
    Johnston, M., Cohen, P. R., McGee, D., Oviatt, S. L., Pittman, J. A., & Smith, I. (1997). Unification-based multimodal integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, ACL ’98 (pp. 281–288). Stroudsburg, PA: Association for Computational Linguistics. doi: 10.3115/976909.979653.
  12. 12.
    Larson, J. A., Raggett, D., & Raman, T. V. (2003). W3C multimodal interaction framework. W3C Note, W3C.
  13. 13.
    Larsson, S. (2002). Issue-based Dialogue Management. Ph.D. thesis, University of Gothenburg.Google Scholar
  14. 14.
    Oviatt, S. (2003). Multimodal information fusion. In The human–computer interaction handbook: Fundamentals, evolving technologies and emerging application (pp. 286–304). Hillsdale: L. Erlbaum Associates Inc.Google Scholar
  15. 15.
    Pitsikalis, V., Katsamanis, A., & Papandreou, G. (2009). Adaptive multimodal fusion by uncertainty compensation. In IEEE Transactions on Audio, Speech, and Language Processing Google Scholar
  16. 16.
    Poh, N., Bourlai, T., & Kittler, J. (2010). Multimodal information fusion. In Multimodal signal processing theory and applications for human computer interaction (p. 153). London: Academic.Google Scholar
  17. 17.
    Potamianos, G., Huang, J., Marcheret, E., Libal, V., Balchandran, R., Epstein, M., et al. (2008). Far-field multimodal speech processing and conversational interaction in smart spaces. In Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008 (pp. 119–123). doi: 10.1109/HSCMA.2008.4538701
  18. 18.
    Radomski, S., Schnelle-Walka, D., & Radeck-Arneth, S. (2013). A prolog datamodel for state chart XML. In Proceedings of the SIGDIAL 2013 Conference (pp. 127–131)Google Scholar
  19. 19.
    Reithinger, N., Alexandersson, J., Becker, T., Blocher, A., Engel, R., Löckelt, M., et al. (2003). Adaptive and flexible multimodal access to multiple applications. In Proceedings of the 5th International Conference on Multimodal Interfaces, ICMI ’03 (pp. 101–108). New York, NY: ACM. doi: 10.1145/958432.958454.
  20. 20.
    Reithinger, N., Fedeler, D., Kumar, A., Lauer, C., Pecourt, E., & Romary, L. (2005). Miamm - A multimodal dialogue system using haptics. In J. van Kuppevelt, L. Dybkjã, & N. Bernsen (Eds.), Advances in natural multimodal dialogue systems. Text, speech and language technology (Vol. 30, pp. 307–332). Netherlands: Springer. doi: 10.1007/1-4020-3933-6_14.
  21. 21.
    Rousseau, C., Bellik, Y., Vernier, F., & Bazalgette, D. (2006). A framework for the intelligent multimodal presentation of information. Signal Processing, 86(12), 3696–3713. doi: 10.1016/j.sigpro.2006.02.041.
  22. 22.
    Sanderson, C., & Paliwal, K. K. (2004). Information fusion and person verification using speech & face information. Digital Signal Processing, 14(5), 449–480. doi: 10.1016/j.dsp.2004.05.001.
  23. 23.
    Schnelle-Walka, D., Radeck-Arnet, S., & Striebinger, J. (2015). Multimodal dialogmanagement in a smart home context with SCXML. In Proceedings of the 2nd Workshop on Engineering Interactive Systems with SCXML.Google Scholar
  24. 24.
    Schnelle-Walka, D., Radomski, S., & Mühlhäuser, M. (2013). JVoiceXML as a modality component in the W3C multimodal architecture. Journal on Multimodal User Interfaces, 7(3), 183–194. doi: 10.1007/s12193-013-0119-y.
  25. 25.
    Schnelle-Walka, D., Radomski, S., & Mühlhäuser, M. (2014). Multimodal fusion and fission within W3C standards for nonverbal communication with blind persons. In K. Miesenberger, D. Fels, D. Archambault, P. Peňáz, & W. Zagler (Eds.), Computers helping people with special needs. Lecture notes in computer science (Vol. 8547, pp. 209–213). Paris, Cham: Springer International Publishing. doi: 10.1007/978-3-319-08596-8_33.
  26. 26.
    Sharma, R., Pavlovic, V., & Huang, T. (1998). Toward multimodal human–computer interface. Proceedings of the IEEE, 86(5), 853–869. doi: 10.1109/5.664275.CrossRefGoogle Scholar
  27. 27.
    Wu, L., Oviatt, S. L., & Cohen, P. R. (2002). From members to teams to committee—A robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks, 13(4), 972–982. doi: 10.1109/TNN.2002.1021897.

Copyright information

© Springer International Publishing Switzerland 2017

Authors and Affiliations

  • Dirk Schnelle-Walka
    • 1
    Email author
  • Carlos Duarte
    • 2
  • Stefan Radomski
    • 3
  1. 1.Harman InternationalConnected Car DivisionStuttgartGermany
  2. 2.LaSIGE, Faculdade de CiênciasUniversidade de LisboaLisboaPortugal
  3. 3.Telecooperation LabTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations