Skip to main content

Multimodal Interfaces: A Survey of Principles, Models and Frameworks

  • Chapter
Human Machine Interaction

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5440))

Abstract

The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ailomaa, M., Lisowska, A., Melichar, M., Armstrong, S., Rajmanm, M.: Archivus: A Multimodal System for Multimedia Meeting Browsing and Retrieval. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, Sydney, Australia, July 17th-21st (2006)

    Google Scholar 

  2. Allen, J.F., Perault, C.R.: Analyzing Intentions in Dialogues. Artificial Intelligence 15(3), 143–178 (1980)

    Article  Google Scholar 

  3. André, E.: The generation of multimedia documents. In: Dale, R., Moisl, H., Somers, H. (eds.) A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text, pp. 305–327. Marcel Dekker Inc., New York (2000)

    Google Scholar 

  4. Araki, M., Tachibana, K.: Multimodal Dialog Description Language for Rapid System Development. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue (July 2006)

    Google Scholar 

  5. Baddeley, A.D.: Working Memory. Science 255, 556–559 (1992)

    Article  Google Scholar 

  6. Baddeley, A.D.: Working Memory. In: Bower, G.A. (ed.) Recent advances in learning and motivation, vol. 8. Academic Press, New York (1974)

    Google Scholar 

  7. Arens, Y., Hovy, E., Vossers, M.: On the knowledge underlying multimedia presentations. In: Maybury, M.T. (ed.) Intelligent Multimedia Interfaces, pp. 280–306. AAAI Press, Menlo Park (1993); Reprinted in Maybury and Wahlster, pp. 157–172 (1998)

    Google Scholar 

  8. Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: Gibbon, D., Mertins, I., Moore, R. (eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, pp. 102–203. Kluwer, Dordrecht (2000)

    Google Scholar 

  9. Bolt, R.A.: Put-that-there: voice and gesture at the graphics interface. Computer Graphics 14(3), 262–270 (1980)

    Article  Google Scholar 

  10. Bouchet, J., Nigay, L., Ganille, T.: ICARE Software Components for Rapidly Developing Multimodal Interfaces. In: Conference Proceedings of ICMI 2004, State College, Pennsylvania, USA, pp. 251–258. ACM Press, New York (2004)

    Google Scholar 

  11. Bourguet, M.L.: A Toolkit for Creating and Testing Multimodal Interface Designs. In: Companion proceedings of UIST 2002, Paris, pp. 29–30 (October 2002)

    Google Scholar 

  12. Brooke, N.M., Petajan, E.D.: Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In: Proceedings of the International Conference on Speech Input and Output: Techniques and Applications (1986), vol. 258, pp. 104–109 (1986)

    Google Scholar 

  13. Bui, T.H.: Multimodal Dialogue Management - State of the Art. CTIT Technical Report series No. 06-01, University of Twente (UT), Enschede, The Netherlands (2006)

    Google Scholar 

  14. Card, S., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, London (1983)

    Google Scholar 

  15. Churcher, G., Atwell, E., Souter, C.: Dialogue management systems: a survey and overview (1997)

    Google Scholar 

  16. Cohen, P.: Dialogue Modeling. In: Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology, pp. 204–209. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  17. Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Clow, J.: QuickSet: multimodal interaction for distributed applications. In: Proceedings of the Fifth ACM international Conference on Multimedia, Seattle, USA, pp. 31–40 (1997)

    Google Scholar 

  18. Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J., Young, R.: Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE properties. In: Proceedings of INTERACT 1995, Lillehammer, Norway, pp. 115–120. Chapman & Hall Publ., Boca Raton (1995)

    Google Scholar 

  19. Dumas, B., Lalanne, D., Ingold, R.: Prototyping Multimodal Interfaces with SMUIML Modeling Language. In: CHI 2008 Workshop on User Interface Description Languages for Next Generation User Interfaces, CHI 2008, Firenze, Italy, pp. 63–66 (2008)

    Google Scholar 

  20. Dumas, B., Lalanne, D., Guinard, D., Ingold, R., Koenig, R.: Strengths and Weaknesses of Software Architectures for the Rapid Creation of Tangible and Multimodal Interfaces. In: Proceedings of 2nd international conference on Tangible and Embedded Interaction (TEI 2008), Bonn, Germany, February 19 - 21, pp. 47–54 (2008)

    Google Scholar 

  21. Duric, Z., Gray, W., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M., Schunn, C., Wechsler, H.: Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc. of the IEEE 90(7), 1272–1289 (2002)

    Article  Google Scholar 

  22. Flippo, F., Krebs, A., Marsic, I.: A Framework for Rapid Development of Multimodal Interfaces. In: Proceedings of ICMI 2003, Vancouver, BC, November 5-7, pp. 109–116 (2003)

    Google Scholar 

  23. Foster, M.E.: State of the art review: Multimodal fission. COMIC project Deliverable 6.1 (September 2002)

    Google Scholar 

  24. Grant, K.W., Greenberg, S.: Speech intelligibility derived from asynchronous processing of auditory-visual information. In: Workshop on Audio-Visual Speech Processing (AVSP 2001), Scheelsminde, Denmark, pp. 132–137 (2001)

    Google Scholar 

  25. Greenberg, S., Fitchett, C.: Phidgets: easy development of physical interfaces through physical widgets. In: Proceedings of the 14th Annual ACM Symposium on User interface Software and Technology (UIST 2001), Orlando, Florida, pp. 209–218. ACM, New York (2001)

    Chapter  Google Scholar 

  26. Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. In: Computer Vision and Image Understanding, vol. 108(1-2), pp. 116–134. Elsevier, Amsterdam (2007)

    Google Scholar 

  27. Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A., Smith, I.: Unification-based multimodal integration. In: Proceedings of the Eighth Conference on European Chapter of the Association For Computational Linguistics, Madrid, Spain, July 07-12, pp. 281–288 (1997)

    Google Scholar 

  28. Katsurada, K., Nakamura, Y., Yamada, H., Nitta, T.: XISL: a language for describing multimodal interaction scenarios. In: Proceedings of ICMI 2003, Vancouver, Canada (2003)

    Google Scholar 

  29. Kieras, D., Meyer, D.E.: An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction 12, 391–438 (1997)

    Article  Google Scholar 

  30. Klemmer, S.R., Li, J., Lin, J., Landay, J.A.: Papier-Mâché: Toolkit Support for Tangible Input. In: Proceedings of CHI 2004, pp. 399–406 (2004)

    Google Scholar 

  31. Koons, D., Sparrell, C., Thorisson, K.: Integrating simultaneous input from speech, gaze, and hand gestures. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 257–276. MIT Press, Cambridge (1993)

    Google Scholar 

  32. Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A real-time framework for natural multimodal interaction with large screen displays. In: ICMI 2002, Pittsburgh, USA (October 2002)

    Google Scholar 

  33. Lalanne, D., Lisowska, A., Bruno, E., Flynn, M., Georgescul, M., Guillemot, M., Janvier, B., Marchand-Maillet, S., Melichar, M., Moenne-Loccoz, N., Popescu-Belis, A., Rajman, M., Rigamonti, M., von Rotz, D., Wellner, P.: In: The IM2 Multimodal Meeting Browser Family, Technical report, Fribourg (March 2005)

    Google Scholar 

  34. Lalanne, D., Rigamonti, M., Evequoz, F., Dumas, B., Ingold, R.: An ego-centric and tangible approach to meeting indexing and browsing. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 84–95. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  35. Lisowska, A.: Multimodal Interface Design for Multimedia Meeting Content Retrieval. PhD Thesis, University of Geneva, Switzerland (September 2007)

    Google Scholar 

  36. Lisowska, A., Betrancourt, M., Armstrong, S., Rajman, M.: Minimizing Modality Bias When Exploring Input Preference for Multimodal Systems in New Domains: the Archivus Case Study. In: Proceedings of CHI 2007, San José, California, pp. 1805–1810 (2007)

    Google Scholar 

  37. Lisowska, A.: Multimodal Interface Design for the Multimodal Meeting Domain: Preliminary Indications from a Query Analysis Study. IM2.MDM Internal Report IM2.MDM-11 (November 2003 )

    Google Scholar 

  38. Matena, L., Jaimes, A., Popescu-Belis, A.: Graphical representation of meetings on mobile devices. In: Proceedings of MobileHCI 2008 (10th International Conference on Human-Computer Interaction with Mobile Devices and Services), Amsterdam, pp. 503–506 (2008)

    Google Scholar 

  39. Mayer, R.E., Moreno, R.: A split-attention effect in multimedia learning: evidence for dual processing systems in working memory. Journal of Educational Psychology 90(2), 312–320 (1998)

    Article  Google Scholar 

  40. McKeown, K.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)

    Book  Google Scholar 

  41. McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. Univ. of Chicago Press, Chicago (1992)

    Google Scholar 

  42. Melichar, M., Cenek, P.: From vocal to multimodal dialogue management. In: Proceedings of the Eighth International Conference on Multimodal Interfaces (ICMI 2006), Banff, Canada, November 2-4, pp. 59–67 (2006)

    Google Scholar 

  43. Moore, J.D.: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. MIT Press, Cambridge (1995)

    Google Scholar 

  44. Mousavi, S.Y., Low, R., Sweller, J.: Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology 87(2), 319–334 (1995)

    Article  Google Scholar 

  45. Neal, J.G., Shapiro, S.C.: Intelligent multimedia interface technology. In: Sullivan, J., Tyler, S. (eds.) Intelligent User Interfaces, pp. 11–43. ACM Press, New York (1991)

    Google Scholar 

  46. Nigay, L., Coutaz, J.A.: Design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, April 24 - 29, pp. 172–178. ACM, New York (1993)

    Google Scholar 

  47. Norman, D.A.: The Design of Everyday Things. Basic Book, New York (1988)

    Google Scholar 

  48. Novick, D.G., Ward, K.: Mutual Beliefs of Multiple Conversants: A computational model of collaboration in Air Trafic Control. In: Proceedings of AAAI 1993, pp. 196–201 (1993)

    Google Scholar 

  49. Oviatt, S.L.: Advances in Robust Multimodal Interface Design. IEEE Computer Graphics and Applications 23 (September 2003)

    Google Scholar 

  50. Oviatt, S.L.: Multimodal interactive maps: Designing for human performance. Human-Computer Interaction 12, 93–129 (1997)

    Article  Google Scholar 

  51. Oviatt, S.L.: Multimodal interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, ch. 14, 2nd edn., pp. 286–304. CRC Press, Boca Raton (2008)

    Google Scholar 

  52. Oviatt, S.L.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)

    Article  Google Scholar 

  53. Oviatt, S.L.: Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th Annual ACM international Conference on Multimedia, Santa Barbara, CA, USA, October 23-27, pp. 871–880. ACM, New York (2006)

    Chapter  Google Scholar 

  54. Oviatt, S.L., Cohen, P.R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions. Human Computer Interaction 15(4), 263–322 (2000); Reprinted. In: Carroll, J. (ed.) Human-Computer Interaction in the New Millennium, ch. 19, pp. 421–456. Addison-Wesley Press, Reading (2001)

    Google Scholar 

  55. Oviatt, S.L., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of ICMI 2003, pp. 44–51. ACM Press, New York (2003)

    Google Scholar 

  56. Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. In: CVPR, vol. 2, pp. 407–412 (1999)

    Google Scholar 

  57. Petajan, E.D.: Automatic Lipreading to Enhance Speech Recognition, PhD thesis, University of Illinois at Urbana-Champaign (1984)

    Google Scholar 

  58. Popescu-Belis, A., Georgescul, M.: TQB: Accessing Multimedia Data Using a Transcript-based Query and Browsing Interface. In: Proceedings of LREC 2006 (5th International Conference on Language Resources and Evaluation), Genoa, Italy, pp. 1560–1565 (2006)

    Google Scholar 

  59. Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S., Balaji, T.S., Buisine, S.p., Collings, P., Cohen, P., Kraal, B., Martin, J.-C., McTear, M., Raman, T., Stanney, K.M., Su, H., Wang, Q.Y.: Guidelines for multimodal user interface design. Communications of the ACM 47(1), 57–59 (2004)

    Article  Google Scholar 

  60. Rigamonti, M., Lalanne, D., Ingold, R.: FaericWorld: Browsing Multimedia Events Through Static Documents And Links. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J. (eds.) INTERACT 2007. LNCS, vol. 4663, pp. 102–115. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  61. Serrano, M., Nigay, L., Lawson, J.-Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The OpenInterface framework: a tool for multimodal interaction. In: Adjunct Proceedings of CHI 2008, Florence, Italy, April 5-10, pp. 3501–3506. ACM Press, New York (2008)

    Google Scholar 

  62. Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings IEEE 86(5), 853–860 (1998); Special issue on Multimedia Signal Processing

    Article  Google Scholar 

  63. Sire, S., Chatty, C.: The Markup Way to Multimodal Toolkits. In: W3C Multimodal Interaction Workshop (2002)

    Google Scholar 

  64. SSPNet: Social Signal Processing Network, http://www.sspnet.eu

  65. Stanciulescu, A., Limbourg, Q., Vanderdonckt, J., Michotte, B., Montero, F.: A transformational approach for multimodal web user interfaces based on UsiXML. In: Proceedings of ICMI 2005, Torento, Italy, October 04-06, pp. 259–266 (2005)

    Google Scholar 

  66. Sweller, J., Chandler, P., Tierney, P., Cooper, M.: Cognitive Load as a Factor in the Structuring of Technical Material. Journal of Experimental Psychology: General 119, 176–192 (1990)

    Article  Google Scholar 

  67. Tindall-Ford, S., Chandler, P., Sweller, J.: When two sensory modes are better than one. Journal of Experimental Psychology: Applied 3(3), 257–287 (1997)

    Google Scholar 

  68. Traum, D., Larsson, S.: The Information State Approach to Dialogue Management. In: Van Kuppevelt, J.C.J., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue, pp. 325–353 (2003)

    Google Scholar 

  69. Turk, M., Robertson, G.: Perceptual user interfaces (Introduction). Communications of the ACM 43(3), 32–70 (2000)

    Article  Google Scholar 

  70. Vo, M.T., Wood, C.: Building an application framework for speech and pen input integration in multimodal learning interfaces. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (IEEE-ICASSP), vol. 6, pp. 3545–3548. IEEE Computer Society Press, Los Alamitos (1996)

    Google Scholar 

  71. W3C Multimodal Interaction Framework, http://www.w3.org/TR/mmi-framework

  72. Wickens, C.: Multiple resources and performance prediction. Theoretical Issues in Ergonomic Science 3(2), 159–177 (2002)

    Article  Google Scholar 

  73. Wickens, C., Sandry, D., Vidulich, M.: Compatibility and resource competition between modalities of input, central processing, and output. Human Factors 25(2), 227–248 (1983)

    Google Scholar 

  74. Wu, L., Oviatt, S., Cohen, P.: From members to teams to committee - a robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks 13(4), 972–982 (2002)

    Article  Google Scholar 

  75. Wu, L., Oviatt, S., Cohen, P.: Multimodal integration – A statistical view. IEEE Transactions on Multimedia 1(4), 334–341 (1999)

    Article  Google Scholar 

  76. Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (MAGIC) pointing. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI 1999), pp. 246–253. ACM Press, New York (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dumas, B., Lalanne, D., Oviatt, S. (2009). Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Lalanne, D., Kohlas, J. (eds) Human Machine Interaction. Lecture Notes in Computer Science, vol 5440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00437-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00437-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00436-0

  • Online ISBN: 978-3-642-00437-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics