Multimodal Interfaces: A Survey of Principles, Models and Frameworks

Dumas, Bruno; Lalanne, Denis; Oviatt, Sharon

doi:10.1007/978-3-642-00437-7_1

Bruno Dumas¹⁷,
Denis Lalanne¹⁷ &
Sharon Oviatt¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5440))

3870 Accesses
154 Citations

Abstract

The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ailomaa, M., Lisowska, A., Melichar, M., Armstrong, S., Rajmanm, M.: Archivus: A Multimodal System for Multimedia Meeting Browsing and Retrieval. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, Sydney, Australia, July 17th-21st (2006)
Google Scholar
Allen, J.F., Perault, C.R.: Analyzing Intentions in Dialogues. Artificial Intelligence 15(3), 143–178 (1980)
Article Google Scholar
André, E.: The generation of multimedia documents. In: Dale, R., Moisl, H., Somers, H. (eds.) A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text, pp. 305–327. Marcel Dekker Inc., New York (2000)
Google Scholar
Araki, M., Tachibana, K.: Multimodal Dialog Description Language for Rapid System Development. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue (July 2006)
Google Scholar
Baddeley, A.D.: Working Memory. Science 255, 556–559 (1992)
Article Google Scholar
Baddeley, A.D.: Working Memory. In: Bower, G.A. (ed.) Recent advances in learning and motivation, vol. 8. Academic Press, New York (1974)
Google Scholar
Arens, Y., Hovy, E., Vossers, M.: On the knowledge underlying multimedia presentations. In: Maybury, M.T. (ed.) Intelligent Multimedia Interfaces, pp. 280–306. AAAI Press, Menlo Park (1993); Reprinted in Maybury and Wahlster, pp. 157–172 (1998)
Google Scholar
Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: Gibbon, D., Mertins, I., Moore, R. (eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, pp. 102–203. Kluwer, Dordrecht (2000)
Google Scholar
Bolt, R.A.: Put-that-there: voice and gesture at the graphics interface. Computer Graphics 14(3), 262–270 (1980)
Article Google Scholar
Bouchet, J., Nigay, L., Ganille, T.: ICARE Software Components for Rapidly Developing Multimodal Interfaces. In: Conference Proceedings of ICMI 2004, State College, Pennsylvania, USA, pp. 251–258. ACM Press, New York (2004)
Google Scholar
Bourguet, M.L.: A Toolkit for Creating and Testing Multimodal Interface Designs. In: Companion proceedings of UIST 2002, Paris, pp. 29–30 (October 2002)
Google Scholar
Brooke, N.M., Petajan, E.D.: Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In: Proceedings of the International Conference on Speech Input and Output: Techniques and Applications (1986), vol. 258, pp. 104–109 (1986)
Google Scholar
Bui, T.H.: Multimodal Dialogue Management - State of the Art. CTIT Technical Report series No. 06-01, University of Twente (UT), Enschede, The Netherlands (2006)
Google Scholar
Card, S., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, London (1983)
Google Scholar
Churcher, G., Atwell, E., Souter, C.: Dialogue management systems: a survey and overview (1997)
Google Scholar
Cohen, P.: Dialogue Modeling. In: Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology, pp. 204–209. Cambridge University Press, Cambridge (1998)
Google Scholar
Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Clow, J.: QuickSet: multimodal interaction for distributed applications. In: Proceedings of the Fifth ACM international Conference on Multimedia, Seattle, USA, pp. 31–40 (1997)
Google Scholar
Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J., Young, R.: Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE properties. In: Proceedings of INTERACT 1995, Lillehammer, Norway, pp. 115–120. Chapman & Hall Publ., Boca Raton (1995)
Google Scholar
Dumas, B., Lalanne, D., Ingold, R.: Prototyping Multimodal Interfaces with SMUIML Modeling Language. In: CHI 2008 Workshop on User Interface Description Languages for Next Generation User Interfaces, CHI 2008, Firenze, Italy, pp. 63–66 (2008)
Google Scholar
Dumas, B., Lalanne, D., Guinard, D., Ingold, R., Koenig, R.: Strengths and Weaknesses of Software Architectures for the Rapid Creation of Tangible and Multimodal Interfaces. In: Proceedings of 2nd international conference on Tangible and Embedded Interaction (TEI 2008), Bonn, Germany, February 19 - 21, pp. 47–54 (2008)
Google Scholar
Duric, Z., Gray, W., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M., Schunn, C., Wechsler, H.: Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc. of the IEEE 90(7), 1272–1289 (2002)
Article Google Scholar
Flippo, F., Krebs, A., Marsic, I.: A Framework for Rapid Development of Multimodal Interfaces. In: Proceedings of ICMI 2003, Vancouver, BC, November 5-7, pp. 109–116 (2003)
Google Scholar
Foster, M.E.: State of the art review: Multimodal fission. COMIC project Deliverable 6.1 (September 2002)
Google Scholar
Grant, K.W., Greenberg, S.: Speech intelligibility derived from asynchronous processing of auditory-visual information. In: Workshop on Audio-Visual Speech Processing (AVSP 2001), Scheelsminde, Denmark, pp. 132–137 (2001)
Google Scholar
Greenberg, S., Fitchett, C.: Phidgets: easy development of physical interfaces through physical widgets. In: Proceedings of the 14th Annual ACM Symposium on User interface Software and Technology (UIST 2001), Orlando, Florida, pp. 209–218. ACM, New York (2001)
Chapter Google Scholar
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. In: Computer Vision and Image Understanding, vol. 108(1-2), pp. 116–134. Elsevier, Amsterdam (2007)
Google Scholar
Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A., Smith, I.: Unification-based multimodal integration. In: Proceedings of the Eighth Conference on European Chapter of the Association For Computational Linguistics, Madrid, Spain, July 07-12, pp. 281–288 (1997)
Google Scholar
Katsurada, K., Nakamura, Y., Yamada, H., Nitta, T.: XISL: a language for describing multimodal interaction scenarios. In: Proceedings of ICMI 2003, Vancouver, Canada (2003)
Google Scholar
Kieras, D., Meyer, D.E.: An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction 12, 391–438 (1997)
Article Google Scholar
Klemmer, S.R., Li, J., Lin, J., Landay, J.A.: Papier-Mâché: Toolkit Support for Tangible Input. In: Proceedings of CHI 2004, pp. 399–406 (2004)
Google Scholar
Koons, D., Sparrell, C., Thorisson, K.: Integrating simultaneous input from speech, gaze, and hand gestures. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 257–276. MIT Press, Cambridge (1993)
Google Scholar
Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A real-time framework for natural multimodal interaction with large screen displays. In: ICMI 2002, Pittsburgh, USA (October 2002)
Google Scholar
Lalanne, D., Lisowska, A., Bruno, E., Flynn, M., Georgescul, M., Guillemot, M., Janvier, B., Marchand-Maillet, S., Melichar, M., Moenne-Loccoz, N., Popescu-Belis, A., Rajman, M., Rigamonti, M., von Rotz, D., Wellner, P.: In: The IM2 Multimodal Meeting Browser Family, Technical report, Fribourg (March 2005)
Google Scholar
Lalanne, D., Rigamonti, M., Evequoz, F., Dumas, B., Ingold, R.: An ego-centric and tangible approach to meeting indexing and browsing. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 84–95. Springer, Heidelberg (2008)
Chapter Google Scholar
Lisowska, A.: Multimodal Interface Design for Multimedia Meeting Content Retrieval. PhD Thesis, University of Geneva, Switzerland (September 2007)
Google Scholar
Lisowska, A., Betrancourt, M., Armstrong, S., Rajman, M.: Minimizing Modality Bias When Exploring Input Preference for Multimodal Systems in New Domains: the Archivus Case Study. In: Proceedings of CHI 2007, San José, California, pp. 1805–1810 (2007)
Google Scholar
Lisowska, A.: Multimodal Interface Design for the Multimodal Meeting Domain: Preliminary Indications from a Query Analysis Study. IM2.MDM Internal Report IM2.MDM-11 (November 2003 )
Google Scholar
Matena, L., Jaimes, A., Popescu-Belis, A.: Graphical representation of meetings on mobile devices. In: Proceedings of MobileHCI 2008 (10th International Conference on Human-Computer Interaction with Mobile Devices and Services), Amsterdam, pp. 503–506 (2008)
Google Scholar
Mayer, R.E., Moreno, R.: A split-attention effect in multimedia learning: evidence for dual processing systems in working memory. Journal of Educational Psychology 90(2), 312–320 (1998)
Article Google Scholar
McKeown, K.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)
Book Google Scholar
McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. Univ. of Chicago Press, Chicago (1992)
Google Scholar
Melichar, M., Cenek, P.: From vocal to multimodal dialogue management. In: Proceedings of the Eighth International Conference on Multimodal Interfaces (ICMI 2006), Banff, Canada, November 2-4, pp. 59–67 (2006)
Google Scholar
Moore, J.D.: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. MIT Press, Cambridge (1995)
Google Scholar
Mousavi, S.Y., Low, R., Sweller, J.: Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology 87(2), 319–334 (1995)
Article Google Scholar
Neal, J.G., Shapiro, S.C.: Intelligent multimedia interface technology. In: Sullivan, J., Tyler, S. (eds.) Intelligent User Interfaces, pp. 11–43. ACM Press, New York (1991)
Google Scholar
Nigay, L., Coutaz, J.A.: Design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, April 24 - 29, pp. 172–178. ACM, New York (1993)
Google Scholar
Norman, D.A.: The Design of Everyday Things. Basic Book, New York (1988)
Google Scholar
Novick, D.G., Ward, K.: Mutual Beliefs of Multiple Conversants: A computational model of collaboration in Air Trafic Control. In: Proceedings of AAAI 1993, pp. 196–201 (1993)
Google Scholar
Oviatt, S.L.: Advances in Robust Multimodal Interface Design. IEEE Computer Graphics and Applications 23 (September 2003)
Google Scholar
Oviatt, S.L.: Multimodal interactive maps: Designing for human performance. Human-Computer Interaction 12, 93–129 (1997)
Article Google Scholar
Oviatt, S.L.: Multimodal interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, ch. 14, 2nd edn., pp. 286–304. CRC Press, Boca Raton (2008)
Google Scholar
Oviatt, S.L.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)
Article Google Scholar
Oviatt, S.L.: Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th Annual ACM international Conference on Multimedia, Santa Barbara, CA, USA, October 23-27, pp. 871–880. ACM, New York (2006)
Chapter Google Scholar
Oviatt, S.L., Cohen, P.R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions. Human Computer Interaction 15(4), 263–322 (2000); Reprinted. In: Carroll, J. (ed.) Human-Computer Interaction in the New Millennium, ch. 19, pp. 421–456. Addison-Wesley Press, Reading (2001)
Google Scholar
Oviatt, S.L., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of ICMI 2003, pp. 44–51. ACM Press, New York (2003)
Google Scholar
Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. In: CVPR, vol. 2, pp. 407–412 (1999)
Google Scholar
Petajan, E.D.: Automatic Lipreading to Enhance Speech Recognition, PhD thesis, University of Illinois at Urbana-Champaign (1984)
Google Scholar
Popescu-Belis, A., Georgescul, M.: TQB: Accessing Multimedia Data Using a Transcript-based Query and Browsing Interface. In: Proceedings of LREC 2006 (5th International Conference on Language Resources and Evaluation), Genoa, Italy, pp. 1560–1565 (2006)
Google Scholar
Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S., Balaji, T.S., Buisine, S.p., Collings, P., Cohen, P., Kraal, B., Martin, J.-C., McTear, M., Raman, T., Stanney, K.M., Su, H., Wang, Q.Y.: Guidelines for multimodal user interface design. Communications of the ACM 47(1), 57–59 (2004)
Article Google Scholar
Rigamonti, M., Lalanne, D., Ingold, R.: FaericWorld: Browsing Multimedia Events Through Static Documents And Links. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J. (eds.) INTERACT 2007. LNCS, vol. 4663, pp. 102–115. Springer, Heidelberg (2007)
Chapter Google Scholar
Serrano, M., Nigay, L., Lawson, J.-Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The OpenInterface framework: a tool for multimodal interaction. In: Adjunct Proceedings of CHI 2008, Florence, Italy, April 5-10, pp. 3501–3506. ACM Press, New York (2008)
Google Scholar
Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings IEEE 86(5), 853–860 (1998); Special issue on Multimedia Signal Processing
Article Google Scholar
Sire, S., Chatty, C.: The Markup Way to Multimodal Toolkits. In: W3C Multimodal Interaction Workshop (2002)
Google Scholar
SSPNet: Social Signal Processing Network, http://www.sspnet.eu
Stanciulescu, A., Limbourg, Q., Vanderdonckt, J., Michotte, B., Montero, F.: A transformational approach for multimodal web user interfaces based on UsiXML. In: Proceedings of ICMI 2005, Torento, Italy, October 04-06, pp. 259–266 (2005)
Google Scholar
Sweller, J., Chandler, P., Tierney, P., Cooper, M.: Cognitive Load as a Factor in the Structuring of Technical Material. Journal of Experimental Psychology: General 119, 176–192 (1990)
Article Google Scholar
Tindall-Ford, S., Chandler, P., Sweller, J.: When two sensory modes are better than one. Journal of Experimental Psychology: Applied 3(3), 257–287 (1997)
Google Scholar
Traum, D., Larsson, S.: The Information State Approach to Dialogue Management. In: Van Kuppevelt, J.C.J., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue, pp. 325–353 (2003)
Google Scholar
Turk, M., Robertson, G.: Perceptual user interfaces (Introduction). Communications of the ACM 43(3), 32–70 (2000)
Article Google Scholar
Vo, M.T., Wood, C.: Building an application framework for speech and pen input integration in multimodal learning interfaces. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (IEEE-ICASSP), vol. 6, pp. 3545–3548. IEEE Computer Society Press, Los Alamitos (1996)
Google Scholar
W3C Multimodal Interaction Framework, http://www.w3.org/TR/mmi-framework
Wickens, C.: Multiple resources and performance prediction. Theoretical Issues in Ergonomic Science 3(2), 159–177 (2002)
Article Google Scholar
Wickens, C., Sandry, D., Vidulich, M.: Compatibility and resource competition between modalities of input, central processing, and output. Human Factors 25(2), 227–248 (1983)
Google Scholar
Wu, L., Oviatt, S., Cohen, P.: From members to teams to committee - a robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks 13(4), 972–982 (2002)
Article Google Scholar
Wu, L., Oviatt, S., Cohen, P.: Multimodal integration – A statistical view. IEEE Transactions on Multimedia 1(4), 334–341 (1999)
Article Google Scholar
Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (MAGIC) pointing. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI 1999), pp. 246–253. ACM Press, New York (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

DIVA Group, University of Fribourg, Bd de Pérolles 90, 1700, Fribourg, Switzerland
Bruno Dumas & Denis Lalanne
Incaa Designs, 821 Second Ave., Ste. 1100, Seattle, WA, 98104, USA
Sharon Oviatt

Authors

Bruno Dumas
View author publications
You can also search for this author in PubMed Google Scholar
Denis Lalanne
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Oviatt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Fribourg, Bd. de Pérolles 90, CH-1700, Fribourg, Switzerland
Denis Lalanne & Jürg Kohlas &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dumas, B., Lalanne, D., Oviatt, S. (2009). Multimodal Interfaces: A Survey of Principles, Models and Frameworks. In: Lalanne, D., Kohlas, J. (eds) Human Machine Interaction. Lecture Notes in Computer Science, vol 5440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00437-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-00437-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00436-0
Online ISBN: 978-3-642-00437-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics