Abstract
Multimodal-multisensor interfaces have coevolved rapidly with the emergence of mobile devices (e.g., smart phones), and they are now the dominant computer interface worldwide. This chapter summarizes the state of the art of multimodal-multisensor interfaces, including their major advantages, cognitive and neuroscience foundations, language-processing methods, multimodal machine learning techniques, commercialization trends, and future directions. It highlights recent changes in multimodal-multisensor interfaces during the last decade, including their: (1) incorporation of a larger number of increasingly heterogeneous information sources (e.g., physiological, behavioral, and contextual); (2) more robust processing based on machine learning and deep learning techniques; (3) modeling and prediction of complex and often hidden human mental and physical states (e.g., deception, neurodegenerative disease), and (4) widespread commercialization across business sectors. As multimodal-multisensor interfaces have become more deeply human-centered and powerful, for example, multimodal behavioral analytics, new concerns have been raised about developing strategies for protecting individual’s privacy. Additional ethical concerns will need to be examined in the future about the fairness, explainability, and societal impact of deploying different applications enabled by this technology.
References
Adams A, Sasse AM (2001) Privacy in multimedia communications: protecting users, not just data. Springer, London, pp 49–64
Allen J, Andre E, Cohen P, Hakkani-Tur D, Kaplan R, Lemon O, Traum D (2019) Challenge discussion: advancing multimodal dialogue. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Alpaydın E (2018) Classifying multimodal data. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Baddeley A (1992) Working memory. Science 255:556–559
Baltrusaitis T, Ahuja C, Morency L-P (2018) Multimodal machine learning. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Bengio S, Deng L, Morency L, Schuller B (2018) Perspectives on predictive power of multimodal deep learning: surprises and future directions. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Benoit C, Guiard-Marigny T, Le Goff B, Adjoudani A (1996) Which components of the face do humans and machines best speechread? In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems, and applications, NATO ASI series. Series F: computer and systems sciences, vol 150. Springer, Berlin, pp 315–325
Burgoon J, Magnenat-Thalmann N, Pantic M, Vinciarelli A (2017) Social signal processing. Cambridge University Press, Cambridge, UK
Burzo M, Abouelenien M, Perez-Rosas V, Mihalcea R (2018) Multimodal deception detection. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Calvert G, Spence C, Stein BE (eds) (2004) The handbook of multisensory processing. MIT Press, Cambridge, MA
Cohen P, Tumuluri R (2019) Commercialization of multimodal systems. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Cohen P, Dalrymple M, Moran D, Pereira F (1989) Synergistic use of direct manipulation and natural language. In: CHI’89 conf. proc., ACM. Addison Wesley, New York, pp 227–234
Cohn J, Cummins N, Epps J, Goecke R, Joshi J, Scherer S (2018) Multimodal assessment of depression and related disorders based on behavioural signals. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Curhan J, Pentland A (2007) Thin slices of negotiation: predicting outcomes from conversational dynamics within the first five minutes. J Appl Psychol 92(3):802–811
D’Mello S, Kory J (2015) A review and meta-analysis of multimodal affect detection systems. ACM Comput Surv 47(3):1–36
D’Mello S, Bosch N, Chen H (2018) Multimodal-multisensor affect detection. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 2: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
DeMaagd G, Philip A (2015) Parkinson’s disease and its management. P T 40(8):504–510
di Biase L, di Santo A, Caminiti M, de Liso A, Shah S, Ricci L, Lazarro V (2020) Gait analysis in Parkinson’s disease: an overview of the most accurate markers for diagnosis and symptoms monitoring. Sensors 20(12):3529
Dorsey E et al (2007) Projected number of people with Parkinson disease in the most populous nations, 2005 through 2030. Neurology 68(5):384–386
Ekman P (1992) Facial expressions of emotion: new findings, new questions. Psychol Sci 3(1):34–38
Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Berkeley
Evans B (2014) Mobile is eating the world. Forbes. https://www.forbes.com/sites/louiscolumbus/2014/11/09/mobile-is-eating-the-world/?sh=1c6cd745647d
Fowler G, Hunter T (2021) When you “ask app not to track,” some iPhone apps keep snooping anyway. Washington Post, 24 Sept 2021. Retrieved 24 Sept 2021 at https://www.washingtonpost.com/technology/2021/09/23/iphone-tracking/
Fridlund A (1994) Human facial expression: an evolutionary view. Academic Press, New York
Friedland G, Tschantz M (2019) Privacy concerns of multimodal sensor systems. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Fuster-Duran A (1996) Perception of conflicting audio-visual speech: an examination across Spanish and German. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems and applications. Springer, New York, pp 135–143
Hadar U, Steiner TJ, Grant EC, Clifford Rose F (1983) Kinematics of head movements accompanying speech during conversation. Hum Mov Sci 2:35–46
Hauptmann AG (1989) Speech and gestures for graphic image manipulation. In: Proceedings of the conference on human factors in computing systems (CHI’89), vol 1. ACM Press, New York, pp 241–245
International Computer Science Institute (2016) Tools for teaching privacy to k12 and undergraduate students, pp 687–696. http://teachingprivacy.icsi.berkeley.edu
James K, Vinci-Booher S, Munoz-Rubke F (2017) The impact of multimodal-multisensory learning on human performance and brain activation patterns. In: Oviatt S, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 1: foundations, user modeling and common modality combinations. ACM Press, New York
Johnston M (2019) Multimodal integration for interactive conversational systems. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Kendon A (1980) Gesticulation and speech: two aspects of the process of utterance. In: Key M (ed) The relationship of verbal and nonverbal communication. Mouton, The Hague, pp 207–227
Keren G, Mousa A, Pietquin O, Zafeiriou S, Schuller B (2018) Deep learning for multisensorial and multimodal interaction. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Kirchner E, Fairclough S, Kirchner F (2019) Embedded multimodal interfaces in robotics: applications, future trends, and societal applications. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Kricos PB (1996) Differences in visual intelligibility across talkers. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems and applications. Springer, New York, pp 43–53
Liu S, Cai W, Liu S, Zhang F, Fulham M, Feng D, Pujol S, Kikinis R (2015) Multimodal neuroimaging computing: a review of the applications in neuropsychiatric disorders. Brain Inform 2(3):167–180
Lucas G, Stratou G, Lieblich S, Gratch J (2016) Trust me: multimodal signals of trustworthiness. In: Proceedings of the international conference on multimodal interaction. ACM, New York, pp 5–12
Massaro DW, Cohen MM (1990) Perception of synthesized audible and visible speech. Psychol Sci 1(1):55–63
McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago
Oviatt SL (1997) Multimodal interactive maps: designing for human performance. Hum Comput Interact 12:93–129
Oviatt SL (1999) Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of the conference on human factors in computing systems (CHI’99), CHI Letters, New York. ACM Press, pp 576–583
Oviatt SL (2002) Breaking the robustness barrier: recent progress on the design of robust multimodal systems. Adv Comput 56:305–341
Oviatt SL (2006) Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the conference on ACM multimedia. ACM, New York, pp 871–880
Oviatt SL (2013) The design of future of educational interfaces. Routledge Press, New York
Oviatt S (2017) Theoretical foundations of multimodal interfaces and systems. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 1: foundations, user modeling, and common modality combinations. Morgan Claypool Publishers, San Rafael
Oviatt SL (2018) Multimodal-multisensor behavioral analytics: going deeper into human-centered design, invited tutorial. In: ACM international conference on multimodal interaction, New York
Oviatt SL (2021) Technology as infrastructure for dehumanization: three hundred million people with the same face. In: ACM proceedings of the international conference on multimodal interaction, Montreal, October 2021
Oviatt SL, Cohen P (2000) Perceptual user interfaces: multimodal interfaces that process what comes naturally. Commun ACM 43(3):45–53
Oviatt SL, Cohen PR (2015) The paradigm shift to multimodality in contemporary computer interfaces. Morgan Claypool Publishers, San Rafael
Oviatt SL, Kuhn K (1998) Referential features and linguistic indirection in multimodal language. In: Proceedings of the international conference on spoken language processing (ICSLP’98), Sydney, vol 6. ASSTA, Inc., pp 2339–2342
Oviatt SL, Smeulders A (2018) Multimodal knowledge discovery. In: Chang S, Hauptmann A, Morency L, Antani S, Bulterman D, Busso C, Chai J, Hirschberg J, Jain R, Mayer-Patel K, Meth R, Mooney R, Nahrstedt K, Narayanan S, Natarajan P, Oviatt S, Prabhakaran B, Smeulders A, Sundaram H, Zhang Z, Zhou M (eds) Report of 2017 NSF workshop on multimedia challenges, opportunities and research roadmap. arXiv preprint arXiv:1908.02308
Oviatt SL, Van Gent R (1996) Error resolution during multimodal human-computer interaction. In: Bunnell T, Idsardi W (eds) Proceedings of the international conference on spoken language processing (ICSLP’96), vol 1. University of Delaware/A.I. DuPont Institute, pp 204–207
Oviatt SL, De Angeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of the CHI conference, pp 415–422
Oviatt SL, Bernard J, Levow G (1999) Linguistic adaptation during error resolution with spoken and multimodal systems. Lang Speech 41(3–4):415–438
Oviatt SL, Coulston R, Shriver S Xiao B, Wesson R, Lunsford R, Carmichael L (2003) Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of the international conference on multimodal interfaces (ICMI’03). ACM Press, New York, pp 44–51
Oviatt SL, Coulston R, Lunsford R (2004) When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proceedings of the sixth international conference on multimodal interfaces (ICMI’04), pp 129–136
Oviatt SL, Lunsford R, Coulston R (2005) Individual differences in multimodal integration patterns: what are they and why do they exist? In: Proceedings of the conference on human factors in computing systems (CHI’05), CHI Letters, New York. ACM Press
Oviatt SL, Arthur A, Brock Y, Cohen J (2007) Expressive pen-based interfaces for math education. In: Chinn CA, Erkens G, Puntambekar S (eds) Proceedings of the conference on computer-supported collaborative learning, vol 8(2). International Society of the Learning Sciences, pp 569–578
Oviatt S, Cohen A, Miller A, Hodge K, Mann A (2012) The impact of interface affordances on human ideation, problem solving and inferential reasoning. ACM Trans Comput Hum Interact 19(3):1–30
Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) (2017) The handbook of multimodal-multisensor interfaces, volume 1: foundations, user modeling, and common modality combinations. Morgan Claypool Publishers, San Rafael
Oviatt S, Graafsgard J, Chen L, Ochoa X (2018a) Multimodal learning analytics: assessing learners’ mental state during the process of learning. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 2: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) (2018b) The handbook of multimodal-multisensor interfaces, volume 2: signal processing, architectures, and detection of cognition and emotion. Morgan Claypool Publishers, San Rafael
Oviatt S, Zhou J, Hang K, Yu K, Chen F (2018c) Dynamic handwriting signal features predict domain expertise. ACM Trans Intell Interact Syst 8(3):1–21
Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) (2019) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Oviatt S, Lin J, Sriramulu A (2021) I know what you know: what hand movements reveal about domain expertise. ACM Trans Intell Interact Syst 4:25
Palliya Guruge C, Oviatt S, Delir Haghighi P, Prichard E (2021) Advances in multimodal behavioral analytics for early dementia diagnosis: a review. In: ACM proceedings of the international conference on multimodal interaction, Montreal, October 2021
Panagakis Y, Rudovic O, Pantic M (2018) Learning for multi-modal and context-sensitive interfaces. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Pentland A (2007) On the collective nature of human intelligence. Adapt Behav 15(2):189–198
Renals S, Bourlard H, Carletta J, Popescu-Belis A (2012) Multimodal signal processing: human interactions in meetings. Cambridge University Press, Cambridge, UK
Ribaric S, Ariyaeeinia A, Pavesic N (2016) De-identification for privacy protection in multimedia content: a survey. Signal Process Image Commun 47C:131–151
Rudnicky A, Hauptman A (1992) Multimodal interactions in speech systems. In: Blattner M, Dannenberg R (eds) Multimedia interface design. ACM Press, New York, pp 147–172
Schnelle-Walka D, Radomski S (2019) Automotive multimodal human-machine interface. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Sekiyama K, Tohkura Y (1991) McGurk effect in non-English listeners: few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. J Acoust Soc Am 90:1797–1805
Senaratne H, Ellis K, Melvin G, Kuhlmann L, Oviatt S (2021) A multimodal dataset and evaluation for feature estimators of temporal phases of anxiety. In: ACM proceedings of the international conference on multimodal interaction, Montreal, October 2021
Sonntag D (2019) Medical and health systems. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Stein B (2012) The new handbook of multisensory perception. MIT Press, Cambridge, MA
Stork DG, Hennecke ME (eds) (1995) Speechreading by humans and machines. Springer, New York
Suhm B (1998) Multimodal interactive error recovery for non-conversational speech user interfaces. Ph.D. thesis, Fredericiana University, Shaker Verlag
Tang A, McLachlan P, Lowe K, Saka C, MacLean K (2005) Perceiving ordinal data haptically under workload. In: Proceedings of the ACM seventh international conference on multimodal interfaces. ACM, New York, pp 317–324
Thuraisingham B (2007) Security and privacy for multimedia database management systems. Multimed Tools Appl 33(1):13–29
Tian F, Fan X, Fan J, Zhu Y, Gao J, Wang D, Bi X, Wang H (2019) What can gestures tell? Detecting motor impairments in early Parkinson’s from common touch gestural interactions. In: ACM CHI conference, Glasgow
Tomlinson MJ, Russell MJ, Brooke NM (1996) Integrating audio and visual information to provide highly robust speech recognition. In: Proceedings of the international conference on acoustics, speech and signal processing (IEEE-ICASSP), vol 2. IEEE Press, pp 821–824
Tsanas A, Little M, McSharry P, Ramig LO (2011) Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J R Soc Interface 8:842–855
Tsanas A, Little M, McSharry P, Spielman J, Ramig L (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271
Valstar M (2019) Multimodal databases. In: Oviatt SL, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 3: language processing, software, commercialization and emerging directions. Morgan Claypool Publishers, San Rafael
Vasquez-Correa J, Orozco-Arroyave J et al (2017) Multi-view representation learning via GCCA for multimodal analysis of Parkinson’s disease. In: IEEE ICASSP, pp 2966–2970
Vatikiotis-Bateson E, Munhall KG, Hirayama M, Lee YV, Terzopoulos D (1996) The dynamics of audiovisual behavior of speech. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems, and applications, NATO ASI series. Series F: computer and systems sciences, vol 150. Springer, Berlin, pp 221–232
Vinciarelli A, Pantic M, Bourlard H, Pentland A (2008) Social signal processing: state of the art and future perspectives for an emerging domain. In: Proceedings of the 16th ACM international conference on multimedia. ACM, New York, pp 1061–1070
Winkler R (2021) Apple is working on iPhone features to help detect depression, cognitive decline. Wall Street J, 21 Sept 2021; Retrieved on 22 Sept 2021 at: https://www.wsj.com/articles/apple-wants-iphones-to-help-detect-depression-cognitive-decline-sources-say-11632216601
Wynn T (2002) Archaeology and cognitive evolution. Behav Brain Sci 25:389–438
Xiao B, Girand C, Oviatt S (2002) Multimodal integration patterns in children. In: Hansen J, Pellom B (eds) Proceedings of the international conference on spoken language processing (ICSLP’02). Casual Prod. Ltd, Denver, pp 629–632
Xiao B, Lunsford R, Coulston R, Wesson R, Oviatt SL (2003) Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences. In: Proceedings of the international conference on multimodal interfaces (ICMI’03). ACM Press, New York, pp 265–272
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31:39–58
Zhou J, Yu K, Chen F, Wang Y, Arshad S (2018) Multimodal behavioral and physiological signals as indicators of cognitive load. In: Oviatt S, Schuller B, Cohen P, Sonntag D, Potamianos G, Krueger A (eds) The handbook of multimodal-multisensor interfaces, volume 2: signal processing, architectures, and detection of emotion and cognition. Morgan Claypool Publishers, San Rafael
Zuboff S (2019) The age of surveillance capitalism. Perseus Books, New York
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this entry
Cite this entry
Oviatt, S. (2022). Multimodal Interaction, Interfaces, and Analytics. In: Vanderdonckt, J., Palanque, P., Winckler, M. (eds) Handbook of Human Computer Interaction. Springer, Cham. https://doi.org/10.1007/978-3-319-27648-9_22-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27648-9_22-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27648-9
Online ISBN: 978-3-319-27648-9
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering