Emotions can be recognized whether conveyed by facial expressions, linguistic cues (semantics), or prosody (voice tone). However, few studies have empirically documented the extent to which multi-modal emotion perception differs from uni-modal emotion perception. Here, we tested whether emotion recognition is more accurate for multi-modal stimuli by presenting stimuli with different combinations of facial, semantic, and prosodic cues. Participants judged the emotion conveyed by short utterances in six channel conditions. Results indicated that emotion recognition is significantly better in response to multi-modal versus uni-modal stimuli. When stimuli contained only one emotional channel, recognition tended to be higher in the visual modality (i.e., facial expressions, semantic information conveyed by text) than in the auditory modality (prosody), although this pattern was not uniform across emotion categories. The advantage for multi-modal recognition may reflect the automatic integration of congruent emotional information across channels which enhances the accessibility of emotion-related knowledge in memory.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
For the uni-modal face condition, stimuli were initially extracted from both the lexical and pseudo-utterances, saved as silent .avi videoclips, which were presented to a group of raters. There was no statistically significant effect of identifying emotions from uni-modal face stimuli extracted from videoclips containing lexical versus pseudo-utterances; since including all of these items would yield twice as many items in this one condition, only uni-modal face stimuli from pseudo-utterances were used.
Baenziger, T., Grandjean, D., & Scherer, K. R. (2009). Emotion recognition from expressions in face, voice, and body. The Multimodal Emotion Recognition Test (MERT). Emotion, 9(5), 691–704.
Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 3, 614–636.
Borod, J. C., Cicero, B., Obler, L. K., Welkowitz, J., Erhan, H. M., Santschi, C., et al. (1998). Right hemisphere emotional perception. Evidence across multiple channels. Neuropsychology, 12, 446–458.
Borod, J. C., Pick, L. H., Hall, S., Sliwinski, M., Madigan, N., Obler, L. K., et al. (2000). Relationships among facial, prosodic, and lexical channels of emotional perceptual processing. Cognition and Emotion, 14, 193–211.
Bower, G. H. (1981). Mood and memory. American Psychologist, 36, 129–148.
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., et al. (2004). Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of ACM 6th International Conference on Multimodal Interfaces (ICMI 2004), State College, PA, 2004.
Castro, S. L., & Lima, C. F. (2010). Recognizing emotions in spoken language: A validated set of Portuguese sentences and pseudosentences for research on emotional prosody. Behavior Research Methods, 42(1), 74–81.
Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M., et al. (2008). Audio-visual integration of emotion expression. Brain Research, 1242, 126–135.
De Silva, L. C., Miyasato, T., & Natatsu, R. (1997). Facial emotion recognition using multimodal information. In Proceedings of IEEE International Conference on Information, Communications and Signal Processing (ICICS’97), pp. 397–401.
DeGelder, B., & Bertelson, P. (2003). Multisensory integration, perception and ecological validity. Trends in Cognitive Sciences, 7, 460–467.
DeGelder, B., Böcker, K. B. E., Tuomainen, J., Hensen, M., & Vroomen, J. (1999). The combined perception of emotion from voice and face: Early interaction revealed by human electric brain responses. Neuroscience Letters, 260, 133–136.
DeGelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye. Cognition and Emotion, 14, 289–311.
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200.
Ekman, P., & Friesen, W. (1976). Pictures of facial affect. Palo Alto, CA: Consulting Psychologist’s Press.
Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128, 203–235.
Etcoff, N. L., & Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, 44, 227–240.
Hawk, S. T., van Kleef, G. A., Fischer, A. H., & van der Schalk, J. (2009). Worth a thousand words: Absolute and relative decodability of nonlinguistic affect vocalizations. Emotion, 9(3), 293–305.
Johnstone, T., & Scherer, K. R. (2000). Vocal communication of emotion. In M. Lewis & J. Haviland (Eds.), Handbook of emotions (2nd ed., pp. 220–235). New York: Guilford Press.
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770–814.
Keppel, G. (1991). Design and analysis: A researcher’s handbook. Englewood Cliffs, NJ: Prentice Hall.
Kotz, S. A., & Paulmann, S. (2007). When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Research, 1151, 107–118.
Kreifelts, B., Ethofer, T., Grodd, W., Erb, M., & Wildgruber, D. (2007). Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. Neuroimage, 37, 1445–1456.
Kreifelts, B., Ethofer, T., Huberle, E., Grodd, W., & Wildgruber, D. (2010). Association of trait emotional intelligence and individual fMRI-activation patterns during the perception of social signals from voice and face. Human Brain Mapping, 31(7), 979–991.
Levitt, E. A. (1964). The relationship between abilities to express emotional meanings vocally and facially. In J. R. Davitz (Ed.), The communication of emotional meaning (pp. 87–100). New York: McGraw-Hill.
Massaro, D. W., & Egan, P. B. (1996). Perceiving affect from the voice and the face. Psychonomic Bulletin Review, 3, 215–221.
Niedenthal, P. M. (2007). Embodying emotion. Science, 316, 1002–1005.
Niedenthal, P. M., & Halberstadt, J. B. (1995). The acquisition and structure of emotional response categories. The Psychology of Learning and Motivation, 33, 23–63.
Nowicki, S., & Duke, M. (1994). Individual differences in the nonverbal communication of affect. Journal of Nonverbal Behavior, 18, 9–36.
Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269.
Pell, M. D. (2002). Evaluation of nonverbal emotion in face and voice: Some preliminary findings on a new battery of tests. Brain and Cognition, 48, 499–504.
Pell, M. D. (2005). Nonverbal emotion priming: evidence from the ‘facial affect decision task’. Journal of Nonverbal Behavior, 29(1), 45–73.
Pell, M. D. (2006). Cerebral mechanisms for understanding emotional prosody in speech. Brain and Language, 96(2), 221–234.
Pell, M. D., Jaywant, A., Monetta, L., & Kotz, S. A. (in press). Emotional speech processing: disentangling the effects of prosody and semantic cues. Cognition & Emotion. doi:10.1080/02699931.2010.516915.
Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2009). Factors in the recognition of vocally expressed emotions: A comparison of four languages. Journal of Phonetics, 37, 417–435.
Rosenthal, R., Hall, J. A., DiMatteo, M. R., Rogers, P. L., & Archer, D. (1979). Sensitivity to nonverbal communication: The PONS test. Baltimore: John Hopkins University Press.
Russell, J., & Lemay, G. (2000). Emotion concepts. In M. Lewis & M. J. Haviland-Jones (Eds.), Handbook of emotion (2nd ed., pp. 491–503). New York: Guilford Press.
Schwartz, J.-L., Robert-Ribes, J., & Escudier, P. (1998). Ten years after Summerfield: A taxonomy of models for audio-visual fusion in speech perception. In R. Campbell (Ed.), Hearing by eye: The psychology of lipreading (pp. 3–51). Hove, UK: Lawrence Erlbaum Associcates.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–667.
The authors would like to thank Meg Webb and Catherine Knowles for help with the stimuli and data acquisition. This work was supported by a Postdoctoral fellowship from the German Academic Exchange Service (DAAD) awarded to the first author, and by a Discovery grant awarded to the second author by the Natural Sciences and Engineering Research Council of Canada.
Rights and permissions
About this article
Cite this article
Paulmann, S., Pell, M.D. Is there an advantage for recognizing multi-modal emotional stimuli?. Motiv Emot 35, 192–201 (2011). https://doi.org/10.1007/s11031-011-9206-0
- Emotional prosody
- Emotional semantics
- Emotional facial expressions