Abstract
This experiment examines how emotion is perceived by using facial and vocal cues of a speaker. Three levels of facial affect were presented using a computer-generated face. Three levels of vocal affect were obtained by recording the voice of a male amateur actor who spoke a semantically neutral word in different simulated emotional states. These two independent variables were presented to subjects in all possible permutations—visual cues alone, vocal cues alone, and visual and vocal cues together—which gave a total set of 15 stimuli. The subjects were asked to judge the emotion of the stimuli in a two-alternative forced choice task (either HAPPY or ANGRY). The results indicate that subjects evaluate and integrate information from both modalities to perceive emotion. The influence of one modality was greater to the extent that the other was ambiguous (neutral). The fuzzy logical model of perception (FLMP) fit the judgments significantly better than an additive model, which weakens theories based on an additive combination of modalities, categorical perception, and influence from only a single modality.
Article PDF
References
Archer, D. (Producer) (1991).A world of gestures: Culture and nonverbal communication [Videorecording]. (Available from University of California Extension Media Center, Berkeley)
Cahn, J. E. (1990). The generation of affect in synthesized speech.Journal of the American Voice I/O Society,8, 1–19.
Carlson, R., Granström, B., &Nord, L. (1993). Synthesis experiments with mixed feelings—A progress report. InFonetik-93: Papers from the Seventh Swedish Phonetics Conference (Uppsala University Linguistics No. 23, pp. 65–68). Uppsala, Sweden.
Chandler, J. P. (1969). Subroutine STEPIT— Finds local minima of a smooth function of several parameters.Behavioral Science,14, 81–82.
Cohen, M. M., &Massaro, D. W. (1993). Modeling coarticulation in synthetic visual speech. In N. M. Thalmann & D. Thalmann (Eds.),Models and techniques in computer animation (pp. 139–156). Tokyo: Springer-Verlag.
Cohen, M. M., &Massaro, D. W. (1994). Development and experimentation with synthetic visible speech.Behavior Research Methods, Instruments, & Computers,26, 260–265.
Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 364–381.
Darwin, C. (1872).The expressions of emotion in man and animals. London: John Murray.
Ekman, P. (1984). Expression and the nature of emotion. In K. Scherer & P. Ekman (Eds.),Approaches to emotion (pp. 319–343). Hillsdale, NJ: Erlbaum.
Ekman, P. (1993). Facial expression and emotion.American Psychologist,48, 384–392.
Ellison, J. W., &Massaro, D. W. (1995).Featural evaluation, integration, and judgment of facial affect. Unpublished manuscript.
Etcoff, N. L., &Magee, J. J. (1992). Categorical perception of facial expressions.Cognition,44, 227–240.
Fridlund, A. J. (1991). Evolution and facial action in reflex, social motive, and paralanguage.Biological Psychology,32, 3–100.
Fridlund, A. J. (1994).Human facial expression: An evolutionary view. San Diego, CA: Academic Press.
Huber, L., &Lenz, R. (1993). A test of the linear feature model of polymorphous concept discrimination with pigeons.Quarterly Journal of Experimental Psychology,46B, 1–18.
Johnson, W. F., Emde, R. N., Scherer, K. R., &Klinnert, M. D. (1986). Recognition of emotion from vocal cues.Archives of General Psychiatry,43, 280–283.
Kramer, E. (1963). Judgment of personal characteristics and emotions from nonverbal properties of speech.Psychological Bulletin,60, 408–420.
Massaro, D. W. (1987a). Categorical partition: A fuzzy logical model of categorization behavior. In S. Harnad (Ed.),Categorical perception: The groundwork of cognition (pp. 254–283). New York: Cambridge University Press.
Massaro, D. W. (1987b).Speech perception by eye and by ear: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
Massaro, D. W. (1988). Ambiguity in perception and experimentation.Journal of Experimental Psychology: General,117, 417–421.
Massaro, D. W. (1989).Experimental psychology: An information processing approach. San Diego, CA: Harcourt Brace Jovanovich.
Massaro, D. W., &Cohen, M. M. (1990). Perception of synthesized audible and visible speech.Psychological Science,1, 55–63.
Massaro, D. W., &Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.
Massaro, D. W., &Cohen, M. M. (1994). Visual, orthographic, phonological, and lexical influences in reading.Journal of Experimental Psychology: Human Perception & Performance,20, 1107–1128.
Massaro, D. W., &Ferguson, E. L. (1993). Cognition style and perception: The relationship between category width and speech perception, categorization, and discrimination.American Journal of Psychology,106, 25–49.
Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.
Meltzoff, A. N., &Moore, M. K. (1977). Imitation of facial and manual gestures in human neonates.Science,198, 75–78.
Murray, I. R., &Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion.Journal of the Acoustical Society of America,93, 1097–1108.
Noller, P. (1985). Video primacy—a further look.Journal of Nonverbal Behavior,9, 28–47.
Ohala, J. J. (1981). The nonlinguistic components of speech. In J. Darby (Ed.),Speech evaluation in psychiatry (pp. 39–49). New York: Grune & Stratton.
Ohala, J. J. (1984). An ethological perspective on common crosslanguage utilization of F0 of voice.Phonetica,41, 1–16.
Pollack, I., Rubenstein, H., &Horowitz, A. (1960). Communication of verbal modes of expression.Language & Speech,3, 121–130.
Scherer, K. R., Banse, R., Wallbott, H. G., &Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding.Motivation & Emotion,15, 123–148.
Summerfield, A. Q. (1979). Use of visual information in phonetic perception.Phonetica,36, 314–331.
Tanaka, J. W., &Farah, M. J. (1993). Parts and wholes in face recognition.Quarterly Journal of Experimental Psychology,46A, 225–245.
Tartter, V. C., &Braun, D. (1994). Hearing smiles and frowns in normal and whisper registers.Journal of the Acoustical Society of America,96, 2101–2107.
Williams, C. E., &Stevens, K. N. (1972). Emotions and speech: Some acoustic correlates.Journal of the Acoustical Society of America,52, 1238–1250.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported, in part, by grants from the the National Institute on Deafness and Other Communication Disorders, the National Institutes of Health (2 R01 DC00236-13A1), the National Science Foundation (BNS 8812728), and the University of California, Santa Cruz. The authors thank Michael M. Cohen for help at all stages of this research.
Rights and permissions
About this article
Cite this article
Massaro, D.W., Egan, P.B. Perceiving affect from the voice and the face. Psychonomic Bulletin & Review 3, 215–221 (1996). https://doi.org/10.3758/BF03212421
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03212421