Skip to main content
Log in

The Role of Perceived Voice and Speech Characteristics in Vocal Emotion Communication

  • Original Paper
  • Published:
Journal of Nonverbal Behavior Aims and scope Submit manuscript

Abstract

Aiming at a more comprehensive assessment of nonverbal vocal emotion communication, this article presents the development and validation of a new rating instrument for the assessment of perceived voice and speech features. In two studies, using two different sets of emotion portrayals by German and French actors, ratings of perceived voice and speech characteristics (loudness, pitch, intonation, sharpness, articulation, roughness, instability, and speech rate) were obtained from non-expert (untrained) listeners. In addition, standard acoustic parameters were extracted from the voice samples. Overall, highly similar patterns of results were found in both studies. Rater agreement (reliability) reached highly satisfactory levels for most features. Multiple discriminant analysis results reveal that both perceived vocal features and acoustic parameters allow a high degree of differentiation of the actor-portrayed emotions. Positive emotions can be classified with a higher hit rate on the basis of perceived vocal features, confirming suggestions in the literature that it is difficult to find acoustic valence indicators. The results show that the suggested scales (Geneva Voice Perception Scales) can be reliably measured and make a substantial contribution to a more comprehensive assessment of the process of emotion inferences from vocal expression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. Mean values for the case of Western encoders and decoders. The recognition accuracy for expressions in static photos of facial expressions reaches 77.8 %, probably due to highly prototypical facial muscle configurations (often explicitly specified to the actors). However, these static facial stimuli cannot be reasonably compared to necessarily dynamic vocal stimuli which is why we report only the mean value for the few studies that investigated dynamic video stimuli of facial expression.

  2. The development of the French voice rating scales was based on earlier collaborative work for an English scale with Lou Boves and Renée van Bezooijen.

  3. The scripts used for the acoustic analyses can be downloaded at the following address: http://www.affective-sciences.org/gemep/perceived_voice. Further supplementary materials (audio examples and details of statistical results) are also available at the same address.

  4. This parameter (F0 floor, see Tolkmitt and Scherer 1986) was added because of the assumption that although pitch and intensity are highly correlated in both Study 1 and Study 2, they may still constitute relevant and partly independent aspects of vocal communication of emotion. We included a measure of average acoustic intensity in Study 1 using the same rationale.

  5. Those recordings are all produced by one speaker (a research collaborator and expert in speech analysis). They are not used as anchors for the ratings, but only to illustrate the specific contrast represented in every scale (i.e., they help to define the scales for the raters). It was clear to the raters that—for example—the pitch level of the emotional expressions was not to be compared directly with the pitch range given by the examples. Those recordings can be accessed along supplementary materials on the website indicated in footnote 3.

  6. Additional ratings were collected (concerning the emotions perceived in the recordings) from the same raters in Study 2. Those additional ratings are not described in the current paper and were collected in a second step; i.e., a new set of instructions was presented to the raters after the procedure described in the current article and the recordings were replayed in a new random order for the collection of additional ratings.

References

  • Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.

    Article  PubMed  Google Scholar 

  • Bänziger, T., & Scherer, K. R. (2003). A study of perceived vocal features in emotional speech. In Voice quality: Functions, analysis and synthesis (VOQUAL’03), ISCA tutorial and research workshop, Geneva, Switzerland, pp 169–172.

  • Bänziger, T. (2004). Communication vocale des émotions: Perception de l’expression vocale et attributions émotionnelles. Unpublished doctoral thesis, University of Geneva.

  • Bänziger, T., Mortillaro, M., & Scherer, K. R. (2012). Introducing the Geneva Multimodal Expression corpus for experimental research on emotion perception. Emotion, 12(5), 1161–1179.

    Article  PubMed  Google Scholar 

  • Bänziger, T., & Scherer, K. R. (2010). Introducing the Geneva Multimodal Emotion Portrayal (GEMEP) corpus. In K. R. Scherer, T. Bänziger, & E. B. Roesch (Eds.), Blueprint for affective computing: A sourcebook (pp. 271–294). Oxford, England: Oxford University Press.

    Google Scholar 

  • Biemans, M. A. J. (2000). Gender variation in voice quality. Unpublished doctoral thesis, University of Nijmegen.

  • Boersma, P., & Weenink, D. (2012). Praat: Doing phonetics by computer [Computer program]. Retrieved from http://www.praat.org/.

  • Davitz, J. R. (1964). The communication of emotional meaning. Oxford, England: McGraw Hill.

    Google Scholar 

  • Granqvist, S. (1996). Enhancements to the visual analogue scale. Speech, Music and Hearing: Quarterly Progress and Status Report, 4, 61–65.

    Google Scholar 

  • Hall, J. A., & Knapp, M. L. (2013). Nonverbal communication. Boston: de Gruyter Mouton.

    Book  Google Scholar 

  • Henrich, N., Bezard, P., Expert, R., Garnier, M., Guerin, C., Pillot, C., et al. (2008). Towards a common terminology to describe voice quality in western lyrical singing: Contribution of a multidisciplinary research group. Journal of Interdisciplinary Music Studies, 2(1&2), 71–93.

    Google Scholar 

  • Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129(5), 770–814.

    Article  PubMed  Google Scholar 

  • Juslin, P. N., & Scherer, K. R. (2005). Vocal expression of affect. In J. A. Harrigan, R. Rosenthal, & K. Scherer (Eds.), The new handbook of methods in nonverbal behavior research (pp. 65–135). Oxford, UK: Oxford University Press.

    Google Scholar 

  • Kreiman, J., & Gerratt, B. R. (1998). Validity of rating scale measures of voice quality. Journal of the Acoustical Society of America, 104(3), 1598–1608.

    Article  PubMed  Google Scholar 

  • Laver, J. (1980). The phonetic description of voice quality. Cambridge, England: Cambridge University Press.

    Google Scholar 

  • Patel, S., & Scherer, K. R. (2013). Vocal behaviour. In J. A. Hall & M. L. Knapp (Eds.), Handbook of nonverbal communication (pp. 167–204). Berlin: Mouton-DeGruyter.

    Google Scholar 

  • Robson, J., & Beck; J. M. (1999). Hearing smiles—Perceptual, acoustic and production aspects of labial spreading. In Proceedings of the XIVth international congress of phonetic sciences, pp. 219–222.

  • Rosenthal, R. (1987). Judgment studies. New York: Cambridge University Press.

    Book  Google Scholar 

  • Sangsue, J., Siegwart, H., Cosnier, J., Cornu, J., & Scherer, K. R. (1997). Développement d’un questionaire d’évaluation subjective de la qualité de la voix et de la parole, QEV. Geneva Studies in Emotion and Communication, 11(1). Retrieved from http://www.affective-sciences.org/system/files/1997_Sangsue_Genstudies_VoiceQuality.pdf.

  • Scherer, K. R. (1978). Personality inference from voice quality: The loud voice of extroversion. European Journal of Social Psychology, 8, 467–487.

    Article  Google Scholar 

  • Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99(2), 143–165.

    PubMed  Google Scholar 

  • Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256.

    Article  Google Scholar 

  • Scherer, K. R. (2013). Emotion in action, interaction, music, and speech. In M. A. Arbib (Ed.), Language, music, and the brain: A mysterious relationship (pp. 107–139). Cambridge, MA: MIT Press.

    Google Scholar 

  • Scherer, K. R., Clark-Polner, E., & Mortillaro, M. (2011). In the eye of the beholder? Universality and cultural specificity in the expression and perception of emotion. International Journal of Psychology, 46(6), 401–435.

    Article  PubMed  Google Scholar 

  • Scherer, K. R., & Ellgring, H. (2007a). Are facial expressions of emotion produced by categorical affect programs or dynamically driven by appraisal? Emotion, 7(1), 113–130.

    PubMed  Google Scholar 

  • Scherer, K. R., & Ellgring, H. (2007b). Multimodal expression of emotion: Affect programs or componential appraisal patterns? Emotion, 7(1), 158–171.

    Article  PubMed  Google Scholar 

  • Sundberg, J., Patel, S., Björkner, E., & Scherer, K. R. (2011). Interdependencies among voice source parameters in emotional speech. IEEE Transactions on Affective Computing, 2(3), 162–174.

    Article  Google Scholar 

  • Tartter, V. C., & Braun, D. (1994). Hearings smiles and frowns in normal and whisper registers. Journal of the Acoustical Society of America, 96(4), 2101–2107.

    Article  PubMed  Google Scholar 

  • Tolkmitt, F., & Scherer, K. R. (1986). Effects of experimentally induced stress on vocal parameters. Journal of Experimental Psychology: Human Perception and Performance, 12, 302–313.

    PubMed  Google Scholar 

  • van Bezooijen, R. A. (1984). Characteristics and recognizability of vocal expressions of emotion. Doctoral dissertation. Dordrecht, The Netherlands: Foris Publications.

  • van Bezooijen, R. (1986). Lay ratings of long-term voice-and-speech chatacteristics. In F. Beukema & A. Hulk (Eds.), Linguistics in the Netherlands 1986 (pp. 1–7). Dordrecht, The Netherlands: Foris Publications.

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Olivier Rosset for writing the scripts for data collection (Study 2), and Tamara Ott for subject recruitment and testing (Study 2). Study 2 was supported by a Swiss National Science Foundation Grant (100014-122491) to K. R. Scherer.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klaus R. Scherer.

Additional information

T. Bänziger conducted Study 1 as part of an unpublished doctoral dissertation (supervised by K. R. Scherer).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bänziger, T., Patel, S. & Scherer, K.R. The Role of Perceived Voice and Speech Characteristics in Vocal Emotion Communication. J Nonverbal Behav 38, 31–52 (2014). https://doi.org/10.1007/s10919-013-0165-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10919-013-0165-x

Keywords

Navigation