Emotion in Speech, Singing, and Sound Effects

  • Duncan WilliamsEmail author
Part of the International Series on Computer Entertainment and Media Technology book series (ISCEMT)


This chapter explores the theoretical context of emotion studies in terms of speech and sound effects, and in particular the concept of affective potential. Voice actors in game soundtracking can have a particularly powerful impact on the emotional presentation of a narrative; and this affective control can go beyond that of the actor alone if combined with emotionally-targeted signal processing (for example, sound design and audio processing techniques). The prospect of synchronousing emotionally congruent sound effects remains a fertile area for further work, but an initial study which will be presented later in this chapter suggests that timbral features from speech and sound effects can exert an influence on the perceived emotional response of a listener in the context of dynamic soundtracking for video games. This chapter extends upon material originally presented at the Audio Engineering Society conference on video game soundtracking in London, UK, 2015 (Williams et al. 2015), and subsequently on the specific design of affect in vocal production at the Audio Engineering society convention in New York, 2015 (Williams 2015a). Prosodic (nonverbal) speech features have been the subject of a considerable amount of research (Gobl 2003; Pell 2006). The role of such features as a communicative tool in emotion studies suggests that acoustic manipulation of prosody could be a useful way to explore emotional communication (Frick 1985; Baum and Nowicki 1998). For example, in studies which use dimensional approaches to emotion, known acoustic correlations found in prosody include emotional arousal with pitch height, range, rate of speech, and loudness. Some emotional cues can be derived acoustically from prosody (Bach et al. 2008) by time series analysis in a manner which is analogous to the temporal characteristics used to determine such cues in musical sequences (Gobl 2003; Juslin and Laukka 2006; Kotlyar and Morozov 1976; Deng and Leung 2013, for example pitch height and range, loudness, and density are suggested to correlate strongly with affective arousal by some research).


  1. Bach, D.R., Grandjean, D., Sander, D., Herdener, M., Strik, W.K., Seifritz, E.: The effect of appraisal level on processing of emotional prosody in meaningless speech. NeuroImage. 42, 919–927 (2008)CrossRefGoogle Scholar
  2. Baum, K.M., Nowicki Jr., S.: Perception of emotion: measuring decoding accuracy of adult prosodic cues varying in intensity. J. Nonverbal Behav. 22, 89–107 (1998)CrossRefGoogle Scholar
  3. Brookes, T., Williams, D.: Perceptually-motivated audio morphing: Brightness. In: Audio Engineering Society Convention 122, Audio Engineering Society (2007)Google Scholar
  4. Caetano, M., Rodet, X.: Independent manipulation of high-level spectral envelope shape features for sound morphing by means of evolutionary computation. In: Proceedings of the 13th International Conference on Digital Audio Effects (DAFx), vol. 21 (2010)Google Scholar
  5. Cospito, G., de Tintis R.: Morph: Timbre hybridization tools based on frequency. In: Workshop on Digital Audio Effects (DAFx-98) (1998)Google Scholar
  6. Coutinho, E., Cangelosi, A.: Musical emotions: predicting second-by-second subjective feelings of emotion from low-level psychoacoustic features and physiological measurements. Emotion. 11, 921–937 (2011). CrossRefGoogle Scholar
  7. Daly, I., Williams, D., Hallowell, J., Hwang, F., Kirke, A., Malik, A., Weaver, J., Miranda, E., Nasuto, S.J.: Music-induced emotions can be predicted from a combination of brain activity and acoustic features. Brain Cogn. 101, 1 (2015)CrossRefGoogle Scholar
  8. Deng, J.J., Leung, C.H.C.: Music retrieval in joint emotion space using audio features and emotional tags. In: Li, S., Saddik, A., Wang, M., Mei, T., Sebe, N., Yan, S., Hong, R., Gurrin, C. (eds.) Advances in Multimedia Modeling Lecture Notes in Computer Science, vol. 7732, pp. 524–534. Springer, Berlin (2013)CrossRefGoogle Scholar
  9. Frick, R.W.: Communicating emotion: the role of prosodic features. Psychol. Bull. 97, 412 (1985)CrossRefGoogle Scholar
  10. Garrard, C., Williams, D.: Tools for fashioning voices: an interview with Trevor Wishart. Contemp. Music. Rev. 32, 511–525 (2013)CrossRefGoogle Scholar
  11. Gobl, C.: The role of voice quality in communicating emotion, mood and attitude. Speech Comm. 40, 189–212 (2003). CrossRefzbMATHGoogle Scholar
  12. Haken, L., Fitz, K., Christensen, P.: Beyond traditional sampling synthesis: real-time timbre morphing using additive synthesis. In: Analysis, Synthesis, and Perception of Musical Sounds, pp. 122–144. Springer, New York (2007)CrossRefGoogle Scholar
  13. Josephson, D.: A brief tutorial on proximity effect. In: Audio Engineering Society Convention 107. Audio Engineering Society (1999)Google Scholar
  14. Juslin, P.N., Laukka, P.: Emotional expression in speech and music. Ann. N. Y. Acad. Sci. 1000, 279–282 (2006). CrossRefGoogle Scholar
  15. Kotlyar, G.M., Morozov, V.P.: Acoustical correlates of the emotional content of vocalized speech. Sov. Phys. Acoust. 22, 208–211 (1976)Google Scholar
  16. Le Groux, S., Verschure P.F.M.J.: Emotional responses to the perceptual dimensions of timbre: a pilot study using physically informed sound synthesis. In: Proceedings of the 7th International Symposium on Computer Music Modeling and Retrieval, CMMR (2010)Google Scholar
  17. Mo, R., Bin, W., Horner, A.: The effects of reverberation on the emotional characteristics of musical instruments. J. Audio Eng. Soc. 63, 966–979 (2016)CrossRefGoogle Scholar
  18. Olivero, A., Depalle P., Torrésani B., Kronland-Martinet R.: Sound morphing strategies based on alterations of time-frequency representations by Gabor multipliers. In: Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio. Audio Engineering Society (2012)Google Scholar
  19. Pell, M.D.: Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang. 96, 221–234 (2006)CrossRefGoogle Scholar
  20. Sethares, W.A., Milne, A.J., Tiedje, S., Prechtl, A., Plamondon, J.: Spectral tools for dynamic tonality and audio morphing. Comput. Music. J. 33, 71–84 (2009)CrossRefGoogle Scholar
  21. Williams, D.: Affective potential in vocal production. In: Audio Engineering Society Convention 139. Audio Engineering Society (2015a)Google Scholar
  22. Williams, D.: Developing a timbrometer: perceptually-motivated audio signal metering. In: Audio Engineering Society Convention 139. Audio Engineering Society (2015b)Google Scholar
  23. Williams, D., Kirke, A., Eaton, J., Miranda, E., Daly, I., Hallowell, J., Roesch, E., Hwang, F., Nasuto, S.J.: Dynamic game soundtrack generation in response to a continuously varying emotional trajectory. In: Audio Engineering Society Conference: 56th International Conference: Audio for Games. Audio Engineering Society (2015)Google Scholar
  24. Wishart, T.: The composition of “Vox-5”. Comput. Music. J. 12, 21–27 (1988)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Digital Creativity LabsUniversity of YorkYorkUK

Personalised recommendations