Emotion Modelling via Speech Content and Prosody: In Computer Games and Elsewhere

  • Björn SchullerEmail author
Part of the Socio-Affective Computing book series (SAC, volume 4)


The chapter describes a typical modern speech emotion recognition engine as can be used to enhance computer games’ or other technical systems’ emotional intelligence. Acquisition of human affect via the spoken content and its prosody and further acoustic features is highlighted. Features for both of these information streams are shortly discussed along chunking of the stream. Decision making with and without training data is presented, each. A particular focus is then laid on autonomous learning and adaptation methods as well as the required calculation of confidence measures. Practical aspects include the encoding of the information, distribution of the processing, and available toolkits. Benchmark performances are given by typical competitive challenges in the field.


Speech Signal Emotion Recognition Acoustic Feature Independent Component Analysis Automatic Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The author acknowledges the support of the European Union’s Horizon 2020 Framework Programme under grant agreement no. 645378 (ARIA-VALUSPA).


  1. 1.
    Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2007) EMMA: extensible MultiModal annotation markup languageGoogle Scholar
  2. 2.
    Banea C, Mihalcea R, Wiebe J (2011) Multilingual sentiment and subjectivity. In: Zitouni I, Bikel D (eds) Multilingual natural language processing. Prentice HallGoogle Scholar
  3. 3.
    Batliner A, Schuller B (2014) More than fifty years of speech processing – the rise of computational paralinguistics and ethical demands. In: Proceedings ETHICOMP 2014. CERNA, Paris, 11pGoogle Scholar
  4. 4.
    Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit – searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang Spec Issue Affect Speech Real-life Interact 25(1):4–28Google Scholar
  5. 5.
    Becker C, Nakasone A, Prendinger H, Ishizuka M, Wachsmuth I (2005) Physiologically interactive gaming with the 3D agent max. In: Proceedings international workshop on conversational informatics in conjunction with JSAI-05, Kitakyushu, pp 37–42Google Scholar
  6. 6.
    Brückner R, Schuller B (2014) Social signal classification using deep BLSTM recurrent neural networks. In: Proceedings 39th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2014, Florence. IEEE, pp 4856–4860Google Scholar
  7. 7.
    Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) Feeltrace: an instrument for recording perceived emotion in real time. In: Proceedings ISCA workshop on speech and emotion, Newcastle, pp 19–24Google Scholar
  8. 8.
    Davidov D, Tsur O, Rappoport A (2010) Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings CoNNL, Uppsala, pp 107–116Google Scholar
  9. 9.
    Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceedings ICSLP, Philadelphia, pp 1970–1973Google Scholar
  10. 10.
    Deng J, Schuller B (2012) Confidence measures in speech emotion recognition based on semi-supervised learning. In: Proceedings of INTERSPEECH. ISCA, PortlandGoogle Scholar
  11. 11.
    Deng J, Zhang Z, Schuller B (2014) Linked source and target domain subspace feature transfer learning – exemplified by speech emotion recognition. In: Proceedings 22nd international conference on pattern recognition (ICPR 2014). IAPR, Stockholm, pp 761–766CrossRefGoogle Scholar
  12. 12.
    Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (eds) (2013) Proceedings emotion recognition in the wild challenge and workshop. ACM, SydneyGoogle Scholar
  13. 13.
    Döring S, Goldie P, McGuinness S (2011) Principalism: a method for the ethics of emotion-oriented machines. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems: the HUMAINE handbook, cognitive technologies. Springer, Berlin/Heidelberg, pp 713–724CrossRefGoogle Scholar
  14. 14.
    Elfenbein HA, Mandal MK, Ambady N, Harizuka S, Kumar S (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull 128(2):236–242CrossRefGoogle Scholar
  15. 15.
    Eyben F, Weninger F, Groß F, Schuller B (2013) Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, MM 2013. ACM, Barcelona, pp 835–838Google Scholar
  16. 16.
    Eyben F, Wöllmer M, Schuller B (2012) A multi-task approach to continuous five-dimensional affect sensing in natural speech. ACM Trans Interact Intell Syst Spec Issue Affect Interact Nat Environ 2(1):29Google Scholar
  17. 17.
    Gao Y, Bianchi-Berthouze N, Meng H (2012) What does touch tell us about emotions in touchscreen-based gameplay? ACM Trans Comput-Hum Interact 19(4/31):1–30CrossRefGoogle Scholar
  18. 18.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11Google Scholar
  19. 19.
    Holmgard C, Yannakakis G, Karstoft KI, Andersen H (2013) Stress detection for PTSD via the StartleMart game. In: Proceedings of 2013 humaine association conference on affective computing and intelligent interaction (ACII). IEEE, Memphis, pp 523–528CrossRefGoogle Scholar
  20. 20.
    Hudlicka E (2009) Affective game engines: motivation and requirements. In: Proceedings of the 4th international conference on foundations of digital games. ACM, New York, 9pGoogle Scholar
  21. 21.
    Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning. Springer, Heidelberg/Chemnitz, pp 137–142CrossRefGoogle Scholar
  22. 22.
    Johnstone T (1996) Emotional speech elicited using computer games. In: Proceedings ICSLP, Philadelphia, 4pCrossRefGoogle Scholar
  23. 23.
    Johnstone T, van Reekum CM, Hird K, Kirsner K, Scherer KR (2005) Affective speech elicited with a computer game. Emotion 5:513–518CrossRefPubMedGoogle Scholar
  24. 24.
    Kim J, Bee N, Wagner J, André E (2004) Emote to win: affective interactions with a computer game agent. In: Lecture notes in informatics (LNI) – proceedings 01/2004, vol 50. Springer, pp 159–164Google Scholar
  25. 25.
    Kim Y, Lee H, Mower-Provost E (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings the 2nd CHiME workshop on machine listening in multisource environments held in conjunction with ICASSP 2013, Vancouver. IEEE, pp 86–90Google Scholar
  26. 26.
    Liscombe J, Hirschberg J, Venditti JJ (2005) Detecting certainness in spoken tutorial dialogues. In: Proceedings INTERSPEECH. ISCA, Lisbon, pp 1837–1840Google Scholar
  27. 27.
    Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In: Proceedings ASRU, Virgin Island. IEEE, pp 25–30Google Scholar
  28. 28.
    Mahdhaoui A, Chetouani M (2009) A new approach for motherese detection using a semi-supervised algorithm. In: Machine learning for signal processing XIX – Proceedings of the 2009 IEEE signal processing society workshop, MLSP 2009, Grenoble. IEEE, pp 1–6Google Scholar
  29. 29.
    Martyn C, Sutherland JJ (2005) Creating an emotionally reactive computer game responding to affective cues in speech. In: Proceedings HCI, Las Vegas, vol 2, pp 1–2Google Scholar
  30. 30.
    Metze F, Batliner A, Eyben F, Polzehl T, Schuller B, Steidl S (2010) Emotion recognition using imperfect speech recognition. In: Proceedings INTERSPEECH. ISCA, Makuhari, pp 478–481Google Scholar
  31. 31.
    Missen M, Boughanem M (2009) Using WordNet’s semantic relations for opinion detection in blogs. In: Advances in information retrieval. Lecture notes in computer science, vol 5478/2009. Springer, Berlin, pp 729–733Google Scholar
  32. 32.
    Mower E, Mataric MJ, Narayanan SS (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19:1057–1070CrossRefGoogle Scholar
  33. 33.
    Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP J Audio Speech Music Process 2009:1–23CrossRefGoogle Scholar
  34. 34.
    Park S, Sim H, Lee W (2014) Dynamic game difficulty control by using EEG-based emotion recognition. Int J Control Autom 7:267–272CrossRefGoogle Scholar
  35. 35.
    Ploog BO, Banerjee S, Brooks PJ (2009) Attention to prosody (intonation) and content in children with autism and in typical children using spoken sentences in a computer game. Res Autism Spectr Disord 3:743–758CrossRefGoogle Scholar
  36. 36.
    Polzehl T, Schmitt A, Metze F (2010) Approaching multi-lingual emotion recognition from speech – on language dependency of acoustic/prosodic features for anger detection. In: Proceedings speech prosody, Chicago. ISCAGoogle Scholar
  37. 37.
    Ringeval F, Eyben F, Kroupi E, Yuce A, Thiran JP, Ebrahimi T, Lalanne D, Schuller B (2015) Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognit Lett 66:10CrossRefGoogle Scholar
  38. 38.
    Rudra T, Kavakli M, Tien D (2007) Emotion detection from female speech in computer games. In: Proceedings of TENCON 2007 – 2007 IEEE region 10 conference, Taipei. IEEE, pp 712–716Google Scholar
  39. 39.
    Sauter DA, Eisner F, Ekman P, Scott SK (2010) Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proc Natl Acad Sci USA 107(6):2408–2412CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256CrossRefGoogle Scholar
  41. 41.
    Scherer S, Hofmann H, Lampmann M, Pfeil M, Rhinow S, Schwenker F, Palm G (2008) Emotion recognition from speech: stress experiment. In: Proceedings of the international conference on language resources and evaluation, LREC 2008, Marrakech. ELRA, 6pGoogle Scholar
  42. 42.
    Schröder M, Devillers L, Karpouzis K, Martin JC, Pelachaud C, Peter C, Pirker H, Schuller B, Tao J, Wilson I (2007) What should a generic emotion markup language be able to represent? In: Paiva A, Prada R, Picard RW (eds) Proceedings of ACII. Springer, Berlin/Heidelberg, pp 440–451Google Scholar
  43. 43.
    Schuller B (2012) The computational paralinguistics challenge. IEEE Signal Process Mag 29(4):97–101CrossRefGoogle Scholar
  44. 44.
    Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley, New YorkCrossRefGoogle Scholar
  45. 45.
    Schuller B, Devillers L (2010) Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. In: Proceedings INTERSPEECH, Makuhari. ISCA, pp 2794–2797Google Scholar
  46. 46.
    Schuller B, Dunwell I, Weninger F, Paletta L (2013) Serious gaming for behavior change – the state of play. IEEE Pervasive Comput Mag 12(3):48–55CrossRefGoogle Scholar
  47. 47.
    Schuller B, Knaup T (2011) Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito A, Esposito AM, Martone R, Müller V, Scarpetta G (eds) Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues: third COST 2102 international training school. Lecture notes on computer science (LNCS), vol 6456/2010, 1st edn. Springer, Heidelberg, pp 448–472Google Scholar
  48. 48.
    Schuller B, Marchi E, Baron-Cohen S, Lassalle A, O’Reilly H, Pigat D, Robinson P, Davies I, Baltrusaitis T, Mahmoud M, Golan O, Friedenson S, Tal S, Newman S, Meir N, Shillo R, Camurri A, Piana S, Staglianò A, Bölte S, Lundqvist D, Berggren S, Baranger A, Sullings N, Sezgin M, Alyuz N, Rynkiewicz A, Ptaszek K, Ligmann K (2015) Recent developments and results of ASC-inclusion: an integrated internet-based environment for social inclusion of children with autism spectrum conditions. In: Proceedings of the of the 3rd international workshop on intelligent digital games for empowerment and inclusion (IDGEI 2015) as part of the 20th ACM international conference on intelligent user interfaces, IUI 2015, Atlanta. ACM, 9pGoogle Scholar
  49. 49.
    Schuller B, Rigoll G (2006) Timing levels in segment-based speech emotion recognition. In: Proceedings of INTERSPEECH, Pittsburgh. ISCA, pp 1818–1821Google Scholar
  50. 50.
    Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH, Brighton. ISCA, pp 312–315Google Scholar
  51. 51.
    Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, Mortillaro M, Salamin H, Polychroniou A, Valente F, Kim S (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH, Lyon. ISCA, pp 148–152Google Scholar
  52. 52.
    Schuller B, Zhang Z, Weninger F, Burkhardt F (2012) Synthesized speech for model training in cross-corpus recognition of human emotion. Int J Speech Technol Spec Issue New Improv Adv Speak Recognit Technol 15(3):313–323Google Scholar
  53. 53.
    Shahid S, Krahmer E, Swerts M (2007) Audiovisual emotional speech of game playing children: effects of age and culture. In: Proceedings of INTERSPEECH, Antwerp, pp 2681–2684Google Scholar
  54. 54.
    Shaver PR, Wu S, Schwartz JC (1992) Cross-cultural similarities and differences in emotion and its representation: a prototype approach. Emotion 175–212Google Scholar
  55. 55.
    Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labeling English prosody. In: Proceedings of ICSLP, Banff, pp 867–870Google Scholar
  56. 56.
    Sneddon I, Goldie P, Petta P (2011) Ethics in emotion-oriented systems: the challenges for an ethics committee. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems: the HUMAINE handbook, cognitive technologies. Springer, Berlin/Heidelberg, pp 753–768CrossRefGoogle Scholar
  57. 57.
    Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings 36th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2011, Prague. IEEE, pp 5688–5691CrossRefGoogle Scholar
  58. 58.
    Vogt T, André E, Bee N (2008) Emovoice – a framework for online recognition of emotions from voice. In: Proceedings IEEE PIT, Kloster Irsee. Lecture notes in computer science, vol 5078. Springer, pp 188–199Google Scholar
  59. 59.
    Weninger F, Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognition of non-prototypical emotions in reverberated and noisy speech by non-negative matrix factorization. EURASIP J Adv Signal Process Spec Issue Emot Ment State Recognit Speech 2011:Article ID 838790Google Scholar
  60. 60.
    Weninger FJ, Watanabe S, Tachioka Y, Schuller B (2014) Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In: Proceedings 39th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2014, Florence. IEEE, pp 4656–4660Google Scholar
  61. 61.
    Wöllmer M, Schuller B, Eyben F, Rigoll G (2010) Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J Select Top Signal Process Spec Issue Speech Process Nat Interact Intell Environ 4(5):867–881CrossRefGoogle Scholar
  62. 62.
    Yildirim S, Lee C, Lee S, Potamianos A, Narayanan S (2005) Detecting politeness and Frustration state of a child in a conversational computer game. In: Proceedings of INTERSPEECH, Lisbon. ISCA, pp 2209–2212Google Scholar
  63. 63.
    Yildirim S, Narayanan S, Potamianos A (2011) Detecting emotional state of a child in a conversational computer game. Comput Speech Lang 25:29–44CrossRefGoogle Scholar
  64. 64.
    Zhang Z, Coutinho E, Deng J, Schuller B (2015) Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans Audio Speech Lang Process 23(1):115–126Google Scholar
  65. 65.
    Zhang Z, Coutinho E, Deng J, Schuller B (2015) Distributing recognition in computational paralinguistics. IEEE Trans Affect ComputGoogle Scholar
  66. 66.
    Zhang Z, Deng J, Marchi E, Schuller B (2013) Active learning by label uncertainty for acoustic emotion recognition. In: Proceedings of INTERSPEECH, Lyon. ISCA, pp 2841–2845Google Scholar
  67. 67.
    Zhang Z, Deng J, Schuller B (2013) Co-training succeeds in computational paralinguistics. In: Proceedings 38th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2013, Vancouver. IEEE, pp 8505–8509CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Imperial College LondonLondonUK

Personalised recommendations