The Automatic Recognition of Emotions in Speech

Batliner, Anton; Schuller, Björn; Seppi, Dino; Steidl, Stefan; Devillers, Laurence; Vidrascu, Laurence; Vogt, Thurid; Aharonson, Vered; Amir, Noam

doi:10.1007/978-3-642-15184-2_6

Anton Batliner⁴,
Björn Schuller⁵,
Dino Seppi⁶,
Stefan Steidl⁴,
Laurence Devillers⁷,
Laurence Vidrascu⁸,
Thurid Vogt⁹,
Vered Aharonson¹⁰ &
…
Noam Amir¹¹

Part of the book series: Cognitive Technologies ((COGTECH))

3013 Accesses
27 Citations

Abstract

In this chapter, we focus on the automatic recognition of emotional states using acoustic and linguistic parameters as features and classifiers as tools to predict the ‘correct’ emotional states. We first sketch history and state of the art in this field; then we describe the process of ‘corpus engineering’, i.e. the design and the recording of databases, the annotation of emotional states, and further processing such as manual or automatic segmentation. Next, we present an overview of acoustic and linguistic features that are extracted automatically or manually. In the section on classifiers, we deal with topics such as the curse of dimensionality and the sparse data problem, classifiers, and evaluation. At the end of each section, we point out important aspects that should be taken into account for the planning or the assessment of studies. The subject area of this chapter is not emotions in some narrow sense but in a wider sense encompassing emotion-related states such as moods, attitudes, or interpersonal stances as well. We do not aim at an in-depth treatise of some specific aspects or algorithms but at an overview of approaches and strategies that have been used or should be used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ai H, Litman D, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 797–800
Google Scholar
Ang J, Dhillon R, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the Interspeech, Denver, September 16–20, pp 2037–2040
Google Scholar
Arunachalam S, Gould D, Anderson E, Byrd D, Narayanan S (2001) Politeness and frustration language in child-machine interactions. In: Proceedings of the Eurospeech, Aalborg, September 3–7, pp 2675–2678
Google Scholar
Athanaselis T, Bakamidis S, Dologlu I, Cowie R, Douglas-Cowie E, Cox C (2005) ASR for emotional speech: clarifying the issues and enhancing performance. Neural Netw. 18:437–444
Article Google Scholar
Ayadi MMHE, Kamel MS, Karray F (2007) Speech emotion recognition using gaussian mixture vector autoregressive models. In: Proceedings of ICASSP, Honolulu, April 15–20, pp 957–960
Google Scholar
Batliner A, Kompe R, Kießling A, Mast M, Niemann H, Nöth E (1998) M = Syntax + Prosody: a syntactic–prosodic labelling scheme for large spontaneous speech databases. Speech Communi 25(4):193–222
Article Google Scholar
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2000a) Desperately Seeking Emotions: Actors, Wizards, and Human Beings. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 195–200
Google Scholar
Batliner A, Huber R, Niemann H, Nöth E, Spilker J, Fischer K (2000b) The recognition of emotion. In: Wahlster W. (ed) Verbmobil: Foundations of speech-to-speech translations. Springer, Berlin, pp 122–130.
Google Scholar
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (2001) Boiling down prosody for the classification of boundaries and accents in German and English. In: Proceedings of the Eurospeech, Aalborg, September 3–7, pp 2781–2784
Google Scholar
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003a) How to find trouble in communication. Speech Commun, 40:117–143
Article MATH Google Scholar
Batliner A, Zeissler V, Frank C, Adelhardt J, Shi RP, Nöth E (2003b) We are not amused - but how do you know? User states in a multi-modal dialogue system. In: Proceedings of the Interspeech, Geneva, September 1–4, pp 733–736
Google Scholar
Batliner A, Hacker C, Steidl S, Nöth E, Haas J (2004) From emotion to interaction: lessons from real human-machine-dialogues. In: Affective dialogue systems, proceedings of a tutorial and research workshop, Kloster Irsee, June 14–16, pp 1–12
Google Scholar
Batliner A, Steidl S, Hacker C, Nöth E, Niemann H (2005) Tales of tuning – prototyping for automatic classification of emotional user states. In: Proceedings of the Interspeech, Lisbon, September 4–8, pp 489–492
Google Scholar
Batliner A, Burkhardt F, van Ballegooy M, Nöth E (2006a) A taxonomy of applications that utilize emotional awareness. In: Proceedings of IS-LTC 2006, Ljubliana, October 9–10, pp 246–250
Google Scholar
Batliner A, Steidl S, Schuller B, Seppi D, Laskowski K, Vogt T, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2006b) Combining efforts for improving automatic classification of emotional user states. In: Proceedings of IS-LTC 2006, Ljubliana, October 9–10, pp 240–245
Google Scholar
Batliner A, Steidl S, Nöth E (2007a) Laryngealizations and Emotions: How Many Babushkas? In: Proceedings of the international workshop on paralinguistic speech – between models and data (ParaLing’07), Saarbrücken, August 3, pp 17–22
Google Scholar
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2007b) The impact of F0 extraction errors on the classification of prominence and emotion. In: Proceedings of the ICPhS, Saarbrücken, August 6–10, pp 2201–2204
Google Scholar
Batliner A, Schuller B, Schaeffler S, Steidl S (2008a) Mothers, adults, children, pets — towards the acoustics of intimacy. In: Proceedings of the ICASSP 2008, Las Vegas, NV, March 30–April 04, pp 4497–4500
Google Scholar
Batliner A, Steidl S, Hacker C, Nöth E (2008b) Private emotions vs. social interaction — a data-driven approach towards analysing emotions in speech. User Model User-Adap Interact 18:175–206
Article Google Scholar
Bellman R (1961) Adaptive control processes. Princeton University Press, Princeton, NJ
MATH Google Scholar
Bogert B, Healy M, Tukey J (1963) The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In: Rosenblatt M. (ed) Symposium on time series analysis. Wiley, New York, NY, pp 209–243
Google Scholar
Breese J, Ball G (1998) Modeling emotional state and personality for conversational agents. Technical Report MS-TR-98-41, Microsoft
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, Pacific Grove, CA
MATH Google Scholar
Burger S, Weilhammer K, Schiel F, Tillman HG (2000) Verbmobil data collection and annotation. In: Wahlster W. (ed) Verbmobil: foundations of speech-to-speech translations. Springer, Berlin, pp 537–549
Google Scholar
Burkhardt F, van Ballegooy M, Engelbrecht K-P, Polzehl T, Stegmann J (2009) Emotion detection in dialog systems: applications, strategies and challenges. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 684–689
Google Scholar
Campbell N, Kashioka H, Ohara R (2005) No laughing matter. In: Proceedings of the Interspeech, Lisbon, September 12–14, pp 465–468
Google Scholar
Chuang Z-J, Wu C-H (2004) Emotion recognition using acoustic features and textual content. In: Proceedings of ICME, Taipei, June 27–30, pp 53–56
Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Info Theoy 13:21–27
Article MATH Google Scholar
Cowie R, Douglas-Cowie E, Apolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed), Computational intelligence and applications. World Scientific Engineering Society Press, pp 109–114
Google Scholar
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) Feeltrace: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, September 5–7, pp 19–24
Google Scholar
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Netw 18:371–388
Article Google Scholar
Craggs R, Wood MM (2004) A categorical annotation scheme for emotion in the linguistic content of dialogue. In: Affective dialogue systems, proceedings of a tutorial and research workshop, Kloster Irsee, June 14–16, pp 89–100
Google Scholar
Daubechies I (1990) The wavelet transform, time–frequency localization and signal analysis. TransIT 36(5):961–1005
MATH MathSciNet Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 29:917–919
Google Scholar
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceedings of the ICSLP, Philadelphia, PA, October 3–6, pp 1970–1973
Google Scholar
Devillers L, Vasilescu I, Lamel L (2003) Emotion detection in task-oriented spoken dialogs. In: Proceedings of ICME 2003, IEEE, multimedia human-machine interface and interaction, Baltimore, MD, July 6–9, pp 549–552
Google Scholar
Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw, 18:407–422
Article Google Scholar
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin J-C, Devillers L, Abrilan S, Batliner A, Amir N, Karpousis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 488–500
Chapter Google Scholar
Elliott C (1992) The affective reasoner: a process model of emotions in a multi-agent system. Ph.D. thesis, Dissertation, Northwestern University
Google Scholar
Eyben F, Wöllmer M, Schuller B (2009) openear - introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 576–581
Google Scholar
Fairbanks G, Pronovost W (1939) An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monogr, 6:87–104
Article Google Scholar
Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159
Article MATH Google Scholar
Fiscus J (1997) A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the ASRU, Santa Barbara, CA, December 14–17, pp 347–352
Google Scholar
Fleiss J, Cohen J, Everitt B (1969) Large sample standard errors of kappa and weighted kappa. Psychol Bull 72(5):323–327
Article Google Scholar
Frick R (1985) Communicating emotion: the role of prosodic features. Psychol Bull 97:412–429
Article Google Scholar
Fujisaki H (1992) Modelling the process of fundamental frequency contour generation. In: Tohkura Y, Vatikiotis-Bateson E, Sagisasaka Y, (eds), Speech perception, production and linguistic structure. IOS Press, Amsterdam, pp 313–328
Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London
MATH Google Scholar
Goertzel B, Silverman K, Hartley C, Bugaj S, Ross M (2000) The baby webmind project. In: Proceedings of the annual conference of the society for the study of artificial intelligence and the simulation of behaviour (AISB), Birmingham, April 17–20
Google Scholar
Good I (1965) The estimation of probabilities: an essay on modern bayesian methods. MIT Press, Cambridge, MA
MATH Google Scholar
Grimm M, Kroschel K, Harris H, Nass C, Schuller B, Rigoll G, Moosmayr T (2007) On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 126–138
Chapter Google Scholar
Grimm M, Kroschel K, Narayanan S (2008) The vera am mittag german audio-visual emotional speech database. In: Proceedings of the IEEE international conference on multimedia and expo (ICME), Hannover, Germany, June 23–26, pp 865–868
Google Scholar
Hall MA (1998) Correlation-based feature selection for machine learning. Ph.D. thesis, Department of Computer Science, Waikato University, Hamilton, NZ
Google Scholar
Hermansky H (1990) Perceptual linear predictive (plp) analysis for speech. J Acoust Soc Am (JASA), 87:1738–1752
Article Google Scholar
Hermansky H, Sharma S (1998) Traps - classifiers of temporal patterns. In: Proceedings of the ICSLP, Sydney, November 30–December 04, pp 1003–1006
Google Scholar
Hess W, Batliner A, Kießling A, Kompe R, Nöth E, Petzold A, Reyelt M, Strom V (1996) Prosodic modules for speech recognition and understanding in verbmobil. In: Sagisaka Y, Campell N, Higuchi N, (eds), Computing prosody. Approaches to a computational analysis and modelling of the prosody of spontaneous speech. Springer, New York, NY, pp 363–383
Google Scholar
Hirst D, Cristo AD, Espesser R (2000) Levels of representation and levels of analysis for intonation. In: Horne M, (ed), Prosody : theory and experiment Kluwer, Dordrecht, pp 51–87
Google Scholar
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York, NY
Book Google Scholar
Jain A, Zongker D (1997) Feature selection: evaluation, application and small sample performance. PAMI 19(2):153–158
Google Scholar
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, (eds), Proceedings of ECML-98, 10th European conference on machine learning. Springer, Heidelberg, pp 137–142
Chapter Google Scholar
Johnstone T, Scherer KR (2000) Vocal communication of emotion. In: Lewis M, Haviland-Jones JM, (eds), Handbook of emotions, chapter 14. 2nd edn. Guilford Press, London
Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer, Berlin
MATH Google Scholar
Kießling A (1997) Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen
Google Scholar
Kwon O-W, Chan K, Hao J, Lee T-W (2003) Emotion recognition by speech signals. In: Proceedings of the Interspeech, Geneva, September 1–4, pp 125–128
Google Scholar
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on articial intelligence, San Jose, CA, pp 223–228
Google Scholar
Lee C, Narayanan S, Pieraccini R (2001) Recognition of negative emotions from the speech signal. In: Proceedings of the ASRU, Madonna di Campiglio, December 9–13, no pagination
Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Lee CM, Narayanan SS, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: Proceedings of the Interspeech, Denver, September 16–20, pp 873–376
Google Scholar
Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan SS (2004) Emotion recognition based on phoneme classes. In: Proceedings of the Interspeech, Jeju Island, Korea, October 4–8, pp 889–892
Google Scholar
Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In: Proceedings of the ASRU, Virgin Island, November 30–December 3, pp 25–30
Google Scholar
Liu H, Liebermann H, Selker T (2003) A model of textual affect sensing using real-world knowledge. In: Proceedings of the 7th International conference on intelligent user interfaces (IUI 2003), Miami, Florida, USA, January 12–15, pp 125–132
Google Scholar
Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11:22–31
Google Scholar
Lugger M, Yang B, Wokurek W (2006) Robust estimation of voice quality parameters under real world disturbances. In: Proceedings of the ICASSP, Toulouse, May 15–19, pp 1097–1100, 2006
Google Scholar
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Article Google Scholar
Martinez CA, Cruz A (2005) Emotion recognition in non-structured utterances for human-robot interaction. In: IEEE international workshop on robot and human interactive communication, August 13–15, pp 19–23, 2005
Google Scholar
Matos S, Birring S, Pavord I, Evans D (2006) Detection of cough signals in continuous audio recordings using hidden Markov models. IEEE Trans Biomed Eng pp 1078–108
Google Scholar
McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: A rough benchmark. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 207–212
Google Scholar
Meyer D, Leisch F, Hornik K (2002) Benchmarking support vector machines. Report series no. 78, SFB Adaptive informations systems and management in economics and management science, Wien, Austria, 19 pp
Google Scholar
Morrison D, Silva LCD (2007) Voting ensembles for spoken affect classification. J Netw Comput Appl 30:1356–1365
Article Google Scholar
Morrison D, Wang R, Xu W, Silva LCD (2007) Incremental learning for spoken affect classification and its application in call-centres. Int J Intell Syst Tech: Appl 2:242–254
Google Scholar
Mower E, Metallinou A, Lee C-C, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: Proceedings of the ACII, Amsterdam, pp 662–669
Google Scholar
Neiberg D, Elenius K, Laskowski K (2006) Emotion Recognition in Spontaneous Speech Using GMMs. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 809–812
Google Scholar
Nickerson RS (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods 5:241–301
Article Google Scholar
Noll AM (1967) Cepstrum pitch determination. J Acoust Soc Am (JASA), 14:293–309
Article Google Scholar
Nöth E, Batliner A, Warnke V, Haas J, Boros M, Buckow J, Huber R, Gallwitz F, Nutt M, Niemann H (2002) On the use of prosody in automatic dialogue understanding. Speech Commun, 36:(1–2), pp 45–62
Google Scholar
Nwe T, Foo S, Silva LD (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Article Google Scholar
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge
Google Scholar
Pal P, Iyer A, Yantorno R (2006) Emotion detection from infant facial expressions and cries. In: Proceedings of ICASSP, Toulouse, May 15–19, pp 809–812
Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP), Philadelphia, PA, July 6–7, pp 79–86
Google Scholar
Pernegger T.V (1998) What’s wrong with Bonferroni adjustment. Br Med J, 316:1236–1238
Google Scholar
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering (ANNIE ’99), St. Louis, MO, November 7–10, pp 7–10
Google Scholar
Polzin TS, Waibel A (2000) Emotion-sensitive human-computer interfaces. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 201–206
Google Scholar
Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137
Google Scholar
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125
Article Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, CA
Google Scholar
Rabiner LR (1977) On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process 25:24–33
Article Google Scholar
Rahurkar MA, Hansen JHL (2003) Towards affect recognition: an ICA approach. In: Proceedings of 4th international symposium on independent component analysis and blind signal separation (ICA2003), Nara, April 1–4, pp 1017–1022
Google Scholar
Rosenberg A, Binkowski E (2004) Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In: Dumais DMS, Roukos S, (eds), HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, MA, pp 77–80
Chapter Google Scholar
de Rosis F, Batliner A, Novielli N, Steidl S (2007) ‘You are Sooo Cool, Valentina!’ Recognizing social attitude in speech-based dialogues with an ECA. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction, Springer, Berlin, pp 179–190
Chapter Google Scholar
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation. In: Rumelhart D, McClelland L, the PDP Research Group, (eds), Parallel distributed processes: exploration in the microstructure of cognition, vol 1. MIT Press, Cambridge, MA, pp 318–362
Google Scholar
Russel JA (1997) How shall an emotion be called? In: Plutchik R, Conte HR (eds), Circumplex models of personality and emotions, chapter 9. American Psychological Association, Washington, DC, pp 205–220
Russell J, Bachorowski J, Fernandez-Dols J (2003) Facial and vocal expressions of emotion. Ann Rev Psychol 54:329–349
Google Scholar
Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov, 1(3), 317–328
Article Google Scholar
Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256
Article MATH Google Scholar
Scherer KR, Johnstone T, Klasmeyer G (2003) Vocal expression of emotion. In: Davidson RJ, Scherer KR, Goldsmith HH, (eds), Handbook of affective sciences, chapter 23. Oxford University Press, Oxford NY, pp 433–456
Google Scholar
Schiel F (1999) Automatic phonetic transcription of non-prompted speech. In: Proceedings of the ICPhS, San Francisco, CA, August 1–7, pp 607–610
Google Scholar
Schröder M, Devillers L, Karpouzis K, Martin J-C, Pelachaud C, Peter C, Pirker H, Schuller B, Tao J, Wilson I (2007) What should a generic emotion markup language be able to represent? In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 440–451
Chapter Google Scholar
Schuller B, Rigoll G (2009) Recognising interest in conversational speech – comparing bag of frames and supra-segmental features. In: Proceedings of the Interspeech, Brighton, UK, September 6–10, pp 1999–2002
Google Scholar
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the ICASSP, Hong Kong, April 6–10, pp 1–4
Google Scholar
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of the ICASSP, Montreal, QC, Canada, May 17–21, pp 577–580
Google Scholar
Schuller B, Müller R, Lang M, Rigoll G (2005) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensemble. In: Proceedings of the Interspeech, Lisbon, September 4–8, pp 805–808
Google Scholar
Schuller B, Stadermann J, Rigoll G (2006a) Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of speech prosody 2006, Dresden, May 2–5, no pagination
Google Scholar
Schuller B, Köhler N, Müller R, Rigoll G (2006b) Recognition of interest in human conversational speech. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 793–796
Google Scholar
Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2007a) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of the Interspeech, Antwerp, Belgium, August 27–31, pp 2253–2256
Google Scholar
Schuller B, Seppi D, Batliner A, Meier A, Steidl S (2007b) Towards more reality in the recognition of emotional speech. In: Proceedings of the ICASSP, Honolulu, April 15–20, pp 941–944
Google Scholar
Schuller B, Rigoll G, Can S, Feussner H (2008) Emotion sensitive speech control for human-robot interaction in minimal invasive surgery. In: Proceedings of the 17th International Symposium on robot and human interactive communication, RO-MAN 2008, Munich, Germany, August 1–3, pp 453–458
Google Scholar
Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009a) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput J, Special Issue on Vis Multimodal Anal Hum Spontaneous Behav 27:1760–1774
Google Scholar
Schuller B, Batliner A, Steidl S, Seppi D (2009b) Emotion recognition from speech: putting ASR in the loop. In: Proceedings of ICASSP, Taipei, Taiwan. IEEE, April 19–24, pp 4585–4588
Google Scholar
Schuller B, Steidl S, Batliner A (2009c) The INTERSPEECH 2009 emotion challenge. In: Proceedings of the Interspeech, Brighton, September 6–10, pp 312–315
Google Scholar
Scripture E (1921) A study of emotions by speech transcription. Vox 31:179–183
Google Scholar
Seppi D, Gerosa M, Schuller B, Batliner A, Steidl S (2008a) Detecting problems in spoken child-computer interaction. In: Proceedings of the 1st workshop on child, computer and interaction, Chania, Greece, October 23, no pagination
Google Scholar
Seppi D, Batliner A, Schuller B, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Aharonson V (2008b) Patterns, prototypes, performance: classifying emotional user states. In: Proceedings of the Interspeech, Brisbane, September 22–26, pp 601–604
Google Scholar
Shami M, Verhelst W (2007) Automatic classification of expressiveness in speech: a multi-corpus study. In: Müller C, (ed), Speaker classification II (Lecture notes in computer science / artificial intelligence) vol 4441. Springer, Heidelberg, pp 43–56
Google Scholar
Siegle G (1995) The balanced affective word list project. http://www.sci.sdsu.edu/CAL/wordlist/ (accessed October 17, 2010)
Skinner E (1935) A calibrated recording and analysis of the pitch, force, and quality of vocal tones expressing happiness and sadness. Speech Monogr 2:81–137
Article Google Scholar
Slaney M, McRoberts G (1998) Baby Ears: A Recognition System for Affective Vocalizations. In: Proceedings of the ICASSP, Seattle, WA, pp 985–988
Google Scholar
Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Berlin. PhD thesis, Logos Verlag
Google Scholar
Steidl S, Ruff C, Batliner A, Nöth E, Haas J (2004) Looking at the last two turns, I’d say this dialogue is doomed — measuring dialogue success. In: Sojka P, Kopeček I, Pala K, (eds), Text, speech and dialogue, 7th international conference, TSD 2004. Berlin, Heidelberg, pp 629–636
Google Scholar
Steidl S, Levit M, Batliner A, Nöth E, Niemann H (2005) “Of all things the measure is man”: automatic classification of emotions and inter-labeler consistency. In: Proceedings of ICASSP, Philadelphia, PA, May 12–15, pp 317–320
Google Scholar
Steidl S, Schuller B, Batliner A, Seppi D (2009) The hinterland of emotions: facing the open-microphone challenge. In: Proceedings of ACII, Amsterdam, September 10–12, pp 690–697
Google Scholar
Streit M, Batliner A, Portele T (2006) Emotions analysis and emotion-handling subdialogues. In: Wahlster W, (ed), SmartKom: foundations of multimodal dialogue systems. Springer, Berlin, pp 317–332
Chapter Google Scholar
ten Bosch L (2003) Emotions, speech and the ASR framework. Speech Commun 40(1–2):213–225
MATH Google Scholar
Truong K, van Leeuwen D (2005) Automatic detection of laughter. In: Proceedings of the interspeech, Lisbon, Portugal, September 4–8, pp 485–488
Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
MATH Google Scholar
Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collection. In: Proceedings of european signal processing Conference (EUSIPCO 2006), Florence, September 4–8, no pagination
Google Scholar
Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007a) Combining frame and turn-level information for robust recognition of emotions within speech. In: Proceedings of Interspeech, Antwerp, Belgium, August 27–31, pp 2249–2252
Google Scholar
Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007b) Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 139–147
Chapter Google Scholar
Vogt T, André E, Wagner J, Gilroy S, Charles F, Cavazza M (2009) Real-time vocal emotion recognition in artistic installations and interactive storytelling: experiences and lessons learnt from CALLAS and IRIS. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 670–677
Google Scholar
Wagner J, Vogt T, André (2007) A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 114–125
Chapter Google Scholar
Williams C, Stevens K (1972) Emotions and speech: some acoustic correlates. J Acoust Soc Am (JASA) 52:1238–1250
Article Google Scholar
Wilting J, Krahmer E, Swerts M (2006) Real vs. acted emotional speech. In: Proceedings of Interspeech, Pittsburgh, PA, September 17–21, pp 805–808
Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd Edn. Morgan Kaufmann, San Francisco, CA
MATH Google Scholar
Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech, Brisbane, September 22–26, pp 597–600
Google Scholar
Wolpert D (1992) Stacked generalization. Neural Netw 5:241–259
Article Google Scholar
Wu T, Khan F, Fisher T, Shuler L, Pottenger W (2005) Posting act tagging using transformation-based learning. In: Lin TY, Ohsuga S, Liau C-J, Hu X, Tsumoto S, (eds), Foundations of data mining and knowledge discovery. Springer, Berlin, pp 319–331
Google Scholar
You M, Chen C, Bu J, Liu J, Tao J (2006) Emotion recognition from noisy speech. In: Proceedings of ICME, Toronto, ON, July 9–12, pp 1653–1656
Google Scholar
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book. Cambridge University Engineering Department, for htk version 3.4 edition
Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A Survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar
Zhe X, Boucouvalas A (2002) Text-to-emotion engine for real time internet communication. In: Proceedings of the international symposium on communication systems, networks, and DSPs. Staffordshire University, Stoke-on-Trent, July 15–17, pp 164–168
Google Scholar
Zhou G, Hansen JHL, Kaiser J.F (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9:201–216
Article Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen, Erlangen, Germany
Anton Batliner & Stefan Steidl
Institute for Human-Machine Communication, Technische Universität München, Munich, Germany
Björn Schuller
Fondazione Bruno Kessler-Irst, Trento, Italy
Dino Seppi
LIMSI-CNRS, Orsay, France
Laurence Devillers
Iminent SA, Paris, France
Laurence Vidrascu
Multimedia Concepts and their Applications, University of Augsburg, Augsburg, Germany
Thurid Vogt
Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Vered Aharonson
Department of Communication Disorders, Tel Aviv University, Tel Aviv, Israel
Noam Amir

Authors

Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Dino Seppi
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Steidl
View author publications
You can also search for this author in PubMed Google Scholar
Laurence Devillers
View author publications
You can also search for this author in PubMed Google Scholar
Laurence Vidrascu
View author publications
You can also search for this author in PubMed Google Scholar
Thurid Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Vered Aharonson
View author publications
You can also search for this author in PubMed Google Scholar
Noam Amir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Batliner .

Editor information

Editors and Affiliations

, Department of Psychology, Queen's University Belfast, Belfast, BT7 1NN, United Kingdom
Roddy Cowie
TELECOM ParisTech, CNRS - LTCI, rue Dareau 37, Paris, 75014, France
Catherine Pelachaud
for Artificial Intelligence (OFAI), Austrian Research Institute, Freyung 6/6, Vienna, 1010, Austria
Paolo Petta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Batliner, A. et al. (2011). The Automatic Recognition of Emotions in Speech. In: Cowie, R., Pelachaud, C., Petta, P. (eds) Emotion-Oriented Systems. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15184-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-15184-2_6
Published: 30 October 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15183-5
Online ISBN: 978-3-642-15184-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics