Abstract
In this chapter, we focus on the automatic recognition of emotional states using acoustic and linguistic parameters as features and classifiers as tools to predict the ‘correct’ emotional states. We first sketch history and state of the art in this field; then we describe the process of ‘corpus engineering’, i.e. the design and the recording of databases, the annotation of emotional states, and further processing such as manual or automatic segmentation. Next, we present an overview of acoustic and linguistic features that are extracted automatically or manually. In the section on classifiers, we deal with topics such as the curse of dimensionality and the sparse data problem, classifiers, and evaluation. At the end of each section, we point out important aspects that should be taken into account for the planning or the assessment of studies. The subject area of this chapter is not emotions in some narrow sense but in a wider sense encompassing emotion-related states such as moods, attitudes, or interpersonal stances as well. We do not aim at an in-depth treatise of some specific aspects or algorithms but at an overview of approaches and strategies that have been used or should be used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ai H, Litman D, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 797–800
Ang J, Dhillon R, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the Interspeech, Denver, September 16–20, pp 2037–2040
Arunachalam S, Gould D, Anderson E, Byrd D, Narayanan S (2001) Politeness and frustration language in child-machine interactions. In: Proceedings of the Eurospeech, Aalborg, September 3–7, pp 2675–2678
Athanaselis T, Bakamidis S, Dologlu I, Cowie R, Douglas-Cowie E, Cox C (2005) ASR for emotional speech: clarifying the issues and enhancing performance. Neural Netw. 18:437–444
Ayadi MMHE, Kamel MS, Karray F (2007) Speech emotion recognition using gaussian mixture vector autoregressive models. In: Proceedings of ICASSP, Honolulu, April 15–20, pp 957–960
Batliner A, Kompe R, Kießling A, Mast M, Niemann H, Nöth E (1998) M = Syntax + Prosody: a syntactic–prosodic labelling scheme for large spontaneous speech databases. Speech Communi 25(4):193–222
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2000a) Desperately Seeking Emotions: Actors, Wizards, and Human Beings. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 195–200
Batliner A, Huber R, Niemann H, Nöth E, Spilker J, Fischer K (2000b) The recognition of emotion. In: Wahlster W. (ed) Verbmobil: Foundations of speech-to-speech translations. Springer, Berlin, pp 122–130.
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (2001) Boiling down prosody for the classification of boundaries and accents in German and English. In: Proceedings of the Eurospeech, Aalborg, September 3–7, pp 2781–2784
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003a) How to find trouble in communication. Speech Commun, 40:117–143
Batliner A, Zeissler V, Frank C, Adelhardt J, Shi RP, Nöth E (2003b) We are not amused - but how do you know? User states in a multi-modal dialogue system. In: Proceedings of the Interspeech, Geneva, September 1–4, pp 733–736
Batliner A, Hacker C, Steidl S, Nöth E, Haas J (2004) From emotion to interaction: lessons from real human-machine-dialogues. In: Affective dialogue systems, proceedings of a tutorial and research workshop, Kloster Irsee, June 14–16, pp 1–12
Batliner A, Steidl S, Hacker C, Nöth E, Niemann H (2005) Tales of tuning – prototyping for automatic classification of emotional user states. In: Proceedings of the Interspeech, Lisbon, September 4–8, pp 489–492
Batliner A, Burkhardt F, van Ballegooy M, Nöth E (2006a) A taxonomy of applications that utilize emotional awareness. In: Proceedings of IS-LTC 2006, Ljubliana, October 9–10, pp 246–250
Batliner A, Steidl S, Schuller B, Seppi D, Laskowski K, Vogt T, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2006b) Combining efforts for improving automatic classification of emotional user states. In: Proceedings of IS-LTC 2006, Ljubliana, October 9–10, pp 240–245
Batliner A, Steidl S, Nöth E (2007a) Laryngealizations and Emotions: How Many Babushkas? In: Proceedings of the international workshop on paralinguistic speech – between models and data (ParaLing’07), Saarbrücken, August 3, pp 17–22
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2007b) The impact of F0 extraction errors on the classification of prominence and emotion. In: Proceedings of the ICPhS, Saarbrücken, August 6–10, pp 2201–2204
Batliner A, Schuller B, Schaeffler S, Steidl S (2008a) Mothers, adults, children, pets — towards the acoustics of intimacy. In: Proceedings of the ICASSP 2008, Las Vegas, NV, March 30–April 04, pp 4497–4500
Batliner A, Steidl S, Hacker C, Nöth E (2008b) Private emotions vs. social interaction — a data-driven approach towards analysing emotions in speech. User Model User-Adap Interact 18:175–206
Bellman R (1961) Adaptive control processes. Princeton University Press, Princeton, NJ
Bogert B, Healy M, Tukey J (1963) The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In: Rosenblatt M. (ed) Symposium on time series analysis. Wiley, New York, NY, pp 209–243
Breese J, Ball G (1998) Modeling emotional state and personality for conversational agents. Technical Report MS-TR-98-41, Microsoft
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, Pacific Grove, CA
Burger S, Weilhammer K, Schiel F, Tillman HG (2000) Verbmobil data collection and annotation. In: Wahlster W. (ed) Verbmobil: foundations of speech-to-speech translations. Springer, Berlin, pp 537–549
Burkhardt F, van Ballegooy M, Engelbrecht K-P, Polzehl T, Stegmann J (2009) Emotion detection in dialog systems: applications, strategies and challenges. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 684–689
Campbell N, Kashioka H, Ohara R (2005) No laughing matter. In: Proceedings of the Interspeech, Lisbon, September 12–14, pp 465–468
Chuang Z-J, Wu C-H (2004) Emotion recognition using acoustic features and textual content. In: Proceedings of ICME, Taipei, June 27–30, pp 53–56
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Info Theoy 13:21–27
Cowie R, Douglas-Cowie E, Apolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed), Computational intelligence and applications. World Scientific Engineering Society Press, pp 109–114
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) Feeltrace: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, September 5–7, pp 19–24
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Netw 18:371–388
Craggs R, Wood MM (2004) A categorical annotation scheme for emotion in the linguistic content of dialogue. In: Affective dialogue systems, proceedings of a tutorial and research workshop, Kloster Irsee, June 14–16, pp 89–100
Daubechies I (1990) The wavelet transform, time–frequency localization and signal analysis. TransIT 36(5):961–1005
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 29:917–919
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceedings of the ICSLP, Philadelphia, PA, October 3–6, pp 1970–1973
Devillers L, Vasilescu I, Lamel L (2003) Emotion detection in task-oriented spoken dialogs. In: Proceedings of ICME 2003, IEEE, multimedia human-machine interface and interaction, Baltimore, MD, July 6–9, pp 549–552
Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw, 18:407–422
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin J-C, Devillers L, Abrilan S, Batliner A, Amir N, Karpousis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 488–500
Elliott C (1992) The affective reasoner: a process model of emotions in a multi-agent system. Ph.D. thesis, Dissertation, Northwestern University
Eyben F, Wöllmer M, Schuller B (2009) openear - introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 576–581
Fairbanks G, Pronovost W (1939) An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monogr, 6:87–104
Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159
Fiscus J (1997) A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the ASRU, Santa Barbara, CA, December 14–17, pp 347–352
Fleiss J, Cohen J, Everitt B (1969) Large sample standard errors of kappa and weighted kappa. Psychol Bull 72(5):323–327
Frick R (1985) Communicating emotion: the role of prosodic features. Psychol Bull 97:412–429
Fujisaki H (1992) Modelling the process of fundamental frequency contour generation. In: Tohkura Y, Vatikiotis-Bateson E, Sagisasaka Y, (eds), Speech perception, production and linguistic structure. IOS Press, Amsterdam, pp 313–328
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London
Goertzel B, Silverman K, Hartley C, Bugaj S, Ross M (2000) The baby webmind project. In: Proceedings of the annual conference of the society for the study of artificial intelligence and the simulation of behaviour (AISB), Birmingham, April 17–20
Good I (1965) The estimation of probabilities: an essay on modern bayesian methods. MIT Press, Cambridge, MA
Grimm M, Kroschel K, Harris H, Nass C, Schuller B, Rigoll G, Moosmayr T (2007) On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 126–138
Grimm M, Kroschel K, Narayanan S (2008) The vera am mittag german audio-visual emotional speech database. In: Proceedings of the IEEE international conference on multimedia and expo (ICME), Hannover, Germany, June 23–26, pp 865–868
Hall MA (1998) Correlation-based feature selection for machine learning. Ph.D. thesis, Department of Computer Science, Waikato University, Hamilton, NZ
Hermansky H (1990) Perceptual linear predictive (plp) analysis for speech. J Acoust Soc Am (JASA), 87:1738–1752
Hermansky H, Sharma S (1998) Traps - classifiers of temporal patterns. In: Proceedings of the ICSLP, Sydney, November 30–December 04, pp 1003–1006
Hess W, Batliner A, Kießling A, Kompe R, Nöth E, Petzold A, Reyelt M, Strom V (1996) Prosodic modules for speech recognition and understanding in verbmobil. In: Sagisaka Y, Campell N, Higuchi N, (eds), Computing prosody. Approaches to a computational analysis and modelling of the prosody of spontaneous speech. Springer, New York, NY, pp 363–383
Hirst D, Cristo AD, Espesser R (2000) Levels of representation and levels of analysis for intonation. In: Horne M, (ed), Prosody : theory and experiment Kluwer, Dordrecht, pp 51–87
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York, NY
Jain A, Zongker D (1997) Feature selection: evaluation, application and small sample performance. PAMI 19(2):153–158
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, (eds), Proceedings of ECML-98, 10th European conference on machine learning. Springer, Heidelberg, pp 137–142
Johnstone T, Scherer KR (2000) Vocal communication of emotion. In: Lewis M, Haviland-Jones JM, (eds), Handbook of emotions, chapter 14. 2nd edn. Guilford Press, London
Jolliffe IT (2002) Principal component analysis. Springer, Berlin
Kießling A (1997) Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen
Kwon O-W, Chan K, Hao J, Lee T-W (2003) Emotion recognition by speech signals. In: Proceedings of the Interspeech, Geneva, September 1–4, pp 125–128
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on articial intelligence, San Jose, CA, pp 223–228
Lee C, Narayanan S, Pieraccini R (2001) Recognition of negative emotions from the speech signal. In: Proceedings of the ASRU, Madonna di Campiglio, December 9–13, no pagination
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Lee CM, Narayanan SS, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: Proceedings of the Interspeech, Denver, September 16–20, pp 873–376
Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan SS (2004) Emotion recognition based on phoneme classes. In: Proceedings of the Interspeech, Jeju Island, Korea, October 4–8, pp 889–892
Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In: Proceedings of the ASRU, Virgin Island, November 30–December 3, pp 25–30
Liu H, Liebermann H, Selker T (2003) A model of textual affect sensing using real-world knowledge. In: Proceedings of the 7th International conference on intelligent user interfaces (IUI 2003), Miami, Florida, USA, January 12–15, pp 125–132
Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11:22–31
Lugger M, Yang B, Wokurek W (2006) Robust estimation of voice quality parameters under real world disturbances. In: Proceedings of the ICASSP, Toulouse, May 15–19, pp 1097–1100, 2006
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Martinez CA, Cruz A (2005) Emotion recognition in non-structured utterances for human-robot interaction. In: IEEE international workshop on robot and human interactive communication, August 13–15, pp 19–23, 2005
Matos S, Birring S, Pavord I, Evans D (2006) Detection of cough signals in continuous audio recordings using hidden Markov models. IEEE Trans Biomed Eng pp 1078–108
McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: A rough benchmark. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 207–212
Meyer D, Leisch F, Hornik K (2002) Benchmarking support vector machines. Report series no. 78, SFB Adaptive informations systems and management in economics and management science, Wien, Austria, 19 pp
Morrison D, Silva LCD (2007) Voting ensembles for spoken affect classification. J Netw Comput Appl 30:1356–1365
Morrison D, Wang R, Xu W, Silva LCD (2007) Incremental learning for spoken affect classification and its application in call-centres. Int J Intell Syst Tech: Appl 2:242–254
Mower E, Metallinou A, Lee C-C, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: Proceedings of the ACII, Amsterdam, pp 662–669
Neiberg D, Elenius K, Laskowski K (2006) Emotion Recognition in Spontaneous Speech Using GMMs. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 809–812
Nickerson RS (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods 5:241–301
Noll AM (1967) Cepstrum pitch determination. J Acoust Soc Am (JASA), 14:293–309
Nöth E, Batliner A, Warnke V, Haas J, Boros M, Buckow J, Huber R, Gallwitz F, Nutt M, Niemann H (2002) On the use of prosody in automatic dialogue understanding. Speech Commun, 36:(1–2), pp 45–62
Nwe T, Foo S, Silva LD (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge
Pal P, Iyer A, Yantorno R (2006) Emotion detection from infant facial expressions and cries. In: Proceedings of ICASSP, Toulouse, May 15–19, pp 809–812
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP), Philadelphia, PA, July 6–7, pp 79–86
Pernegger T.V (1998) What’s wrong with Bonferroni adjustment. Br Med J, 316:1236–1238
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering (ANNIE ’99), St. Louis, MO, November 7–10, pp 7–10
Polzin TS, Waibel A (2000) Emotion-sensitive human-computer interfaces. In: Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, September 5–7, pp 201–206
Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, CA
Rabiner LR (1977) On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process 25:24–33
Rahurkar MA, Hansen JHL (2003) Towards affect recognition: an ICA approach. In: Proceedings of 4th international symposium on independent component analysis and blind signal separation (ICA2003), Nara, April 1–4, pp 1017–1022
Rosenberg A, Binkowski E (2004) Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In: Dumais DMS, Roukos S, (eds), HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, MA, pp 77–80
de Rosis F, Batliner A, Novielli N, Steidl S (2007) ‘You are Sooo Cool, Valentina!’ Recognizing social attitude in speech-based dialogues with an ECA. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction, Springer, Berlin, pp 179–190
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation. In: Rumelhart D, McClelland L, the PDP Research Group, (eds), Parallel distributed processes: exploration in the microstructure of cognition, vol 1. MIT Press, Cambridge, MA, pp 318–362
Russel JA (1997) How shall an emotion be called? In: Plutchik R, Conte HR (eds), Circumplex models of personality and emotions, chapter 9. American Psychological Association, Washington, DC, pp 205–220
Russell J, Bachorowski J, Fernandez-Dols J (2003) Facial and vocal expressions of emotion. Ann Rev Psychol 54:329–349
Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov, 1(3), 317–328
Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256
Scherer KR, Johnstone T, Klasmeyer G (2003) Vocal expression of emotion. In: Davidson RJ, Scherer KR, Goldsmith HH, (eds), Handbook of affective sciences, chapter 23. Oxford University Press, Oxford NY, pp 433–456
Schiel F (1999) Automatic phonetic transcription of non-prompted speech. In: Proceedings of the ICPhS, San Francisco, CA, August 1–7, pp 607–610
Schröder M, Devillers L, Karpouzis K, Martin J-C, Pelachaud C, Peter C, Pirker H, Schuller B, Tao J, Wilson I (2007) What should a generic emotion markup language be able to represent? In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 440–451
Schuller B, Rigoll G (2009) Recognising interest in conversational speech – comparing bag of frames and supra-segmental features. In: Proceedings of the Interspeech, Brighton, UK, September 6–10, pp 1999–2002
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the ICASSP, Hong Kong, April 6–10, pp 1–4
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of the ICASSP, Montreal, QC, Canada, May 17–21, pp 577–580
Schuller B, Müller R, Lang M, Rigoll G (2005) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensemble. In: Proceedings of the Interspeech, Lisbon, September 4–8, pp 805–808
Schuller B, Stadermann J, Rigoll G (2006a) Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of speech prosody 2006, Dresden, May 2–5, no pagination
Schuller B, Köhler N, Müller R, Rigoll G (2006b) Recognition of interest in human conversational speech. In: Proceedings of the Interspeech, Pittsburgh, PA, September 17–21, pp 793–796
Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2007a) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of the Interspeech, Antwerp, Belgium, August 27–31, pp 2253–2256
Schuller B, Seppi D, Batliner A, Meier A, Steidl S (2007b) Towards more reality in the recognition of emotional speech. In: Proceedings of the ICASSP, Honolulu, April 15–20, pp 941–944
Schuller B, Rigoll G, Can S, Feussner H (2008) Emotion sensitive speech control for human-robot interaction in minimal invasive surgery. In: Proceedings of the 17th International Symposium on robot and human interactive communication, RO-MAN 2008, Munich, Germany, August 1–3, pp 453–458
Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009a) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput J, Special Issue on Vis Multimodal Anal Hum Spontaneous Behav 27:1760–1774
Schuller B, Batliner A, Steidl S, Seppi D (2009b) Emotion recognition from speech: putting ASR in the loop. In: Proceedings of ICASSP, Taipei, Taiwan. IEEE, April 19–24, pp 4585–4588
Schuller B, Steidl S, Batliner A (2009c) The INTERSPEECH 2009 emotion challenge. In: Proceedings of the Interspeech, Brighton, September 6–10, pp 312–315
Scripture E (1921) A study of emotions by speech transcription. Vox 31:179–183
Seppi D, Gerosa M, Schuller B, Batliner A, Steidl S (2008a) Detecting problems in spoken child-computer interaction. In: Proceedings of the 1st workshop on child, computer and interaction, Chania, Greece, October 23, no pagination
Seppi D, Batliner A, Schuller B, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Aharonson V (2008b) Patterns, prototypes, performance: classifying emotional user states. In: Proceedings of the Interspeech, Brisbane, September 22–26, pp 601–604
Shami M, Verhelst W (2007) Automatic classification of expressiveness in speech: a multi-corpus study. In: Müller C, (ed), Speaker classification II (Lecture notes in computer science / artificial intelligence) vol 4441. Springer, Heidelberg, pp 43–56
Siegle G (1995) The balanced affective word list project. http://www.sci.sdsu.edu/CAL/wordlist/ (accessed October 17, 2010)
Skinner E (1935) A calibrated recording and analysis of the pitch, force, and quality of vocal tones expressing happiness and sadness. Speech Monogr 2:81–137
Slaney M, McRoberts G (1998) Baby Ears: A Recognition System for Affective Vocalizations. In: Proceedings of the ICASSP, Seattle, WA, pp 985–988
Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Berlin. PhD thesis, Logos Verlag
Steidl S, Ruff C, Batliner A, Nöth E, Haas J (2004) Looking at the last two turns, I’d say this dialogue is doomed — measuring dialogue success. In: Sojka P, Kopeček I, Pala K, (eds), Text, speech and dialogue, 7th international conference, TSD 2004. Berlin, Heidelberg, pp 629–636
Steidl S, Levit M, Batliner A, Nöth E, Niemann H (2005) “Of all things the measure is man”: automatic classification of emotions and inter-labeler consistency. In: Proceedings of ICASSP, Philadelphia, PA, May 12–15, pp 317–320
Steidl S, Schuller B, Batliner A, Seppi D (2009) The hinterland of emotions: facing the open-microphone challenge. In: Proceedings of ACII, Amsterdam, September 10–12, pp 690–697
Streit M, Batliner A, Portele T (2006) Emotions analysis and emotion-handling subdialogues. In: Wahlster W, (ed), SmartKom: foundations of multimodal dialogue systems. Springer, Berlin, pp 317–332
ten Bosch L (2003) Emotions, speech and the ASR framework. Speech Commun 40(1–2):213–225
Truong K, van Leeuwen D (2005) Automatic detection of laughter. In: Proceedings of the interspeech, Lisbon, Portugal, September 4–8, pp 485–488
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Ververidis D, Kotropoulos C (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collection. In: Proceedings of european signal processing Conference (EUSIPCO 2006), Florence, September 4–8, no pagination
Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007a) Combining frame and turn-level information for robust recognition of emotions within speech. In: Proceedings of Interspeech, Antwerp, Belgium, August 27–31, pp 2249–2252
Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007b) Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 139–147
Vogt T, André E, Wagner J, Gilroy S, Charles F, Cavazza M (2009) Real-time vocal emotion recognition in artistic installations and interactive storytelling: experiences and lessons learnt from CALLAS and IRIS. In: Proceedings of the ACII, Amsterdam, September 10–12, pp 670–677
Wagner J, Vogt T, André (2007) A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: Paiva A, Prada R, Picard RW, (eds), Affective computing and intelligent interaction. Springer, Berlin, pp 114–125
Williams C, Stevens K (1972) Emotions and speech: some acoustic correlates. J Acoust Soc Am (JASA) 52:1238–1250
Wilting J, Krahmer E, Swerts M (2006) Real vs. acted emotional speech. In: Proceedings of Interspeech, Pittsburgh, PA, September 17–21, pp 805–808
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd Edn. Morgan Kaufmann, San Francisco, CA
Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech, Brisbane, September 22–26, pp 597–600
Wolpert D (1992) Stacked generalization. Neural Netw 5:241–259
Wu T, Khan F, Fisher T, Shuler L, Pottenger W (2005) Posting act tagging using transformation-based learning. In: Lin TY, Ohsuga S, Liau C-J, Hu X, Tsumoto S, (eds), Foundations of data mining and knowledge discovery. Springer, Berlin, pp 319–331
You M, Chen C, Bu J, Liu J, Tao J (2006) Emotion recognition from noisy speech. In: Proceedings of ICME, Toronto, ON, July 9–12, pp 1653–1656
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book. Cambridge University Engineering Department, for htk version 3.4 edition
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A Survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Zhe X, Boucouvalas A (2002) Text-to-emotion engine for real time internet communication. In: Proceedings of the international symposium on communication systems, networks, and DSPs. Staffordshire University, Stoke-on-Trent, July 15–17, pp 164–168
Zhou G, Hansen JHL, Kaiser J.F (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9:201–216
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Batliner, A. et al. (2011). The Automatic Recognition of Emotions in Speech. In: Cowie, R., Pelachaud, C., Petta, P. (eds) Emotion-Oriented Systems. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15184-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-15184-2_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15183-5
Online ISBN: 978-3-642-15184-2
eBook Packages: Computer ScienceComputer Science (R0)