Emotion Recognition from Speech

Wendemuth, Andreas; Vlasenko, Bogdan; Siegert, Ingo; Böck, Ronald; Schwenker, Friedhelm; Palm, Günther

doi:10.1007/978-3-319-43665-4_20

Andreas Wendemuth^4,5,
Bogdan Vlasenko⁴,
Ingo Siegert⁴,
Ronald Böck⁴,
Friedhelm Schwenker⁶ &
…
Günther Palm⁶

Part of the book series: Cognitive Technologies ((COGTECH))

851 Accesses
3 Citations

Abstract

Spoken language is one of the main interaction patterns in human-human as well as in natural, companion-like human-machine interactions. Speech conveys content, but also emotions and interaction patterns determining the nature and quality of the user’s relationship to his counterpart. Hence, we consider emotion recognition from speech in the wider sense of application in Companion-systems. This requires a dedicated annotation process to label emotions and to describe their temporal evolution in view of a proper regulation and control of a system’s reaction. This problem is peculiar for naturalistic interactions, where the emotional labels are no longer a priori given. This calls for generating and measuring of a reliable ground truth, where the measurement is closely related to the usage of appropriate emotional features and classification techniques. Further, acted and naturalistic spoken data has to be available in operational form (corpora) for the development of emotion classification; we address the difficulties arising from the variety of these data sources. Speaker clustering and speaker adaptation will as well improve the emotional modeling. Additionally, a combination of the acoustical affective evaluation and the interpretation of non-verbal interaction patterns will lead to a better understanding of and reaction to user-specific emotional behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term “naturalistic” is used to clarify the fact that a computer system always is a conversational partner less powerful than a human and thus HCI cannot be a natural interaction.

References

Altman, D.G.: Practical Statistics for Medical Research. Chapman & Hall, London (1991)
Google Scholar
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34, 555–596 (2008)
Article Google Scholar
Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 15 (2010)
Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit – searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. 25, 4–28 (2011)
Article Google Scholar
Bergmann, K., Böck, R., Jaecks, P.: Emogest: investigating the impact of emotions on spontaneous co-speech gestures. In: Proceedings of the Workshop on Multimodal Corpora 2014, pp. 13–16. LREC, Reykjavik (2014)
Google Scholar
Böck, R., Hübner, D., Wendemuth, A.: Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE MELECON, Valletta, Malta, pp. 1586–1590 (2010)
Google Scholar
Böck, R., Siegert, I., Vlasenko, B., Wendemuth, A., Haase, M., Lange, J.: A processing tool for emotionally coloured speech. In: Proceedings of the 2011 IEEE ICME, p. s.p, Barcelona (2011)
Google Scholar
Böck, R., Limbrecht, K., Walter, S., Hrabal, D., Traue, H., Glüge, S., Wendemuth, A.: Intraindividual and interindividual multimodal emotion analyses in human-machine-interaction. In: Proceedings of the IEEE CogSIMA, New Orleans, pp. 59–64 (2012)
Google Scholar
Böck, R., Limbrecht-Ecklundt, K., Siegert, I., Walter, S., Wendemuth, A.: Audio-based pre-classification for semi-automatic facial expression coding. In: Kurosu, M. (ed.) Human-Computer Interaction. Towards Intelligent and Implicit Interaction. Lecture Notes in Computer Science, vol. 8008, pp. 301–309. Springer, Berlin/Heidelberg (2013)
Chapter Google Scholar
Böck, R., Bergmann, K., Jaecks, P.: Disposition recognition from spontaneous speech towards a combination with co-speech gestures. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds.) Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction. Lecture Notes in Artificial Intelligence, vol. 8757, pp. 57–66. Springer, Cham (2015)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the INTERSPEECH-2005, Lisbon, pp. 1517–1520 (2005)
Google Scholar
Callejas, Z., López-Cózar, R.: Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Comm. 50, 416–433 (2008)
Article Google Scholar
Cicchetti, D., Feinstein, A.: High agreement but low kappa: II. Resolving the paradoxes. J. Clin. Epidemiol. 43, 551–558 (1990)
Article Google Scholar
Cowie, R., Cornelius, R.R.: Describing the emotional states that are expressed in speech. Speech Comm. 40, 5–32 (2003)
Article MATH Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of the SpeechEmotion-2000, Newcastle, pp. 19–24 (2000)
Google Scholar
Dobris̆ek, S., Gajs̆ek, R., Mihelic̆, F., Paves̆ić, N., S̆truc, V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Robot. Syst. 10, 1–10 (2013)
Google Scholar
Ekman, P.: Are there basic emotions? Psychol. Rev. 99, 550–553 (1992)
Article Google Scholar
Feinstein, A., Cicchetti, D.: High agreement but low kappa: I. The problems of two paradoxes. J. Clin. Epidemiol. 43, 543–549 (1990)
Article Google Scholar
Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971)
Article Google Scholar
Frommer, J., Rösner, D., Haase, M., Lange, J., Friesen, R., Otto, M.: Detection and Avoidance of Failures in Dialogues – Wizard of Oz Experiment Operator’s Manual. Pabst Science Publishers, Lengerich (2012)
Google Scholar
Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Comm. 49, 787–800 (2007)
Article Google Scholar
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE ICME, Hannover, pp. 865–868 (2008)
Google Scholar
Harrington, J., Palethorpe, S., Watson, C.: Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. In: Proceedings of the INTERSPEECH-2007, Antwerp, vol. 2, pp. 1081–1084 (2007)
Google Scholar
Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: Proceedings of the Panhellenic Conference on Informatics, pp. 102–106 (2009)
Google Scholar
Kelly, F., Harte, N.: Effects of long-term ageing on speaker verification. In: Vielhauer, C., Dittmann, J., Drygajlo, A., Juul, N., Fairhurst, M. (eds.) Biometrics and ID Management. Lecture Notes in Computer Science, vol. 6583, pp. 113–124. Springer, Berlin/Heidelberg (2011)
Chapter Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 3rd edn. SAGE, Thousand Oaks (2012)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article MATH Google Scholar
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proceedings of the INTERSPEECH 2004, Jeju Island, pp. 889–892 (2004)
Google Scholar
Lee, C., Busso, C., Lee, S., Narayanan, S.: Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In: Proceedings of the INTERSPEECH 2009, pp. 1983–1986 (2009)
Google Scholar
Lipovčan, L., Prizmić, Z., Franc, R.: Age and gender differences in affect regulation strategies. Drustvena istrazivanja: J. Gen. Soc. Issues 18, 1075–1088 (2009)
Google Scholar
Maganti, H.K., Scherer, S., Palm, G.: A novel feature for emotion recognition in voice based applications. In: Affective Computing and Intelligent Interaction, pp. 710–711. Springer, Berlin/Heidelberg (2007)
Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3, 5–17 (2012)
Article Google Scholar
Meudt, S., Bigalke, L., Schwenker, F.: ATLAS – an annotation tool for HCI data utilizing machine learning methods. In: Proceedings of the 1st APD, San Francisco, pp. 5347–5352 (2012)
Google Scholar
Morris, J.D.: SAM: the self-assessment manikin an efficient cross-cultural measurement of emotional response. J. Adv. Res. 35, 63–68 (1995)
Google Scholar
Palm, G., Glodek, M.: Towards emotion recognition in human computer interaction. In: Neural nets and surroundings, pp. 323–336. Springer, Berlin/Heidelberg (2013)
Google Scholar
Pittermann, J., Pittermann, A., Minker, W.: Handling Emotions in Human-Computer Dialogues. Springer, Amsterdam (2010)
Book Google Scholar
Prylipko, D., Rösner, D., Siegert, I., Günther, S., Friesen, R., Haase, M., Vlasenko, B., Wendemuth, A.: Analysis of significant dialog events in realistic human–computer interaction. J. Multimodal User Interfaces 8, 75–86 (2014)
Article Google Scholar
Rösner, D., Frommer, J., Friesen, R., Haase, M., Lange, J., Otto, M.: LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of the 8th LREC, Istanbul, pp. 96–103 (2012)
Google Scholar
Scherer, K.R.: Unconscious Processes in Emotion: The Bulk of the Iceberg, pp. 312–334. Guilford Press, New York (2005)
Google Scholar
Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Comput. Speech Lang. 27(1), 263–287 (2013)
Article Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of the INTERSPEECH-2009, Brighton, pp. 312–315 (2009)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE ASRU-2009, Merano, pp. 552–557 (2009)
Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Comm. 53, 1062–1087 (2011)
Article Google Scholar
Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–the first international audio/visual emotion challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) Affective Computing and Intelligent Interaction. Lecture Notes in Computer Science, vol. 6975, pp. 415–424. Springer, Berlin/Heidelberg (2011)
Chapter Google Scholar
Shami, M., Verhelst, W.: Automatic classification of emotions in speech using multi-corpora approaches. In: Proceedings of the 2nd IEEE Signal Processing Symposium, Antwerp, pp. 3–6 (2006)
Google Scholar
Siegert, I., Böck, R., Philippou-Hübner, D., Vlasenko, B., Wendemuth, A.: Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment manikins. In: Proceedings of the 2011 IEEE ICME, p. s.p, Barcelona (2011)
Google Scholar
Siegert, I., Böck, R., Wendemuth, A.: The influence of context knowledge for multi-modal affective annotation. In: Kurosu, M. (ed.) Human-Computer Interaction. Towards Intelligent and Implicit Interaction. Lecture Notes in Computer Science, vol. 8008, pp. 381–390. Springer, Berlin/Heidelberg (2013)
Chapter Google Scholar
Siegert, I., Glodek, M., Panning, A., Krell, G., Schwenker, F., Al-Hamadi, A., Wendemuth, A.: Using speaker group dependent modelling to improve fusion of fragmentary classifier decisions. In: Proceedings of 2013 IEEE CYBCONF, Lausanne, pp. 132–137 (2013)
Google Scholar
Siegert, I., Hartmann, K., Philippou-Hübner, D., Wendemuth, A.: Human behaviour in HCI: complex emotion detection through sparse speech features. In: Salah, A., Hung, H., Aran, O., Gunes, H. (eds.) Human Behavior Understanding. Lecture Notes in Computer Science, vol. 8212, pp. 246–257. Springer, Berlin/Heidelberg (2013)
Chapter Google Scholar
Siegert, I., Böck, R., Wendemuth, A.: Inter-rater reliability for emotion annotation in human-computer interaction – comparison and methodological improvements. J. Multimodal User Interfaces 8, 17–28 (2014)
Article Google Scholar
Siegert, I., Haase, M., Prylipko, D., Wendemuth, A.: Discourse particles and user characteristics in naturalistic human-computer interaction. In: Kurosu, M. (ed.) Human-Computer Interaction. Advanced Interaction Modalities and Techniques. Lecture Notes in Computer Science, vol. 8511, pp. 492–501. Springer, Berlin/Heidelberg (2014)
Google Scholar
Siegert, I., Philippou-Hübner, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigation of speaker group-dependent modelling for recognition of affective states from speech. Cogn. Comput. 6(4), 892–913 (2014)
Article Google Scholar
Siegert, I., Prylipko, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigating the form-function-relation of the discourse particle “hm” in a naturalistic human-computer interaction. In: Bassis, S., Esposito, A., Morabito, F. (eds.) Recent Advances of Neural Network Models and Applications. Smart Innovation, Systems and Technologies, vol. 26, pp. 387–394. Springer, Berlin/Heidelberg (2014)
Chapter Google Scholar
Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-oz data collection for perception and interaction in multi-user environments. In: International Conference on Language Resources and Evaluation (LREC) (2006)
Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Comm. 48, 1162–1181 (2006)
Article Google Scholar
Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space. Two-class vs. three-class cross corpora emotion recognition evaluations. In: Proceedings of 2014 IEEE ICME (2014)
Google Scholar
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: Proceedings of 2011 IEEE ICME, Barcelona (2011)
Google Scholar
Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the 5th LREC, p. s.p, Genoa (2006)
Google Scholar
Wahlster, W. (ed.): SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Heidelberg/Berlin (2006)
Google Scholar
Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J. (ed.) Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments. Lecture Notes in Computer Science, vol. 6763, pp. 603–611. Springer, Berlin/Heidelberg (2011)
Chapter Google Scholar
Walter, S., Kim, J., Hrabal, D., Crawcour, S., Kessler, H., Traue, H.: Transsituational individual-specific biopsychological classification of emotions. IEEE Trans. Syst. Man Cybern. Syst. Hum. 43(4), 988–995 (2013)
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hasin, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Engineering Department, Cambridge University, Cambridge (2009)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39–58 (2009)
Article Google Scholar

Download references

Acknowledgements

This work was done within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Cognitive Systems Group, Otto von Guericke University, PF-4120, 39016, Magdeburg, Germany
Andreas Wendemuth, Bogdan Vlasenko, Ingo Siegert & Ronald Böck
Center for Behavioral Brain Sciences, 39118, Magdeburg, Germany
Andreas Wendemuth
Institute for Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker & Günther Palm

Authors

Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Vlasenko
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Böck
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Wendemuth .

Editor information

Editors and Affiliations

Institute of Artificial Intelligence, Universität Ulm, Ulm, Germany
Susanne Biundo
Cognitive Systems Group, Institute for Information Technology and Communications (IIKT) and Center for Behavioral Brain Sciences (CBBS), Otto-von-Guericke Universität Magdeburg, Magdeburg, Germany
Andreas Wendemuth

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wendemuth, A., Vlasenko, B., Siegert, I., Böck, R., Schwenker, F., Palm, G. (2017). Emotion Recognition from Speech. In: Biundo, S., Wendemuth, A. (eds) Companion Technology. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-43665-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-43665-4_20
Published: 05 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43664-7
Online ISBN: 978-3-319-43665-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics