Skip to main content

Emotion Recognition from Speech by Combining Databases and Fusion of Classifiers

  • Conference paper
Text, Speech and Dialogue (TSD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

Abstract

We explore possibilities for enhancing the generality, portability and robustness of emotion recognition systems by combining data-bases and by fusion of classifiers. In a first experiment, we investigate the performance of an emotion detection system tested on a certain database given that it is trained on speech from either the same database, a different database or a mix of both. We observe that generally there is a drop in performance when the test database does not match the training material, but there are a few exceptions. Furthermore, the performance drops when a mixed corpus of acted databases is used for training and testing is carried out on real-life recordings. In a second experiment we investigate the effect of training multiple emotion detectors, and fusing these into a single detection system. We observe a drop in the Equal Error Rate (eer) from 19.0 % on average for 4 individual detectors to 4.2 % when fused using FoCal [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brümmer, N., Burget, L., Cernocky, J., Glembek, O., Grezl, F., Karafiat, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006. IEEE Transactions on Speech, Audio and Language Processing 15(7), 2072–2084 (2007)

    Article  Google Scholar 

  2. Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-Sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE, 1370–1390 (2003)

    Google Scholar 

  3. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proceedings of Interspeech, pp. 312–315. ISCA (2009)

    Google Scholar 

  4. Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech, 1st edn. Logos Verlag, Berlin (2009)

    Google Scholar 

  5. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language (2010)

    Google Scholar 

  6. Vogt, T., Andre, E.: Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition. In: IEEE International Conference on Multimedia and Expo., pp. 474–477 (July 2005)

    Google Scholar 

  7. Shami, M., Verhelst, W.: Automatic Classification of Expressiveness in Speech: A Multi-corpus Study. Speaker Classification II: Selected Projects, 43–56 (2007)

    Google Scholar 

  8. Vidrascu, L., Devillers, L.: Anger Detection Performances Based on Prosodic and Acoustic Cues in Several Corpora. In: LREC 2008 (2008)

    Google Scholar 

  9. Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining Frame and Turn-Level Information for Robust Recognition of Emotions within Speech. In: Proceedings of Interspeech (2007)

    Google Scholar 

  10. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B.: A Database of German Emotional Speech. In: Proceedings of Interspeech, pp. 1517–1520 (2005)

    Google Scholar 

  11. Engberg, I. S., Hansen, A. V.: Documentation of the Danish Emotional Speech Database (DES). Internal AAU report, Center for Person Kommunikation (1996)

    Google Scholar 

  12. Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 Audio-Visual Emotion Database. In: 22nd International Conference on Data Engineering Workshops (2006)

    Google Scholar 

  13. Lefter, I., Rothkrantz, L.J.M., Wiggers, P., van Leeuwen, D.A.: Automatic Stress Detection in Emergency (Telephone) Calls. Int. J. on Intelligent Defence Support Systems (2010) (submitted)

    Google Scholar 

  14. Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification Systems. Digital Signal Processing 10, 42–54 (2000)

    Article  Google Scholar 

  15. Juslin, P.N., Scherer, K.R.: Vocal Expression of Affect. In: Harrigan, J., Rosenthal, R., Scherer, K. (eds.) The New Handbook of Methods in Nonverbal Behavior Research, pp. 65–135. Oxford University Press, Oxford (2005)

    Google Scholar 

  16. Truong, K.P., Raaijmakers, S.: Automatic Recognition of Spontaneous Emotions in Speech Using Acoustic and Lexical Features. In: Popescu-Belis, A., Stiefelhagen, R. (eds.) MLMI 2008. LNCS, vol. 5237, pp. 161–172. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Boersma, P.: Praat, a System for Doing Phonetics by Computer. Glot International 5(9/10), 341–345 (2001)

    Google Scholar 

  18. Chang, C. C., Lin, C. J.: LIBSVM: a Library for Support Vector Machines (2001)

    Google Scholar 

  19. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 121–124 (1992)

    Google Scholar 

  20. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

  21. Campbell, W., Sturim, D., Reynolds, D.: Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)

    Article  Google Scholar 

  22. Brümmer, N.: Discriminative Acoustic Language Recognition via Channel-Compensated GMM Statistics. In: Proceedings of Interspeech. ISCA (2009)

    Google Scholar 

  23. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The Det Curve In Assessment Of Detection Task Performance. In: Proceedings Eurospeech 1997, pp. 1895–1898 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lefter, I., Rothkrantz, L.J.M., Wiggers, P., van Leeuwen, D.A. (2010). Emotion Recognition from Speech by Combining Databases and Fusion of Classifiers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15760-8_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15759-2

  • Online ISBN: 978-3-642-15760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics