Skip to main content

Evaluation of Speech Emotion Classification Based on GMM and Data Fusion

  • Conference paper
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

Abstract

This paper describes continuation of our research on automatic emotion recognition from speech based on Gaussian Mixture Models (GMM). We use similar technique for emotion recognition as for speaker recognition. From previous research it seems to be better to use a lesser number of GMM components than is used for speaker recognition and better results are also achieved for a greater number of speech parameters used for GMM modeling. In previous experiments we used suprasegmental and segmental parameters separately and also together, which can be described as fusion on feature level. The experiment described in this paper is based on an evaluation of score level fusion for two GMM classifiers used separately for segmental and suprasegmental parameters. We evaluate two techniques of score level fusion – dot product of scores from both classifiers and maximum selection and maximum confidence selections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dellaert, F., Polzin, T., Waibel, A.: Recognizing Emotion in Speech. In: The Fourth International Conference on Spoken Language Processing ICSLP 1996, Philadelphia, pp. 1970–1973 (1996)

    Google Scholar 

  2. Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centers. Speech Communication 49, 98–112 (2007)

    Article  Google Scholar 

  3. Truong, K.P., Leeuven, D.A.: An ‘open-set’ detection evaluation methology for automatic emotion recognition in speech. In: ParaLing 2007: Workshop on Paralinguistic Speech - between models and data, Saarbrücken, Germany (2007)

    Google Scholar 

  4. Nwe, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41, 603–623 (2003)

    Article  Google Scholar 

  5. Vondra, M., Vích, R.: Recognition of Emotions in German Speech using Gaussian Mixture Models. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals 2008. LNCS, vol. 5398, pp. 256–263. Springer, Heidelberg (2008)

    Google Scholar 

  6. Vondra, M., Vích, R.: Evaluation of Automatic Speech Emotion Recognition Based on Gaussian Mixture Models. In: Proc. 19. Konferenz Elektronische Sprachsignalverarbeitung, Frankfurt am Main, September 8-10, pp. 172–176 (2008)

    Google Scholar 

  7. Vích, R., Vondra, M.: Experimente mit dem Teager Energie Operator. In: Proc. 19. Konferenz Elektronische Sprachsignalverarbeitung, Frankfurt am Main, September 8-10, pp. 29–36 (2008)

    Google Scholar 

  8. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. Interspeech 2005, Lisbon, Portugal, September 4-8 (2005)

    Google Scholar 

  9. Sjölander, K., Beskow, J.: Wavesurfer, http://www.speech.kth.se/wavesurfer/

  10. Brookes, M.: VOICEBOX: Speech Processing Toolbox for MATLAB, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

  11. Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)

    Article  Google Scholar 

  12. Kinnunen, T., Hautamäki, V., Fränti, P.: On the fusion of dissimilarity- based classifiers for speaker identification. In: Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, pp. 2641–2644 (2003)

    Google Scholar 

  13. Shami, M., Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49, 201–212 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vondra, M., Vích, R. (2009). Evaluation of Speech Emotion Classification Based on GMM and Data Fusion. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03320-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03319-3

  • Online ISBN: 978-3-642-03320-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics