Evaluation of Speech Emotion Classification Based on GMM and Data Fusion

Vondra, Martin; Vích, Robert

doi:10.1007/978-3-642-03320-9_10

Martin Vondra²¹ &
Robert Vích²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

1623 Accesses
4 Citations

Abstract

This paper describes continuation of our research on automatic emotion recognition from speech based on Gaussian Mixture Models (GMM). We use similar technique for emotion recognition as for speaker recognition. From previous research it seems to be better to use a lesser number of GMM components than is used for speaker recognition and better results are also achieved for a greater number of speech parameters used for GMM modeling. In previous experiments we used suprasegmental and segmental parameters separately and also together, which can be described as fusion on feature level. The experiment described in this paper is based on an evaluation of score level fusion for two GMM classifiers used separately for segmental and suprasegmental parameters. We evaluate two techniques of score level fusion – dot product of scores from both classifiers and maximum selection and maximum confidence selections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dellaert, F., Polzin, T., Waibel, A.: Recognizing Emotion in Speech. In: The Fourth International Conference on Spoken Language Processing ICSLP 1996, Philadelphia, pp. 1970–1973 (1996)
Google Scholar
Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centers. Speech Communication 49, 98–112 (2007)
Article Google Scholar
Truong, K.P., Leeuven, D.A.: An ‘open-set’ detection evaluation methology for automatic emotion recognition in speech. In: ParaLing 2007: Workshop on Paralinguistic Speech - between models and data, Saarbrücken, Germany (2007)
Google Scholar
Nwe, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41, 603–623 (2003)
Article Google Scholar
Vondra, M., Vích, R.: Recognition of Emotions in German Speech using Gaussian Mixture Models. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals 2008. LNCS, vol. 5398, pp. 256–263. Springer, Heidelberg (2008)
Google Scholar
Vondra, M., Vích, R.: Evaluation of Automatic Speech Emotion Recognition Based on Gaussian Mixture Models. In: Proc. 19. Konferenz Elektronische Sprachsignalverarbeitung, Frankfurt am Main, September 8-10, pp. 172–176 (2008)
Google Scholar
Vích, R., Vondra, M.: Experimente mit dem Teager Energie Operator. In: Proc. 19. Konferenz Elektronische Sprachsignalverarbeitung, Frankfurt am Main, September 8-10, pp. 29–36 (2008)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. Interspeech 2005, Lisbon, Portugal, September 4-8 (2005)
Google Scholar
Sjölander, K., Beskow, J.: Wavesurfer, http://www.speech.kth.se/wavesurfer/
Brookes, M.: VOICEBOX: Speech Processing Toolbox for MATLAB, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Article Google Scholar
Kinnunen, T., Hautamäki, V., Fränti, P.: On the fusion of dissimilarity- based classifiers for speaker identification. In: Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, pp. 2641–2644 (2003)
Google Scholar
Shami, M., Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49, 201–212 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, CZ 18251, Prague 8, Czech Republic
Martin Vondra & Robert Vích

Authors

Martin Vondra
View author publications
You can also search for this author in PubMed Google Scholar
Robert Vích
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via G. Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, 182 52, Prague 8, Czech Republic
Robert Vích

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vondra, M., Vích, R. (2009). Evaluation of Speech Emotion Classification Based on GMM and Data Fusion. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-03320-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03319-3
Online ISBN: 978-3-642-03320-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics