The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech

  • Friedhelm Schwenker
  • Stefan Scherer
  • Yasmine M. Magdi
  • Günther Palm
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5768)


Emotion recognition from speech is an important field of research in human-machine-interfaces, and has various applications, for instance for call centers. In the proposed classifier system RASTA-PLP features (perceptual linear prediction) are extracted from the speech signals. The first step is to compute an universal background model (UBM) representing a general structure of the underlying feature space of speech signals. This UBM is modeled as a Gaussian mixture model (GMM). After computing the UBM the sequence of feature vectors extracted from the utterance is used to re-train the UBM. From this GMM the mean vectors are extracted and concatenated to the so-called GMM supervectors which are then applied to a support vector machine classifier. The overall system has been evaluated by using utterances from the public Berlin emotional database. Utilizing the proposed features a recognition rate of 79% (utterance based) has been achieved which is close to the performance of humans on this database.


Support Vector Machine Speech Signal Gaussian Mixture Model Emotion Recognition Automatic Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical report, International Computer Science Institute and Computer Science Division, Department of Electrical Engineering and Computer Science, U.C. Berkeley (1998)Google Scholar
  2. 2.
    Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacrètaz, D., Reynolds, D.A.: A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Applied Signal Processing, 430–451 (2004)Google Scholar
  3. 3.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proceedings of Interspeech 2005 (2005)Google Scholar
  4. 4.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using gmm supervectors for speaker verification support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)CrossRefGoogle Scholar
  5. 5.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)CrossRefGoogle Scholar
  6. 6.
    Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceedings of ICSLP, pp. 1970–1973 (1996)Google Scholar
  7. 7.
    Devillers, L., Vidrascu, L., Lamel, L.: Challanges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407–422 (2005)CrossRefGoogle Scholar
  8. 8.
    Drullman, R., Festen, J., Plomp, R.: Effect of reducing slow temporal modulations on speech reception. Journal of the Acousic Society 95, 2670–2680 (1994)CrossRefGoogle Scholar
  9. 9.
    Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Networks 18, 389–405 (2005)CrossRefGoogle Scholar
  10. 10.
    Hermansky, H.: Auditory modeling in automatic recognition of speech. In: Proceedings of Keele Workshop (1996)Google Scholar
  11. 11.
    Hermansky, H.: The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (1997)Google Scholar
  12. 12.
    Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE transactions on speech and audio processing 2(4) (October 1994)Google Scholar
  13. 13.
    Kandovan, R.S., Lashkari, M.R.K., Etemad, Y.: Optimization of speaker verification using adapted GaSPPR 2007ussian mixture models for high quality databases. In: Proceedings of the Fourth conference on IASTED International Conference, Anaheim, CA, USA, pp. 264–268. ACTA Press (2007)Google Scholar
  14. 14.
    Kanederaa, N., Araib, T., Hermansky, H., Pavele, M.: On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communications 28, 43–55 (1999)CrossRefGoogle Scholar
  15. 15.
    Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.S.: Emotion recognition based on phoneme classes. In: Proceedings of ICSLP 2004 (2004)Google Scholar
  16. 16.
    Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Computing and Applications 9, 290–296 (2000)CrossRefzbMATHGoogle Scholar
  17. 17.
    Oudeyer, P.-Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction 59(1-2), 157–183 (2003)Google Scholar
  18. 18.
    Platt, J.C.: Sequential Minimal Optimizer: A Fast Algorithm for Training Support Vector Machines. Technical report, Microsoft Research (April 1998)Google Scholar
  19. 19.
    Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of ICASSP, vol. 1, pp. 1331–1334 (1997)Google Scholar
  20. 20.
    Scherer, K.R., Johnstone, T., Klasmeyer, G.: Affective Science. In: Handbook of Affective Sciences - Vocal expression of emotion, vol. 23, pp. 433–456. Oxford University Press, Oxford (2003)Google Scholar
  21. 21.
    Scherer, S., Schwenker, F., Palm, G.: Classifier fusion for emotion recognition from speech. In: Proceedings of Intelligent Environments 2007, pp. 152–155 (2007)Google Scholar
  22. 22.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  23. 23.
    Yacoub, S., Simske, S., Lin, X., Burns, J.: Recognition of emotions in interactive voice response systems. In: Proceedings of Eurospeech 2003 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Friedhelm Schwenker
    • 1
  • Stefan Scherer
    • 1
  • Yasmine M. Magdi
    • 2
  • Günther Palm
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany
  2. 2.Computer Science and Engineering DepartmentGerman University in CairoHeliopolisEgypt

Personalised recommendations