Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data

  • Theodoros Kostoulas
  • Todor Ganchev
  • Nikos Fakotakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5042)


In the present work we report results from on-going research activity in the area of speaker-independent emotion recognition. Experimentations are performed towards examining the behavior of a detector of negative emotional states over non-acted/acted speech. Furthermore, a score-level fusion of two classifiers on utterance level is applied, in attempt to improve the performance of the emotion recognizer. Experimental results demonstrate significant differences on recognizing emotions on acted/real-world speech.


Emotion recognition real-life data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Devillers, L., Vidrascu, L.: Real life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proc. of the Interspeech 2006, pp. 801–804 (2006)Google Scholar
  2. 2.
    Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc. of the Interspeech 2005, pp. 805–808 (2005)Google Scholar
  3. 3.
    Lugger, M., Yang, B.: The relevance of voice quality features in speaker independent emotion recognition. In: Proc. of the ICASSP 2007, vol. IV, pp. 17–20 (2007)Google Scholar
  4. 4.
    Lee, C.M., Narayanan, S.S.: Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)CrossRefGoogle Scholar
  5. 5.
    Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proc. of the 17 IFA 1993, pp. 97–110 (1993)Google Scholar
  6. 6.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on ASSP 28, 357–366 (1980)CrossRefGoogle Scholar
  7. 7.
    Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3, 72–83 (1995)CrossRefGoogle Scholar
  8. 8.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Wilting, J., Kramber, E., Swerts, M.: Real vs. acted emotional speech. In: Proc. of the Interspeech 2006, pp. 805–808 (2006)Google Scholar
  10. 10.
    University of Pennsylvania, Linguistic Data Consortium, Emotional Prosody Speech (2002),
  11. 11.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. of Interspeech 2005, pp. 1517–1520 (2005)Google Scholar
  12. 12.
    Artificial Intelligence Group, Wire Communication Laboratory, University of Patras,
  13. 13.
    Kostoulas, T., Ganchev, T., Mporas, I., Fakotakis, N.: A real-world emotional speech corpus for modern Greek. In: Proc. of LREC 2008, Morocco (May 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Theodoros Kostoulas
    • 1
  • Todor Ganchev
    • 1
  • Nikos Fakotakis
    • 1
  1. 1.Artificial Intelligence Group, Wire Communications Laboratory, Electrical and Computer Engineering DepartmentUniversity of PatrasRion-PatrasGreece

Personalised recommendations