Audio-Based Emotion Recognition from Natural Conversations Based on Co-Occurrence Matrix and Frequency Domain Energy Distribution Features

  • Aya Sayedelahl
  • Pouria Fewzee
  • Mohamed S. Kamel
  • Fakhri Karray
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6975)


Emotion recognition from natural speech is a very challenging problem. The audio sub-challenge represents an initial step towards building an efficient audio-visual based emotion recognition system that can detect emotions for real life applications (i.e. human-machine interaction and/or communication). The SEMAINE database, which consists of emotionally colored conversations, is used as the benchmark database. This paper presents our emotion recognition system from speech information in terms of positive/negative valence, and high and low arousal, expectancy and power. We introduce a new set of features including Co-Occurrence matrix based features as well as frequency domain energy distribution based features. Comparisons between well-known prosodic and spectral features and the new features are presented. Classification using the proposed features has shown promising results compared to the classical features on both the development and test data sets.


Speech Emotion Recognition Co-Occurrence Matrix Frequency Domain Energy Distribution 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. In: International Conference on Neural Information Processing, vol. 2, pp. 495–501 (1999)Google Scholar
  2. 2.
    El Ayadi, M., Kamel, M.S., Karray, F.: Speech emotion recognition using gaussian mixture vector autoregressive models. In: ICASSP, vol. 4, pp. 957–960 (2007)Google Scholar
  3. 3.
    Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech, pp. 312–315. ISCA, Brighton (2009)Google Scholar
  4. 4.
    Tarasov, A., Delany, S.J.: Benchmarking Classification Models for Emotion Recognition in Natural Speech: a Multi-Corporal Study. In: IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (2011)Google Scholar
  5. 5.
    Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–The First International Audio/Visual Emotion Challenge. In: D´Mello, S., et al. (eds.) ACII 2011, Part II. LNCS, pp. 415–424. Springer, Heidelberg (2011)Google Scholar
  6. 6.
    Ghosal, A., Chakraborty, R., Chakraborty, R., Haty, S., Chandra Dhara, B., Kumar Saha, S.: Speech/Music Classification Using Occurrence Pattern of ZCR and STE. In: Third International Symposium on Intelligent Information Technology Application, pp. 435–438 (2009)Google Scholar
  7. 7.
    Cichosz, J., Ślot, K.: ‘Emotion recognition in speech signal using emotion extracting binary decision trees. In: Doctoral Consortium. ACII 2007. ACM, Springer, Lisbon (2007)Google Scholar
  8. 8.
    Pierre-Yves, O.: The production and recognition of emotions in speech: features and algorithms. Int. J. Human-Computer Studies 59 (2003)Google Scholar
  9. 9.
    ten Bosch, L.: Emotions, speech and the ASR framework. Speech Communication 40 (2003)Google Scholar
  10. 10.
    Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48 (2006)Google Scholar
  11. 11.
    Cowie, R., Cowie, E.D., Tsapatsoulis, N., et al.: Emotion Recognition in Human-Machine Interaction. IEEE Signal Processing Magazine, 32–80 (January 2001)Google Scholar
  12. 12.
    Haralick, R.M., Shanmugam, R., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst., Man, Cybern. SMC-3, 610–621 (1973)CrossRefGoogle Scholar
  13. 13.
    Terzopoulos, D.: Co-Occurrence analysis of speech waveforms. IEEE Transactions on Acoustics, Speech and Signal Processing, Trans. 33(1), 5–30 (1985)CrossRefGoogle Scholar
  14. 14.
    Yacoub, S., Simske, S., Lin, X., Burns, J.: Recognition of Emotions in Interactive Voice Response Systems. In: 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, pp. 729–732 (2003)Google Scholar
  15. 15.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)CrossRefMATHGoogle Scholar
  16. 16.
    Chang, C.C., Lin, C.J.: LibSVM: a library for support vector machines (2001) Software,

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Aya Sayedelahl
    • 1
  • Pouria Fewzee
    • 1
  • Mohamed S. Kamel
    • 1
  • Fakhri Karray
    • 1
  1. 1.Pattern Analysis and Machine Intelligence Lab, Electrical and Computer EngineeringUniversity of WaterlooCanada

Personalised recommendations