Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines

  • Heysem KayaEmail author
  • Alexey A. Karpov
  • Albert Ali Salah
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9719)


One of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on a four class (i.e. joy, anger, sadness, neutral) problem. We show the efficacy of our proposed method on the recently collected Turkish Emotional Speech Database.


Acoustic emotion recognition Speech emotion recognition Cascaded normalization Extreme learning machines ELM 



This research is partially supported by the Council for Grants of the President of the Russian Federation (Project № MD-3035.2015.8) and by the Government of the Russian Federation (Grant № 074-U01).


  1. 1.
    Cowie, R., Sussman, N., Ben-Ze’ev, A.: Emotion-Oriented Systems: The Humaine Handbook, pp. 9–32. Springer, Heidelberg (2011)Google Scholar
  2. 2.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  3. 3.
    Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, pp. 1459–1462. ACM (2010)Google Scholar
  4. 4.
    Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(2), 513–529 (2012)CrossRefGoogle Scholar
  5. 5.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. Proc. IEEE Int. Joint Conf. Neural Netw. 2, 985–990 (2004)Google Scholar
  6. 6.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)CrossRefGoogle Scholar
  7. 7.
    Kaya, H., Gürpinar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM International Conference on Multimodal Interaction, pp. 459–466. ACM (2015)Google Scholar
  8. 8.
    Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: INTERSPEECH, pp. 909–913 (2015)Google Scholar
  9. 9.
    Kaya, H., Salah, A.A., Gurgen, S.F., Ekenel, H.: Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In: Proceedings of the 22nd IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1698–1701 (2014)Google Scholar
  10. 10.
    Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: INTERSPEECH, pp. 746–750 (2014)Google Scholar
  11. 11.
    Meral, H.M., Ekenel, H.K., Ozsoy, A.: Analysis of emotion in Turkish. In: XVII National Conference on Turkish Linguistics (2003)Google Scholar
  12. 12.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp. 143–156 (2010)Google Scholar
  13. 13.
    Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and its Applications. Wiley, New York (1971)zbMATHGoogle Scholar
  14. 14.
    Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: INTERSPEECH, pp. 2794–2797 (2010)Google Scholar
  15. 15.
    Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)Google Scholar
  16. 16.
    Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)CrossRefGoogle Scholar
  17. 17.
    Stanislavski, C.: An Actor Prepares. Routledge, London (1989)Google Scholar
  18. 18.
    Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014–3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge, AVEC 2014 (2014)Google Scholar
  20. 20.
    Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: INTERSPEECH, pp. 751–755 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Heysem Kaya
    • 1
    Email author
  • Alexey A. Karpov
    • 2
    • 3
  • Albert Ali Salah
    • 4
  1. 1.Department of Computer Engineering, Çorlu Faculty of EngineeringNamik Kemal UniversityÇorlu, TekirdağTurkey
  2. 2.St. Petersburg Institute for Informatics and Automation of Russian Academy of SciencesSt. PetersburgRussia
  3. 3.ITMO UniversitySt. PetersburgRussia
  4. 4.Department of Computer EngineeringBoğaziçi UniversityIstanbulTurkey

Personalised recommendations