Skip to main content

Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2016 (ISNN 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9719))

Included in the following conference series:

Abstract

One of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on a four class (i.e. joy, anger, sadness, neutral) problem. We show the efficacy of our proposed method on the recently collected Turkish Emotional Speech Database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The tool is available at http://www.openaudio.eu/.

References

  1. Cowie, R., Sussman, N., Ben-Ze’ev, A.: Emotion-Oriented Systems: The Humaine Handbook, pp. 9–32. Springer, Heidelberg (2011)

    Google Scholar 

  2. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  3. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, pp. 1459–1462. ACM (2010)

    Google Scholar 

  4. Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(2), 513–529 (2012)

    Article  Google Scholar 

  5. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. Proc. IEEE Int. Joint Conf. Neural Netw. 2, 985–990 (2004)

    Google Scholar 

  6. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  7. Kaya, H., Gürpinar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM International Conference on Multimodal Interaction, pp. 459–466. ACM (2015)

    Google Scholar 

  8. Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: INTERSPEECH, pp. 909–913 (2015)

    Google Scholar 

  9. Kaya, H., Salah, A.A., Gurgen, S.F., Ekenel, H.: Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In: Proceedings of the 22nd IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1698–1701 (2014)

    Google Scholar 

  10. Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: INTERSPEECH, pp. 746–750 (2014)

    Google Scholar 

  11. Meral, H.M., Ekenel, H.K., Ozsoy, A.: Analysis of emotion in Turkish. In: XVII National Conference on Turkish Linguistics (2003)

    Google Scholar 

  12. Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp. 143–156 (2010)

    Google Scholar 

  13. Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and its Applications. Wiley, New York (1971)

    MATH  Google Scholar 

  14. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: INTERSPEECH, pp. 2794–2797 (2010)

    Google Scholar 

  15. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)

    Google Scholar 

  16. Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)

    Article  Google Scholar 

  17. Stanislavski, C.: An Actor Prepares. Routledge, London (1989)

    Google Scholar 

  18. Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  19. Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014–3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge, AVEC 2014 (2014)

    Google Scholar 

  20. Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: INTERSPEECH, pp. 751–755 (2014)

    Google Scholar 

Download references

Acknowledgments

This research is partially supported by the Council for Grants of the President of the Russian Federation (Project № MD-3035.2015.8) and by the Government of the Russian Federation (Grant № 074-U01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heysem Kaya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kaya, H., Karpov, A.A., Salah, A.A. (2016). Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines. In: Cheng, L., Liu, Q., Ronzhin, A. (eds) Advances in Neural Networks – ISNN 2016. ISNN 2016. Lecture Notes in Computer Science(), vol 9719. Springer, Cham. https://doi.org/10.1007/978-3-319-40663-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40663-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40662-6

  • Online ISBN: 978-3-319-40663-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics