Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines

Kaya, Heysem; Karpov, Alexey A.; Salah, Albert Ali

doi:10.1007/978-3-319-40663-3_14

Heysem Kaya¹⁶,
Alexey A. Karpov^17,18 &
Albert Ali Salah¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9719))

Included in the following conference series:

International Symposium on Neural Networks

2719 Accesses
10 Citations

Abstract

One of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on a four class (i.e. joy, anger, sadness, neutral) problem. We show the efficacy of our proposed method on the recently collected Turkish Emotional Speech Database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The tool is available at http://www.openaudio.eu/.

References

Cowie, R., Sussman, N., Ben-Ze’ev, A.: Emotion-Oriented Systems: The Humaine Handbook, pp. 9–32. Springer, Heidelberg (2011)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(2), 513–529 (2012)
Article Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. Proc. IEEE Int. Joint Conf. Neural Netw. 2, 985–990 (2004)
Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Kaya, H., Gürpinar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM International Conference on Multimodal Interaction, pp. 459–466. ACM (2015)
Google Scholar
Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: INTERSPEECH, pp. 909–913 (2015)
Google Scholar
Kaya, H., Salah, A.A., Gurgen, S.F., Ekenel, H.: Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In: Proceedings of the 22nd IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1698–1701 (2014)
Google Scholar
Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: INTERSPEECH, pp. 746–750 (2014)
Google Scholar
Meral, H.M., Ekenel, H.K., Ozsoy, A.: Analysis of emotion in Turkish. In: XVII National Conference on Turkish Linguistics (2003)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp. 143–156 (2010)
Google Scholar
Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and its Applications. Wiley, New York (1971)
MATH Google Scholar
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: INTERSPEECH, pp. 2794–2797 (2010)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
Article Google Scholar
Stanislavski, C.: An Actor Prepares. Routledge, London (1989)
Google Scholar
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article MathSciNet MATH Google Scholar
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014–3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge, AVEC 2014 (2014)
Google Scholar
Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: INTERSPEECH, pp. 751–755 (2014)
Google Scholar

Download references

Acknowledgments

This research is partially supported by the Council for Grants of the President of the Russian Federation (Project № MD-3035.2015.8) and by the Government of the Russian Federation (Grant № 074-U01).

Author information

Authors and Affiliations

Department of Computer Engineering, Çorlu Faculty of Engineering, Namik Kemal University, Çorlu, Tekirdağ, Turkey
Heysem Kaya
St. Petersburg Institute for Informatics and Automation of Russian Academy of Sciences, St. Petersburg, Russia
Alexey A. Karpov
ITMO University, St. Petersburg, Russia
Alexey A. Karpov
Department of Computer Engineering, Boğaziçi University, Bebek, Istanbul, Turkey
Albert Ali Salah

Authors

Heysem Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Alexey A. Karpov
View author publications
You can also search for this author in PubMed Google Scholar
Albert Ali Salah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heysem Kaya .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
Huazhong University of Science and Tech., Wuhan, Jiangsu, China
Qingshan Liu
Russian Academy of Sciences, SPIIRAS, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaya, H., Karpov, A.A., Salah, A.A. (2016). Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines. In: Cheng, L., Liu, Q., Ronzhin, A. (eds) Advances in Neural Networks – ISNN 2016. ISNN 2016. Lecture Notes in Computer Science(), vol 9719. Springer, Cham. https://doi.org/10.1007/978-3-319-40663-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-40663-3_14
Published: 02 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40662-6
Online ISBN: 978-3-319-40663-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics