Abstract
One of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on a four class (i.e. joy, anger, sadness, neutral) problem. We show the efficacy of our proposed method on the recently collected Turkish Emotional Speech Database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The tool is available at http://www.openaudio.eu/.
References
Cowie, R., Sussman, N., Ben-Ze’ev, A.: Emotion-Oriented Systems: The Humaine Handbook, pp. 9–32. Springer, Heidelberg (2011)
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(2), 513–529 (2012)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. Proc. IEEE Int. Joint Conf. Neural Netw. 2, 985–990 (2004)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Kaya, H., Gürpinar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM International Conference on Multimodal Interaction, pp. 459–466. ACM (2015)
Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: INTERSPEECH, pp. 909–913 (2015)
Kaya, H., Salah, A.A., Gurgen, S.F., Ekenel, H.: Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In: Proceedings of the 22nd IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1698–1701 (2014)
Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: INTERSPEECH, pp. 746–750 (2014)
Meral, H.M., Ekenel, H.K., Ozsoy, A.: Analysis of emotion in Turkish. In: XVII National Conference on Turkish Linguistics (2003)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp. 143–156 (2010)
Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and its Applications. Wiley, New York (1971)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: INTERSPEECH, pp. 2794–2797 (2010)
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
Stanislavski, C.: An Actor Prepares. Routledge, London (1989)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014–3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge, AVEC 2014 (2014)
Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: INTERSPEECH, pp. 751–755 (2014)
Acknowledgments
This research is partially supported by the Council for Grants of the President of the Russian Federation (Project № MD-3035.2015.8) and by the Government of the Russian Federation (Grant № 074-U01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kaya, H., Karpov, A.A., Salah, A.A. (2016). Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines. In: Cheng, L., Liu, Q., Ronzhin, A. (eds) Advances in Neural Networks – ISNN 2016. ISNN 2016. Lecture Notes in Computer Science(), vol 9719. Springer, Cham. https://doi.org/10.1007/978-3-319-40663-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-40663-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40662-6
Online ISBN: 978-3-319-40663-3
eBook Packages: Computer ScienceComputer Science (R0)