Combined Feature Representation for Emotion Classification from Russian Speech

Verkholyak, Oxana; Karpov, Alexey

doi:10.1007/978-3-319-71746-3_6

Oxana Verkholyak^12,13 &
Alexey Karpov^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

1303 Accesses
1 Citations

Abstract

Acoustic feature extraction for emotion classification is possible on different levels. Frame-level features provide low-level description characteristics that preserve temporal structure of the utterance. On the other hand, utterance-level features represent functionals applied to the low-level descriptors and contain important information about speaker emotional state. Utterance-level features are particularly useful for determining emotion intensity, however, they lose information about temporal changes of the signal. Another drawback includes often insufficient number of feature vectors for complex classification tasks. One solution to overcome these problems is to combine the frame-level features and utterance-level features to take advantage of both methods. This paper proposes to obtain low-level feature representation feeding frame-level descriptor sequences to a Long Short-Term Memory (LSTM) network, combine the outcome with the Principal Component Analysis (PCA) representation of utterance-level features, and make the final prediction with a logistic regression classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Metallinou, A., Wollmer, M., Katsamanis, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affect. Comput. 3(2), 184–198 (2012)
Article Google Scholar
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 139–147. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74889-2_13
Chapter Google Scholar
Kim, Y., Honglak, L., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing ICASSP-2013, pp. 3687–3691 (2013)
Google Scholar
Hochreiter, S., Jürgen, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining frame and turn-level information for robust recognition of emotions within speech. In: Proceedings of 8th International Conference INTERSPEECH-2007, Antwerp, Belgium, pp. 2249–2252 (2007)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE – the Munich versatile and fast open-source audio feature extractor. In: Proceedings of 18th ACM International Conference on Multimedia, Florence, Italy, pp. 1459–1462 (2010)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings 11th International Conference INTERSPEECH-2010, Makuhari, Japan, pp. 2795–2798 (2010)
Google Scholar
Verkholyak, O.: Research on methods of automatic emotion recognition in Russian speech. Ms. dissertation, ITMO University, St. Petersburg, Russia (2017)
Google Scholar
Kaya, H., Karpov, A.A., Salah, A.A.: Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In: Cheng, L., Liu, Q., Ronzhin, A. (eds.) ISNN 2016. LNCS, vol. 9719, pp. 115–123. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40663-3_14
Chapter Google Scholar
Jolliffe, I.: Principal Component Analysis. Wiley, Indianapolis (2002)
MATH Google Scholar
Sidorov, M.: Automatic recognition of paralinguistic information. Ph.D. dissertation, Ulm University, Ulm, Germany (2016)
Google Scholar
Makarova, V., Petrushin, V.A.: RUSLANA: a database of Russian emotional utterances. In: Proceedings of 7th International Conference on Spoken Language Processing ICSLP-2002, Denver, Colorado, USA, pp. 2041–2044 (2002)
Google Scholar

Download references

Acknowledgments

This work was financially supported by the Ministry of Education and Science of the Russian Federation (contract 14.575.21.0132, ID RFMEFI57517X0132), as well as by the Council for grants of the President of the Russian Federation (project № MD–254.2017.8) and by the RFBR (project № 16-3760100).

Author information

Authors and Affiliations

ITMO University, St. Petersburg, Russia
Oxana Verkholyak & Alexey Karpov
SPIIRAS Institute, St. Petersburg, Russia
Oxana Verkholyak & Alexey Karpov

Authors

Oxana Verkholyak
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oxana Verkholyak .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
University of Helsinki, Helsinki, Finland
Lidia Pivovarova
Mendel University , Brno, Czech Republic
Jan Žižka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verkholyak, O., Karpov, A. (2018). Combined Feature Representation for Emotion Classification from Russian Speech. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-71746-3_6
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics