Speech Emotion Recognition Using Spectrogram Patterns as Features

Avci, Umut

doi:10.1007/978-3-030-60276-5_6

Umut Avci¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1705 Accesses

Abstract

In this paper, we tackle the problem of identifying emotions from speech by using features derived from spectrogram patterns. Towards this goal, we create a spectrogram for each speech signal. Produced spectrograms are divided into non-overlapping partitions based on different frequency ranges. After performing a discretization operation on each partition, we mine partition-specific patterns that discriminate an emotion from all other emotions. A classifier is then trained with features obtained from the extracted patterns. Our experimental evaluations indicate that the spectrogram-based patterns outperform the standard set of acoustic features. It is also shown that the results can further be improved with the increasing number of spectrogram partitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl. Based Syst. 184, 104886 (2019)
Google Scholar
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Proc. 22(6), 1154–1160 (2012)
Article MathSciNet Google Scholar
Gosztolya, G.: Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica 17, 7–23 (2020)
Article Google Scholar
Huang, K., Wu, C., Hong, Q., Su, M., Zeng, Y.: Speech emotion recognition using convolutional neural network with audio word-based embedding. In: ISCSLP, pp. 265–269 (2018)
Google Scholar
Ishi, C.T., Kanda, T.: Prosodic and voice quality analyses of loud speech: differences of hot anger and far-directed speech. In: SMM, pp. 1–5 (2019)
Google Scholar
Kaya, H., Fedotov, D., Yeşilkanat, A., Verkholyak, O., Zhang, Y., Karpov, A.: LSTM based cross-corpus and cross-task acoustic emotion recognition. In: INTERSPEECH, pp. 521–525 (2018)
Google Scholar
Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access 7, 117327–117345 (2019)
Article Google Scholar
Lin, J., Wu, C., Wei, W.: Emotion recognition of conversational affective speech using temporal course modeling. In: INTERSPEECH, pp. 1336–1340 (2013)
Google Scholar
Liu, Y., Zheng, Y.F.: One-against-all multi-class SVM classification using reliability measures. In: IJCNN, vol. 2, pp. 849–854 (2005)
Google Scholar
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)
Article Google Scholar
Mao, S., Tao, D., Zhang, G., Ching, P.C., Lee, T.: Revisiting hidden Markov models for speech emotion recognition. In: ICASSP, pp. 6715–6719 (2019)
Google Scholar
Milgram, J., Cheriet, M., Sabourin, R.: “One against one” or “One against all”: which one is better for handwriting recognition with SVMs? In: IWFHR (2006)
Google Scholar
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: ANIPS, pp. 547–553 (2000)
Google Scholar
Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp. 1089–1093 (2017)
Google Scholar
Spyrou, E., Nikopoulou, R., Vernikos, I., Mylonas, P.: Emotion recognition from speech using the bag-of-visual words on audio segment spectrograms. Technologies 7(1), 20 (2019)
Article Google Scholar
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: ICASSP, pp. 5200–5204 (2016)
Google Scholar
Tripathi, S., Beigi, H.S.M.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning. arXiv abs/1804.05788 (2018)
Google Scholar
Tripathi, S., Kumar, A., Ramesh, A., Singh, C., Yenigalla, P.: Deep learning-based emotion recognition system using speech features and transcriptions. arXiv abs/1906.05681 (2019)
Google Scholar
Tzirakis, P., Zhang, J., Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: ICASSP, pp. 5089–5093 (2018)
Google Scholar
Wang, Y., Hu, W.: Speech emotion recognition based on improved MFCC. In: CSAE, pp. 1–7 (2018)
Google Scholar
Yang, H., Duan, L., Hu, B., Deng, S., Wang, W., Qin, P.: Mining top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015)
MathSciNet MATH Google Scholar
Yang, Z., Hirschberg, J.: Predicting arousal and valence from waveforms and spectrograms using deep neural networks. In: INTERSPEECH, pp. 3092–3096 (2018)
Google Scholar
Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: ACII, pp. 139–145 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Department of Software Engineering, Yasar University, Bornova, Izmir, Turkey
Umut Avci

Authors

Umut Avci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Umut Avci .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avci, U. (2020). Speech Emotion Recognition Using Spectrogram Patterns as Features. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_6
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics