Abstract
In this paper, we tackle the problem of identifying emotions from speech by using features derived from spectrogram patterns. Towards this goal, we create a spectrogram for each speech signal. Produced spectrograms are divided into non-overlapping partitions based on different frequency ranges. After performing a discretization operation on each partition, we mine partition-specific patterns that discriminate an emotion from all other emotions. A classifier is then trained with features obtained from the extracted patterns. Our experimental evaluations indicate that the spectrogram-based patterns outperform the standard set of acoustic features. It is also shown that the results can further be improved with the increasing number of spectrogram partitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl. Based Syst. 184, 104886 (2019)
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Proc. 22(6), 1154–1160 (2012)
Gosztolya, G.: Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica 17, 7–23 (2020)
Huang, K., Wu, C., Hong, Q., Su, M., Zeng, Y.: Speech emotion recognition using convolutional neural network with audio word-based embedding. In: ISCSLP, pp. 265–269 (2018)
Ishi, C.T., Kanda, T.: Prosodic and voice quality analyses of loud speech: differences of hot anger and far-directed speech. In: SMM, pp. 1–5 (2019)
Kaya, H., Fedotov, D., Yeşilkanat, A., Verkholyak, O., Zhang, Y., Karpov, A.: LSTM based cross-corpus and cross-task acoustic emotion recognition. In: INTERSPEECH, pp. 521–525 (2018)
Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access 7, 117327–117345 (2019)
Lin, J., Wu, C., Wei, W.: Emotion recognition of conversational affective speech using temporal course modeling. In: INTERSPEECH, pp. 1336–1340 (2013)
Liu, Y., Zheng, Y.F.: One-against-all multi-class SVM classification using reliability measures. In: IJCNN, vol. 2, pp. 849–854 (2005)
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)
Mao, S., Tao, D., Zhang, G., Ching, P.C., Lee, T.: Revisiting hidden Markov models for speech emotion recognition. In: ICASSP, pp. 6715–6719 (2019)
Milgram, J., Cheriet, M., Sabourin, R.: “One against one” or “One against all”: which one is better for handwriting recognition with SVMs? In: IWFHR (2006)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: ANIPS, pp. 547–553 (2000)
Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp. 1089–1093 (2017)
Spyrou, E., Nikopoulou, R., Vernikos, I., Mylonas, P.: Emotion recognition from speech using the bag-of-visual words on audio segment spectrograms. Technologies 7(1), 20 (2019)
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: ICASSP, pp. 5200–5204 (2016)
Tripathi, S., Beigi, H.S.M.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning. arXiv abs/1804.05788 (2018)
Tripathi, S., Kumar, A., Ramesh, A., Singh, C., Yenigalla, P.: Deep learning-based emotion recognition system using speech features and transcriptions. arXiv abs/1906.05681 (2019)
Tzirakis, P., Zhang, J., Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: ICASSP, pp. 5089–5093 (2018)
Wang, Y., Hu, W.: Speech emotion recognition based on improved MFCC. In: CSAE, pp. 1–7 (2018)
Yang, H., Duan, L., Hu, B., Deng, S., Wang, W., Qin, P.: Mining top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015)
Yang, Z., Hirschberg, J.: Predicting arousal and valence from waveforms and spectrograms using deep neural networks. In: INTERSPEECH, pp. 3092–3096 (2018)
Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: ACII, pp. 139–145 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Avci, U. (2020). Speech Emotion Recognition Using Spectrogram Patterns as Features. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)