Skip to main content

Speech Emotion Recognition Using Spectrogram Patterns as Features

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

  • 1705 Accesses

Abstract

In this paper, we tackle the problem of identifying emotions from speech by using features derived from spectrogram patterns. Towards this goal, we create a spectrogram for each speech signal. Produced spectrograms are divided into non-overlapping partitions based on different frequency ranges. After performing a discretization operation on each partition, we mine partition-specific patterns that discriminate an emotion from all other emotions. A classifier is then trained with features obtained from the extracted patterns. Our experimental evaluations indicate that the spectrogram-based patterns outperform the standard set of acoustic features. It is also shown that the results can further be improved with the increasing number of spectrogram partitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl. Based Syst. 184, 104886 (2019)

    Google Scholar 

  2. Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Proc. 22(6), 1154–1160 (2012)

    Article  MathSciNet  Google Scholar 

  3. Gosztolya, G.: Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica 17, 7–23 (2020)

    Article  Google Scholar 

  4. Huang, K., Wu, C., Hong, Q., Su, M., Zeng, Y.: Speech emotion recognition using convolutional neural network with audio word-based embedding. In: ISCSLP, pp. 265–269 (2018)

    Google Scholar 

  5. Ishi, C.T., Kanda, T.: Prosodic and voice quality analyses of loud speech: differences of hot anger and far-directed speech. In: SMM, pp. 1–5 (2019)

    Google Scholar 

  6. Kaya, H., Fedotov, D., Yeşilkanat, A., Verkholyak, O., Zhang, Y., Karpov, A.: LSTM based cross-corpus and cross-task acoustic emotion recognition. In: INTERSPEECH, pp. 521–525 (2018)

    Google Scholar 

  7. Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access 7, 117327–117345 (2019)

    Article  Google Scholar 

  8. Lin, J., Wu, C., Wei, W.: Emotion recognition of conversational affective speech using temporal course modeling. In: INTERSPEECH, pp. 1336–1340 (2013)

    Google Scholar 

  9. Liu, Y., Zheng, Y.F.: One-against-all multi-class SVM classification using reliability measures. In: IJCNN, vol. 2, pp. 849–854 (2005)

    Google Scholar 

  10. Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)

    Article  Google Scholar 

  11. Mao, S., Tao, D., Zhang, G., Ching, P.C., Lee, T.: Revisiting hidden Markov models for speech emotion recognition. In: ICASSP, pp. 6715–6719 (2019)

    Google Scholar 

  12. Milgram, J., Cheriet, M., Sabourin, R.: “One against one” or “One against all”: which one is better for handwriting recognition with SVMs? In: IWFHR (2006)

    Google Scholar 

  13. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: ANIPS, pp. 547–553 (2000)

    Google Scholar 

  14. Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp. 1089–1093 (2017)

    Google Scholar 

  15. Spyrou, E., Nikopoulou, R., Vernikos, I., Mylonas, P.: Emotion recognition from speech using the bag-of-visual words on audio segment spectrograms. Technologies 7(1), 20 (2019)

    Article  Google Scholar 

  16. Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: ICASSP, pp. 5200–5204 (2016)

    Google Scholar 

  17. Tripathi, S., Beigi, H.S.M.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning. arXiv abs/1804.05788 (2018)

    Google Scholar 

  18. Tripathi, S., Kumar, A., Ramesh, A., Singh, C., Yenigalla, P.: Deep learning-based emotion recognition system using speech features and transcriptions. arXiv abs/1906.05681 (2019)

    Google Scholar 

  19. Tzirakis, P., Zhang, J., Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: ICASSP, pp. 5089–5093 (2018)

    Google Scholar 

  20. Wang, Y., Hu, W.: Speech emotion recognition based on improved MFCC. In: CSAE, pp. 1–7 (2018)

    Google Scholar 

  21. Yang, H., Duan, L., Hu, B., Deng, S., Wang, W., Qin, P.: Mining top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015)

    MathSciNet  MATH  Google Scholar 

  22. Yang, Z., Hirschberg, J.: Predicting arousal and valence from waveforms and spectrograms using deep neural networks. In: INTERSPEECH, pp. 3092–3096 (2018)

    Google Scholar 

  23. Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: ACII, pp. 139–145 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Umut Avci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Avci, U. (2020). Speech Emotion Recognition Using Spectrogram Patterns as Features. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics