Emotion Recognition in Sound

  • Anastasiya S. PopovaEmail author
  • Alexandr G. Rassadin
  • Alexander A. Ponomarenko
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 736)


In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational experiment was done based on Radvess open dataset including 8 different emotions: “neutral”, “calm”, “happy,” “sad,” “angry,” “scared”, “disgust”, “surprised”. Our best accuracy result 71% was produced by combination “melspectrogram + convolution neural network VGG-16”.


Deep learning Classification Convolutional neural networks Audio recognition Emotion recognition Speech recognition 



The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (Grant №17-05-0007) and by the Russian Academic Excellence Project “5-100”.


  1. 1.
    Krofto, E.: Kak YAndeks raspoznaet muzyku s mikrofona. In: Yet another Conference 2013, Moscow (2013). (in Russian)Google Scholar
  2. 2.
    Wang, A.: An industrial strength audio search algorithm. In: ISMIR, vol. 2003, pp. 7–13 (2003)Google Scholar
  3. 3.
    Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)CrossRefGoogle Scholar
  4. 4.
    Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298 (2016)
  5. 5.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Ortony, A., Turner, T.J.: What’s basic about basic emotions? Psychol. Rev. 97(3), 315 (1990)CrossRefGoogle Scholar
  7. 7.
    Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44(4), 695–729 (2005)CrossRefGoogle Scholar
  8. 8.
    Russell, J.A., Ward, L.M., Pratt, G.: Affective quality attributed to environments: a factor analytic study. Environ. Behav. 13(3), 259–288 (1981)CrossRefGoogle Scholar
  9. 9.
    Livingstone, S.R., Peck, K., Russo, F.A.: Ravdess: the Ryerson audio-visual database of emotional speech and song. In: Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (2012)Google Scholar
  10. 10.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  11. 11.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)Google Scholar
  12. 12.
    Zhang, Z.: Feature-based facial expression recognition: sensitivity analysis and experiments with a multilayer perceptron. Int. J. Pattern Recognit. Artif. Intell. 13(06), 893–911 (1999)CrossRefGoogle Scholar
  13. 13.
    Tsai, T.J., Morgan, N.: Longer features: they do a speech detector good. In: INTERSPEECH, pp. 1356–1359 (2012)Google Scholar
  14. 14.
    Eyben, F., Böck, S., Schuller, B.W., Graves, A.: Universal onset detection with bidirectional long short-term memory neural networks. In: ISMIR, pp. 589–594 (2010)Google Scholar
  15. 15.
    Ramachandran, A., Vasudevan, S., Naganathan, V.: Deep learning for music era classification. Accessed 23 June 2017
  16. 16.
    Ishaq, M.: Voice activity detection and garbage modelling for a mobile automatic speech recognition application. Accessed 23 June 2017

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Anastasiya S. Popova
    • 1
    Email author
  • Alexandr G. Rassadin
    • 1
  • Alexander A. Ponomarenko
    • 1
  1. 1.Higher School of EconomicsNational Research UniversityNizhniy NovgorodRussian Federation

Personalised recommendations