Skip to main content

Emotion Prediction of Sound Events Based on Transfer Learning

  • Conference paper
  • First Online:
Book cover Engineering Applications of Neural Networks (EANN 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 744))

Abstract

Processing generalized sound events with the purpose of predicting the emotion they might evoke is a relatively young research field. Tools, datasets, and methodologies to address such a challenging task are still under development, far from any standardized format. This work aims to cover this gap by revealing and exploiting potential similarities existing during the perception of emotions evoked by sound events and music. o this end we propose (a) the usage of temporal modulation features and (b) a transfer learning module based on an Echo State Network assisting the prediction of valence and arousal measurements associated with generalized sound events. The effectiveness of the proposed transfer learning solution is demonstrated after a thoroughly designed experimental phase employing both sound and music data. The results demonstrate the importance of transfer learning in the specific field and encourage further research on approaches which manage the problem in a cooperative way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://freemusicarchive.org/.

References

  1. Ntalampiras, S., Potamitis, I., Fakotakis, N.: Acoustic detection of human activities in natural environments. J. Audio Eng. Soc. 60, 686–695 (2012)

    Google Scholar 

  2. Ntalampiras, S.: A transfer learning framework for predicting the emotional content of generalized sound events. J. Acoust. Soc. Am. 141, 1694–1701 (2017)

    Article  Google Scholar 

  3. Shigeno, S.: Effects of discrepancy between vocal emotion and the emotional meaning of speech on identifying the speakers emotions. J. Acoust. Soc. Am. 140, 3399–3399 (2016)

    Article  Google Scholar 

  4. Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  5. Hozjan, V., Kai, Z.: A rule-based emotion-dependent feature extraction method for emotion analysis from speech. J. Acoust. Soc. Am. 119, 3109–3120 (2006)

    Article  Google Scholar 

  6. Marcell, M., Malatanos, M., Leahy, C., Comeaux, C.: Identifying, rating, and remembering environmental sound events. Behav. Res. Methods 39, 561–569 (2007)

    Article  Google Scholar 

  7. Garner, T., Grimshaw, M.: A climate of fear: considerations for designing a virtual acoustic ecology of fear. In: Proceedings of 6th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 31–38 (2011)

    Google Scholar 

  8. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)

    Article  MATH  Google Scholar 

  9. Asadi, R., Fell, H.: Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features. J. Acoust. Soc. Am. 137, 2303–2303 (2015)

    Article  Google Scholar 

  10. Lee, C., Lui, S., So, C.: Visualization of time-varying joint development of pitch and dynamics for speech emotion recognition. J. Acoust. Soc. Am. 135, 2422–2422 (2014)

    Article  Google Scholar 

  11. Fukuyama, S., Goto, M.: Music emotion recognition with adaptive aggregation of Gaussian process regressors. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 71–75 (2016)

    Google Scholar 

  12. Markov, K., Matsui, T.: Music genre and emotion recognition using Gaussian processes. IEEE Access 2, 688–697 (2014)

    Article  Google Scholar 

  13. Yi-Hsuan, Y., Chen, H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3, 40:1–40:30 (2012)

    Google Scholar 

  14. Gang, M.-J., Teft, L.: Individual differences in heart rate responses to affective sound. Psychophysiology 12, 423–426 (1975)

    Article  Google Scholar 

  15. Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 341–344 (2012)

    Google Scholar 

  16. Drossos, K., Floros, A., Kanellopoulos, N.-G.: Affective acoustic ecology: towards emotionally enhanced sound events. In: Proceedings of 7th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 109–116 (2012)

    Google Scholar 

  17. Weninger, F., Eyben, F., Schuller, B., Mortillaro, M., Scherer, K.-R.: On the acoustics of emotion in audio: what speech, music and sound have in common. Front. Psychol. 292, 1–12 (2013)

    Google Scholar 

  18. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.-R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)

    Google Scholar 

  19. Bradley, M., Lang, P.-J.: The International Affective Digitized Sounds (2nd edn. IADS-2): Affective Ratings of Sounds and Instruction Manual. Technical report B-3, University of Florida, Gainesville, Fl (2004)

    Google Scholar 

  20. Soleymani, M., Caro, M.-N., Schmidt, E.-M., Sha, C.-Y., Yang, Y.H.: 1000 songs for emotional analysis of music. In: Proceedings of 2nd ACM International Workshop on Crowdsourcing for Multimedia, pp. 1–6 (2013)

    Google Scholar 

  21. Ntalampiras, S., Potamitis, I.: On predicting the unpleasantness level of a sound event. In: 15th Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1782–1785 (2014)

    Google Scholar 

  22. Clark, P., Atlas, L.: Time-frequency coherent modulation filtering of nonstationary signals. IEEE Trans. Signal Process. 57, 4323–4332 (2009)

    Article  MathSciNet  Google Scholar 

  23. Schimmel, S.M., Atlas, L.E., Nie, K.: Feasibility of single channel speaker separation based on modulation frequency analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 605–608 (2007)

    Google Scholar 

  24. Vinton, M.S., Atlas, L.E.: Scalable and progressive audio codec. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2001), pp. 3277–3280 (2001)

    Google Scholar 

  25. Klapuri, A.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio Speech Lang. Process. 16, 255–266 (2008)

    Article  Google Scholar 

  26. Atlas, L., Clark, P., Schimmel, S.: Modulation Toolbox Version 2.1 for MATLAB. http://isdl.ee.washington.edu/projects/modulationtoolbox/. Accessed Sept 2010

  27. Jalalvand, A., Triefenbach, F., Verstraeten, D., Martens, J.: Connected digit recognition by means of reservoir computing. In: Proceedings of 12th Annual Conference of the International Speech Communication Association, pp. 1725–1728 (2011)

    Google Scholar 

  28. Verstraeten, D., Schrauwen, B., Stroobandt, D.: Reservoir-based techniques for speech recognition. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1050–1053 (2006)

    Google Scholar 

  29. Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004)

    Article  Google Scholar 

  30. Lukoševičius, M., Jaeger, H.: Survey: reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)

    Article  MATH  Google Scholar 

  31. Verstraeten, D., Schrauwen, B., d’Haene, M., Stroobandt, D.: An experimental unification of reservoir computing methods. Neural Netw. 20, 391–403 (2007)

    Article  MATH  Google Scholar 

  32. Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Signal Process. 2009, 1–12 (2009)

    Article  MATH  Google Scholar 

  33. Ntalampiras, S.: Audio pattern recognition of baby crying sound events. J. Audio Eng. Soc 63, 358–369 (2015)

    Article  Google Scholar 

  34. Scharf, B.: Complex sounds and critical bands. Psychol. Bull. 58, 205–217 (1961)

    Article  Google Scholar 

  35. Yi-Lin, L., Gang, W.: Speech emotion recognition based on HMM and SVM. In: International Conference on Machine Learning and Cybernetics, vol. 8, pp. 4898–4901 (2005)

    Google Scholar 

  36. Smola, A.-J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

The research leading to these results has received partial funding from European Union HORIZON 2020 fast track to innovation project no. 691131 REMOSIS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stavros Ntalampiras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ntalampiras, S., Potamitis, I. (2017). Emotion Prediction of Sound Events Based on Transfer Learning. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65172-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65171-2

  • Online ISBN: 978-3-319-65172-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics