Skip to main content

Speech Emotion Recognition Using Spiking Neural Networks

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 4203)

Abstract

Human social communication depends largely on exchanges of non-verbal signals, including non-lexical expression of emotions in speech. In this work, we propose a biologically plausible methodology for the problem of emotion recognition, based on the extraction of vowel information from an input speech signal and on the classification of extracted information by a spiking neural network. Initially, a speech signal is segmented into vowel parts which are represented with a set of salient features, related to the Mel-frequency cesptrum. Different emotion classes are then recognized by a spiking neural network and classified into five different emotion classes.

Keywords

  • Speech Signal
  • Emotion Recognition
  • Spike Train
  • Interactive Voice Response System
  • Spike Neural Network

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McCauley, L., Gholson, B., Hu, X., Graesser, A.: Delivering smooth tutorial dialogue using a talking head. In: Proc. Of WECC 1998, Workshop on Embodied Conversational Characters, Tahoe City, California, AAAI, ACM/SIGCHI (1998)

    Google Scholar 

  2. Reeves, B., Nass, C.: The Media Equation. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  3. Petrushin, V.A.: Emotion in speech: Recognition and Application to call centers. In: Accenture 3773 Willow Rd. Northbrook, IL 60062 - Proceedings of the 1999 Conference on Artificial Neural Networks in Engineering (ANNIE 1999), ASME Press (1999)

    Google Scholar 

  4. Sagisaka, Y., Campbell, N., Higuch, I.N.: Computing Prosody. Springer, New York (1997)

    Google Scholar 

  5. Chiu, C.C., Chang, Y.L., Lai, Y.J.: The analysis and recognition of human vocal emotions. In: Proc. International Computer Symposium, pp. 83–88 (1994)

    Google Scholar 

  6. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1970–1973 (1996)

    Google Scholar 

  7. Von Brandt, A.: Detecting and estimating parameters jumps using adder algorithms and likelihood ratio test. In: Proc. ICASSP, Boston, MA, pp. 1017–1020 (1983)

    Google Scholar 

  8. Rabiner, J.: Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  9. Ferster, D., Spruston, N.: Cracking the neural code. Science (270), 756–757 (1995)

    Google Scholar 

  10. Horn, D., Opher, I.: Collective Exitation Phenomena and Their Apllications. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)

    Google Scholar 

  11. Gerstner, W.: Spiking Neurons. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)

    Google Scholar 

  12. Gerstner, W., Kempter, R., Leo van Hammen, J., Wagner, H.: Hebbian Learning of Pulse Timing in the Brain Owl Auditory Mass. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)

    Google Scholar 

  13. Hopfield, J., Brody, C.D.: What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration. PNAS 98(3), 1282–1287 (2001)

    CrossRef  Google Scholar 

  14. Maass, W.: Computation with spiking neurons. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., MIT Press, Cambridge (2001)

    Google Scholar 

  15. Steeneken, H., Hansen, J.: Speech Under Stress Conditions: Overview of the Effect of Speech Production and on System Performance. In: IEEE ICASSP-1999: Inter. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, March 1999, vol. 4, pp. 2079–2082 (1999)

    Google Scholar 

  16. Praat homepage: http://www.fon.hum.uva.nl/praat

  17. Yacoub, S., Simske, S., Lin, X., Burns, J.: Recognition of emotions in interactive voice response systems. In: Proc. Eurospeech, Geneva (2003)

    Google Scholar 

  18. Kwon, O.-W., Chan, K.-L., Hao, J., Lee, T.-W.: Emotion Recognition by Speech Signals. In: Eurospeech 2003, September 2003, pp. 125–128 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Buscicchio, C.A., Górecki, P., Caponetti, L. (2006). Speech Emotion Recognition Using Spiking Neural Networks. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_6

Download citation

  • DOI: https://doi.org/10.1007/11875604_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45764-0

  • Online ISBN: 978-3-540-45766-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics