Speech Emotion Recognition Using Spiking Neural Networks

  • Cosimo A. Buscicchio
  • Przemysław Górecki
  • Laura Caponetti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4203)


Human social communication depends largely on exchanges of non-verbal signals, including non-lexical expression of emotions in speech. In this work, we propose a biologically plausible methodology for the problem of emotion recognition, based on the extraction of vowel information from an input speech signal and on the classification of extracted information by a spiking neural network. Initially, a speech signal is segmented into vowel parts which are represented with a set of salient features, related to the Mel-frequency cesptrum. Different emotion classes are then recognized by a spiking neural network and classified into five different emotion classes.


Speech Signal Emotion Recognition Spike Train Interactive Voice Response System Spike Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    McCauley, L., Gholson, B., Hu, X., Graesser, A.: Delivering smooth tutorial dialogue using a talking head. In: Proc. Of WECC 1998, Workshop on Embodied Conversational Characters, Tahoe City, California, AAAI, ACM/SIGCHI (1998)Google Scholar
  2. 2.
    Reeves, B., Nass, C.: The Media Equation. Cambridge University Press, Cambridge (1996)Google Scholar
  3. 3.
    Petrushin, V.A.: Emotion in speech: Recognition and Application to call centers. In: Accenture 3773 Willow Rd. Northbrook, IL 60062 - Proceedings of the 1999 Conference on Artificial Neural Networks in Engineering (ANNIE 1999), ASME Press (1999)Google Scholar
  4. 4.
    Sagisaka, Y., Campbell, N., Higuch, I.N.: Computing Prosody. Springer, New York (1997)Google Scholar
  5. 5.
    Chiu, C.C., Chang, Y.L., Lai, Y.J.: The analysis and recognition of human vocal emotions. In: Proc. International Computer Symposium, pp. 83–88 (1994)Google Scholar
  6. 6.
    Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1970–1973 (1996)Google Scholar
  7. 7.
    Von Brandt, A.: Detecting and estimating parameters jumps using adder algorithms and likelihood ratio test. In: Proc. ICASSP, Boston, MA, pp. 1017–1020 (1983)Google Scholar
  8. 8.
    Rabiner, J.: Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  9. 9.
    Ferster, D., Spruston, N.: Cracking the neural code. Science (270), 756–757 (1995)Google Scholar
  10. 10.
    Horn, D., Opher, I.: Collective Exitation Phenomena and Their Apllications. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)Google Scholar
  11. 11.
    Gerstner, W.: Spiking Neurons. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Gerstner, W., Kempter, R., Leo van Hammen, J., Wagner, H.: Hebbian Learning of Pulse Timing in the Brain Owl Auditory Mass. In: Maass, W., Bishop, C.M. (eds.) Pulsed Neural Networks, MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Hopfield, J., Brody, C.D.: What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration. PNAS 98(3), 1282–1287 (2001)CrossRefGoogle Scholar
  14. 14.
    Maass, W.: Computation with spiking neurons. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., MIT Press, Cambridge (2001)Google Scholar
  15. 15.
    Steeneken, H., Hansen, J.: Speech Under Stress Conditions: Overview of the Effect of Speech Production and on System Performance. In: IEEE ICASSP-1999: Inter. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, March 1999, vol. 4, pp. 2079–2082 (1999)Google Scholar
  16. 16.
  17. 17.
    Yacoub, S., Simske, S., Lin, X., Burns, J.: Recognition of emotions in interactive voice response systems. In: Proc. Eurospeech, Geneva (2003)Google Scholar
  18. 18.
    Kwon, O.-W., Chan, K.-L., Hao, J., Lee, T.-W.: Emotion Recognition by Speech Signals. In: Eurospeech 2003, September 2003, pp. 125–128 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Cosimo A. Buscicchio
    • 1
  • Przemysław Górecki
    • 1
  • Laura Caponetti
    • 1
  1. 1.Dipartimento di InformaticaUniversita degli Studi di BariBariItaly

Personalised recommendations