Abstract
Human speech which is generated through the vibration of the vocal cord gets affected by the emotional state of the speaker. Accurate recognition of different emotions concealed in human speech is a significant factor toward further improvement of the quality of Human–Computer Interaction (HCI). But the satisfactory level of accuracy is not yet achieved mainly because there is no well-accepted standard feature set. Emotions are hard to distinguish from speech even by human and that is why the standard feature set is difficult to extract. This paper presents a model to classify emotions from speech signals with high accuracy compared to the present state of the art. The speech dataset used in this experiment where speech recordings that are specifically labeled with different emotions of the speakers. A wavelet-based novel feature set is extracted from speech signals and then a Neural Network (NN) with a single hidden layer is trained on the feature set for classification of different emotions. The feature set is a newly introduced one and for the first time it is being tested with NN architecture and classification results are also compared with the results of other prominent classification techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587
Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17:582–596
Bosch LT (2003) Emotions, speech and the asr framework. Speech Commun 40(1):213–225
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80. https://doi.org/10.1109/79.911197
Han K, Dong Y, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of the INTERSPEECH
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Proceedings of the INTERSPEECH
Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Proceedings of the INTERSPEECH
Shen P, Changjun Z, Chen X (2011) Automatic speech emotion recognition using support vector machine. In: Proceedings of the international conference on electronic mechanical engineering and information technology, vol 2, pp 621–625. https://doi.org/10.1109/EMEIT.2011.6023178
JB (2001) Speech emotion recognition using hidden markov models. In: Proceedings of INTERSPEECH, pp 2679–2682,
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Mower E, Mataric MJ, Narayanan S (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19(5):1057–1070. ISSN 1558-7916. https://doi.org/10.1109/TASL.2010.2076804
Lugger M, Yang B (2008) Psychological motivated multi-stage emotion classification exploiting voice quality features. In: Mihelic F, Zibert J (eds) Speech recognition, technologies and applications, chapter 22. I-Tech
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423
Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw, 18(5):389–405. ISSN 0893-6080. https://doi.org/10.1016/j.neunet.2005.03.006
Walker JS (2008) A primer on WAVELETS and their scientific applications. Taylor and Francis Group, LLC
Quiroga RQ, Rosso OA, Basar E, Schurman M (2001) Wavelet entropy in event-related potentials: a new method shows ordering of EEG oscillations. Biol Cybern 84:291–299
Kullback S (1959) Digital signal processing. Wiley
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (RAVDESS). Public Library Sci 13(5):1–35. https://doi.org/10.1371/journal.pone.0196391
Slaney M, McRoberts G (1998) Baby ears: a recognition system for affective vocalizations. In: Proceedings of the international conference on acoustics, speech, and signal processing
Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a Danish emotional speech database. In: Proceedings of the 5th European conference on speech communication and technology
Fayek HM, Lech M, Cavedonb L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw, 92:60–68
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Roy, T., Marwala, T., Chakraverty, S. (2020). Speech Emotion Recognition Using Neural Network and Wavelet Features. In: Chakraverty, S., Biswas, P. (eds) Recent Trends in Wave Mechanics and Vibrations. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-0287-3_30
Download citation
DOI: https://doi.org/10.1007/978-981-15-0287-3_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0286-6
Online ISBN: 978-981-15-0287-3
eBook Packages: EngineeringEngineering (R0)