Speech Emotion Recognition Using Neural Network and Wavelet Features

Roy, Tanmoy; Marwala, Tshilidzi; Chakraverty, S.

doi:10.1007/978-981-15-0287-3_30

Tanmoy Roy³,
Tshilidzi Marwala³ &
S. Chakraverty⁴

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

584 Accesses
5 Citations

Abstract

Human speech which is generated through the vibration of the vocal cord gets affected by the emotional state of the speaker. Accurate recognition of different emotions concealed in human speech is a significant factor toward further improvement of the quality of Human–Computer Interaction (HCI). But the satisfactory level of accuracy is not yet achieved mainly because there is no well-accepted standard feature set. Emotions are hard to distinguish from speech even by human and that is why the standard feature set is difficult to extract. This paper presents a model to classify emotions from speech signals with high accuracy compared to the present state of the art. The speech dataset used in this experiment where speech recordings that are specifically labeled with different emotions of the speakers. A wavelet-based novel feature set is extracted from speech signals and then a Neural Network (NN) with a single hidden layer is trained on the feature set for classification of different emotions. The feature set is a newly introduced one and for the first time it is being tested with NN architecture and classification results are also compared with the results of other prominent classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587
Article Google Scholar
Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17:582–596
Article Google Scholar
Bosch LT (2003) Emotions, speech and the asr framework. Speech Commun 40(1):213–225
MATH Google Scholar
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80. https://doi.org/10.1109/79.911197
Article Google Scholar
Han K, Dong Y, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of the INTERSPEECH
Google Scholar
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Proceedings of the INTERSPEECH
Google Scholar
Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Proceedings of the INTERSPEECH
Google Scholar
Shen P, Changjun Z, Chen X (2011) Automatic speech emotion recognition using support vector machine. In: Proceedings of the international conference on electronic mechanical engineering and information technology, vol 2, pp 621–625. https://doi.org/10.1109/EMEIT.2011.6023178
JB (2001) Speech emotion recognition using hidden markov models. In: Proceedings of INTERSPEECH, pp 2679–2682,
Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Article Google Scholar
Mower E, Mataric MJ, Narayanan S (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19(5):1057–1070. ISSN 1558-7916. https://doi.org/10.1109/TASL.2010.2076804
Article Google Scholar
Lugger M, Yang B (2008) Psychological motivated multi-stage emotion classification exploiting voice quality features. In: Mihelic F, Zibert J (eds) Speech recognition, technologies and applications, chapter 22. I-Tech
Google Scholar
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423
Article Google Scholar
Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw, 18(5):389–405. ISSN 0893-6080. https://doi.org/10.1016/j.neunet.2005.03.006
Article Google Scholar
Walker JS (2008) A primer on WAVELETS and their scientific applications. Taylor and Francis Group, LLC
Google Scholar
Quiroga RQ, Rosso OA, Basar E, Schurman M (2001) Wavelet entropy in event-related potentials: a new method shows ordering of EEG oscillations. Biol Cybern 84:291–299
Google Scholar
Kullback S (1959) Digital signal processing. Wiley
Google Scholar
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (RAVDESS). Public Library Sci 13(5):1–35. https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Slaney M, McRoberts G (1998) Baby ears: a recognition system for affective vocalizations. In: Proceedings of the international conference on acoustics, speech, and signal processing
Google Scholar
Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a Danish emotional speech database. In: Proceedings of the 5th European conference on speech communication and technology
Google Scholar
Fayek HM, Lech M, Cavedonb L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw, 92:60–68
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa
Tanmoy Roy & Tshilidzi Marwala
Department of Mathematics, National Institute of Technology, Rourkela, India
S. Chakraverty

Authors

Tanmoy Roy
View author publications
You can also search for this author in PubMed Google Scholar
Tshilidzi Marwala
View author publications
You can also search for this author in PubMed Google Scholar
S. Chakraverty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanmoy Roy .

Editor information

Editors and Affiliations

Department of Mathematics, National Institute of Technology Rourkela, Rourkela, Odisha, India
S. Chakraverty
Vibration Research Group, Von karman society, Jalpaiguri, India
Paritosh Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roy, T., Marwala, T., Chakraverty, S. (2020). Speech Emotion Recognition Using Neural Network and Wavelet Features. In: Chakraverty, S., Biswas, P. (eds) Recent Trends in Wave Mechanics and Vibrations. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-0287-3_30

Download citation

DOI: https://doi.org/10.1007/978-981-15-0287-3_30
Published: 13 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0286-6
Online ISBN: 978-981-15-0287-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics