ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Khurana, Surbhi; Dev, Amita; Bansal, Poonam

doi:10.1007/s11042-024-19321-6

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Published: 13 May 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Surbhi Khurana¹,
Amita Dev¹ &
Poonam Bansal¹

15 Accesses
Explore all metrics

Abstract

The textual or display-based control paradigm in human–computer interaction (HCI) has changed in favor of more natural control modalities like voice and gesture. Speech, in particular, contains a significant deal of information, revealing the speaker's inner state and intention. While word analysis makes understanding the speaker's request possible, other speech aspects reveal the speaker's attitude, goal, and motivation. As a result, it is now crucial for modern human–computer interface systems to recognize emotions from speech. Numerous techniques for sound analysis have been created in the past. This work aims to detect human emotions from their voice snippet; for this, an English language open source dataset Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Hindi-language dataset IITKGP-SEHSC are used. RAVDESS contains over 2000 voice samples recorded by 24 actors covering eight emotions: anger, fear, neutral, calmness, happiness, sadness, disgust, and surprise. The proposed model uses ADAM optimized deep learning model along with MFCC, chroma, and Mel band spectral energy features (MBSE) to classify and recognize eight different human vocal emotions. A multilayer perceptron (MLP) classifier is used for classification. The efficiency of the proposed model was compared to another state of the art, and the outcomes were assessed. Using the proposed structure of the model on the RAVDESS and IITKGP-SEHSC datasets, an overall accuracy of 85.19% and 80%, respectively, were achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Data availability

The experiment uses RAVDESS, an open-source speech emotional dataset. This dataset can be shared if requested.

References

Taylor JG, Scherer K, Cowie R (2005) Emotion and brain: u. Neural Netw 18(4):313–316
Article Google Scholar
Chavhan Y, Dhore ML, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Google Scholar
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun 49(3):201–212. https://doi.org/10.1016/j.specom.2007.01.006
Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human-robot interaction. Pattern Anal Appl 9(1):58–69
Article Google Scholar
Partila P, Voznak M (2013) Speech emotions recognition using a 2-d neural classifier. In: Nostradamus 2013: Prediction, modeling and analysis of complex systems. Springer, Berlin, Germany, pp 221–231
Zhao Z (2021) Combining a parallel 2D CNN with a self-attention dilated residual network for CTC- based discrete speech emotion recognition. Neural Netw 141:52–60
Article Google Scholar
Lee S, Han DK, Ko H (2020) Fusion-ConvBERT: parallel convolution and BERT fusion for speech emotion recognition. Sensors 20(22):6688
Article Google Scholar
Zhang H, Gou R, Shang J, Shen F, Wu Y, Dai G (2021) Pretrained deep convolution neural network model with attention for speech emotion recognition. Front Physiol 12:643202
Article Google Scholar
Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl 21(8):2115–2126
Article Google Scholar
Petrushin V (1999) Emotion in speech: recognition and application to call centers. Proc Artif Neural Netw Eng 710:22
Google Scholar
Zhang S, Zhao X (2013) Dimensionality reduction-based spoken emotion recognition. Multimed Tools Appl 63(3):615–646
Article Google Scholar
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Article Google Scholar
Fahad Md.S, Ranjan A, Yadav J, Deepak A (2021) A survey of speech emotion recognition in natural environment. Digit Sig Proc 110:102951. https://doi.org/10.1016/j.dsp.2020.102951
Khurana S, Dev A, Bansal P (2021) Current state of Speech emotion dataset- national and international level. In: Proc. International conference on artificial intelligence and speech technology. Springer, pp 232–243
Khurana S, Dev A, Bansal P (2023) SER: performance evaluation of cnn model along with an overview of available indic speech datasets, and transition of classifiers from traditional to modern era. ACM Trans Asian Low-Resour Lang Inf Process. https://doi.org/10.1145/3605778
Article Google Scholar
Livingstone S, Russo F (2018) The Ryerson audio-visual database of emotional speech and song(RAVDESS): a dynamic. Multimodal Set Facial Vocal Expressions N. Amer Engl 13
https://www.kaggle.com/uwrfkaggler/ravdess-emotionalspeech-audio. Accessed Jan 2023
Koolagudi GS, Reddy R, Yadav J, Rao KS (2022) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In: Proc. IEEE international conference on devices and communications (ICDeCom), pp1–5
Kaur K, Singh P (2023) Trends in speech emotion recognition: a comprehensive survey. Multimed Tools Appl 82(19):29307–29351. https://doi.org/10.1007/s11042-023-14656-y
Article Google Scholar
Kattel M, Nepal A, Shah AK, Shrestha D (2019) Chroma feature extraction using Fourier Transform. In: Proc the conference, Jan 2019
Moreno JJM (2011) Artificial neural networks applied to forecasting time series. Psicothema 23(2):322–329
Google Scholar
Raghu Vamsi U, Yuvraj Chowdhary B, Harshitha M, Ravi Theja S, Divya Udayan J (2021) Speech emotion recognition(ser) using multilayer perceptron and deep learning techniques. IEEE Access 27(5)
Agarwal SS (2011) Emotions in Hindi speech-analysis, perception and recognition. In: Proc of international conference on Speech Database and Assessment. https://doi.org/10.1109/ICSDA.2011.6085972
Montero JM, Gutierrez-Arriola J, Colás J, Enriquez E, Pardo JM (1999) Analysis andmodelling of emotional speech in Spanish. In: Proc of ICPhS. vol 2, pp 957–960
Xu M, Zhang F, Zhang W (2021) Head fusion: improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS dataset. IEEE Access 9:74539–74549
Article Google Scholar
Alnuaim AA et al (2022) Human-computer interaction for recognizing speech emotions using multilayer perceptron classifier. J Healthc Eng 2022:1–12. https://doi.org/10.1155/2022/6005446
Article Google Scholar
Caschera MC, Grifoni P, Ferri F (2022) Emotion classification from speech and text in videos using a multimodal approach. Multimodal Technol Interact 6(4):28. https://doi.org/10.3390/mti6040028
Article Google Scholar
Ahmed N, Aghbari ZA, Girija S (2023) A systematic survey on multimodal emotion recognition using learning algorithms. Intell Syst Appl 17:200171. https://doi.org/10.1016/j.iswa.2022.200171
Article Google Scholar
Al-Dujaili MJ, Ebrahimi-Moghadam A (2023) Speech emotion recognition: a comprehensive survey. Wirel Pers Commun 129(4):2525–2561. https://doi.org/10.1007/s11277-023-10244-3
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, New Delhi, 110006, Delhi, India
Surbhi Khurana, Amita Dev & Poonam Bansal

Authors

Surbhi Khurana
View author publications
You can also search for this author in PubMed Google Scholar
Amita Dev
View author publications
You can also search for this author in PubMed Google Scholar
Poonam Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Surbhi Khurana.

Ethics declarations

Conflict of interests

Authors have shown No conflict of Interest. No funding has been taken to conduct the experiment and study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khurana, S., Dev, A. & Bansal, P. ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19321-6

Download citation

Received: 15 February 2023
Revised: 06 April 2024
Accepted: 30 April 2024
Published: 13 May 2024
DOI: https://doi.org/10.1007/s11042-024-19321-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation