Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios

Kawade, Rupali Ramdas; Jagtap, Sonal K.

doi:10.1007/s11042-023-17097-9

Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios

Published: 04 December 2023

Volume 83, pages 54331–54365, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rupali Ramdas Kawade^1,2 &
Sonal K. Jagtap^1,3

84 Accesses
Explore all metrics

Abstract

Speech has a significant role in conveying emotional information, and SER has emerged as a crucial component of the human–computer interface that has high real-time and accuracy needs. This paper proposes a novel Improved Coot optimization-based Ensemble Classification (ICO-EC) for SER that follows three stages: preprocessing, feature extraction, and classification. The model starts with the preprocessing step, where the class imbalance problem is resolved using Improved SMOTE-ENC. Subsequently, in the feature extraction stage, IMFCC-based features, Chroma-based features, ZCR-based features, and spectral roll-off-based features are extracted. The last stage is classification; in this, an ensemble classification model is used, which combines the classifiers including Deep Maxout, LSTM and ICNN, respectively. Here, the training process is made optimal via an Improved Coot Optimization (ICO) by tuning the optimal weights. At last, the performances of the developed model are validated with conventional methods with four different databases. Also, the proposed model for cross-lingual provides a better accuracy as 92.76% for Hindi, 92.95% for Kannada, 93.85% for Telugu, and 95.97% for Urdu, respectively. The ICO-CE model outperformed 93% accuracy in the Hindi dataset over other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Article 30 March 2024

Multi-language: ensemble learning-based speech emotion recognition

Article 07 May 2024

Effective MLP and CNN based ensemble learning for speech emotion recognition

Article 03 April 2024

Data availability

The datasets for Hindi, Urdu, Kannada, and Telugu were used to generate the data.

Abbreviations

SER:: Speech Emotion Recognition
ASR:: Automatic Speech Recognition
HMM:: Hidden Markov models
DTW:: Dynamic Time Warping
MFCC:: Mel-frequency Cepstral Coefficients
NN:: Neural Network
ML:: Machine Learning
Taylor-DBN:: Taylor series-based Deep Belief Network
MKMFCC:: Multiple Kernel Mel Frequency Cepstral Coefficients
ECSO:: Enhanced Cat Swarm Optimization
OBL:: Opposition Based Learning
DL:: Deep Learning
BDBN:: Bimodal Deep Belief Network
CNN:: Convolution Neural Network
MEDC:: Mel Energy Spectrum Dynamic Coefficients
SVM:: Support Vector Machine
RNN:: Recurrent neural network

References

Tao J-H, Huang J, Li Ya, Lian Z, Niu M-Y (2019) Semi-supervised ladder networks for speech emotion recognition. Int J Autom Comput 16:437–448. https://doi.org/10.1007/s11633-019-1175-x
Article Google Scholar
Christy, Vaithyasubramanian S, Jesudoss A, Praveena MD Anto (2020) Multimodal speech emotion recognition and classification using convolutional neural network techniques. Int J Speech Technol 23:381–388.https://doi.org/10.1007/s10772-020-09713-y
Poorna SS, Nair GJ (2019) Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol 22:327–340. https://doi.org/10.1007/s10772-019-09605-w
Article Google Scholar
Kumaran U, Rammohan S Radha, Nagarajan Senthil Murugan, Prathik A (2021) Fusion of mel and gamma tone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24:303–314. https://doi.org/10.1007/s10772-020-09792-x
Koduru Anusha, Valiveti Hima Bindu, Budati Anil Kumar (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23:45–55. https://doi.org/10.1007/s10772-020-09672-4
Huijuan Z, Ning Ye, Ruchuan W (2021) Coarse-to-fine speech emotion recognition based on multi-task learning. J Signal Process Syst 93:299–308. https://doi.org/10.1007/s11265-020-01538-x
Article Google Scholar
Arano Keith April, Gloor Peter, Orsenigo Carlotta, Vercellis Carlo (2021) When old meets new: Emotion recognition from speech signals. Cogn Comput 13:771–783.https://doi.org/10.1007/s12559-021-09865-2
Zhang C, Xue L (2021) Autoencoder with emotion embedding for speech emotion recognition. IEEE Access 9:51231–51241. https://doi.org/10.1109/ACCESS.2021.3069818
Article Google Scholar
Karan Aggarwal et al (2022) Has the future started? The current growth of artificial intelligence, machine learning, and deep learning. Iraqi J Comput Sci Math 3.1:115–123
Atmaja BT, Sasou A, Akagi M (2022) Speech emotion and naturalness recognitions with multitask and single-task learnings. IEEE Access 10:72381–72387. https://doi.org/10.1109/ACCESS.2022.3189481
Liu Na, Zhang B, Liu B, Shi J, Yang L, Li Z, Zhu J (2021) Transfer subspace learning for unsupervised cross-corpus speech emotion recognition. IEEE Access 9:95925–95937. https://doi.org/10.1109/ACCESS.2021.3094355
Article Google Scholar
Sun Ting-Wei (2020) End-to-end speech emotion recognition with gender information. IEEE Access 8: 152423-152438.https://doi.org/10.1109/ACCESS.2020.3017462
Xia X, Jiang D, Sahli H (2020) Learning salient segments for speech emotion recognition using attentive temporal pooling. IEEE Access 8:151740–151752. https://doi.org/10.1109/ACCESS.2020.3014733
Article Google Scholar
Retta Ephrem Afele et al (2023) Cross-corpus multilingual speech emotion recognition: Amharic vs. Other Languages. arXiv preprint arXiv:2307.10814
Upadhyay Shreya G et al (2023) Phonetic anchor-based transfer learning to facilitate unsupervised cross-lingual speech emotion recognition. ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Latif S, Qayyum A, Usman M, Qadir J (2018) Cross lingual speech emotion recognition: Urdu vs. western languages. In: Proceedings - 2018 International Conference on Frontiers of Information Technology, FIT 2018 8616972, pp 88–93
Goel Shivali, Beigi Homayoon (2020) Cross lingual cross corpus speech emotion recognition. arXiv preprint arXiv:2003.07996
Zehra Wisha, Javed Abdul Rehman, Jalil Zunera, Khan Habib Ullah, Gadekallu Thippa Reddy (2021) Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst 7:1845–1854.https://doi.org/10.1007/s40747-020-00250-4
Haridas Arul Valiyavalappil, Marimuthu Ramalatha, Sivakumar VG, Chakraborty Basabi (2022) Emotion recognition of speech signal using Taylor series and deep belief network based classification. Evol Intell 15:1145–1158. https://doi.org/10.1007/s12065-019-00333-3
Gomathy M (2021) Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int J Speech Technol 24:155–163. https://doi.org/10.1007/s10772-020-09776-x
Article Google Scholar
Jermsittiparsert K, Abdurrahman A, Siriattakul P, Sundeeva LA, Hashim W, Rahim R, Maseleno A (2020) Pattern recognition and features selection for speech emotion recognition model using deep learning. Int J Speech Technol 23:799–806. https://doi.org/10.1007/s10772-020-09690-2
Article Google Scholar
Yang Z, Huang Y (2022) Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system. Evol Intel 15:2485–2494. https://doi.org/10.1007/s12065-020-00532-3
Article Google Scholar
Wang C, Ren Y, Zhang Na, Cui F, Luo S (2022) Speech emotion recognition based on multi-feature and multi-lingual fusion. Multimed Tools Applic 81:4897–4907. https://doi.org/10.1007/s11042-021-10553-4
Article Google Scholar
Liu D, Chen L, Wang Z, Diao G (2021) Speech expression multimodal emotion recognition based on deep belief network. J Grid Comput 19:22. https://doi.org/10.1007/s10723-021-09564-0
Article Google Scholar
Pawar MD, Kokate RD (2021) Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimed Tools Applic 80:15563–15587. https://doi.org/10.1007/s11042-020-10329-2
Article Google Scholar
Mukherjee M, Khushi M (2021) SMOTE-ENC: A Novel SMOTE-based method to generate synthetic data for nominal and continuous features. Appl Syst Innov 4:18. https://doi.org/10.3390/asi4010018
Article Google Scholar
Taoufiq Belhoussine Drissi, Soumaya Zayrit, Benayad Nsiri, Nouhaila Boualoulou (2022) Cepstral coefficient extraction using the MFCC with the discrete wavelet transform for the parkinson's disease diagnosis. Int J Eng Trends Technol 70(7):283–290, ISSN: 2231 – 5381. https://doi.org/10.14445/22315381/IJETT-V70I7P229
Shah Ayush Kumar, Kattel Manasi, Nepal Araju (2019) Chroma feature extraction. Conference Paper.
Shete DS, Patil SB, Patil SB (2014) Zero crossing rate and energy of the speech signal of devanagari script. IOSR J VLSI Signal Process (IOSR-JVSP) 4(1), Ver. I, PP 01–05 e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197. www.iosrjournals.org
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302
Goodfellow Ian J, Warde-Farley David, Mirza Mehdi, Courville Aaron, Bengio Yoshua (2013) "Maxout networks", Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, JMLR: W&CP volume 28
Sak Hasim, Senior Andrew, Beaufays Francoise (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling.
Ghosh Anirudha, Sufian A, Sultana Farhana, Chakrabarti Amlan (2020) Fundamental concepts of convolutional neural network. https://doi.org/10.1007/978-3-030-32644-9_36
Naruei I, Keynia F (2021) A new optimization method based on COOT bird natural life model. Expert Syst Appl 183:15352
Article Google Scholar
He Di, He C, Jiang L-G, Zhu H-W, Guang-Rui Hu (2001) Chaotic characteristics of a one-dimensional iterative map with infinite collapses. IEEE Trans Circ Syst I: Fundam Theory Applic 48(7):900–906. https://doi.org/10.1109/81.933333
Article MathSciNet Google Scholar
Koolagudi SG, Reddy R, Yadav J, Rao KS (2011) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International conference on devices and communications. IEEE
Koolagudi SG, Maity S, Kumar VA, Chakrabarti S, Rao KS (2009) IITKGP-SESC: speech database for emotion analysis. In: International conference on contemporary computing, Springer
Zehra W, Javed AR, Jalil Z, Khan HU, Gadekallu TR (2021) Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst 7
Joy Jerry, Kannan Aparna, Ram Shreya, Rama S (2020) Speech emotion recognition using neural network and MLP classifier
Aouani Hadhami, Ayed Yassine Ben (2020) Speech emotion recognition with deep learning. Procedia Comput Sci 176
Tamulevicius Gintautas, Korvel Grazina, Yayak Anil Bora, Treigys Povilas, Bernataviciene Jolita, Kostek Bozena (2020) A study of cross-linguistic speech emotion recognition based on 2D feature spaces. Electronics 9:1725
Zehra Wisha, Javed Abdul Rehman, Jalil Zunera, Khan Habib Ullah, Gadekallu Thippa Reddy (2021) Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst 7:1845-1854
Biau G (2012) Analysis of a Random Forests Model. J Mach Learn Res 13:1063–1095
MathSciNet Google Scholar
Sak Hasim, Senior Andrew, Beaufays Francoise (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. INTERSPEECH, pp. 338–342
Liu X, Wang Y, Wang X, Hui Xu, Li C, Xin X (2021) Bi-directional gated recurrent unit neural network based nonlinear equalizer for coherent optical communication system. Opt Express 29:5923–5933
Article Google Scholar
Sherstinsky Alex (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Elsevier, vol.404
Ms. Sonali. B. Maind, Ms Priyanka Wankar (2014) Research paper on basic of artificial neural network. Int J Recent Innov Trends Comput Commun 2:96–100

Download references

Author information

Authors and Affiliations

Department of Electronics & Telecommunication, G H Raisoni College of Engineering and Management, Pune, Maharashtra, 412207, India
Rupali Ramdas Kawade & Sonal K. Jagtap
Department of Electronics & Telecommunication Engineering, PCET’s Pimpri Chinchwad College of Engineering & Research, Ravet, Pune, Maharashtra, 412101, India
Rupali Ramdas Kawade
Department of Electronics and Telecommunication, STES Smt. Kashibai Navale College of Engineering, Vadgaon (Bk), Pune, Maharashtra, 410141, India
Sonal K. Jagtap

Authors

Rupali Ramdas Kawade
View author publications
You can also search for this author in PubMed Google Scholar
Sonal K. Jagtap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupali Ramdas Kawade.

Ethics declarations

Informed consent

Not Relevant

Ethical approval

Not Relevant

Conflict of interest

The authors say they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kawade, R.R., Jagtap, S.K. Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios. Multimed Tools Appl 83, 54331–54365 (2024). https://doi.org/10.1007/s11042-023-17097-9

Download citation

Received: 17 April 2023
Revised: 31 August 2023
Accepted: 15 September 2023
Published: 04 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17097-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios

Abstract

Access this article

Similar content being viewed by others

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Multi-language: ensemble learning-based speech emotion recognition

Effective MLP and CNN based ensemble learning for speech emotion recognition

Data availability

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Informed consent

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios

Abstract

Access this article

Similar content being viewed by others

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Multi-language: ensemble learning-based speech emotion recognition

Effective MLP and CNN based ensemble learning for speech emotion recognition

Data availability

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Informed consent

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation