Effective ensembling classification strategy for voice and emotion recognition

Alharbi, Yasser

doi:10.1007/s13198-022-01729-8

Effective ensembling classification strategy for voice and emotion recognition

Original Article
Published: 24 July 2022

Volume 15, pages 334–345, (2024)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Yasser Alharbi ORCID: orcid.org/0000-0002-6523-628X¹

129 Accesses
1 Citation
Explore all metrics

Abstract

Nowadays, Machine learning techniques are found to be unique among the most effective approaches for Voice and Emotion Recognition (VER). Moreover, automatic recognition of voice and emotions is essential for smooth psychosocial interactions between humans and machines. There have been huge strides in creating workable pieces of art that combine spectrogram and deep learning characteristics in the VER research. On the other hand, although single Machine Learning (ML) methods deliver acceptable results, it's not quite reaching the standards yet. This necessitates the development of strategies that use various ML techniques, target multiple aspects and elements of voice recognition. This article proposes an ensembling classifier model that incorporates the outcome of base classifiers (CapsNet and RNNs) for VER. The CapsNet model can identify the spatial correlation of vital speech information in spectrograms using a pooling technique. The RNN, on the other hand, is excellent for processing time-series datasets, and both are well known for their performance in classification work. Stacked generalization is used for constructing ensemble classifiers that integrate predictions made by CapsNet and RNN classifiers. As much as 96.05% of overall accuracy is obtained when using this ensemble approach, which is more effective than either CapsNets or RNN when individually compared. One of the significant benefits of the proposed classifier is that it effectively detects the emotional class 'FEAR', with a recognition rate of 96.68% among seven other classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Article 30 March 2024

Voice Emotion Recognition in Real Time Applications

Ensemble Learning with CNN–LSTM Combination for Speech Emotion Recognition

References

Ali H, Hariharan M, Yaacob S, Adom AH (2015) Facial emotion recognition using empirical mode decomposition. Expert Syst Appl 42(3):1261–1277. https://doi.org/10.1016/j.eswa.2014.08.049
Article Google Scholar
Alonso JB, Cabrera J, Medina M, Travieso CM (2015) New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Syst Appl 42(24):9554–9564. https://doi.org/10.1016/j.eswa.2015.07.062
Article Google Scholar
Alshamsi H, Kupuska V (2017) Real-time facial expression recognition app development on smart phones. Int J Eng Res Appl 07(07):30–38. https://doi.org/10.9790/9622-0707033038
Article Google Scholar
Basheer S, Anbarasi M, Sakshi DG, Vinoth Kumar V (2020) Efficient text summarization method for blind people using text mining techniques. Int J Speech Technol 23(4):713–725. https://doi.org/10.1007/s10772-020-09712-z
Article Google Scholar
Chavhan Y, Dhore ML, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Computer Appl 1(20):8–11. https://doi.org/10.5120/431-636
Article Google Scholar
Dellaert F, Polzin T, & Waibel A (1996) Recognizing emotion in speech, In: Proceeding of fourth international conference on spoken language (Vol 96, No 1970, p 1973)
Dhiman G, Vinoth Kumar V, Kaur A, Sharma A (2021) DON: Deep learning and optimization-based framework for detection of novel coronavirus disease using X-ray images. Interdiscip Sci Comput Life Sci 13(2):260–272. https://doi.org/10.1007/s12539-021-00418-7
Article Google Scholar
Gu Y, Postma E, Lin HX, & Herik JVD (2016), Speech emotion recognition using voiced segment selection algorithm, In: Proceedings of the twenty-second European conference on artificial intelligence pp 1682-1683.
Kerkeni L, Serrestou Y, Raoof K, Mbarki M, Mahjoub MA, Cleder C (2019) Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Commun 114:22–35
Article Google Scholar
Kumar VV, Raghunath KMK, Rajesh N, Venkatesan M, Joseph RB, Thillaiarasu N (2021) Paddy plant disease recognition, risk analysis, and classification using deep convolution neuro-fuzzy network. J Mobile Multimed. https://doi.org/10.13052/jmm1550-4646.1829
Article Google Scholar
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. Interspeech. https://doi.org/10.21437/interspeech.2015-336
Article Google Scholar
Li X, Akagi M (2019) Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model. Speech Commun 110:1–12. https://doi.org/10.1016/j.specom.2019.04.004
Article Google Scholar
Li J, Mohamed A, Zweig G, & Gong Y (2015) LSTM time and frequency recurrence for automatic speech recognition, In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, p 187-191
Liu Z-T, Wu M, Cao W-H, Mao J-W, Xu J-P, Tan G-Z (2018) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280. https://doi.org/10.1016/j.neucom.2017.07.050
Article Google Scholar
Mahesh TR, Dhilip Kumar V, Vinoth Kumar V, Asghar J, Geman O, Arulkumaran G, Arun N (2022) AdaBoost ensemble methods using K-fold cross validation for survivability with the early detection of heart disease. Comput Intell Neurosci 2022:1–11. https://doi.org/10.1155/2022/9005278
Article Google Scholar
Martin O, Kotsia I, Macq B, & Pitas I (2006), The eNTERFACE'05 audio-visual emotion database. In: 22nd international conference on data engineering workshops (ICDEW'06), IEEE, pp, 8-8
Milton A, Sharmy Roy S, Tamil Selvi S (2013) SVM scheme for speech emotion recognition using MFCC feature. Int J Computer Appl 69(9):34–39. https://doi.org/10.5120/11872-7667
Article Google Scholar
Mirsamadi S, Barsoum E, & Zhang C (2017), Automatic speech emotion recognition using recurrent neural networks with local attention, In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 2227-2231
Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biologically Inspired Cognitive Architectures 19:32–38. https://doi.org/10.1016/j.bica.2016.12.002
Article Google Scholar
Muthusamy H, Polat K, Yaacob S (2015) improved emotion recognition using gaussian mixture model and extreme learning machine in speech and glottal signals. Math Probl Eng 2015:1–13. https://doi.org/10.1155/2015/394083
Article Google Scholar
Nahid MMH, Purkaystha B, & Islam MS (2017) Bengali speech recognition: a double layered LSTM-RNN approach. In: 2017 20th international conference of computer and information technology (ICCIT) IEEE, pp 1-6
Narendra NP, Alku P (2019) Dysarthric speech classification from coded telephone speech using glottal features. Speech Commun 110:47–55. https://doi.org/10.1016/j.specom.2019.04.003
Article Google Scholar
Pandiyan S, Ashwin M, Manikandan R, KM KR, &, GR AR (2020) Heterogeneous internet of things organization predictive analysis platform for apple leaf diseases recognition. Computer Commun 154:99–110. https://doi.org/10.1016/j.comcom.2020.02.054
Article Google Scholar
Prasomphan S (2015) Improvement of speech emotion recognition with neural network classifier by using speech spectrogram, In: 2015 International conference on systems, signals and image processing (IWSSIP), IEEE, pp 73-76
Sabour S, Frosst N, Hinton GE (2022) Dynamic routing between capsules. https://proceedings.neurips.cc/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf
Sarker MK, Alam KMR, & Arifuzzaman M (2014) Emotion recognition from speech based on relevant feature and majority voting, In: 2014 international conference on informatics, electronics & vision (ICIEV), IEEE, pp 1-5
Satt A, Rozenberg S, & Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms, In: Interspeech, pp 1089-1093
Shalini A, Jayasuruthi L, VinothKumar V (2018) Voice Recognition Robot Control Using Android Device. J Comput Theor Nanosci 15(6):2197–2201. https://doi.org/10.1166/jctn.2018.7436
Article Google Scholar
Tahon M, Devillers L (2016) Towards a small set of robust acoustic features for emotion recognition: challenges. IEEE/ACM Trans Audio Speech Lang Process 24(1):16–28. https://doi.org/10.1109/taslp.2015.2487051
Article Google Scholar
Turan MAT & Erzin E (2018) Monitoring infant's emotional cry in domestic environments using the capsule network architecture, In: Interspeech pp 132-136
Vondra M, Vích R (2009) Recognition of emotions in german speech using gaussian mixture models. Lect Notes Comput Sci. https://doi.org/10.1007/978-3-642-00525-1_26
Article Google Scholar
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using fourier parameters. IEEE Trans Affect Comput 6(1):69–75. https://doi.org/10.1109/taffc.2015.2392101
Article Google Scholar
Wang W (ed) (2011) Machine audition principles algorithms and systems. IGI Global, Pennsylvania. https://doi.org/10.4018/978-1-61520-919-4
Book Google Scholar
Yang M, Zhao W, Ye J, Lei Z, Zhao Z, & Zhang S (2018) Investigating capsule networks with dynamic routing for text classification, In: Proceedings of the 2018 conference on empirical methods in natural language processing, https://doi.org/10.18653/v1/d18-1350
Ying S, Xue-Ying Z (2018) Characteristics of human auditory model based on compensation of glottal features in speech emotion recognition. Futur Gener Comput Syst 81:291–296. https://doi.org/10.1016/j.future.2017.10.002
Article Google Scholar
Zheng WQ, Yu JS, & Zou YX (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks, In: 2015 international conference on affective computing and intelligent interaction (ACII), IEEE, pp 827-831

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, University of Hail, Ha’il, Saudi Arabia
Yasser Alharbi

Authors

Yasser Alharbi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasser Alharbi.

Ethics declarations

Conflict of interest

The author declares he has no conflict of interest.

Funding statement

No Funding has been provided for the completion of this research paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alharbi, Y. Effective ensembling classification strategy for voice and emotion recognition. Int J Syst Assur Eng Manag 15, 334–345 (2024). https://doi.org/10.1007/s13198-022-01729-8

Download citation

Received: 22 January 2022
Revised: 21 May 2022
Accepted: 23 June 2022
Published: 24 July 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s13198-022-01729-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective ensembling classification strategy for voice and emotion recognition

Abstract

Access this article

Similar content being viewed by others

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Voice Emotion Recognition in Real Time Applications

Ensemble Learning with CNN–LSTM Combination for Speech Emotion Recognition

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Funding statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective ensembling classification strategy for voice and emotion recognition

Abstract

Access this article

Similar content being viewed by others

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Voice Emotion Recognition in Real Time Applications

Ensemble Learning with CNN–LSTM Combination for Speech Emotion Recognition

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Funding statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation