Multi-language: ensemble learning-based speech emotion recognition

Sruthi, Anumula; Kumar, Anumula Kalyan; Dasari, Kishore; Sivaramaiah, Yenugu; Divya, Garikapati; Kumar, Gunupudi Sai Chaitanya

doi:10.1007/s41060-024-00553-6

Multi-language: ensemble learning-based speech emotion recognition

Regular Paper
Published: 07 May 2024

(2024)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Anumula Sruthi¹,
Anumula Kalyan Kumar²,
Kishore Dasari¹,
Yenugu Sivaramaiah¹,
Garikapati Divya³ &
…
Gunupudi Sai Chaitanya Kumar²

56 Accesses
Explore all metrics

Abstract

Inaccurate emotional reactions from robots have been a problem for authors in previous years. Since technology has advanced, robots like service robots can communicate with people of many other languages. The traditional Speech Emotion Recognition (SER) method utilizes the same corpus for classifier testing and training to accurately identify emotions. However, this method could be more flexible for multi-lingual (multi-language) contexts, which is essential for robots that people use worldwide. This research proposes an ensemble learning method (HMLSTM and CapsNet) that uses a voting majority for a cross-corpus, multi-lingual SER system. This work utilizes three corpora (EMO-DB, URDU, and SAVEE) that offer a variety of languages (German, Urdu, and English) to test multi-language SER. We first use the Refined Attention Pyramid Network (RAPNet) for speech and emotion recognition to extract the features. Following that, the pre-processing step of the data is normalized using the Min–max normalization approach and IGAN to address data imbalance. To identify the emotions in speech into the appropriate group, use HMLSTM and CapsNet’s ensemble learning algorithms. With reasonable accuracy, the proposed ensemble learning approach enhances emotion recognition. It compares the effectiveness of the proposed ensemble learning method with existing traditional learning methods. Using data from a corpus trained on a different corpus, this study tests the performance of a classifier for multi-lingual emotion identification. In this experiment, distinct classifiers offer excellent accuracy for diverse corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Article 27 May 2024

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Data availability

Data will be available when requested.

References

Kwon, S.: MLT-DNet: speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst. Appl. 167, 114177 (2021)
Article Google Scholar
Zhang, S., Tao, X., Chuang, Y., Zhao, X.: Learning deep multimodal affective features for spontaneous speech emotion recognition. Speech Commun. 127, 73–81 (2021)
Article Google Scholar
Kwon, S.: Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)
Article Google Scholar
Meena, G., Mohbey, K.K., Kumar, S., Lokesh, K.: A hybrid deep learning approach for detecting sentiment polarities and knowledge graph representation on monkeypox tweets. Decis. Anal. J. 7, 100243 (2023)
Article Google Scholar
Tuncer, T., Dogan, S., Acharya, U.R.: Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Syst. 211, 106547 (2021)
Article Google Scholar
Zhao, Z., Li, Q., Zhang, Z., Cummins, N., Wang, H., Tao, J., Schuller, B.W.: Combining a parallel 2D CNN with a self-attention dilated residual network for CTC-based discrete speech emotion recognition. Neural Netw. 141, 52–60 (2021)
Article Google Scholar
Mohbey, K.K., Meena, G., Kumar, S., Lokesh, K.: A CNN-LSTM-based hybrid deep learning approach for sentiment analysis on Monkeypox tweets. New Gener. Comput. 14, 1–19 (2023)
Google Scholar
Yildirim, S., Kaya, Y., Kılıç, F.: A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl. Acoust. 173, 107721 (2021)
Article Google Scholar
Li, S., Xing, X., Fan, W., Cai, B., Fordson, P., Xu, X.: Spatiotemporal and frequential cascaded attention networks for speech emotion recognition. Neurocomputing 448, 238–248 (2021)
Article Google Scholar
Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M.: Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)
Article Google Scholar
Abdulmohsin, H.A.: A new proposed statistical feature extraction method in speech emotion recognition. Comput. Electr. Eng. 93, 107172 (2021)
Article Google Scholar
Hansen, L., Zhang, Y.P., Wolf, D., Sechidis, K., Ladegaard, N., Fusaroli, R.: A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr. Scand. 145(2), 186–199 (2022)
Article Google Scholar
Fu, C., Dissanayake, T., Hosoda, K., Maekawa, T., & Ishiguro, H.: Similarity of speech emotion in different languages revealed by a neural network with attention. In: 2020 IEEE 14th international conference on semantic computing (ICSC) (pp. 381–386). IEEE (2020)
Kumaran, U., Radha Rammohan, S., Nagarajan, S.M., Prathik, A.: Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int. J. Speech Technol. 24, 303–314 (2021)
Article Google Scholar
Senthilkumar, N., Karpakam, S., Devi, M.G., Balakumaresan, R., Dhilipkumar, P.: Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Mater. Today Proc. 57, 2180–2184 (2022)
Article Google Scholar
Qadri, S. A. A., Gunawan, T. S., Kartiwi, M., Mansor, H., & Wani, T. M.: Speech emotion recognition using feature fusion of TEO and MFCC on multilingual databases. In: Recent trends in mechatronics towards industry 4.0: selected articles from iM3F 2020, Malaysia (pp. 681–691). Springer Singapore (2022)
Ma, Y., Wang, W.: MSFL: explainable multitask-based shared feature learning for multilingual speech emotion recognition. Appl. Sci. 12(24), 12805 (2022)
Article Google Scholar
Alsabhan, W.: Human-computer interaction with a real-time speech emotion recognition with ensembling techniques 1D convolution neural network and attention. Sensors 23(3), 1386 (2023)
Article Google Scholar
Gomathy, M.: Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int. J. Speech Technol. 24(1), 155–163 (2021)
Article Google Scholar
Ahmed, M.R., Islam, S., Islam, A.M., Shatabda, S.: An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 15(218), 119633 (2023)
Article Google Scholar
Pham, N.T., Dang, D.N., Nguyen, N.D., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D.: Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. Expert Syst. Appl. 15(230), 120608 (2023)
Article Google Scholar
Chen, W., Hu, H.: Generative attention adversarial classification network for unsupervised domain adaptation. Pattern Recogn. 107, 107440 (2020)
Article Google Scholar
Kanna, P.R., Santhi, P.: Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features. Knowl. Syst. 226, 107132 (2021)
Article Google Scholar
Wang, Z., Zheng, L., Du, W., Cai, W., Zhou, J., Wang, J., He, G.: A novel method for intelligent fault diagnosis of bearing based on capsule neural network. Complexity 2019(2019), 1 (2019)
Google Scholar
SAVEE dataset: https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee
EMO-DB dataset: https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb
URDU dataset: https://www.kaggle.com/datasets/hazrat/urdu-speech-dataset?select=files
Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M.: Transformer-based multilingual speech emotion recognition using data augmentation and feature fusion. Appl. Sci. 12(18), 9188 (2022)
Article Google Scholar
Khan, A.: Improved multi-lingual sentiment analysis and recognition using deep learning. J. Inform. Sci. 12, 01655515221137270 (2023)
Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Andhra Pradesh, India
Anumula Sruthi, Kishore Dasari & Yenugu Sivaramaiah
Department of Artificial Intelligence, DVR & Dr HS MIC College of Technology, Kanchikcherla, Andhra Pradesh, India
Anumula Kalyan Kumar & Gunupudi Sai Chaitanya Kumar
Department of Artificial Intelligence and Data Science, Laki Reddy Bali Reddy College of Engineering (Autonomous), Mylavaram, India
Garikapati Divya

Authors

Anumula Sruthi
View author publications
You can also search for this author in PubMed Google Scholar
Anumula Kalyan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Kishore Dasari
View author publications
You can also search for this author in PubMed Google Scholar
Yenugu Sivaramaiah
View author publications
You can also search for this author in PubMed Google Scholar
Garikapati Divya
View author publications
You can also search for this author in PubMed Google Scholar
Gunupudi Sai Chaitanya Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The contributions of authors are as follows: Anumula Sruthi, Anumula Kalyan Kumar, Kishore Dasari, Yenugu Sivaramaiah contributed to conceptualization, methodology, software, formal analysis, investigation, resources, writing—original draft, review & editing, and visualization. Garikapati Divya, Dr. G. Sai Chaitanya Kumar contributed to conceptualization, writing—review & editing.

Corresponding author

Correspondence to Gunupudi Sai Chaitanya Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sruthi, A., Kumar, A.K., Dasari, K. et al. Multi-language: ensemble learning-based speech emotion recognition. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00553-6

Download citation

Received: 19 June 2023
Accepted: 11 April 2024
Published: 07 May 2024
DOI: https://doi.org/10.1007/s41060-024-00553-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-language: ensemble learning-based speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-language: ensemble learning-based speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Stacked ensemble learning for facial gender classification using deep learning based features extraction

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation