Skip to main content
Log in

Multi-language: ensemble learning-based speech emotion recognition

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Inaccurate emotional reactions from robots have been a problem for authors in previous years. Since technology has advanced, robots like service robots can communicate with people of many other languages. The traditional Speech Emotion Recognition (SER) method utilizes the same corpus for classifier testing and training to accurately identify emotions. However, this method could be more flexible for multi-lingual (multi-language) contexts, which is essential for robots that people use worldwide. This research proposes an ensemble learning method (HMLSTM and CapsNet) that uses a voting majority for a cross-corpus, multi-lingual SER system. This work utilizes three corpora (EMO-DB, URDU, and SAVEE) that offer a variety of languages (German, Urdu, and English) to test multi-language SER. We first use the Refined Attention Pyramid Network (RAPNet) for speech and emotion recognition to extract the features. Following that, the pre-processing step of the data is normalized using the Min–max normalization approach and IGAN to address data imbalance. To identify the emotions in speech into the appropriate group, use HMLSTM and CapsNet’s ensemble learning algorithms. With reasonable accuracy, the proposed ensemble learning approach enhances emotion recognition. It compares the effectiveness of the proposed ensemble learning method with existing traditional learning methods. Using data from a corpus trained on a different corpus, this study tests the performance of a classifier for multi-lingual emotion identification. In this experiment, distinct classifiers offer excellent accuracy for diverse corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Data will be available when requested.

References

  1. Kwon, S.: MLT-DNet: speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst. Appl. 167, 114177 (2021)

    Article  Google Scholar 

  2. Zhang, S., Tao, X., Chuang, Y., Zhao, X.: Learning deep multimodal affective features for spontaneous speech emotion recognition. Speech Commun. 127, 73–81 (2021)

    Article  Google Scholar 

  3. Kwon, S.: Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)

    Article  Google Scholar 

  4. Meena, G., Mohbey, K.K., Kumar, S., Lokesh, K.: A hybrid deep learning approach for detecting sentiment polarities and knowledge graph representation on monkeypox tweets. Decis. Anal. J. 7, 100243 (2023)

    Article  Google Scholar 

  5. Tuncer, T., Dogan, S., Acharya, U.R.: Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Syst. 211, 106547 (2021)

    Article  Google Scholar 

  6. Zhao, Z., Li, Q., Zhang, Z., Cummins, N., Wang, H., Tao, J., Schuller, B.W.: Combining a parallel 2D CNN with a self-attention dilated residual network for CTC-based discrete speech emotion recognition. Neural Netw. 141, 52–60 (2021)

    Article  Google Scholar 

  7. Mohbey, K.K., Meena, G., Kumar, S., Lokesh, K.: A CNN-LSTM-based hybrid deep learning approach for sentiment analysis on Monkeypox tweets. New Gener. Comput. 14, 1–19 (2023)

    Google Scholar 

  8. Yildirim, S., Kaya, Y., Kılıç, F.: A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl. Acoust. 173, 107721 (2021)

    Article  Google Scholar 

  9. Li, S., Xing, X., Fan, W., Cai, B., Fordson, P., Xu, X.: Spatiotemporal and frequential cascaded attention networks for speech emotion recognition. Neurocomputing 448, 238–248 (2021)

    Article  Google Scholar 

  10. Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M.: Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)

    Article  Google Scholar 

  11. Abdulmohsin, H.A.: A new proposed statistical feature extraction method in speech emotion recognition. Comput. Electr. Eng. 93, 107172 (2021)

    Article  Google Scholar 

  12. Hansen, L., Zhang, Y.P., Wolf, D., Sechidis, K., Ladegaard, N., Fusaroli, R.: A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr. Scand. 145(2), 186–199 (2022)

    Article  Google Scholar 

  13. Fu, C., Dissanayake, T., Hosoda, K., Maekawa, T., & Ishiguro, H.: Similarity of speech emotion in different languages revealed by a neural network with attention. In: 2020 IEEE 14th international conference on semantic computing (ICSC) (pp. 381–386). IEEE (2020)

  14. Kumaran, U., Radha Rammohan, S., Nagarajan, S.M., Prathik, A.: Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int. J. Speech Technol. 24, 303–314 (2021)

    Article  Google Scholar 

  15. Senthilkumar, N., Karpakam, S., Devi, M.G., Balakumaresan, R., Dhilipkumar, P.: Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Mater. Today Proc. 57, 2180–2184 (2022)

    Article  Google Scholar 

  16. Qadri, S. A. A., Gunawan, T. S., Kartiwi, M., Mansor, H., & Wani, T. M.: Speech emotion recognition using feature fusion of TEO and MFCC on multilingual databases. In: Recent trends in mechatronics towards industry 4.0: selected articles from iM3F 2020, Malaysia (pp. 681–691). Springer Singapore (2022)

  17. Ma, Y., Wang, W.: MSFL: explainable multitask-based shared feature learning for multilingual speech emotion recognition. Appl. Sci. 12(24), 12805 (2022)

    Article  Google Scholar 

  18. Alsabhan, W.: Human-computer interaction with a real-time speech emotion recognition with ensembling techniques 1D convolution neural network and attention. Sensors 23(3), 1386 (2023)

    Article  Google Scholar 

  19. Gomathy, M.: Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int. J. Speech Technol. 24(1), 155–163 (2021)

    Article  Google Scholar 

  20. Ahmed, M.R., Islam, S., Islam, A.M., Shatabda, S.: An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 15(218), 119633 (2023)

    Article  Google Scholar 

  21. Pham, N.T., Dang, D.N., Nguyen, N.D., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D.: Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. Expert Syst. Appl. 15(230), 120608 (2023)

    Article  Google Scholar 

  22. Chen, W., Hu, H.: Generative attention adversarial classification network for unsupervised domain adaptation. Pattern Recogn. 107, 107440 (2020)

    Article  Google Scholar 

  23. Kanna, P.R., Santhi, P.: Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features. Knowl. Syst. 226, 107132 (2021)

    Article  Google Scholar 

  24. Wang, Z., Zheng, L., Du, W., Cai, W., Zhou, J., Wang, J., He, G.: A novel method for intelligent fault diagnosis of bearing based on capsule neural network. Complexity 2019(2019), 1 (2019)

    Google Scholar 

  25. SAVEE dataset: https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee

  26. EMO-DB dataset: https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb

  27. URDU dataset: https://www.kaggle.com/datasets/hazrat/urdu-speech-dataset?select=files

  28. Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M.: Transformer-based multilingual speech emotion recognition using data augmentation and feature fusion. Appl. Sci. 12(18), 9188 (2022)

    Article  Google Scholar 

  29. Khan, A.: Improved multi-lingual sentiment analysis and recognition using deep learning. J. Inform. Sci. 12, 01655515221137270 (2023)

    Google Scholar 

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

The contributions of authors are as follows: Anumula Sruthi, Anumula Kalyan Kumar, Kishore Dasari, Yenugu Sivaramaiah contributed to conceptualization, methodology, software, formal analysis, investigation, resources, writing—original draft, review & editing, and visualization. Garikapati Divya, Dr. G. Sai Chaitanya Kumar contributed to conceptualization, writing—review & editing.

Corresponding author

Correspondence to Gunupudi Sai Chaitanya Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sruthi, A., Kumar, A.K., Dasari, K. et al. Multi-language: ensemble learning-based speech emotion recognition. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00553-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-024-00553-6

Keywords

Navigation