Long-Time Speech Emotion Recognition Using Feature Compensation and Accentuation-Based Fusion

Sun, Jiu; Zhu, Jinxin; Shao, Jun

doi:10.1007/s00034-023-02480-6

Long-Time Speech Emotion Recognition Using Feature Compensation and Accentuation-Based Fusion

Published: 11 September 2023

Volume 43, pages 916–940, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Jiu Sun¹,
Jinxin Zhu¹ &
Jun Shao¹

148 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we study the speech emotion feature optimization using stochastic optimization algorithms, and feature compensation using deep neural networks. We also proposed to use accentuation-based fusion for long-time speech emotion recognition. Firstly, the extraction method of emotional features is studied, and a series of speech features are constructed for the recognition of emotion. Secondly, we propose a method of sample adaptation through denoising autoencoder to enhance the versatility of features through the mapping of sample features to improve adaptive ability. Thirdly, GA and SFLA are used to optimize the combination of features to improve the emotion recognition results at the utterance level. Finally, we use transformer model to implement accentuation-based emotion fusion in long-time speech. The continuous long-time speech corpus, as well as the public available EMO-DB, are used for experiments. Results show that the proposed method can effectively improve the performance of long-time speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Data Availability

The EMO-DB database is the freely available German emotional database. EMO-DB can be downloaded publicly from https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb.

References

A.A. Abdelhamid, E.S.M. El-Kenawy, B. Alotaibi, G.M. Amer, M.Y. Abdelkader, A. Ibrahim, M.M. Eid, Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm. IEEE Access 10, 49265–49284 (2022)
Article Google Scholar
S. Akinpelu, S. Viriri, Robust feature selection-based speech emotion classification using deep transfer learning. Appl. Sci. 12(16), 8265 (2022)
Article Google Scholar
F. Albu, D. Hagiescu, L. Vladutu, M.A. Puica, Neural network approaches for children’s emotion recognition in intelligent learning applications, in: International Conference on Education and New Learning Technologies, 3229–3239 (2015)
S.B. Alex, L. Mary, B.P. Babu, Attention and feature selection for automatic speech emotion recognition using utterance and syllable-level prosodic features. Circuits Syst. Signal Process. 39(11), 5681–709 (2020)
Article Google Scholar
T. Anvarjon, S. Kwon, Deep-net: a lightweight cnn-based speech emotion recognition system using deep frequency features. Sensors 20(18), 1–16 (2020)
Article Google Scholar
B.T. Atmaja, A. Sasou, M. Akagi, Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Commun. 140, 11–28 (2022)
Article Google Scholar
G. Choudhary, R. Meena, K. Mohbey, Speech emotion based emotion recognition using deep neural networks. J. Phys. Conf. Ser. 2236(1), 012003 (2022)
Article Google Scholar
A. Cowen, D. Keltner, Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. 114(38), E7900–E7909 (2017)
Article Google Scholar
M.S. Fahad, A. Deepak, G. Pradhan, J. Yadav, DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Signal Process. 40, 466–89 (2021)
Article Google Scholar
C. Fu, Q. Deng, J. Shen, H. Mahzoon, H. Ishiguro, A preliminary study on realizing human–robot mental comforting dialogue via sharing experience emotionally. Sensors 22(3), 991 (2022)
Article Google Scholar
I. Gat, et al., Speaker normalization for self-supervised speech emotion recognition, in: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7342–7346 (2022)
N. Hajarolasvadi, H. Demirel, 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5), 479 (2019)
Article Google Scholar
C. Huang, B. Song, L. Zhao, Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. Int. J. Speech Technol. 19(4), 805–816 (2016)
Article Google Scholar
C. Huang, Y. Jin, Q. Wang, Speech emotion recognition based on decomposition of feature space and information fusion. J. Signal Process. 26(6), 835–842 (2010)
Google Scholar
C. Huang, Y. Jin, Y. Zhao, Y. Yu, L. Zhao, Speech emotion recognition based on re-composition of two-class classifiers, in: The 3rd International conference on affective computing and intelligent interaction and workshops (2009)
Y. Jin, C. Huang, L. Zhao, A semi-supervised learning algorithm based on modified self-training SVM. J. Comput. 6(7), 1438–1443 (2011)
Article Google Scholar
S.R. Kadiri, P. Gangamohan, S.V. Gangashetty, P. Alku, B. Yegnanarayana, Excitation features of speech for emotion recognition using neutral speech as reference. Circuits Syst. Signal Process. 39(9), 4459–81 (2020)
Article Google Scholar
B. Maji, M. Swain, Advanced fusion-based speech emotion recognition system using a dual-attention mechanism with conv-caps and bi-gru features. Electronics 11(9), 1328 (2022)
Article Google Scholar
K. Manohar, E. Logashanmugam, Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm. Knowl. Based Syst. 246, 108659 (2022)
Article Google Scholar
M. Oaten, R.J. Stevenson, T. Case, Disgust as a disease-avoidance mechanism. Psychol. Bull. 135(2), 303–321 (2009)
Article Google Scholar
T. Özseven, A novel feature selection method for speech emotion recognition. Appl. Acoust. 146, 320–326 (2019)
Article Google Scholar
L. Pandey, R.M. Hegde, Keyword spotting in continuous speech using spectral and prosodic information fusion. Circuits Syst. Signal Process. 38, 2767–91 (2019)
Article Google Scholar
V.M. Praseetha, P.P. Joby, Speech emotion recognition using data augmentation. Int. J. Speech Technol. 25(4), 783–792 (2022)
Article Google Scholar
H. Saad, F. Mahmud, M. Shaheen, M. Hasan, P. Farastu, M. Kabir, Is speech emotion recognition language-independent? Analysis of English and Bangla languages using language-independent vocal features. arXiv preprint, arXiv:2111.10776 (2021)
C. Wu, C. Huang, H. Chen, Text-independent speech emotion recognition using frequency adaptive features. Multimed. Tools Appl. 77(18), 24353–24363 (2018)
Article Google Scholar
X. Xu et al., Graph learning based speaker independent speech emotion recognition. Adv. Electr. Comput. Eng. 14(2), 17–23 (2014)
Article Google Scholar
L. You, H. Jiang, J. Hu, C. H. Chang, L. Chen, X. Cui, M. Zhao, GPU-accelerated faster mean shift with Euclidean distance metrics, in: 2022 IEEE 46th Annual Computers, Software, and Applications Conference, 211–216 (2022)
X. Zhang et al., Recognition of practical speech emotion using improved shuffled frog leaping algorithm. Chin. J. Acoust. 33(4), 441–441 (2014)
MathSciNet Google Scholar
C. Zou, C. Huang, D. Han, L. Zhao, Detecting practical speech emotion in a cognitive task, in: 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN), 1–5 (2011)

Download references

Author information

Authors and Affiliations

School of Information Technology, Yancheng Institute of Technology, Yancheng, 224051, China
Jiu Sun, Jinxin Zhu & Jun Shao

Authors

Jiu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jinxin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiu Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest related to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, J., Zhu, J. & Shao, J. Long-Time Speech Emotion Recognition Using Feature Compensation and Accentuation-Based Fusion. Circuits Syst Signal Process 43, 916–940 (2024). https://doi.org/10.1007/s00034-023-02480-6

Download citation

Received: 02 June 2023
Revised: 26 July 2023
Accepted: 27 July 2023
Published: 11 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00034-023-02480-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long-Time Speech Emotion Recognition Using Feature Compensation and Accentuation-Based Fusion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Long-Time Speech Emotion Recognition Using Feature Compensation and Accentuation-Based Fusion

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation