Cross Languages One-Versus-All Speech Emotion Classifier

Liu, Xiangrui; Bin, Junchi; Li, Huakang

doi:10.1007/978-981-16-5188-5_15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1449))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

1704 Accesses

Abstract

Speech emotion recognition (SER) is a task that cannot be accomplished solely depending on linguistic models due to the presence of figures of speech. For a more accurate prediction of emotions, researchers adopted acoustic modelling. The complexity of SER can be attributed to a variety of acoustic features, the similarities among certain emotions, etc. In this paper, we proposed a framework named Cross Languages One-Versus-All Speech Emotion Classifier (CLOVASEC) that identifies speeches’ emotions for both Chinese and English. Acoustic features were preprocessed by Synthetic Minority Oversampling Technique (SMOTE) to diminish the impact of an imbalanced dataset then by Principal component analysis (PCA) to reduce the dimension. The features were fed into a classifier that was made up of eight sub-classifiers and each sub-classifier was tasked to differentiate one class from the other seven classes. The framework outperformed regular classifiers significantly on The Chinese Natural Audio-Visual Emotion Database (CHEAVD) and an English dataset from Deng.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Language-Independent Acoustic Emotion Classification System

Article 20 December 2019

Multilingual Speech Emotion Recognition on Japanese, English, and German

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

References

Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010). https://doi.org/10.1002/wics.101
Article Google Scholar
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020). https://doi.org/10.1016/j.specom.2019.12.001
Article Google Scholar
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service, PlatCon 2017 - Proceedings, pp. 3–7 (2017). https://doi.org/10.1109/PlatCon.2017.7883728
Bong, S.Z., Wan, K., Murugappan, M., Ibrahim, N.M., Rajamanickam, Y., Mohamad, K.: Implementation of wavelet packet transform and non linear analysis for emotion classification in stroke patient using brain signals. Biomed. Signal Process. Control 36, 102–112 (2017). https://doi.org/10.1016/j.bspc.2017.03.016
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique nitesh. J. Artif. Intell. Res. 16(2), 321–357 (2002). https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Chen, M., Zhao, X.: A multi-scale fusion framework for bimodal speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 374–378 (2020). https://doi.org/10.21437/Interspeech.2020-3156
Chiba, Y., Nose, T., Ito, A.: Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 3301–3305 (2020). https://doi.org/10.21437/Interspeech.2020-1199
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, pp. 511–516 (2013). https://doi.org/10.1109/ACII.2013.90
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020
Article MATH Google Scholar
Feng, H., Ueno, S., Kawahara, T.: End-to-end speech emotion recognition combined with acoustic-to-word ASR model. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 501–505 (2020). https://doi.org/10.21437/Interspeech.2020-1180
Fujioka, T., Homma, T., Nagamatsu, K.: Meta-learning for speech emotion recognition considering ambiguity of emotion labels. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 2332–2336 (2020). https://doi.org/10.21437/Interspeech.2020-1082
Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. J. Netw. Comput. Appl. 30(4), 1334–1345 (2007). https://doi.org/10.1016/j.jnca.2006.09.007
Article Google Scholar
Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)
MathSciNet MATH Google Scholar
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020). https://doi.org/10.1016/j.bspc.2020.101894
Article Google Scholar
Latif, S., Asim, M., Rana, R., Khalifa, S., Jurdak, R., Schuller, B.W.: Augmenting Generative Adversarial Networks for Speech Emotion Recognition. arXiv, pp. 521–525 (2020)
Google Scholar
Li, Y., Tao, J., Chao, L., Bao, W., Liu, Y.: CHEAVD: a Chinese natural emotional audio-visual database. J. Ambient Intell. Humanized Comput. 8(6), 913–924 (2017). https://doi.org/10.1007/s12652-016-0406-z
Article Google Scholar
Li, Y., Tao, J., Technology, I., Jiang, D., Shan, S., Jia, J.: MEC 2017: Multimodal Emotion Recognition Challenge (2018)
Google Scholar
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, pp. 3–6 (2017). https://doi.org/10.1109/APSIPA.2016.7820699
Mayer, J.D.: Emotional intelligence. Imagination Cogn. Pers. 9(3), 185–211 (1989). https://doi.org/10.2190/DUGG-P24E-52WK-6CDG
Article Google Scholar
Nardelli, M., Valenza, G., Greco, A., Lanata, A., Scilingo, E.P.: Recognizing emotions induced by affective sounds through heart rate variability. IEEE Trans. Affect. Comput. 6(4), 385–394 (2015). https://doi.org/10.1109/TAFFC.2015.2432810
Article Google Scholar
Niu, Y., Zou, D., Niu, Y., He, Z., Tan, H.: Improvement on speech emotionrecognition based on deep convolutional neural networks.pdf (2018). https://doi.org/10.1145/3194452.3194460
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003). https://doi.org/10.1016/S0167-6393(03)00099-2
Article Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: Proceedings - IEEE International Conference on Multimedia and Expo 1, pp. I401–I404 (2003). https://doi.org/10.1109/ICME.2003.1220939
Shen, G., et al.: WISE: word-level interaction-based multimodal fusion for speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 369–373 (2020). https://doi.org/10.21437/Interspeech.2020-3131
Su, B.H., Chang, C.M., Lin, Y.S., Lee, C.C.: Improving speech emotion recognition using graph attentive Bi-directional gated recurrent unit network. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 506–510 (2020). https://doi.org/10.21437/Interspeech
Sun, Y., Wen, G., Wang, J.: Weighted spectral features based on local Hu moments for speech emotion recognition. Biomed. Signal Process. Control 18, 80–90 (2015). https://doi.org/10.1016/j.bspc.2014.10.008
Article Google Scholar
Theodoros, G.: A Python library for audio feature extraction, classification, segmentation and applications. https://github.com/tyiannak/pyAudioAnalysis
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2016-May, pp. 5200–5204 (2016). https://doi.org/10.1109/ICASSP.2016.7472669
Yuvaraj, R., et al.: Detection of emotions in Parkinson’s disease using higher order spectral features from brain’s electrical activity. Biomed. Signal Process. Control 14(1), 108–116 (2014). https://doi.org/10.1016/j.bspc.2014.07.005
Article Google Scholar
Zhang, X., Xu, M., Zheng, T.F.: Ensemble system for multimodal emotion recognition challenge. In: 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018 (MEC 2017), pp. 7–12 (2018). https://doi.org/10.1109/ACIIAsia.2018.8470352
Zhu, T., Lin, Y., Liu, Y.: Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn. 72, 327–340 (2017). https://doi.org/10.1016/j.patcog.2017.07.024
Article Google Scholar

Download references

Acknowledgement

This work was supported by the Six-Talent Peaks Project of Jiangsu Province (XYDXX-204), the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (KF-2019-04-011, KF-2019-04-065), and Angel Project of Suzhou City science and technology (Grant No. CYTS2018233).

Author information

Authors and Affiliations

Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen, China
Xiangrui Liu & Huakang Li
School of Engineering, University of British Columbia, Kelowna, Canada
Junchi Bin
School of Artificial Intelligence and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, China
Huakang Li
Suzhou Privacy Technology Co. Ltd., Suzhou, China
Xiangrui Liu & Huakang Li

Authors

Xiangrui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junchi Bin
View author publications
You can also search for this author in PubMed Google Scholar
Huakang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huakang Li .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Nanfang College of Sun Yat-sen University, Guangzhou, China
Zhi Yang
Hefei University of Technology, Hefei, China
Zhao Zhang
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Bin, J., Li, H. (2021). Cross Languages One-Versus-All Speech Emotion Classifier. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-16-5188-5_15
Published: 20 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5187-8
Online ISBN: 978-981-16-5188-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross Languages One-Versus-All Speech Emotion Classifier

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Language-Independent Acoustic Emotion Classification System

Multilingual Speech Emotion Recognition on Japanese, English, and German

Databases, features and classifiers for speech emotion recognition: a review

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Cross Languages One-Versus-All Speech Emotion Classifier

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Language-Independent Acoustic Emotion Classification System

Multilingual Speech Emotion Recognition on Japanese, English, and German

Databases, features and classifiers for speech emotion recognition: a review

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation