Skip to main content

Cross Languages One-Versus-All Speech Emotion Classifier

  • Conference paper
  • First Online:
Neural Computing for Advanced Applications (NCAA 2021)

Abstract

Speech emotion recognition (SER) is a task that cannot be accomplished solely depending on linguistic models due to the presence of figures of speech. For a more accurate prediction of emotions, researchers adopted acoustic modelling. The complexity of SER can be attributed to a variety of acoustic features, the similarities among certain emotions, etc. In this paper, we proposed a framework named Cross Languages One-Versus-All Speech Emotion Classifier (CLOVASEC) that identifies speeches’ emotions for both Chinese and English. Acoustic features were preprocessed by Synthetic Minority Oversampling Technique (SMOTE) to diminish the impact of an imbalanced dataset then by Principal component analysis (PCA) to reduce the dimension. The features were fed into a classifier that was made up of eight sub-classifiers and each sub-classifier was tasked to differentiate one class from the other seven classes. The framework outperformed regular classifiers significantly on The Chinese Natural Audio-Visual Emotion Database (CHEAVD) and an English dataset from Deng.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010). https://doi.org/10.1002/wics.101

    Article  Google Scholar 

  2. Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020). https://doi.org/10.1016/j.specom.2019.12.001

    Article  Google Scholar 

  3. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service, PlatCon 2017 - Proceedings, pp. 3–7 (2017). https://doi.org/10.1109/PlatCon.2017.7883728

  4. Bong, S.Z., Wan, K., Murugappan, M., Ibrahim, N.M., Rajamanickam, Y., Mohamad, K.: Implementation of wavelet packet transform and non linear analysis for emotion classification in stroke patient using brain signals. Biomed. Signal Process. Control 36, 102–112 (2017). https://doi.org/10.1016/j.bspc.2017.03.016

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique nitesh. J. Artif. Intell. Res. 16(2), 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  6. Chen, M., Zhao, X.: A multi-scale fusion framework for bimodal speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 374–378 (2020). https://doi.org/10.21437/Interspeech.2020-3156

  7. Chiba, Y., Nose, T., Ito, A.: Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 3301–3305 (2020). https://doi.org/10.21437/Interspeech.2020-1199

  8. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, pp. 511–516 (2013). https://doi.org/10.1109/ACII.2013.90

  9. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020

    Article  MATH  Google Scholar 

  10. Feng, H., Ueno, S., Kawahara, T.: End-to-end speech emotion recognition combined with acoustic-to-word ASR model. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 501–505 (2020). https://doi.org/10.21437/Interspeech.2020-1180

  11. Fujioka, T., Homma, T., Nagamatsu, K.: Meta-learning for speech emotion recognition considering ambiguity of emotion labels. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 2332–2336 (2020). https://doi.org/10.21437/Interspeech.2020-1082

  12. Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. J. Netw. Comput. Appl. 30(4), 1334–1345 (2007). https://doi.org/10.1016/j.jnca.2006.09.007

    Article  Google Scholar 

  13. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)

    MathSciNet  MATH  Google Scholar 

  14. Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020). https://doi.org/10.1016/j.bspc.2020.101894

    Article  Google Scholar 

  15. Latif, S., Asim, M., Rana, R., Khalifa, S., Jurdak, R., Schuller, B.W.: Augmenting Generative Adversarial Networks for Speech Emotion Recognition. arXiv, pp. 521–525 (2020)

    Google Scholar 

  16. Li, Y., Tao, J., Chao, L., Bao, W., Liu, Y.: CHEAVD: a Chinese natural emotional audio-visual database. J. Ambient Intell. Humanized Comput. 8(6), 913–924 (2017). https://doi.org/10.1007/s12652-016-0406-z

    Article  Google Scholar 

  17. Li, Y., Tao, J., Technology, I., Jiang, D., Shan, S., Jia, J.: MEC 2017: Multimodal Emotion Recognition Challenge (2018)

    Google Scholar 

  18. Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, pp. 3–6 (2017). https://doi.org/10.1109/APSIPA.2016.7820699

  19. Mayer, J.D.: Emotional intelligence. Imagination Cogn. Pers. 9(3), 185–211 (1989). https://doi.org/10.2190/DUGG-P24E-52WK-6CDG

    Article  Google Scholar 

  20. Nardelli, M., Valenza, G., Greco, A., Lanata, A., Scilingo, E.P.: Recognizing emotions induced by affective sounds through heart rate variability. IEEE Trans. Affect. Comput. 6(4), 385–394 (2015). https://doi.org/10.1109/TAFFC.2015.2432810

    Article  Google Scholar 

  21. Niu, Y., Zou, D., Niu, Y., He, Z., Tan, H.: Improvement on speech emotionrecognition based on deep convolutional neural networks.pdf (2018). https://doi.org/10.1145/3194452.3194460

  22. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003). https://doi.org/10.1016/S0167-6393(03)00099-2

    Article  Google Scholar 

  23. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: Proceedings - IEEE International Conference on Multimedia and Expo 1, pp. I401–I404 (2003). https://doi.org/10.1109/ICME.2003.1220939

  24. Shen, G., et al.: WISE: word-level interaction-based multimodal fusion for speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 369–373 (2020). https://doi.org/10.21437/Interspeech.2020-3131

  25. Su, B.H., Chang, C.M., Lin, Y.S., Lee, C.C.: Improving speech emotion recognition using graph attentive Bi-directional gated recurrent unit network. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October, pp. 506–510 (2020). https://doi.org/10.21437/Interspeech

  26. Sun, Y., Wen, G., Wang, J.: Weighted spectral features based on local Hu moments for speech emotion recognition. Biomed. Signal Process. Control 18, 80–90 (2015). https://doi.org/10.1016/j.bspc.2014.10.008

    Article  Google Scholar 

  27. Theodoros, G.: A Python library for audio feature extraction, classification, segmentation and applications. https://github.com/tyiannak/pyAudioAnalysis

  28. Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2016-May, pp. 5200–5204 (2016). https://doi.org/10.1109/ICASSP.2016.7472669

  29. Yuvaraj, R., et al.: Detection of emotions in Parkinson’s disease using higher order spectral features from brain’s electrical activity. Biomed. Signal Process. Control 14(1), 108–116 (2014). https://doi.org/10.1016/j.bspc.2014.07.005

    Article  Google Scholar 

  30. Zhang, X., Xu, M., Zheng, T.F.: Ensemble system for multimodal emotion recognition challenge. In: 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018 (MEC 2017), pp. 7–12 (2018). https://doi.org/10.1109/ACIIAsia.2018.8470352

  31. Zhu, T., Lin, Y., Liu, Y.: Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn. 72, 327–340 (2017). https://doi.org/10.1016/j.patcog.2017.07.024

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the Six-Talent Peaks Project of Jiangsu Province (XYDXX-204), the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (KF-2019-04-011, KF-2019-04-065), and Angel Project of Suzhou City science and technology (Grant No. CYTS2018233).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huakang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Bin, J., Li, H. (2021). Cross Languages One-Versus-All Speech Emotion Classifier. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-5188-5_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-5187-8

  • Online ISBN: 978-981-16-5188-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics