Skip to main content
Log in

Multimodal Emotion Distribution Learning

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Background

Emotion recognition is an interesting and challenging problem and has attracted much attention in recent years. To more accurately express emotions, emotion distribution learning (EDL) introduces the emotion description degree to form an emotion distribution at a fine granularity, which is used to describe the fusion of multiple basic emotions at different levels.

Challenge

Existing EDL research has shown a strong representation ability on emotion recognition, but all studies are based on unimodal information, meaning the results may be one-sided.

Method

As the first pioneering investigation of multimodal emotion distribution learning, we present a corresponding learning method named MEDL. First, for each modality, we learn an emotion distribution and obtain the corresponding label correlation matrix. Second, we constrain the consistency of label correlation matrices between different modalities to utilize modal complementarity. Finally, the final emotion distribution is achieved based on a simple decision fusion strategy.

Results and Conclusions

The experimental results demonstrate that our proposal performs better than some state-of-the-art multimodal emotion recognition methods and unimodal emotion distribution learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abdul-Mageed M, Ungar L. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In: 55th annual meeting of the association for computational linguistics, 2017;718–28.

  2. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.

    MathSciNet  MATH  Google Scholar 

  3. Boyd SP, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2011;3(1):1–122.

    Article  MATH  Google Scholar 

  4. Brady K, Gwon Y, Khorrami P, Godoy E, Campbell W, Dagli C, Huang TS. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In: Proceedings of the 6th international workshop on audio/visual emotion challenge, New York, NY, USA: Association for Computing Machinery; 2016;97–104.

  5. Fan Y, Yang H, Li Z, Liu S. Predicting image emotion distribution by learning labels’ correlation. IEEE Access. 2019;7:129997–30007.

  6. Gaus YFA, Meng H, Jan A, Zhang F, Turabzadeh S: Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and pls regression. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol. 05, pp. 1–6 (2015). https://doi.org/10.1109/FG.2015.7284859

  7. Gaus YFA, Meng H, Jan A, Zhang F, Turabzadeh S. Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and PLS regression. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), 2015;1–6. https://doi.org/10.1109/FG.2015.7284859

  8. Geng X. Label distribution learning. IEEE Trans Knowl Data Eng. 2016;28(7):1734–48.

    Article  Google Scholar 

  9. Geng X, Yin C, Zhou ZH. Facial age estimation by learning from label distributions. IEEE Trans Pattern Anal Mach Intell. 2013;35(10):2401–12.

    Article  Google Scholar 

  10. Han J, Zhang Z, Ren Z, Schuller B. Exploring perception uncertainty for emotion recognition in dyadic coversation and music listening. Cognit Computat. 2020. https://doi.org/10.1007/s12559-019-09694-4.

  11. Haq S, Jackson P, Edge J. Audio-visual feature selection and reduction for emotion classification. In: International conference on auditory-visual speech, 2008;185–90.

  12. He L, Jiang D, Yang L, Pei E, Wu P, Sahli H. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: Proceedings of the 5th international workshop on audio/visual emotion challenge. New York, NY, USA: Association for Computing Machinery; 2015;73–80.

  13. He T, Jin X. Image emotion distribution learning with graph convolutional networks. In: International conference on multimedia retrieval, 2019;382–90.

  14. Huang H, Hu Z, Wang W, Wu M. Multimodal emotion recognition based on ensemble convolutional neural network. IEEE Access. 2020;8:3265–71.

    Article  Google Scholar 

  15. Jia X, Li W, Liu J, Zhang Y. Label distribution learning by exploiting label correlations. In: AAAI conference on artificial intelligence, 2018; 3310–7.

  16. Jia X, Li Z, Zheng X, Li W, Huang SJ. Label distribution learning with label correlations on local samples. IEEE Trans Knowl Data Eng. 2021;33(4):1619–31.

    Article  Google Scholar 

  17. Jia X, Zheng X, Li W, Zhang C, Li Z. Facial motion distribution learning by exploiting low-rank label correlations locally. In: IEEE conference on computer vision and pattern recognition, 2019;9841–50.

  18. Jin Q, Li C, Chen S, Wu H. Speech emotion recognition with acoustic and lexical features. In: IEEE international conference on acoustics, speech and signal processing, 2015;4749–53.

  19. Kachele M, Schels M, Thiam P, Schwenker F. Fusion mappings for multimodal affect recognition. In: 2015 IEEE symposium series on computational intelligence, 2015;307–13. https://doi.org/10.1109/SSCI.2015.53.

  20. Kobayashi H, Hara F. The recognition of basic facial expressions by neural network. In: [Proceedings] 1991 IEEE international joint conference on neural networks, vol. 1, 1991;460–6. https://doi.org/10.1109/IJCNN.1991.170444.

  21. Li B, Hu W, Xiong W, Wu O, Li W. Horror image recognition based on emotional attention. In: Asian conference on computer vision, 2010;594–605.

  22. Li X, Xianyu H, Tian J, Chen W, Meng F, Xu M, Cai L. A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016;544–8. https://doi.org/10.1109/ICASSP.2016.7471734.

  23. Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math Program. 1989;45(1–3):503–28.

    Article  MathSciNet  MATH  Google Scholar 

  24. Liu H, Zhang L. Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granular Comput. 2018;3:355–65.

    Article  Google Scholar 

  25. Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (Ravdess): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE. 2018;13(5):e0196391.

    Article  Google Scholar 

  26. Miao D, Wang G, Liu QT, Yao Y. Granular computing: past, present and future prospects. Beijing: Science Press; 2007.

    Google Scholar 

  27. Nemati S. Canonical correlation analysis for data fusion in multimodal emotion recognition. In: International symposium on telecommunications, 2018;676–81.

  28. Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fus. 2017;37:98–125.

    Article  Google Scholar 

  29. Ren T, Jia X, Li W, Chen L, Li Z. Label distribution learning with label-specific features. In: International joint conference on artificial intelligence, pp. 3318–3324 (2019)

  30. Ringeval F, Schuller B, Valstar M, Jaiswal S, Marchi E, Lalanne D, Cowie R, Pantic M. Av+ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th international workshop on audio/visual emotion challenge. New York, NY, USA: Association for Computing Machinery; 2015;3–8.

  31. Saha T, Gupta D, Saha S, Bhattacharyya P. Emotion aided dialogue act classification for task-independent conversations in a multi-modal framework. Cognit Comput. 2020. https://doi.org/10.1007/s12559-019-09704-5.

  32. Sayedelahl A, Araujo R, Kamel MS. Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversations. In: IEEE international conference on multimedia and expo workshops, 2013;1–6.

  33. Schölkopf B, Smola AJ. Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning Series. MIT Press; 2002.

  34. Shah M, Tu M, Berisha V, Chakrabarti C, Spanias A. Articulation constrained learning with application to speech emotion recognition. EURASIP J Audio Speech Music Process. 2019;2019:14.

    Article  Google Scholar 

  35. Valstar M, Schuller B, Smith K, Almaev T, Eyben F, Krajewski J, Cowie R, Pantic M. Avec 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th international workshop on audio/visual emotion challenge, 2014;3–10. Association for Computing Machinery, New York, NY, USA

  36. Vergin R, O’Shaughnessy DD, Farhat A. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans Speech Audio Process. 1999;7(5):525–32.

  37. Xu C, Li H, Bo H, Ma L. Speech emotion recognition using multi-granularity feature fusion through auditory cognitive mechanism. In: International conference on cognitive computing, 2019;117–31.

  38. Xu N, Tao A, Geng X. Label enhancement for label distribution learning. In: International joint conference on artificial intelligence, 2018;2926–2932.

  39. Yang J, She D, Sun M. Joint image emotion classification and distribution learning via deep convolutional neural network. In: International joint conference on artificial intelligence, 2017;3266–3272.

  40. Yao J, Oladimeji OA, Zhang Y. Fractal analysis approaches to granular computing. In: International joint conference on rough sets, 2017;215–222.

  41. Yao Y. A triarchic theory of granular computing. Granular Comput. 2016;1:145–57.

    Article  Google Scholar 

  42. Zeng Z, Pantic M, Roisman GI, Huang TS. A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell. 2009;31(1):39–58. https://doi.org/10.1109/TPAMI.2008.52.

    Article  Google Scholar 

  43. Zhao S, Zhao X, Ding G, Keutzer K. Emotiongan: unsupervised domain adaptation for learning discrete probability distributions of image emotions. In: ACM multimedia conference, 2018;1319–27.

  44. Zhao Z, Ma X. Text emotion distribution learning from small sample: a meta-learning approach. In: Empirical methods in natural language processing and international joint conference on natural language processing, 2019;3955–3965.

  45. Zheng X, Jia X, Li W. Label distribution learning by exploiting sample correlations locally. In: AAAI conference on artificial intelligence, 2018;4556–63.

  46. Zhou D, Zhang X, Zhou Y, Zhao Q, Geng X.: Emotion distribution learning from texts. In: 2016 conference on empirical methods in natural language processing, 2016;638–47.

  47. Zhou Y, Xue H, Geng X. Emotion distribution recognition from facial expressions. In: ACM multimedia conference, 2015;1247–1250.

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant No. 62176123), the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20191287), and the Fundamental Research Funds for the Central Universities (Grant No. 30920021131).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiuyi Jia.

Ethics declarations

Ethics Approval and Consent to Participate

This article does not contain any studies with animals performed by any of the authors.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, X., Shen, X. Multimodal Emotion Distribution Learning. Cogn Comput 14, 2141–2152 (2022). https://doi.org/10.1007/s12559-021-09927-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09927-5

Keywords

Navigation