Skip to main content
Log in

Exploring Perception Uncertainty for Emotion Recognition in Dyadic Conversation and Music Listening

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Predicting emotions automatically is an active field of research in affective computing. Considering the property of the individual’s subjectivity, the label of an emotional instance is usually created based on opinions from multiple annotators. That is, the labelled instance is often accompanied with the corresponding inter-rater disagreement information, which we call here the perception uncertainty. Such uncertainty information, as shown in previous studies, can provide supplementary information for better recognition performance in such a subjective task. In this paper, we propose a multi-task learning framework to leverage the knowledge of perception uncertainty to ameliorate the prediction performance. In particular, in our novel framework, the perception uncertainty is exploited in an explicit manner to manipulate an initial prediction dynamically, in contrast to merely estimating the emotional state and perception uncertainty simultaneously, as done in a conventional multi-task learning framework. To evaluate the feasibility and effectiveness of the proposed method, we perform extensive experiments for time- and value-continuous emotion predictions in audiovisual conversation and music listening scenarios. Compared with other state-of-the-art approaches, our approach yields remarkable performance improvements in both datasets. The obtained results indicate that integrating the perception uncertainty information can enhance the learning process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput 2015;7(4):487–99.

    Article  Google Scholar 

  2. Albanie S, Nagrani A, Vedaldi A, Zisserman A. Emotion recognition in speech using cross-modal transfer in the wild. Proc. ACM international conference on multimedia (MM). Seoul; 2018. p. 292–301.

  3. Aljanaki A, Yang YH, Soleymani M. Developing a benchmark for emotional analysis of music. PloS One 2017;12(3):e0173,392.

    Article  Google Scholar 

  4. Beatty A. Anthropology and emotion. J R Anthropol Instit 2014;20(3):545–63.

    Article  Google Scholar 

  5. Brady K, Gwon Y, Khorrami P, Godoy E, Campbell WM, Dagli CK, Huang TS. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. Proc. 6th international workshop on audio/visual emotion challenge (AVEC). Amsterdam; 2016. p. 97–104.

  6. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst 2016;31(2):102–7.

    Article  Google Scholar 

  7. Chorianopoulou A, Tzinis E, Iosif E, Papoulidi A, Papailiou C, Potamianos A. Engagement detection for children with autism spectrum disorder. Proc. international conference on acoustics, speech and signal processing (ICASSP). Calgary; 2017. p. 5055–9.

  8. Chou H, Lee C. Every rating matters: joint learning of subjective labels and individual annotators for speech emotion classification. Proc. IEEE international conference on acoustics, speech and signal processing (ICASSP). Brighton; 2019. p. 5886–90.

  9. Dang T, Sethu V, Ambikairajah E. Dynamic multi-rater gaussian mixture regression incorporating temporal dependencies of emotion uncertainty using kalman filters. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 4929–33.

  10. Dang T, Sethu V, Epps J, Ambikairajah E. An investigation of emotion prediction uncertainty using gaussian mixture regression. Proc. Annual conference of the international speech communication association (INTERSPEECH). Stockholm; 2017. p. 1248–52.

  11. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 2016;8(4): 757–71.

    Article  Google Scholar 

  12. Deng J, Han W, Schuller B. Confidence measures for speech emotion recognition: a start. Proc.the 10th ITG conference on speech communication. Braunschweig; 2012. p. 1–4.

  13. Eyben F, Scherer K, Schuller B, Sundberg J, André E., Busso C, Devillers L, Epps J, Laukka P, Narayanan S, Truong K. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans Affect Comput 2016;7(2):190–202.

    Article  Google Scholar 

  14. Eyben F, Wöllmer M, Schuller B. openSMILE – the Munich versatile and fast open-source audio feature extractor. Proc. ACM international conference on multimedia (ACM MM). Florence; 2010. p. 1459–62.

  15. Eyben F, Wöllmer M, Schuller B. A multitask approach to continuous five-dimensional affect sensing in natural speech. ACM Trans Interact Intell Syst 2012;2(1):1–29.

    Article  Google Scholar 

  16. Gui L, Baltrušaitis T, Morency L. Curriculum learning for facial expression recognition. Proc. 12th IEEE international conference on automatic face gesture recognition (FG). Washington; 2017. p. 505–11.

  17. Han J, Zhang Z, Cummins N, Ringeval F, Schuller B. Strength modelling for real-world automatic continuous affect recognition from audiovisual signals. Image Vis Comput 2017;65:76–86.

    Article  Google Scholar 

  18. Han J, Zhang Z, Cummins N, Schuller B. Adversarial training in affective computing and sentiment analysis: recent advances and perspectives. IEEE Comput Intell Mag 2019;14(2):68–81.

    Article  Google Scholar 

  19. Han J, Zhang Z, Keren G, Schuller B. Emotion recognition in speech with latent discriminative representations learning. Acta Acust United Acust 2018;104(5):737–40.

    Article  Google Scholar 

  20. Han J, Zhang Z, Schmitt M, Schuller B. From hard to soft: towards more human-like emotion recognition by modelling the perception uncertainty. Proc. ACM International conference on multimedia (MM). Mountain View; 2017. p. 890–97.

  21. Hazarika D, Poria S, Zadeh A, Cambria E, Morency L, Zimmermann R. Conversational memory network for emotion recognition in dyadic dialogue videos. Proc. the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT). New Orleans; 2018. p. 2122–132.

  22. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proc. IEEE conference on computer vision and pattern recognition (ICCV). Las Vegas; 2016. p. 770–78.

  23. He L, Jiang D, Yang L, Pei E, Wu P, Sahli H. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. Proc. 5th international workshop on audio/visual emotion challenge (AVEC). Brisbane; 2015. p. 73–80.

  24. He L, Liu B, Li G, Sheng Y, Wang Y, Xu Z. Knowledge base completion by variational Bayesian neural tensor decomposition. Cogn Comput 2018;10(6):1075–84.

    Article  Google Scholar 

  25. Kaminskas M, Ricci F. Contextual music information retrieval and recommendation: state of the art and challenges. Comput Sci Rev 2012;6(2–3):89–119.

    Article  Google Scholar 

  26. Katsigiannis S, Ramzan N. DREAMER: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J Biomed Health Inf 2018;22(1):98–107.

    Article  Google Scholar 

  27. Kim Y, Kim J. Human-like emotion recognition: multi-label learning from noisy labeled audio-visual expressive speech. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 5104–08.

  28. Kim Y, Provost EM. Leveraging inter-rater agreement for audio-visual emotion recognition. Proc. International conference on affective computing and intelligent interaction (ACII). Xi’an; 2015. p. 553–59.

  29. Koelsch S. Music-evoked emotions: principles, brain correlates, and implications for therapy. Ann N Y Acad Sci 2015;1337(1):193–201.

    Article  Google Scholar 

  30. Kossaifi J, Walecki R, Panagakis Y, Shen J, Schmitt M, Ringeval F, Han J, Pandit V, Schuller B, Star K, Hajiyev E, Pantic M. 2019. SEWA DB: a rich database for audio-visual emotion and sentiment research in the wild. In: IEEE Transactions on pattern analysis and machine intelligence. No pagination.

  31. Li X, Bing L, Lam W, Shi B. Transformation networks for target-oriented sentiment classification. Proc. Annual meeting of the association for computational linguistics (ACL). Melbourne; 2018. p. 946–56.

  32. Liu N, Fang Y, Li L, Hou L, Yang F, Guo Y. Multiple feature fusion for automatic emotion recognition using EEG signals. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 896–900.

  33. Lotfian R, Busso C. Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans Audio Speech Lang Process 2019;27(4):815–26.

    Article  Google Scholar 

  34. Majid A. Current emotion research in the language sciences. Emot Rev 2012;4(4):432–43.

    Article  Google Scholar 

  35. Majumder N, Poria S, Gelbukh A, Cambria E. Deep learning-based document modeling for personality detection from text. IEEE Intell Syst 2017;32(2):74–9.

    Article  Google Scholar 

  36. Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E. DialogueRNN: an attentive RNN for emotion detection in conversations. Proc. Thirty-Third AAAI conference on artificial intelligence (AAAI). Honolulu; 2019. p. 6818–25.

  37. Malandri L, Xing FZ, Orsenigo C, Vercellis C, Cambria E. Public mood–driven asset allocation: the importance of financial sentiment in portfolio management. Cogn Comput 2018;10(6):1167–76.

    Article  Google Scholar 

  38. Mauss IB, Robinson MD. Measures of emotion: a review. Cogn Emotion 2009;23(2):209–37.

    Article  Google Scholar 

  39. Mower E, Metallinou A, Lee C, Kazemzadeh A, Busso C, Lee S, Narayanan S. Interpreting ambiguous emotional expressions. Proc. International conference on affective computing and intelligent interaction (ACII). Amsterdam; 2009. p. 1–8.

  40. Niedenthal PM, Ric F. Psychology of emotion, 2nd ed. New York: Psychology Press; 2017.

    Book  Google Scholar 

  41. Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G. 2018. Survey on emotional body gesture recognition. IEEE Transactions on Affective Computing. No pagination.

  42. Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. Proc. International conference on empirical methods in natural language processing (EMNLP). Lisbon; 2015. p. 2539–44.

  43. Principi E, Rotili R, Wöllmer M, Eyben F, Squartini S, Schuller B. Real-time activity detection in a multi-talker reverberated environment. Cogn Comput 2012;4(4):386–97.

    Article  Google Scholar 

  44. Ringeval F, Schuller B, Valstar M, Jaiswal S, Marchi E, Lalanne D, Cowie R, Pantic M. AV+EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. Proc. the 5th international workshop on audio/visual emotion challenge (AVEC). Brisbane; 2015. p. 3–8.

  45. Ringeval F, Sonderegger A, Sauer JS, Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. 10th IEEE International conference and workshops on automatic face and gesture recognition (FG). Shanghai; 2013. p. 1–8.

  46. Sarda P, Halasawade S, Padmawar A, Aghav J. Emousic: emotion and activity-based music player using machine learning. Proc. International conference on computer communication and computational sciences (IC4S). Bangkok; 2018. p. 179–88.

  47. Schuller B, Batliner A. Computational paralinguistics: emotion, affect and personality in speech and language processing. Hoboken: Wiley; 2013.

    Book  Google Scholar 

  48. Soleymani M, Caro MN, Schmidt EM, Sha CY, Yang YH. 1000 songs for emotional analysis of music. Proc. 2nd ACM international workshop on crowdsourcing for multimedia (CrowdMM); 2013. p. 1–6.

  49. Soleymani M, Caro MN, Schmidt EM, Yang YH. The mediaeval 2013 brave new task: emotion in music. Proc. MediaEval workshop; 2013. p. 1–2.

  50. Sun X, Lv M. Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 2019;11(4):587–97.

    Article  Google Scholar 

  51. Trigeorgis G, Ringeval F, Bruckner R, Marchi E, Nicolaou M, Schuller B, Zafeiriou S. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. Proc. International conference on acoustics, speech and signal processing (ICASSP). Shanghai; 2016. p. 5200–4.

  52. Valstar M, Gratch J, Schuller B, Ringeval F, Lalanne D, Torres Torres M, Scherer S, Stratou G, Cowie R, Pantic M. AVEC 2016: depression, mood, and emotion recognition workshop and challenge. Proc. the 6th international workshop on audio/visual emotion challenge (AVEC). Amsterdam; 2016. p. 3–10.

  53. Weninger F, Ringeval F, Marchi E, Schuller B. Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. Proc. International joint conference on artificial intelligence (IJCAI). New York; 2016. p. 2196–02.

  54. Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R. Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. Proc. Annual conference of the international speech communication association (INTERSPEECH). Brisbane; 2008. p. 597–600.

  55. Zhang L, Wang S, Liu B. Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Mining Knowl Discov 2018;8(4):1–25.

    Google Scholar 

  56. Zhang Z, Coutinho E, Deng J, Schuller B. Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans Audio Speech Lang Process 2015;23(1):115–26.

    Google Scholar 

  57. Zhang Z, Cummins N, Schuller B. Advanced data exploitation for speech analysis – an overview. IEEE Signal Process Mag 2017;34(4):107–29.

    Article  Google Scholar 

  58. Zhang Z, Eyben F, Deng J, Schuller B. An agreement and sparseness-based learning instance selection and its application to subjective speech phenomena. Proc. 5th international workshop on emotion social signals, sentiment & linked open data, satellite of LREC. Reykjavik; 2014. p. 21–6.

  59. Zhang Z, Han J, Schuller B. Dynamic difficulty awareness training for continuous emotion prediction. IEEE Trans Multimed 2019;21(5):1289–301.

    Article  Google Scholar 

Download references

Funding

This study was partially supported by the TransAtlantic Platform “Digging into Data” collaboration grant (ACLEW: Analysing Child Language Experiences Around The World), with the support of the UK’s Economic & Social Research Council through the research Grant No. HJ-253479, and by the European Union’s Horizon H2020 Research and Innovation programme under Marie Skłodowska-Curie grant agreement No. 766287 (TAPAS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Han.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, J., Zhang, Z., Ren, Z. et al. Exploring Perception Uncertainty for Emotion Recognition in Dyadic Conversation and Music Listening. Cogn Comput 13, 231–240 (2021). https://doi.org/10.1007/s12559-019-09694-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-019-09694-4

Keywords

Navigation