Abstract
Predicting emotions automatically is an active field of research in affective computing. Considering the property of the individual’s subjectivity, the label of an emotional instance is usually created based on opinions from multiple annotators. That is, the labelled instance is often accompanied with the corresponding inter-rater disagreement information, which we call here the perception uncertainty. Such uncertainty information, as shown in previous studies, can provide supplementary information for better recognition performance in such a subjective task. In this paper, we propose a multi-task learning framework to leverage the knowledge of perception uncertainty to ameliorate the prediction performance. In particular, in our novel framework, the perception uncertainty is exploited in an explicit manner to manipulate an initial prediction dynamically, in contrast to merely estimating the emotional state and perception uncertainty simultaneously, as done in a conventional multi-task learning framework. To evaluate the feasibility and effectiveness of the proposed method, we perform extensive experiments for time- and value-continuous emotion predictions in audiovisual conversation and music listening scenarios. Compared with other state-of-the-art approaches, our approach yields remarkable performance improvements in both datasets. The obtained results indicate that integrating the perception uncertainty information can enhance the learning process.
Similar content being viewed by others
References
Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput 2015;7(4):487–99.
Albanie S, Nagrani A, Vedaldi A, Zisserman A. Emotion recognition in speech using cross-modal transfer in the wild. Proc. ACM international conference on multimedia (MM). Seoul; 2018. p. 292–301.
Aljanaki A, Yang YH, Soleymani M. Developing a benchmark for emotional analysis of music. PloS One 2017;12(3):e0173,392.
Beatty A. Anthropology and emotion. J R Anthropol Instit 2014;20(3):545–63.
Brady K, Gwon Y, Khorrami P, Godoy E, Campbell WM, Dagli CK, Huang TS. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. Proc. 6th international workshop on audio/visual emotion challenge (AVEC). Amsterdam; 2016. p. 97–104.
Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst 2016;31(2):102–7.
Chorianopoulou A, Tzinis E, Iosif E, Papoulidi A, Papailiou C, Potamianos A. Engagement detection for children with autism spectrum disorder. Proc. international conference on acoustics, speech and signal processing (ICASSP). Calgary; 2017. p. 5055–9.
Chou H, Lee C. Every rating matters: joint learning of subjective labels and individual annotators for speech emotion classification. Proc. IEEE international conference on acoustics, speech and signal processing (ICASSP). Brighton; 2019. p. 5886–90.
Dang T, Sethu V, Ambikairajah E. Dynamic multi-rater gaussian mixture regression incorporating temporal dependencies of emotion uncertainty using kalman filters. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 4929–33.
Dang T, Sethu V, Epps J, Ambikairajah E. An investigation of emotion prediction uncertainty using gaussian mixture regression. Proc. Annual conference of the international speech communication association (INTERSPEECH). Stockholm; 2017. p. 1248–52.
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 2016;8(4): 757–71.
Deng J, Han W, Schuller B. Confidence measures for speech emotion recognition: a start. Proc.the 10th ITG conference on speech communication. Braunschweig; 2012. p. 1–4.
Eyben F, Scherer K, Schuller B, Sundberg J, André E., Busso C, Devillers L, Epps J, Laukka P, Narayanan S, Truong K. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans Affect Comput 2016;7(2):190–202.
Eyben F, Wöllmer M, Schuller B. openSMILE – the Munich versatile and fast open-source audio feature extractor. Proc. ACM international conference on multimedia (ACM MM). Florence; 2010. p. 1459–62.
Eyben F, Wöllmer M, Schuller B. A multitask approach to continuous five-dimensional affect sensing in natural speech. ACM Trans Interact Intell Syst 2012;2(1):1–29.
Gui L, Baltrušaitis T, Morency L. Curriculum learning for facial expression recognition. Proc. 12th IEEE international conference on automatic face gesture recognition (FG). Washington; 2017. p. 505–11.
Han J, Zhang Z, Cummins N, Ringeval F, Schuller B. Strength modelling for real-world automatic continuous affect recognition from audiovisual signals. Image Vis Comput 2017;65:76–86.
Han J, Zhang Z, Cummins N, Schuller B. Adversarial training in affective computing and sentiment analysis: recent advances and perspectives. IEEE Comput Intell Mag 2019;14(2):68–81.
Han J, Zhang Z, Keren G, Schuller B. Emotion recognition in speech with latent discriminative representations learning. Acta Acust United Acust 2018;104(5):737–40.
Han J, Zhang Z, Schmitt M, Schuller B. From hard to soft: towards more human-like emotion recognition by modelling the perception uncertainty. Proc. ACM International conference on multimedia (MM). Mountain View; 2017. p. 890–97.
Hazarika D, Poria S, Zadeh A, Cambria E, Morency L, Zimmermann R. Conversational memory network for emotion recognition in dyadic dialogue videos. Proc. the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT). New Orleans; 2018. p. 2122–132.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proc. IEEE conference on computer vision and pattern recognition (ICCV). Las Vegas; 2016. p. 770–78.
He L, Jiang D, Yang L, Pei E, Wu P, Sahli H. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. Proc. 5th international workshop on audio/visual emotion challenge (AVEC). Brisbane; 2015. p. 73–80.
He L, Liu B, Li G, Sheng Y, Wang Y, Xu Z. Knowledge base completion by variational Bayesian neural tensor decomposition. Cogn Comput 2018;10(6):1075–84.
Kaminskas M, Ricci F. Contextual music information retrieval and recommendation: state of the art and challenges. Comput Sci Rev 2012;6(2–3):89–119.
Katsigiannis S, Ramzan N. DREAMER: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J Biomed Health Inf 2018;22(1):98–107.
Kim Y, Kim J. Human-like emotion recognition: multi-label learning from noisy labeled audio-visual expressive speech. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 5104–08.
Kim Y, Provost EM. Leveraging inter-rater agreement for audio-visual emotion recognition. Proc. International conference on affective computing and intelligent interaction (ACII). Xi’an; 2015. p. 553–59.
Koelsch S. Music-evoked emotions: principles, brain correlates, and implications for therapy. Ann N Y Acad Sci 2015;1337(1):193–201.
Kossaifi J, Walecki R, Panagakis Y, Shen J, Schmitt M, Ringeval F, Han J, Pandit V, Schuller B, Star K, Hajiyev E, Pantic M. 2019. SEWA DB: a rich database for audio-visual emotion and sentiment research in the wild. In: IEEE Transactions on pattern analysis and machine intelligence. No pagination.
Li X, Bing L, Lam W, Shi B. Transformation networks for target-oriented sentiment classification. Proc. Annual meeting of the association for computational linguistics (ACL). Melbourne; 2018. p. 946–56.
Liu N, Fang Y, Li L, Hou L, Yang F, Guo Y. Multiple feature fusion for automatic emotion recognition using EEG signals. Proc. IEEE International conference on acoustics, speech and signal processing (ICASSP). Calgary; 2018. p. 896–900.
Lotfian R, Busso C. Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans Audio Speech Lang Process 2019;27(4):815–26.
Majid A. Current emotion research in the language sciences. Emot Rev 2012;4(4):432–43.
Majumder N, Poria S, Gelbukh A, Cambria E. Deep learning-based document modeling for personality detection from text. IEEE Intell Syst 2017;32(2):74–9.
Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E. DialogueRNN: an attentive RNN for emotion detection in conversations. Proc. Thirty-Third AAAI conference on artificial intelligence (AAAI). Honolulu; 2019. p. 6818–25.
Malandri L, Xing FZ, Orsenigo C, Vercellis C, Cambria E. Public mood–driven asset allocation: the importance of financial sentiment in portfolio management. Cogn Comput 2018;10(6):1167–76.
Mauss IB, Robinson MD. Measures of emotion: a review. Cogn Emotion 2009;23(2):209–37.
Mower E, Metallinou A, Lee C, Kazemzadeh A, Busso C, Lee S, Narayanan S. Interpreting ambiguous emotional expressions. Proc. International conference on affective computing and intelligent interaction (ACII). Amsterdam; 2009. p. 1–8.
Niedenthal PM, Ric F. Psychology of emotion, 2nd ed. New York: Psychology Press; 2017.
Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G. 2018. Survey on emotional body gesture recognition. IEEE Transactions on Affective Computing. No pagination.
Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. Proc. International conference on empirical methods in natural language processing (EMNLP). Lisbon; 2015. p. 2539–44.
Principi E, Rotili R, Wöllmer M, Eyben F, Squartini S, Schuller B. Real-time activity detection in a multi-talker reverberated environment. Cogn Comput 2012;4(4):386–97.
Ringeval F, Schuller B, Valstar M, Jaiswal S, Marchi E, Lalanne D, Cowie R, Pantic M. AV+EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. Proc. the 5th international workshop on audio/visual emotion challenge (AVEC). Brisbane; 2015. p. 3–8.
Ringeval F, Sonderegger A, Sauer JS, Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. 10th IEEE International conference and workshops on automatic face and gesture recognition (FG). Shanghai; 2013. p. 1–8.
Sarda P, Halasawade S, Padmawar A, Aghav J. Emousic: emotion and activity-based music player using machine learning. Proc. International conference on computer communication and computational sciences (IC4S). Bangkok; 2018. p. 179–88.
Schuller B, Batliner A. Computational paralinguistics: emotion, affect and personality in speech and language processing. Hoboken: Wiley; 2013.
Soleymani M, Caro MN, Schmidt EM, Sha CY, Yang YH. 1000 songs for emotional analysis of music. Proc. 2nd ACM international workshop on crowdsourcing for multimedia (CrowdMM); 2013. p. 1–6.
Soleymani M, Caro MN, Schmidt EM, Yang YH. The mediaeval 2013 brave new task: emotion in music. Proc. MediaEval workshop; 2013. p. 1–2.
Sun X, Lv M. Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 2019;11(4):587–97.
Trigeorgis G, Ringeval F, Bruckner R, Marchi E, Nicolaou M, Schuller B, Zafeiriou S. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. Proc. International conference on acoustics, speech and signal processing (ICASSP). Shanghai; 2016. p. 5200–4.
Valstar M, Gratch J, Schuller B, Ringeval F, Lalanne D, Torres Torres M, Scherer S, Stratou G, Cowie R, Pantic M. AVEC 2016: depression, mood, and emotion recognition workshop and challenge. Proc. the 6th international workshop on audio/visual emotion challenge (AVEC). Amsterdam; 2016. p. 3–10.
Weninger F, Ringeval F, Marchi E, Schuller B. Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. Proc. International joint conference on artificial intelligence (IJCAI). New York; 2016. p. 2196–02.
Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R. Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. Proc. Annual conference of the international speech communication association (INTERSPEECH). Brisbane; 2008. p. 597–600.
Zhang L, Wang S, Liu B. Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Mining Knowl Discov 2018;8(4):1–25.
Zhang Z, Coutinho E, Deng J, Schuller B. Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans Audio Speech Lang Process 2015;23(1):115–26.
Zhang Z, Cummins N, Schuller B. Advanced data exploitation for speech analysis – an overview. IEEE Signal Process Mag 2017;34(4):107–29.
Zhang Z, Eyben F, Deng J, Schuller B. An agreement and sparseness-based learning instance selection and its application to subjective speech phenomena. Proc. 5th international workshop on emotion social signals, sentiment & linked open data, satellite of LREC. Reykjavik; 2014. p. 21–6.
Zhang Z, Han J, Schuller B. Dynamic difficulty awareness training for continuous emotion prediction. IEEE Trans Multimed 2019;21(5):1289–301.
Funding
This study was partially supported by the TransAtlantic Platform “Digging into Data” collaboration grant (ACLEW: Analysing Child Language Experiences Around The World), with the support of the UK’s Economic & Social Research Council through the research Grant No. HJ-253479, and by the European Union’s Horizon H2020 Research and Innovation programme under Marie Skłodowska-Curie grant agreement No. 766287 (TAPAS).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, J., Zhang, Z., Ren, Z. et al. Exploring Perception Uncertainty for Emotion Recognition in Dyadic Conversation and Music Listening. Cogn Comput 13, 231–240 (2021). https://doi.org/10.1007/s12559-019-09694-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-019-09694-4