Abstract
When a person recognizes another’s emotion, he or she recognizes the (facial) features associated with emotional expression. So, for a machine to recognize facial emotion(s), the features related to emotional expression must be represented and described properly. However, prior arts based on label supervision not only failed to explicitly capture features related to emotional expression, but also were not interested in learning emotional representations. This paper proposes a novel approach to generate features related to emotional expression through feature transformation and to use them for emotional representation learning. Specifically, the contrast between the generated features and overall facial features is quantified through contrastive representation learning, and then facial emotions are recognized based on understanding of angle and intensity that describe the emotional representation in the polar coordinate, i.e., the Arousal-Valence space. Experimental results show that the proposed method improves the PCC/CCC performance by more than 10% compared to the runner-up method in the wild datasets and is also qualitatively better in terms of neural activation map. Code is available at https://github.com/kdhht2334/AVCE_FER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barros, P., Parisi, G., Wermter, S.: A personalized affective memory model for improving emotion recognition. In: International Conference on Machine Learning, pp. 485–494 (2019)
Bera, A., Randhavane, T., Manocha, D.: Modelling multi-channel emotions using facial expression and trajectory cues for improving socially-aware robot navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Cerf, M., Frady, E.P., Koch, C.: Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(12), 10 (2009)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, Doha, Qatar, pp. 1724–1734. ACL (2014)
Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 8765–8775. Curran Associates, Inc. (2020)
d’Apolito, S., Paudel, D.P., Huang, Z., Romero, A., Van Gool, L.: Ganmut: learning interpretable conditional space for gamut of emotions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 568–577 (2021)
Deng, D., Chen, Z., Zhou, Y., Shi, B.: Mimamo net: integrating micro-and macro-motion for video emotion recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2621–2628 (2020)
Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: audio-video, student engagement and group-level affect prediction. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 653–656 (2018)
Diamond, S., Boyd, S.: Cvxpy: A python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(1), 2909–2913 (2016)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations (2018)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint. arXiv:2006.07733 (2020)
Hasani, B., Mahoor, M.H.: Facial affect estimation in the wild using deep residual and convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–16 (2017)
Hasani, B., Negi, P.S., Mahoor, M.: Breg-next: facial affect computing using adaptive residual networks with bounded gradient. IEEE Trans. Affect. Comput. 13(2), 1023–1036 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Jackson, J.C., et al.: Emotion semantics show both cultural variation and universal structure. Science 366(6472), 1517–1522 (2019)
Jefferies, L.N., Smilek, D., Eich, E., Enns, J.T.: Emotional valence and arousal interact in attentional control. Psychol. Sci. 19(3), 290–295 (2008)
Kim, D.H., Song, B.C.: Contrastive adversarial learning for person independent facial emotion recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5948–5956 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). arxiv.org/abs/1412.6980
Kollias, D., et al.: Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. Int. J. Comput. Vis. 127(6), 907–929 (2019). https://doi.org/10.1007/s11263-019-01158-4
Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, 9–12 September 2019, p. 297 (2019). https://www.bmvc2019.org/wp-content/uploads/papers/0399-paper.pdf
Kossaifi, J., Toisoul, A., Bulat, A., Panagakis, Y., Hospedales, T.M., Pantic, M.: Factorized higher-order cnns with an application to spatio-temporal emotion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6060–6069 (2020)
Kossaifi, J., Tzimiropoulos, G., Todorovic, S., Pantic, M.: Afew-va database for valence and arousal estimation in-the-wild. Image Vis. Comput. 65, 23–36 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., Tang, J.: Self-supervised learning: Generative or contrastive. IEEE Trans. Knowl. Data Eng. (2021)
Marinoiu, E., Zanfir, M., Olaru, V., Sminchisescu, C.: 3d human sensing, action and emotion recognition in robot assisted therapy of children with autism. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2158–2167 (2018)
Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623 (2016)
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005). https://doi.org/10.3758/BF03192732
Mitenkova, A., Kossaifi, J., Panagakis, Y., Pantic, M.: Valence and arousal estimation in-the-wild with tensor methods. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–7 (2019)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Mroueh, Y., Melnyk, I., Dognin, P., Ross, J., Sercu, T.: Improved mutual information estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9009–9017 (2021)
Niculae, V., Martins, A., Blondel, M., Cardie, C.: Sparsemap: differentiable sparse structured inference. In: International Conference on Machine Learning, pp. 3799–3808 (2018)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint. arXiv:1807.03748 (2018)
Panda, R., Zhang, J., Li, H., Lee, J.Y., Lu, X., Roy-Chowdhury, A.K.: Contemplating visual emotions: Understanding and overcoming dataset bias. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 579–595 (2018)
Posner, J., Russell, J.A., Peterson, B.S.: The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 17(3), 715–734 (2005)
Rafaeli, A., Sutton, R.I.: Emotional contrast strategies as means of social influence: Lessons from criminal interrogators and bill collectors. Acad. Manag. J. 34(4), 749–775 (1991)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Roy, S., Etemad, A.: Self-supervised contrastive learning of multi-view facial expressions. arXiv preprint. arXiv:2108.06723 (2021)
Roy, S., Etemad, A.: Spatiotemporal contrastive learning of facial expressions in videos. arXiv preprint. arXiv:2108.03064 (2021)
Sanchez, E., Tellamekala, M.K., Valstar, M., Tzimiropoulos, G.: Affective processes: stochastic modelling of temporal context for emotion and facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9074–9084 (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Song, B.C., Kim, D.H.: Hidden emotion detection using multi-modal signals. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–7 (2021)
Srivastava, P., Srinivasan, N.: Time course of visual attention with emotional faces. Attention Percept. Psychophysics 72(2), 369–377 (2010). https://doi.org/10.3758/APP.72.2.369
Taverner, J., Vivancos, E., Botti, V.: A multidimensional culturally adapted representation of emotions for affective computational simulation and recognition. IEEE Trans. Affect. Comput. (2020)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45
Tsai, Y.H., Zhao, H., Yamada, M., Morency, L.P., Salakhutdinov, R.: Neural methods for point-wise dependency estimation. In: Proceedings of the Neural Information Processing Systems Conference (Neurips) (2020)
Tsai, Y.H.H., Ma, M.Q., Yang, M., Zhao, H., Morency, L.P., Salakhutdinov, R.: Self-supervised representation learning with relative predictive coding. In: International Conference on Learning Representations (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017)
Wang, Y., Pan, X., Song, S., Zhang, H., Huang, G., Wu, C.: Implicit semantic data augmentation for deep networks. Adv. Neural. Inf. Process. Syst. 32, 12635–12644 (2019)
Wei, Z., Zhang, J., Lin, Z., Lee, J.Y., Balasubramanian, N., Hoai, M., Samaras, D.: Learning visual emotion representations from web data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2020)
Xue, F., Wang, Q., Guo, G.: Transfer: learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
, Yang, J., Li, J., Li, L., Wang, X., Gao, X.: A circular-structured representation for visual emotion distribution learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4237–4246 (2021)
Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal’in-the-wild’challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–41 (2017)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: Proceedings of the 38th International Conference on Machine Learning, Virtual Event, vol. 139, pp. 12310–12320. PMLR (2021)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Zhu, X., Xu, C., Tao, D.: Where and what? examining interpretable disentangled representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5861–5870 (2021)
Acknowledgements
This work was supported by IITP grants funded by the Korea government (MSIT) (No. 2021-0-02068, AI Innovation Hub and RS-2022-00155915, Artificial Intelligence Convergence Research Center(Inha University)), and was supported by the NRF grant funded by the Korea government (MSIT) (No. 2022R1A2C2010095 and No. 2022R1A4A1033549).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, D., Song, B.C. (2022). Emotion-aware Multi-view Contrastive Learning for Facial Emotion Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13673. Springer, Cham. https://doi.org/10.1007/978-3-031-19778-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-19778-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19777-2
Online ISBN: 978-3-031-19778-9
eBook Packages: Computer ScienceComputer Science (R0)