Skip to main content

Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12535)

Abstract

We present our winning submission to the First International Workshop on Bodily Expressed Emotion Understanding (BEEU) challenge. Based on recent literature on the effect of context/environment on emotion, as well as visual representations with semantic meaning using word embeddings, we extend the framework of Temporal Segment Network to accommodate these. Our method is verified on the validation set of the Body Language Dataset (BoLD) and achieves 0.26235 Emotion Recognition Score on the test set, surpassing the previous best result of 0.2530.

Keywords

  • Emotion
  • Body
  • Context
  • Visual-semantic
  • BEEU challenge

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    PyTorch code available at https://github.com/filby89/NTUA-BEEU-eccv2020.

References

  1. Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012)

    CrossRef  Google Scholar 

  2. Cavallo, F., Semeraro, F., Fiorini, L., Magyar, G., Sinčák, P., Dario, P.: Emotion modelling for social robotics applications: a review. J. Bionic Eng. 15(2), 185–203 (2018)

    CrossRef  Google Scholar 

  3. Dael, N., Mortillaro, M., Scherer, K.R.: Emotion expression in body action and posture. Emotion 12(5), 1085 (2012)

    CrossRef  Google Scholar 

  4. Dael, N., Mortillaro, M., Scherer, K.R.: The body action and posture coding system (BAP): development and reliability. J. Nonverbal Behav. 36(2), 97–121 (2012)

    CrossRef  Google Scholar 

  5. De Gelder, B.: lWhy bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364(1535), 3475–3484 (2009)

    CrossRef  Google Scholar 

  6. Dong, J., Li, X., Snoek, C.G.: Word2VisualVec: image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016)

  7. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)

    CrossRef  Google Scholar 

  8. Ekman, P., Keltner, D.: Universal facial expressions of emotion. In: Segerstrale, U., Molnar, P. (eds.) Nonverbal Communication: Where Nature Meets Culture, pp. 27–46 (1997)

    Google Scholar 

  9. Ekman, R.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, Oxford (1997)

    Google Scholar 

  10. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Rob. Autom. Lett. 4(4), 4011–4018 (2019)

    CrossRef  Google Scholar 

  11. Friesen, W.V., Ekman, P., et al.: EMFACS-7: emotional facial action coding system. Unpublished manuscript, University of California at San Francisco, vol. 2(36), pp. 1 (1983)

    Google Scholar 

  12. Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)

    Google Scholar 

  13. Gaudelus, B., et al.: Improving facial emotion recognition in schizophrenia: a controlled study comparing specific and attentional focused cognitive remediation. Front. Psychiatry 7, 105 (2016)

    CrossRef  Google Scholar 

  14. Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: Proceedings of ICPR, vol. 1, pp. 1148–1153 (2006)

    Google Scholar 

  15. Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. 4(1), 15–33 (2013)

    CrossRef  Google Scholar 

  16. Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Emotion recognition in context. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1960–1968 (2017)

    Google Scholar 

  17. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of IEEE International Conference on Computer Vision, pp. 10143–10152 (2019)

    Google Scholar 

  18. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101 (2010)

    Google Scholar 

  19. Luo, Y., Ye, J., Adams, R.B., Li, J., Newman, M.G., Wang, J.Z.: ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int. J. Comput. Vis. 128(1), 1–25 (2020)

    CrossRef  Google Scholar 

  20. Marinoiu, E., Zanfir, M., Olaru, V., Sminchisescu, C.: 3D human sensing, action and emotion recognition in robot assisted therapy of children with autism. In: Proceedings of CVPR,pp. 2158–2167 (2018)

    Google Scholar 

  21. Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: EmotiCon: context-aware multimodal emotion recognition using Frege’s principle. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)

    Google Scholar 

  22. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)

    CrossRef  Google Scholar 

  23. Pennington, J., Socher, R., Manning, C.D.: GloVE: global vectors for word representation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  24. Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.L.: Multiple instance visual-semantic embedding. In: Proceedings of BMVC (2017)

    Google Scholar 

  25. Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977)

    CrossRef  Google Scholar 

  26. Tracy, J.L., Robins, R.W.: Show your pride: Evidence for a discrete emotion expression. Psychol. Sci. 15(3), 194–197 (2004)

    CrossRef  Google Scholar 

  27. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    CrossRef  Google Scholar 

  28. Wei, Z., et al.: Learning visual emotion representations from web data. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2020)

    Google Scholar 

  29. Yeh, M.C., Li, Y.N.: Multilabel deep visual-semantic embedding. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1530–1536 (2020)

    CrossRef  Google Scholar 

Download references

Acknowledgments

This research is carried out/funded in the context of the project “Intelligent Child-Robot Interaction System for designing and implementing edutainment scenarios with emphasis on visual information" (MIS 5049533) under the call for proposals “Researchers’ support with an emphasis on young researchers- 2nd Cycle”. The project is co-financed by Greece and the European Union (European Social Fund- ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014–2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Paraskevas Filntisis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Filntisis, P.P., Efthymiou, N., Potamianos, G., Maragos, P. (2020). Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12535. Springer, Cham. https://doi.org/10.1007/978-3-030-66415-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66415-2_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66414-5

  • Online ISBN: 978-3-030-66415-2

  • eBook Packages: Computer ScienceComputer Science (R0)