Skip to main content
Log in

Facial expression recognition via transfer learning in cooperative game paradigms for enhanced social AI

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Facial Expression Recognition (FER) is an effortless task for humans, and such non-verbal communication is intricately related to how we relate to others beyond the explicit content of our speech. Facial expressions can convey how we are feeling, as well as our intentions, and are thus a key point in multimodal social interactions. Recent computational advances, such as promising results from Convolutional Neural Networks (CNN), have drawn increasing attention to the potential of FER to enhance human–agent interaction (HAI) and human–robot interaction (HRI), but questions remain as to how “transferrable” the learned knowledge is from one task environment to another. In this paper, we explore how FER can be deployed in HAI cooperative game paradigms, where a human subject interacts with a virtual avatar in a goal-oriented environment where they must cooperate to survive. The primary question was whether transfer learning (TL) would offer an advantage for FER over pre-trained models based on similar (but the not exact same) task environment. The final results showed that TL was able to achieve significantly improved results (94.3% accuracy), without the need for an extensive task-specific corpus. We discuss how such approaches could be used to flexibly create more life-like robots and avatars, capable of fluid social interactions within cooperative multimodal environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

© 2014, IEEE

Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to the fact the data comprises video and audio recordings of identifiable human subjects during gameplay. However, extracted de-identified data may be made available from the corresponding author upon reasonable request.

References

  1. Camerlink I, Coulange E, Farish M, Baxter EM, Turner SP (2018) Facial expression as a potential measure of both intent and emotion. Sci Rep 8:17602. https://doi.org/10.1038/s41598-018-35905-3

    Article  Google Scholar 

  2. Key MR (2011) The relationship of verbal and nonverbal communication. De Gruyter Mouton, Berlin. https://doi.org/10.1515/9783110813098

    Book  Google Scholar 

  3. Mehrabian A (2008) Communication without words. In: Communication theory, pp 193–200, Routledge. https://doi.org/10.4324/9781315080918-15

  4. Jyoti J, Jesse H (2017) Continuous facial expression recognition for affective interaction with virtual avatar. IEEE Signal Processing Society, SigPort

    Google Scholar 

  5. Houshmand B, Khan N (2020) Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning. https://doi.org/10.1109/BigMM50055.2020.00020

  6. Onyema EM, Shukla PK, Dalal S, Mathur MN, Zakariah M, Tiwari B (2021) Enhancement of patient facial recognition through deep learning algorithm: ConvNet. J Healthc Eng. https://doi.org/10.1155/2021/5196000

    Article  Google Scholar 

  7. Bennett CC, Weiss B, Suh J, Yoon E, Jeong J, Chae Y (2022) Exploring data-driven components of socially intelligent AI through cooperative game paradigms. Multimodal Technol Interact 6(2):16. https://doi.org/10.3390/mti6020016

    Article  Google Scholar 

  8. Carranza KR, Manalili J, Bugtai NT, Baldovino RG (2019) Expression tracking with OpenCV deep learning for a development of emotionally aware chatbots. In: 7th IEEE international conference on robot intelligence technology and applications (RiTA), pp 160–163. https://doi.org/10.1109/RITAPP.2019.8932852

  9. Castillo JC, González ÁC, Alonso-Martín F, Fernández-Caballero A, Salichs MA (2018) Emotion detection and regulation from personal assistant robot in smart environment personal assistants. In: Personal assistants: emerging computational technologies, pp 179–195. Springer Cham. https://doi.org/10.1007/978-3-319-62530-0_10

  10. Samadiani N, Huang G, Cai B, Luo W, Chi CH, Xiang Y, He J (2019) A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors 9(8):1863. https://doi.org/10.3390/s19081863

    Article  Google Scholar 

  11. Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors (Basel, Switzerland). https://doi.org/10.3390/s18020401

  12. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition (CVPR)1: I-I. https://doi.org/10.1109/CVPR.2001.990517.

  13. Karnati M, Ayan S, Ondrej K, Anis Y (2021) FER-net: facial expression recognition using deep neural net. Neural Comput Appl 33:9125–9136. https://doi.org/10.1007/s00521-020-05676-y

    Article  Google Scholar 

  14. Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2:420. https://doi.org/10.1007/s42979-021-00815-1

    Article  Google Scholar 

  15. Wafa M, Wahida H (2020) Facial emotion recognition using deep learning: review and insights. Procedia Comput Sci 175:689–694. https://doi.org/10.1016/j.procs.2020.07.101

    Article  Google Scholar 

  16. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE winter conference on applications of computer vision (WACV), 1–10

  17. Lopes AT, Aguiar ED, Souza AF, Oliveira-Santos T (2017) Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognit 61:610–628. https://doi.org/10.1016/j.patcog.2016.07.026

    Article  Google Scholar 

  18. Kim DH, Baddar WJ, Jang J, Ro Y (2019) Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223–236. https://doi.org/10.1109/TAFFC.2017.2695999

    Article  Google Scholar 

  19. Singh S, Prasad SVAV (2018) Techniques and challenges of face recognition: a critical review. Procedia Comput Sci 143:536–543. https://doi.org/10.1016/j.procs.2018.10.427

    Article  Google Scholar 

  20. Mohamad NO, Dras M, Hamey L, Richards D, Wan S, Paris C (2020) Automatic recognition of student engagement using deep learning and facial expression. In: Joint European conference on machine learning and knowledge discovery in databases, pp 273–289. https://doi.org/10.1007/978-3-030-46133-1_17

  21. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449. https://doi.org/10.1162/NECO_a_00990

    Article  MathSciNet  MATH  Google Scholar 

  22. Li B (2021) Facial expression recognition via transfer learning. EAI Endorsed Trans e-Learning 7(21):e4–e4. https://doi.org/10.4108/eai.8-4-2021.169180

    Article  Google Scholar 

  23. Akhand MAH, Shuvendu R, Nazmul S, Kamal MAS, Shimamura T (2021) Facial emotion recognition using transfer learning in the deep CNN. Electronics 10(9):1036. https://doi.org/10.3390/electronics10091036

    Article  Google Scholar 

  24. Bennett CC, Weiss B (2022) Purposeful failures as a form of culturally-appropriate intelligent disobedience during human–robot social interaction. In: Autonomous agents and multiagent systems best and visionary papers (AAMAS 2022), revised selected papers. Springer-Verlag, Berlin, Heidelberg, pp 84–90. https://doi.org/10.1007/978-3-031-20179-0_5

  25. Marsh AA, Elfenbein HA, Ambady N (2003) Nonverbal “accents”: cultural differences in facial expressions of emotion. Psychol Sci 14(4):373–376. https://doi.org/10.1111/1467-9280.24461

    Article  Google Scholar 

  26. Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robot. Int J Soc Robot 1:71–81. https://doi.org/10.1007/s12369-008-0001-3

    Article  Google Scholar 

  27. Ekman P, Friesen WV (2003) Unmasking the face a guide to recognizing emotions from facial clues. Malor Books, Los Altos

    Google Scholar 

  28. Fang X, Rychlowska M, Lange J (2022) Cross-cultural and inter-group research on emotion perception. J Cult Cogn Sci 6:1–7. https://doi.org/10.1007/s41809-022-00102-2

    Article  Google Scholar 

  29. Soussignan R, Schaal B, Boulanger V, Garcia S, Jiang T (2015) Emotional communication in the context of joint attention for food stimuli: effects on attentional and affective processing. Biol Psychol 104:173–183. https://doi.org/10.1016/j.biopsycho.2014.12.006

    Article  Google Scholar 

  30. Mojzisch A, Schilbach L, Helmert JR, Pannasch S, Velichkovsky BM, Vogeley K (2006) The effects of self-involvement on attention, arousal, and facial expression during social interaction with virtual others: a psychophysiological study. Soc Neurosci 1(3–4):184–195. https://doi.org/10.1080/17470910600985621

    Article  Google Scholar 

  31. Blom PM, Methors S, Bakkes S, Spronck P (2019) Modeling and adjusting in-game difficulty based on facial expression analysis. Entertain Comput 31:100307. https://doi.org/10.1016/j.entcom.2019.100307

    Article  Google Scholar 

  32. Mistry K, Jasekar J, Issac B, Zhang L (2018) Extended LBP based facial expression recognition system for adaptive AI agent behaviour. In: International joint conference on neural networks (IJCNN), pp 1–7

  33. Serengil SI (2022) TensorFlow 101: introduction to deep learning for python within TensorFlow. https://www.github.com/serengil/tensorflow-101. Accessed 12 Dec 2022

  34. Yan H (2016) Transfer subspace learning for cross-dataset facial expression recognition. Neurocomputing 208:165–173. https://doi.org/10.1016/j.neucom.2015.11.113

    Article  Google Scholar 

  35. Dubey AK, Jain V (2020) Automatic facial recognition using VGG16 based transfer learning model. J Inf Optim Sci 41:1589–1596. https://doi.org/10.1080/02522667.2020.1809126

    Article  Google Scholar 

  36. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556

  37. Ji S, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI201259

    Article  Google Scholar 

  38. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223

  39. Jeong M, Ko BC (2018) Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12):4270. https://doi.org/10.3390/s18124270

    Article  Google Scholar 

  40. Wu C-H, Lin J-C, Wei W-L (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans Signal Inf Process 3:e12. https://doi.org/10.1017/ATSIP201411

  41. Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5216–5225. https://doi.org/10.1109/CVPR.2017.554

  42. Xu Q, Yang Y, Tan Q, Zhang L (2017) Facial expressions in context: electrophysiological correlates of the emotional congruency of facial expressions and background scenes. Front Psychol 8:2175. https://doi.org/10.3389/fpsyg.2017.02175

    Article  Google Scholar 

  43. Cha HS, Im CH (2022) Performance enhancement of facial electromyogram-based facial-expression recognition for social virtual reality applications using linear discriminant analysis adaptation. Virtual Real 26(1):385–398. https://doi.org/10.1007/s10055-021-00575-6

    Article  Google Scholar 

  44. Citron FM, Gray MA, Critchley HD, Weekes BS, Ferstl EC (2014) Emotional valence and arousal affect reading in an interactive way: neuroimaging evidence for an approach-withdrawal framework. Neuropsychologia 56:79–89. https://doi.org/10.1016/jneuropsychologia201401002

    Article  Google Scholar 

  45. Barrett LF, Russell JA (1999) The structure of current affect: controversies and emerging consensus. Curr Direct Psychol Sci 8(1):10–14. https://doi.org/10.1111/1467-8721.00003

    Article  Google Scholar 

  46. Lang PJ, Bradley MM, Cuthbert BN (1997) Motivated attention: affect, activation, and action. Atten Orienting Sens Motivational Processes 97:135

    Google Scholar 

  47. Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145. https://doi.org/10.1037/0033-295x.110.1.145

    Article  Google Scholar 

  48. Tottenham N, Tanaka JW, Leon AC et al (2009) The NimStim set of facial expressions: judgments from untrained research participants. Psychiatry Res 68(3):242–249. https://doi.org/10.1016/jpsychres200805006

    Article  Google Scholar 

  49. Biehl MC, Matsumoto D, Ekman P, Hearn V, Heider KG, Kudoh T, Ton V (1997) Matsumoto and Ekman’s Japanese and Caucasian Facial Expressions of Emotion (JACFEE): reliability data and cross-national differences. J Nonverbal Behav 21:3–21. https://doi.org/10.1023/A:1024902500935

    Article  Google Scholar 

  50. Holzinger AT, Müller H (2021) Toward human–AI interfaces to support explainability and causability in medical AI. Computer 54(10):78–86. https://doi.org/10.1109/MC.2021.3092610

    Article  Google Scholar 

  51. Thomaz A, Hoffman G, Cakmak M (2016) Computational human–robot interaction. Found Trends Robot 4:104–223. https://doi.org/10.1561/2300000049

    Article  Google Scholar 

  52. Celiktutan O, Skordos S, Gunes H (2019) Multimodal human–human–robot interactions (MHHRI) dataset for studying personality and engagement. IEEE Trans Affect Comput 10(4):484–497. https://doi.org/10.1109/TAFFC.2017.2737019

    Article  Google Scholar 

  53. Oh CS, Bailenson JN, Welch GF (2018) A systematic review of social presence: definition, antecedents, and implications. Front Robot AI 5:114. https://doi.org/10.3389/frobt.2018.00114

    Article  Google Scholar 

  54. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  55. Munoz-De-Escalona E, Cañas J (2017) Online measuring of available resources. In: First international symposium on human mental workload: models and applications. https://doi.org/10.21427/D7DK96

Download references

Funding

This work was supported through funding by a Grant from the National Research Foundation of Korea (NRF Grant# 2021R1G1A1003801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Casey C. Bennett.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Ethical approval

This study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Hanyang University (protocol #HYU2021-138) for studies involving humans. Informed consent was obtained from all subjects involved in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 294 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sánchez, P.C., Bennett, C.C. Facial expression recognition via transfer learning in cooperative game paradigms for enhanced social AI. J Multimodal User Interfaces 17, 187–201 (2023). https://doi.org/10.1007/s12193-023-00410-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-023-00410-z

Keywords

Navigation