Skip to main content
Log in

Synthesizing facial expressions in dyadic human–robot interaction

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Generative artificial intelligence (GenAI) can be used to create facial expressions of artificial human characters in real time based on the training dataset. However, the bottleneck that prevents natural dyadic interaction between an artificial character and a human lies in the GenAI’s limited capability to recognize dynamically changing contexts. To tackle this issue, we investigated how deep learning (DL) techniques could synthesize facial reaction emotions based on a sequence of the previous emotions. We applied action units from the facial action coding system to manipulate facial points of an artificial character inside unreal engine 4 using the OpenFace API. First the artificial character’s facial behavior was programmed to mimic human facial expressions on screen. For adequate reaction emotions, we then trained an autoencoder with a long short-term memory model to have a DL model. To validate the performance of our trained model, we compared our results on reaction expressions with our test dataset by using average root-mean-square error. Furthermore, sixteen test participants reported the apparent naturalness of the character’s reactions to the dynamic human expressions. Our findings are promising steps in developing facial reaction emotion synthesis into a dynamic system that can adapt to the user’s specific needs and context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and materials

Not applicable.

No datasets were generated or analyzed during the current study.

References

  1. Mota, B.E.F., Rodrigues, P.O., Lacerda, K.C.D., David, I.A., Volchan, E., Campagnoli, R.R., Souza, G.G.L.: Pictures of social interaction prompt a sustained increase of the smile expression and induce sociability. Sci. Rep. 11(1), 5518 (2021)

    Article  Google Scholar 

  2. Schindler, S., Bublatzky, F.: Attention and emotion: an integrative review of emotional face processing as a function of attention. Cortex 130, 362–386 (2020)

    Article  Google Scholar 

  3. Cowen, A., Sauter, D., Tracy, J.L., Keltner, D.: Mapping the passions: toward a high-dimensional taxonomy of emotional experience and expression. Psychol. Sci. Public Interest 20(1), 69–90 (2019)

    Article  Google Scholar 

  4. Holland, A.C., O’Connell, G., Dziobek, I.: Facial mimicry, empathy, and emotion recognition: a meta-analysis of correlations. Cogn. Emot. 35(1), 150–168 (2021)

    Article  Google Scholar 

  5. Volynets, S., Smirnov, D., Saarimäki, H., Nummenmaa, L.: Statistical pattern recognition reveals shared neural signatures for displaying and recognizing specific facial expressions. Soc. Cognitive Affect. Neurosci. 15(8), 803–813 (2020)

    Article  Google Scholar 

  6. Barrett, L.F., Adolphs, R., Marsella, S., Martinez, A.M., Pollak, S.D.: Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol. Sci. Public Interest 20(1), 1–68 (2019)

    Article  Google Scholar 

  7. Gouizi, K., Bereksi Reguig, F., Maaoui, C.: Emotion recognition from physiological signals. J. Med. Eng. Technol. 35(6–7), 300–307 (2011)

    Article  Google Scholar 

  8. Egger, M., Ley, M., Hanke, S.: Emotion recognition from physiological signal analysis: a review. Electron. Notes Theoretical Comput. Sci. 343, 35–55 (2019)

    Article  Google Scholar 

  9. Seibt, J.: “Towards an ontology of simulated social interaction: varieties of the “as if” for robots and humans,” In: Sociality and normativity for robots: Philosophical inquiries into human-robot interactions. Springer, pp. 11–39 (2017)

  10. Malinowska, J.K.: What does it mean to empathise with a robot? Mind. Mach. 31(3), 361–376 (2021)

    Article  Google Scholar 

  11. Fox, J., Gambino, A.: Relationship development with humanoid social robots: applying interpersonal theories to human-robot interaction. Cyberpsychol. Behav. Soc. Netw. 24(5), 294–299 (2021)

    Article  Google Scholar 

  12. Park, S., Whang, M.: Empathy in human-robot interaction: designing for social robots. Int. J. Environ. Res. Public Health 19(3), 1889 (2022)

    Article  Google Scholar 

  13. Damiano, L., Dumouchel, P.: Anthropomorphism in human-robot co-evolution. Front. Psychol. 9, 468 (2018)

    Article  Google Scholar 

  14. Lee, J.R., Wang, L., Wong, A.: Emotionnet nano: an efficient deep convolutional neural network design for real-time facial expression recognition. Front. Artif. Intell. 3, 609673 (2021)

    Article  Google Scholar 

  15. Wang, M.-Y., Luan, P., Zhang, J., Xiang, Y.-T., Niu, H., Yuan, Z.: Concurrent mapping of brain activation from multiple subjects during social interaction by hyperscanning: a mini-review. Quant. Imaging Med. Surg. 8(8), 819 (2018)

    Article  Google Scholar 

  16. Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Netw. 18(4), 389–405 (2005)

    Article  Google Scholar 

  17. Toichoa Eyam, A., Mohammed, W.M., Martinez Lastra, J.L.: Emotion-driven analysis and control of human-robot interactions in collaborative applications. Sensors 21(14), 4626 (2021)

    Article  Google Scholar 

  18. Spezialetti, M., Placidi, G., Rossi, S.: Emotion recognition for human-robot interaction: Recent advances and future perspectives. Front. Robotics AI, p. 145, (2020)

  19. Duthoit, C. J., Sztynda, T., Lal, S. K., Jap, B. T., Agbinya, J. I.: Optical flow image analysis of facial expressions of human emotion: Forensic applications. In: Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop, pp. 1–6 (2008)

  20. Popa, M., Rothkrantz, L., Wiggers, P.: Products appreciation by facial expressions analysis. In: Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies, pp. 293–298 (2010)

  21. Liu, X., Zhang, L., Yadegar, J.: A multi-modal emotion recognition system for persistent and non-invasive personal health monitoring. In: Proceedings of the 2nd Conference on Wireless Health, pp. 1–2 (2011)

  22. Metri, P., Ghorpade, J.: Facial emotion recognition using context based multimodal approach. Int. J. Emerging Sci. 2(1), 171 (2012)

    Google Scholar 

  23. Soladié, C., Salam, H., Pelachaud, C., Stoiber, N., Séguier, R.: A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection. In: Proceedings of the 14th ACM international conference on Multimodal interaction, pp. 493–500 (2012)

  24. Terzis, V., Moridis, C.N., Economides, A.A.: Measuring instant emotions based on facial expressions during computer-based assessment. Pers. Ubiquit. Comput. 17, 43–52 (2013)

    Article  Google Scholar 

  25. Zen, G., Sangineto, E., Ricci, E., Sebe, N.: Unsupervised domain adaptation for personalized facial emotion recognition. In: Proceedings of the 16th international conference on multimodal interaction, pp. 128–135 (2014)

  26. Guo, J., Lei, Z., Wan, J., Avots, E., Hajarolasvadi, N., Knyazev, B., Kuharenko, A., Junior, J.C.S.J., Baró, X., Demirel, H., et al.: Dominant and complementary emotion recognition from still images of faces. IEEE Access 6, 26391–26403 (2018)

    Article  Google Scholar 

  27. Slimani, K., Kas, M., El Merabet, Y., Messoussi, R., Ruichek, Y.: Facial emotion recognition: A comparative analysis using 22 lbp variants. In: Proceedings of the 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence, pp. 88–94 (2018)

  28. Slimani, K., Lekdioui, K., Messoussi, R., Touahni, R.: Compound facial expression recognition based on highway cnn. In: Proceedings of the new challenges in data sciences: acts of the second conference of the Moroccan Classification Society, pp. 1–7 (2019)

  29. Zook, A.: “Game agi beyond characters,” Artificial Intelligence: Concepts, Methodologies, Tools, and Applications, pp. 463–484, (2017)

  30. Everitt, T., Lea, G., Hutter, M.: Agi safety literature review. arXiv preprint arXiv:1805.01109, (2018)

  31. Hartholt, A., Traum, D., Marsella, S. C., Shapiro, A., Stratou, G., Leuski, A., Morency, L.-P., Gratch, J.: All together now: Introducing the virtual human toolkit. In: Intelligent Virtual Agents: 13th International Conference, IVA: Edinburgh, UK, August 29–31, 2013. Proceedings 13. Springer 2013, 368–381 (2013)

  32. Zhou, W., Peng, X., Riedl, M.: Dialogue shaping: Empowering agents through npc interaction. arXiv preprint arXiv:2307.15833, (2023)

  33. Gong, R., Huang, Q., Ma, X., Vo, H., Durante, Z., Noda, Y., Zheng, Z., Zhu, S.-C., Terzopoulos, D., Fei-Fei, L. et al.: Mindagent: Emergent gaming interaction. arXiv preprint arXiv:2309.09971, (2023)

  34. Csepregi, L. M.: “The effect of context-aware llm-based npc conversations on player engagement in role-playing video games,” AALBORG UNIVERSITY,Student assignment : Master’s thesis and HD graduation project, (2023)

  35. Lee, J., Marsella, S.: Nonverbal behavior generator for embodied conversational agents. In: International Workshop on Intelligent Virtual Agents. Springer, pp. 243–255 (2006)

  36. Vilhjálmsson, H., Cantelmo, N., Cassell, J., Chafai, N. E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A. N., Pelachaud, C. et al.: The behavior markup language: Recent developments and challenges. In: Intelligent Virtual Agents: 7th International Conference, IVA 2007 Paris, France, September 17-19, 2007 Proceedings 7. Springer, pp. 99–111 (2007)

  37. Hartholt, A., Fast, E., Li, Z., Kim, K., Leeds, A., Mozgai, S.:“Re-architecting the virtual human toolkit: towards an interoperable platform for embodied conversational agent research and development,” In: Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents, pp. 1–8 (2022)

  38. Cerekovic, A., Aran, O., Gatica-Perez, D.: Rapport with virtual agents: What do human social cues and personality explain? IEEE Trans. Affect. Comput. 8(3), 382–395 (2016)

    Article  Google Scholar 

  39. Loveys, K., Sebaratnam, G., Sagar, M., Broadbent, E.: The effect of design features on relationship quality with embodied conversational agents: a systematic review. Int. J. Soc. Robot. 12(6), 1293–1312 (2020)

    Article  Google Scholar 

  40. Ter Stal, S., Kramer, L.L., Tabak, M., Akker, H.. op den, Hermens, H..: Design features of embodied conversational agents in ehealth: a literature review. Int. J. Hum Comput Stud. 138, 102409 (2020)

    Article  Google Scholar 

  41. Tellols, D., Lopez-Sanchez, M., Rodríguez, I., Almajano, P., Puig, A.: Enhancing sentient embodied conversational agents with machine learning. Pattern Recogn. Lett. 129, 317–323 (2020)

    Article  Google Scholar 

  42. Liu, C., Li, X.: Superimposition-guided facial reconstruction from skull. arXiv preprint arXiv:1810.00107, (2018)

  43. Lattas, A., Moschoglou, S., Gecer, B., Ploumpis, S., Triantafyllou, V., Ghosh, A., Zafeiriou, S.: Avatarme: Realistically renderable 3d facial reconstruction" in-the-wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 760–769 (2020)

  44. Taherkhani, F., Rai, A., Gao, Q., Srivastava, S., Chen, X., de la Torre, F., Song, S., Prakash, A., Kim, D.: Controllable 3d generative adversarial face model via disentangling shape and appearance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 826–836 (2023)

  45. Otberdout, N., Ferrari, C., Daoudi, M., Berretti, S., Del Bimbo, A.: Sparse to dense dynamic 3d facial expression generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20 385–20 394 (2022)

  46. Tikka, P., Bastamow, T., Gerolin, I., Gerry, L. J., Pardinho, V. P., Kaipainen, M., Kosunen, I.: The state of darkness: non-human narratives embedded in the encounters with artificial agents. In: Non-Human Narratives. CARNEGIE MELLON UNIVERSITY, p. 9 (2020)

  47. Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J. et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, (2019)

  48. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  49. Lang, S., Murrow, G.: The Distance Formula. New York, NY: Springer New York, pp. 110–122. [Online]. Available: https://doi.org/10.1007/978-1-4757-2022-8_4(1988)

  50. Hammer, P.C.: The midpoint method of numerical integration. Math. Mag. 31(4), 193–195 (1958)

    Article  MathSciNet  Google Scholar 

  51. Mar 1964. [Online]. Available: https://gis.stackexchange.com/questions/256738/bearing-of-two-close-lines-in-python

  52. OpenCV, Perspective-n-point (pnp) pose computation. [Online]. Available: https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html

  53. Ekman, P., Friesen, W. V.: Facial action coding system. Environmental Psychology & Nonverbal Behavior, (1978)

  54. Baltrušaitis, T., Robinson, P., Morency, L.-P.: Openface: an open source facial behavior analysis toolkit. In: IEEE winter conference on applications of computer vision (WACV). IEEE 2016, 1–10 (2016)

  55. Sham, A.H., Khan, A., Lamas, D., Tikka, P., Anbarjafari, G.: Towards context-aware facial emotion reaction database for dyadic interaction settings. Sensors 23(1), 458 (2023)

    Article  Google Scholar 

  56. Sham, A.H., Tikka, P., Lamas, D., Anbarjafari, G.: Automatic reaction emotion estimation in a human-human dyadic setting using deep neural networks. SIViP 17(2), 527–534 (2023)

    Article  Google Scholar 

  57. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., Polosukhin, I.: “Attention is all you need,” In: Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  58. Team, K.: Keras documentation: Timedistributed layer. [Online]. Available: https://keras.io/api/layers/recurrent_layers/time_distributed/

  59. Kenney, J.F., Keeping, E.: Root mean square. Math. Stat. 1, 59–60 (1962)

    Google Scholar 

  60. Reaction facial expression synthesis on an avatar from one person’s facial expressions. Jul 2023. [Online]. Available: https://youtu.be/bVLZlRqlY7k

  61. Facial mimicry on an avatar from one person’s facial expressions using openface api. Jul 2023. [Online]. Available: https://youtu.be/H6h_JpLNtyA

  62. Facial mimicry using google’s mediapipe. Jul 2023. [Online]. Available: https://youtu.be/9i1Ntrg5E28

  63. Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph. 36(6), 194–1 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

The work has been supported by the EU Mobilitas Pluss Grant (MOBTT90) and the grant for “Context-embedded social characters for virtual simulation - an enactive media approach to human-robot co-learning” (TF/1522) of Dr. Pia Tikka, Tallinn University.

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

Sham and Anbarjafari were involved in conceptualization, formal analysis, methodology, software, validation, visualization, writing—original draft, and writing—reviewing and editing. Tikka and Lamas were responsible for conceptualization, formal analysis, validation, writing—original draft, and writing—reviewing and editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Abdallah Hussein Sham.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Ethical approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sham, A.H., Tikka, P., Lamas, D. et al. Synthesizing facial expressions in dyadic human–robot interaction. SIViP (2024). https://doi.org/10.1007/s11760-024-03202-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03202-4

Keywords

Navigation