Photorealistic Facial Synthesis in the Dimensional Affect Space
Abstract
This paper presents a novel approach for synthesizing facial affect, which is based on our annotating 600,000 frames of the 4DFAB database in terms of valence and arousal. The input of this approach is a pair of these emotional state descriptors and a neutral 2D image of a person to whom the corresponding affect will be synthesized. Given this target pair, a set of 3D facial meshes is selected, which is used to build a blendshape model and generate the new facial affect. To synthesize the affect on the 2D neutral image, 3DMM fitting is performed and the reconstructed face is deformed to generate the target facial expressions. Last, the new face is rendered into the original image. Both qualitative and quantitative experimental studies illustrate the generation of realistic images, when the neutral image is sampled from a variety of well known databases, such as the Aff-Wild, AFEW, Multi-PIE, AFEW-VA, BU-3DFE, Bosphorus.
Keywords
Dimensional facial affect synthesis Valence Arousal Discretization Blendshape models 3DMM fitting 4DFAB Aff-Wild AFEW AFEW-VA Multi-PIE BU-3DFE Bosphorus Deep neural networksNotes
Acknowledgements
The authors would also like to thank Evangelos Ververas for assisting with the affect synthesis. The work of Dimitris Kollias was funded by a Teaching Fellowship of Imperial College London. The work of S. Cheng is funded by the EPSRC project EP/J017787/1 (4D-FAB) and EP/N007743/1 (FACER2VM).
References
- 1.Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization withsparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012). https://doi.org/10.1561/2200000015CrossRefzbMATHGoogle Scholar
- 2.Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process.-Lett. Rev. 11(10), 203–224 (2007)Google Scholar
- 3.Blanz, V., Basso, C., Poggio, T., Vetter, T.: Reanimating faces in images and video. In: Computer Graphics Forum, vol. 22, pp. 641–650. Wiley Online Library (2003)Google Scholar
- 4.Booth, J., Antonakos, E., Ploumpis, S., Trigeorgis, G., Panagakis, Y., Zafeiriou, S.: 3D face morphable models “in-the-wild”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. https://arxiv.org/abs/1701.05360
- 5.Booth, J., Roussos, A., Ponniah, A., Dunaway, D., Zafeiriou, S.: Large scale 3D morphable models. Int. J. Comput. Vis. 126(2–4), 233–254 (2018)MathSciNetCrossRefGoogle Scholar
- 6.Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)Google Scholar
- 7.Chang, W.Y., Hsu, S.H., Chien, J.H.: Fatauva-net: an integrated deep learning framework for facial attribute recognition, action unit (au) detection, and valence-arousal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (2017)Google Scholar
- 8.Cheng, S., Kotsia, I., Pantic, M., Zafeiriou, S.: 4DFAB: a large scale 4D database for facial expression analysis and biometric applications. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, Utah, US, June 2018Google Scholar
- 9.Dhall, A., Goecke, R., Ghosh, S., Joshi, J., Hoey, J., Gedeon, T.: From individual to group-level emotion recognition: Emotiw 5.0. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 524–528. ACM (2017)Google Scholar
- 10.Ding, H., Sricharan, K., Chellappa, R.: Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842 (2017)
- 11.Du, S., Tao, Y., Martinez, A.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. U.S.A. 111(15), 1454–1462 (2014). https://doi.org/10.1073/pnas.1322355111CrossRefGoogle Scholar
- 12.Ghodrati, A., Jia, X., Pedersoli, M., Tuytelaars, T.: Towards automatic image editing: Learning to see another you. arXiv preprint arXiv:1511.08446 (2015)
- 13.Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
- 14.Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)CrossRefGoogle Scholar
- 15.Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis; an overview with application to learning methods. Technical report, Royal Holloway, University of London, May 2003. http://eprints.soton.ac.uk/259225/
- 16.Huang, R., Zhang, S., Li, T., He, R., et al.: Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. arXiv preprint arXiv:1704.04086 (2017)
- 17.Jonze, S., Cusack, J., Diaz, C., Keener, C., Kaufman, C.: Being John Malkovich. Universal Studios (1999)Google Scholar
- 18.Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1972–1979. IEEE (2017)Google Scholar
- 19.Kollias, D., Tagaris, A., Stafylopatis, A.: On line emotion detection using retrainable deep neural networks. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2016)Google Scholar
- 20.Kollias, D., et al.: Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. arXiv preprint arXiv:1804.10938 (2018)
- 21.Kollias, D., Yu, M., Tagaris, A., Leontidis, G., Stafylopatis, A., Kollias, S.: Adaptation and contextualization of deep neural network models. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2017)Google Scholar
- 22.Kollias, D., Zafeiriou, S.: Training deep neural networks with different datasets in-the-wild: the emotion recognition paradigm. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)Google Scholar
- 23.Kossaifi, J., Tzimiropoulos, G., Todorovic, S., Pantic, M.: AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis. Comput. 65, 23–36 (2017)CrossRefGoogle Scholar
- 24.Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015)
- 25.Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015)
- 26.Mohammed, U., Prince, S.J., Kautz, J.: Visio-lization: generating novel facial images. ACM Trans. Graph. (TOG) 28(3), 57 (2009)CrossRefGoogle Scholar
- 27.Neumann, T., Varanasi, K., Wenger, S., Wacker, M., Magnor, M., Theobalt, C.: Sparse localized deformation components. ACM Trans. Graph. (TOG) 32(6), 179 (2013)CrossRefGoogle Scholar
- 28.Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)Google Scholar
- 29.Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, SIGGRAPH 2003, pp. 313–318. ACM, New York (2003). https://doi.org/10.1145/1201775.882269
- 30.Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., Salesin, D.H.: Synthesizing realistic facial expressions from photographs. In: ACM SIGGRAPH 2006 Courses, p. 19. ACM (2006)Google Scholar
- 31.Russell, J.A.: Evidence of convergent validity on the dimensions of affect. J. Pers. Soc. Psychol. 36(10), 1152 (1978)CrossRefGoogle Scholar
- 32.Savran, A., et al.: Bosphorus database for 3D face analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) BioID 2008. LNCS, vol. 5372, pp. 47–56. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89991-4_6CrossRefGoogle Scholar
- 33.Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)Google Scholar
- 34.Susskind, J.M., Hinton, G.E., Movellan, J.R., Anderson, A.K.: Generating facial expressions with deep belief nets. In: Affective Computing. InTech (2008)Google Scholar
- 35.Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
- 36.Theobald, B.J., et al.: Mapping and manipulating facial expression. Lang. Speech 52(2–3), 369–386 (2009)CrossRefGoogle Scholar
- 37.Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34(6), 183–1 (2015)CrossRefGoogle Scholar
- 38.Thomaz, C.E., Giraldi, G.A.: A new ranking method for principal components analysis and its application to face image analysis. Image Vis. Comput. 28(6), 902–913 (2010)CrossRefGoogle Scholar
- 39.Whissell, C.M.: The dictionary of affect in language. In: The Measurement of Emotions, pp. 113–131. Elsevier (1989)Google Scholar
- 40.Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Sig. Process. 57(7), 2479–2493 (2009). https://doi.org/10.1109/TSP.2009.2016892MathSciNetCrossRefzbMATHGoogle Scholar
- 41.Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2Image: conditional image generation from visual attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 776–791. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_47CrossRefGoogle Scholar
- 42.Yang, F., Bourdev, L., Shechtman, E., Wang, J., Metaxas, D.: Facial expression editing in video using a temporally-smooth factorization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 861–868. IEEE (2012)Google Scholar
- 43.Yang, F., Wang, J., Shechtman, E., Bourdev, L., Metaxas, D.: Expression flow for 3D-aware face component transfer. ACM Trans. Graph. (TOG) 30(4), 60 (2011)CrossRefGoogle Scholar
- 44.Yeh, R., Liu, Z., Goldman, D.B., Agarwala, A.: Semantic facial expression editing using autoencoded flow. arXiv preprint arXiv:1611.09961 (2016)
- 45.Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic Face and Gesture Recognition, FGR 2006, pp. 211–216. IEEE (2006)Google Scholar
- 46.Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: Valence and arousal ‘in-the-wild’ challenge. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1980–1987. IEEE (2017)Google Scholar
- 47.Zhang, Q., Liu, Z., Quo, G., Terzopoulos, D., Shum, H.Y.: Geometry-driven photorealistic facial expression synthesis. IEEE Trans. Vis. Comput. Graph. 12(1), 48–60 (2006)CrossRefGoogle Scholar
- 48.Zhang, S., Wu, Z., Meng, H.M., Cai, L.: Facial expression synthesis based on emotion dimensions for affective talking avatar. In: Nishida, T., Jain, L.C., Faucher, C. (eds.) Modeling Machine Emotions for Realizing Intelligence, pp. 109–132. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 49.Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar