Abstract
Existing generative adversarial network-based methods for facial expression synthesis require larger datasets for training. Their performance tends to decrease noticeably when trained on smaller datasets. Moreover, they demand higher computational and spatial complexity at inference, making them unsuitable for resource-constrained devices. To address these limitations, this paper presents a linear formulation to learn Localized and Sparse Receptive Fields (LSRF) for facial expression synthesis considering global face context. In this approach, we extend the sparsity-inducing formulation of the Orthogonal Matching Pursuit (OMP) algorithm by incorporating a locality constraint. This constraint ensures that i) each output pixel observes a localized region and ii) neighboring output pixels attend proximate regions of the input face image. Extensive qualitative as well as quantitative experiments demonstrate that the proposed method generates realistic facial expressions and outperforms existing methods. Further, the proposed method can be trained by employing significantly smaller datasets while exhibiting good generalization capabilities for out-of-distribution images.
Similar content being viewed by others
References
Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, Courville, A, Bengio, Y (2014) Generative Adversarial Nets. In: Advances in neural information processing systems, pp 2672–2680
Mirza, M, Osindero, S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Perarnau, G, van de Weijer, J, Raducanu, B, Álvarez, JM (2016) Invertible conditional GANs for image editing. arXiv:1611.06355
Karras, T, Aila, T, Laine, S, Lehtinen, J (2018) Progressive growing of GANs for improved quality, stability, and variation. In: International conference on learning representations
Chen, Y, Tai, Y, Liu, X, Shen, C, Yang, J (2018) FSRNet: End-to-End learning face super-resolution with facial priors. In: IEEE conference on computer vision and pattern recognition, pp 2492–2501
Chen C, Gong D, Wang H, Li Z, Wong K-YK (2020) Learning spatial attention for face super-resolution. IEEE Trans Image Process 30:1219–1231
Hou H, Xu J, Hou Y, Hu X, Wei B, Shen D (2023) Semi-cycled generative adversarial networks for real-world face super-resolution. IEEE Trans Image Process 32:1184–1199
He Z, Zuo W, Kan M, Shan S, Chen X (2019) AttGAN: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28(11):5464–5478
Liu, M, Ding, Y, Xia, M, Liu, X, Ding, E, Zuo, W, Wen, S (2019) STGAN: a unified selective transfer network for arbitrary image attribute editing. In: IEEE international conference on computer vision, pp 3673–3682
Gao, Y, Wei, F, Bao, J, Gu, S, Chen, D, Wen, F, Lian, Z (2021) High-fidelity and arbitrary face editing. In: IEEE conference on computer vision and pattern recognition, pp 16115–16124
Choi, Y, Choi, M, Kim, M, Ha, J-W, Kim, S, Choo, J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: IEEE conference on computer vision and pattern recognition, pp 8789–8797
Pumarola A, Agudo A, Martinez AM, Sanfeliu A, Moreno-Noguer F (2020) GANimation: one-shot anatomically consistent facial animation. Int J Comput Vis 128(3):698–713
Wu, R, Zhang, G, Lu, S, Chen, T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: IEEE conference on computer vision and pattern recognition, pp 5021–5030
Akram, A, Khan, N (2023) US-GAN: on the importance of ultimate skip connection for facial expression synthesis. Multimedia Tools and Applications
Akram, A, Khan, N (2023) SARGAN: Spatial attention-based residuals for facial expression manipulation. IEEE TCSVT
Khan N, Akram A, Mahmood A, Ashraf S, Murtaza K (2020) Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis. International Journal of Computer Vision 128(5):1433–1454
Akram, A, Khan, N (2021) Pixel-based facial expression synthesis. In: International conference on pattern recognition, pp 9733–9739. IEEE
Pati, YC, Rezaiifar, R, Krishnaprasad, PS (1993) Orthogonal matching pursuit: recursive function approximat ion with applications to wavelet decomposition. In: Proceedings of 27th asilomar conference on signals, systems and computers, pp 40–44. IEEE
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666
Karras, T, Laine, S, Aila, T (2019) A style-based generator architecture for generative adversarial networks. In: IEEE conference on computer vision and pattern recognition, pp 4401–4410
Isola, P, Zhu, J-Y, Zhou, T, Efros, AA (2017) Image-to-image translation with conditional adversarial networks. In: IEEE conference on computer vision and pattern recognition, pp 1125–1134
Zhang, Z, Song, Y, Qi, H (2017) Age progression/regression by conditional adversarial autoencoder. In: IEEE conference on computer vision and pattern recognition, pp 5810–5818
Lee H-Y, Tseng H-Y, Mao Q, Huang J-B, Lu Y-D, Singh M, Yang M-H (2020) DRIT++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417
Nirkin Y, Keller Y, Hassner T (2022) FSGANv2: improved subject agnostic face swapping and reenactment. IEEE Transactions on Pattern Analysis & Machine Intelligence 45(1):560–575
Tang H, Sebe N (2022) Facial expression translation using landmark guided GANs. IEEE Trans Affect Comput 13(4):1986–1997
Shen, W, Liu, R (2017) Learning residual images for face attribute manipulation. In: IEEE conference on computer vision and pattern recognition, pp 4030–4038
Chen, Y-C, Xu, X, Jia, J (2020) Domain Adaptive Image-to-image Translation. In: IEEE conference on computer vision and pattern recognition, pp 5274–5283
Arjovsky, M, Chintala, S, Bottou, L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, pp 214–223
Gulrajani, I, Ahmed, F, Arjovsky, M, Dumoulin, V, Courville, A (2017) Improved training of wasserstein GANs. arXiv:1704.00028
Zhang, H, Xu, T, Li, H, Zhang, S, Huang, X, Wang, X, Metaxas, D (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE international conference on computer vision, pp 5907–5915
Patashnik, O, Wu, Z, Shechtman, E, Cohen-Or, D, Lischinski, D (2021) StyleCLIP: text-driven manipulation of stylegan imagery. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2085–2094
Song, L, Lu, Z, He, R, Sun, Z, Tan, T (2018) Geometry guided adversarial facial expression synthesis. In: Proceedings of the 26th ACM international conference on multimedia, pp 627–635
Qiao, F, Yao, N, Jiao, Z, Li, Z, Chen, H, Wang, H (2018) Geometry-contrastive generative adversarial network for facial expression synthesis. arXiv:1802.01822
Ding, H, Sricharan, K, Chellappa, R (2018) ExprGAN: facial expression editing with controllable expression intensity. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Ling, J, Xue, H, Song, L, Yang, S, Xie, R, Gu, X (2020) Toward fine-grained facial expression manipulation. In: IEEE international conference on computer vision, pp 37–53. Springer
Ronneberger, O, Fischer, P, Brox, T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer
Xia Y, Zheng W, Wang Y, Yu H, Dong J, Wang F-Y (2021) Local and global perception generative adversarial network for facial expression synthesis. IEEE TCSVT 32(3):1443–1452
d’Apolito, S, Paudel, DP, Huang, Z, Romero, A, Van Gool, L (2021) GANmut: learning interpretable conditional space for gamut of emotions. In: IEEE conference on computer vision and pattern recognition, pp 568–577
Fabian Benitez-Quiroz, C, Srinivasan, R, Martinez, AM (2016) EmotioNet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: IEEE conference on computer vision and pattern recognition, pp 5562–5570
Megvii Inc: Face++ (2019). https://www.faceplusplus.com/
He, K, Zhang, X, Ren, S, Sun, J (2016) Deep Residual Learning for Image Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Du S, Tao Y, Martinez AM (2014) Compound facial expressions of emotion. Proceedings of the National Academy of Sciences 111(15):1454–1462
Chen, C, Li, X, Yang, L, Lin, X, Zhang, L, Wong, K-YK (2021) Progressive semantic-aware style transformation for blind face restoration. In: IEEE conference on computer vision and pattern recognition, pp 11896–11905
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akram, A., Khan, N. LSRF: localized and sparse receptive fields for linear facial expression synthesis based on global face context. Multimed Tools Appl 83, 31341–31360 (2024). https://doi.org/10.1007/s11042-023-16822-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16822-8