Skip to main content
Log in

Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Advanced face swapping approaches have achieved high-fidelity results. However, the success of most methods hinges on heavy parameters and high-computational costs. With the popularity of real-time face swapping, these factors have become obstacles restricting their swap speed and application. To overcome these challenges, we propose a high-fidelity lightweight generator (HFLG) for face swapping, which is a compressed version of the existing network Simple Swap and consists of its 1/4 channels. Moreover, to stabilize the learning of HFLG, we introduce feature map-based online knowledge distillation into our training process and improve the teacher–student architecture. Specifically, we first enhance our teacher generator to provide more efficient guidance. It minimizes the loss of details on the lower face. In addition, a new identity-irrelevant similarity loss is proposed to improve the preservation of non-facial regions in the teacher generator results. Furthermore, HFLG uses an extended identity injection module to inject identity more efficiently. It gradually learns face swapping by imitating the feature maps and outputs of the teacher generator online. Extensive experiments on faces in the wild demonstrate that our method achieves comparable results with other methods while having fewer parameters, lower computations, and faster inference speed. The code is available at https://github.com/EifelTing/HFLFS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 1
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The data are available from the corresponding author on reasonable request. Three datasets are used in our experiments, including VGGFace2, FaceForensics++, and CelebA-HQ. The VGGFace2 dataset is selected from https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/, the FaceForensics++ dataset is selected from https://github.com/ondyari/FaceForensics, and the CelebA-HQ dataset is selected from https://github.com/switchablenorms/CelebAMask-HQ.

References

  1. Nirkin Y., Keller Y., Hassner T.: Fsgan: Subject agnostic face swapping and reenactment. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 7184–7193 (2019). https://doi.org/10.1109/ICCV.2019.00728

  2. Li L., Bao J., Yang H., Chen D., Wen F.: Advancing high fidelity identity swapping for forgery detection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5074–5083 (2020). https://doi.org/10.1109/CVPR42600.2020.00512

  3. Xu Z., Yu X., Hong Z., Zhu Z., Han J., Liu J., Ding E., Bai X.: Facecontroller: Controllable attribute editing for face in the wild. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 3083–3091 (2021). https://doi.org/10.1609/aaai.v35i4.16417

  4. Chen R., Chen X., Ni B., Ge Y.: Simswap: An efficient framework for high fidelity face swapping. In: proceedings of the 28th ACM international conference on multimedia, pp. 2003–2011 (2020). https://doi.org/10.1145/3394171.3413630

  5. Xu Z., Hong Z., Ding C., Zhu Z., Han J., Liu J., Ding E.: Mobilefaceswap: A lightweight framework for video face swapping. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2973–2981 (2022). https://doi.org/10.1609/aaai.v36i3.20203

  6. Brabandere, B.D., Jia, X., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. Adv. Neural. Inf. Process. Syst. 29, 667–675 (2016)

    Google Scholar 

  7. Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T.: Analyzing and improving the image quality of stylegan. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8110–8119 (2020). https://doi.org/10.1109/CVPR42600.2020.00813

  8. Hinton G., Vinyals O., Dean J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:150302531. (2015)

  9. Ren Y., Wu J., Xiao X., Yang J.: Online multi-granularity distillation for gan compression. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 6793–6803 (2021). https://doi.org/10.1109/ICCV48922.2021.00672

  10. Hu T., Lin M., You L., Chao F., Ji R.: Discriminator-cooperated feature map distillation for GAN compression. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 20351–20360 (2023). https://doi.org/10.1109/CVPR52729.2023.01949

  11. Yuan G., Li M., Zhang Y., Zheng H.: ReliableSwap: boosting general face swapping via reliable supervision. arXiv preprint arXiv:230605356. (2023)

  12. Deng J., Guo J., Xue N., Zafeiriou S.: Arcface: Additive angular margin loss for deep face recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4690–4699 (2019). https://doi.org/10.1109/TPAMI.2021.3087709

  13. Blanz V., Vetter T.: A morphable model for the synthesis of 3D faces. In: proceedings of the 26th annual conference on computer graphics and interactive techniques, (1999)

  14. Nirkin Y., Masi I., Tuan A.T., Hassner T., Medioni G.: On face segmentation, face swapping, and face perception. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 98–105 (2018). https://doi.org/10.1109/FG.2018.00024

  15. Wang Y., Chen X., Zhu J., Chu W., Tai Y., Wang C., Li J., Wu Y., Huang F., Ji R.: HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In: proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI), pp. 1136–1142 (2021). https://doi.org/10.24963/ijcai.2021/157

  16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)

    Google Scholar 

  17. Nirkin, Y., Keller, Y., Hassner, T.: FSGANv2: Improved subject agnostic face swapping and reenactment. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 560–575 (2022). https://doi.org/10.1109/TPAMI.2022.3155571

    Article  Google Scholar 

  18. Huang X., Belongie S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: proceedings of the IEEE international conference on computer vision (ICCV), pp. 1501–1510 (2017). https://doi.org/10.1109/ICCV.2017.167

  19. Zhu Y., Li Q., Wang J., Xu C.-Z., Sun Z.: One shot face swapping on megapixels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4834–4844 (2021). https://doi.org/10.1109/CVPR46437.2021.00480

  20. Xu Y., Deng B., Wang J., Jing Y., Pan J., He S.: High-resolution face swapping via latent semantics disentanglement. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 7642–7651 (2022). https://doi.org/10.1109/CVPR52688.2022.00749

  21. Chen P., Liu S., Zhao H., Jia J.: Distilling knowledge via knowledge review. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5008–5017 (2021). https://doi.org/10.1109/CVPR46437.2021.00497

  22. Heo B., Kim J., Yun S., Park H., Kwak N., Choi J.Y.: A comprehensive overhaul of feature distillation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1921–1930 (2019). https://doi.org/10.1109/ICCV.2019.00201

  23. Li M., Lin J., Ding Y., Liu Z., Zhu J.-Y., Han S.: Gan compression: Efficient architectures for interactive conditional gans. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5284–5294 (2020). https://doi.org/10.1109/tpami.2021.3126742

  24. Zhang L., Chen X., Tu X., Wan P., Xu N., Ma K.: Wavelet knowledge distillation: Towards efficient image-to-image translation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 12464–12474 (2022). https://doi.org/10.1109/CVPR52688.2022.01214

  25. Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: proceedings of the international conference on machine learning (ICML), pp. 448–456 (2015)

  26. He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  27. Wang T.-C., Liu M.-Y., Zhu J.-Y., Tao A., Kautz J., Catanzaro B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917

  28. Guo Y., Zhang L., Hu Y., He X., Gao J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: proceedings of the European conference on computer vision (ECCV), pp. 87–102 (2016). https://doi.org/10.1007/978-3-319-46487-9_6

  29. Andrew Brock J.D., Karen Simonyan: Large scale GAN training for high fidelity natural image synthesis. In: proceedings of the international conference on learning representations (ICLR), (2019)

  30. Liu M.-Y., Huang X., Mallya A., Karras T., Aila T., Lehtinen J., Kautz J.: Few-shot unsupervised image-to-image translation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 10551–10560 (2019). https://doi.org/10.1109/ICCV.2019.01065

  31. Park T., Liu M.-Y., Wang T.-C., Zhu J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 2337–2346 (2019). https://doi.org/10.1109/CVPR.2019.00244

  32. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Adv. Neural. Inf. Process. Syst. 30, 5769–5779 (2017). https://papers.nips.cc/paper/7159-improvedtraining-of-wasserstein-gans

    Google Scholar 

  33. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  34. Johnson J., Alahi A., Fei-Fei L.: Perceptual losses for real-time style transfer and super-resolution. In: proceedings of the european conference on computer vision (ECCV), pp. 694–711 (2016). https://doi.org/10.1007/978-3-319-46475-6_43

  35. Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. (2014). https://doi.org/10.48550/arXiv.1409.1556

  36. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992). https://doi.org/10.1016/0167-2789(92)90242-F

    Article  MathSciNet  Google Scholar 

  37. zllrunning. face-parsing.pytorch. (2019). https://github.com/zllrunning/face-parsing.PyTorch

  38. Cao Q., Shen L., Xie W., Parkhi O.M., Zisserman A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 67–74 (2018). https://doi.org/10.1109/FG.2018.00020

  39. Rossler A., Cozzolino D., Verdoliva L., Riess C., Thies J., Nießner M.: Faceforensics++: Learning to detect manipulated facial images. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009

  40. Tero Karras T.A., Samuli Laine, Jaakko Lehtinen: Progressive growing of gans for improved quality, stability, and variation. In: proceedings of the international conference on learning representations (LCLR), pp. 26 (2018)

  41. Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D.H., Hawk, S.T., Van Knippenberg, A.: Presentation and validation of the radboud faces database. Cogn. Emot. 24(8), 1377–1388 (2010). https://doi.org/10.1080/02699930903485076

    Article  Google Scholar 

  42. DeepFakes. https://github.com/ondyari/FaceForensics/tree/master/dataset/DeepFakes

  43. Rosberg F., Aksoy E.E., Alonso-Fernandez F., Englund C.: FaceDancer: Pose-and occlusion-aware high fidelity face swapping. In: proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 3454–3463 (2023). https://doi.org/10.1109/WACV56688.2023.00345

  44. Liu Z., Li M., Zhang Y., Wang C., Zhang Q., Wang J., Nie Y.: Fine-grained face swapping via regional GAN inversion. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8578–8587 (2023). https://doi.org/10.1109/CVPR52729.2023.00829

  45. Karras T., Laine S., Aila T.: A style-based generator architecture for generative adversarial networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4401–4410 (2019)

  46. Wang H., Wang Y., Zhou Z., Ji X., Gong D., Zhou J., Li Z., Liu W.: Cosface: Large margin cosine loss for deep face recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5265–5274 (2018). https://doi.org/10.1109/CVPR.2018.00552

  47. Ruiz N., Chong E., Rehg J.M.: Fine-grained head pose estimation without keypoints. In: proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 2074–2083 (2018). https://doi.org/10.1109/CVPRW.2018.00281

  48. Deng Y., Yang J., Xu S., Chen D., Jia Y., Tong X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 0–0 (2019). https://doi.org/10.1109/CVPRW.2019.00038

  49. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural. Inf. Process. Syst. (2017). https://doi.org/10.18034/ajase.v8i1.9

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Anhui Natural Science Foundation of China under grant number 2308085MF218, in part by the Academic Funding Project for Top Talents in University Disciplines under grant number gxbjZD2021050, in part by the Anhui Provincial Higher Education Institutions Scientific Research Project under grant number 2022AH040113, and in part by the Anhui University of Science and Technology 2023 Graduate Innovation Fund Project under grant number 2023cx2136.

Author information

Authors and Affiliations

Authors

Contributions

Yifeng Ding performed the experiments and analyzed the data. Yifeng Ding wrote the original manuscript. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Yifeng Ding.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This study was conducted with the highest regard for ethical standards and following relevant guidelines and regulations. While no ethical review or approval was necessary for this particular study, the principles of academic integrity and research ethics were strictly adhered to throughout the research process.

Human and animal rights

The research protocol did not require ethical review or approval as it did not involve human participants, animals, or sensitive data. All data used in this study were obtained from publicly available sources and were properly cited and acknowledged. No private or personally identifiable information was used or accessed during this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, G., Ding, Y., Fang, X. et al. Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03414-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03414-2

Keywords

Navigation