Skip to main content

A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

We present a new encoder architecture for the inversion of Generative Adversarial Networks (GAN). The task is to reconstruct a real image from the latent space of a pre-trained GAN. Unlike previous encoder-based methods which predict only a latent code from a real image, the proposed encoder maps the given image to both a latent code and a feature tensor, simultaneously. The feature tensor is key for accurate inversion, which helps to obtain better perceptual quality and lower reconstruction error. We conduct extensive experiments for several style-based generators pre-trained on different data domains. Our method is the first feed-forward encoder to include the feature tensor in the inversion, outperforming the state-of-the-art encoder-based methods for GAN inversion. Additionally, experiments on video inversion show that our method yields a more accurate and stable inversion for videos. This offers the possibility to perform real-time editing in videos. Code is available at https://github.com/InterDigitalInc/FeatureStyleEncoder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)

    Google Scholar 

  2. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8296–8305 (2020)

    Google Scholar 

  3. Abdal, R., Zhu, P., Mitra, N., Wonka, P.: StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. arXiv e-prints, pp. arXiv-2008 (2020)

    Google Scholar 

  4. Alaluf, Y., Patashnik, O., Cohen-Or, D.: Official implementation of ReStyle: a residual-based StyleGAN encoder via iterative refinement (2021). https://github.com/yuval-alaluf/restyle-encoder

  5. Alaluf, Y., Patashnik, O., Cohen-Or, D.: ReStyle: a residual-based StyleGAN encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6711–6720 (2021)

    Google Scholar 

  6. Alharbi, Y., Wonka, P.: Disentangled image generation through structured noise injection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5134–5142 (2020)

    Google Scholar 

  7. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=B1xsqj09Fm

  8. Chai, L., Zhu, J.Y., Shechtman, E., Isola, P., Zhang, R.: Ensembling with deep generative views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14997–15007 (2021)

    Google Scholar 

  9. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)

    Google Scholar 

  10. Collins, E., Bala, R., Price, B., Susstrunk, S.: Editing in style: uncovering the local semantics of GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5771–5780 (2020)

    Google Scholar 

  11. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)

    Google Scholar 

  12. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  13. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  14. Hou, X., Zhang, X., Liang, H., Shen, L., Lai, Z., Wan, J.: GuidedStyle: attribute knowledge guided style manipulation for semantic face editing. Neural Networks (2021)

    Google Scholar 

  15. Huang, Y., et al.: CurricularFace: adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5901–5910 (2020)

    Google Scholar 

  16. Huh, M., Zhang, R., Zhu, J.-Y., Paris, S., Hertzmann, A.: Transforming and projecting images into class-conditional generative networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 17–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_2

    Chapter  Google Scholar 

  17. Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. In: Proceedings of the NeurIPS (2020)

    Google Scholar 

  18. Kang, K., Kim, S., Cho, S.: GAN inversion for out-of-range images with geometric transformations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13941–13949 (2021)

    Google Scholar 

  19. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)

    Google Scholar 

  20. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proceedings of the NeurIPS (2020)

    Google Scholar 

  21. Karras, T., et al.: Alias-free generative adversarial networks. In: Proceedings of the NeurIPS (2021)

    Google Scholar 

  22. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)

    Google Scholar 

  24. Kim, H., Choi, Y., Kim, J., Yoo, S., Uh, Y.: Exploiting spatial dimensions of latent in GAN for real-time image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 852–861 (2021)

    Google Scholar 

  25. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)

    Google Scholar 

  26. Kwon, G., Ye, J.C.: Diagonal attention and style-based GAN for content-style disentanglement in image generation and translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13980–13989 (2021)

    Google Scholar 

  27. Ling, H., Kreis, K., Li, D., Kim, S.W., Torralba, A., Fidler, S.: EditGAN: high-precision semantic image editing. arXiv preprint arXiv:2111.03186 (2021)

  28. Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)

    Google Scholar 

  29. Park, T., et al.: Swapping autoencoder for deep image manipulation. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 7198–7211. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/50905d7b2216bfeccb5b41016357176b-Paper.pdf

  30. Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094 (2021)

    Google Scholar 

  31. Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)

    Google Scholar 

  32. Richardson, E., et al.: Official implementation of encoding in style: a StyleGAN encoder for image-to-image translation (2020). https://github.com/eladrich/pixel2style2pixel

  33. Richardson, E., et al.: Encoding in style: a StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)

    Google Scholar 

  34. Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)

    Google Scholar 

  35. Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1532–1540 (2021)

    Google Scholar 

  36. Tewari, A., et al.: StyleRig: rigging StyleGAN for 3D control over portrait images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6142–6151 (2020)

    Google Scholar 

  37. Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for StyleGAN image manipulation. ACM Trans. Graph. (TOG) 40(4), 1–14 (2021)

    Article  Google Scholar 

  38. Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Official implementation of designing an encoder for StyleGAN image manipulation (2021). https://github.com/omertov/encoder4editing

  39. Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. In: International Conference on Machine Learning, pp. 9786–9796. PMLR (2020)

    Google Scholar 

  40. Wang, B., Ponce, C.R.: The geometry of deep generative image models and its applications. arXiv preprint arXiv:2101.06006 (2021)

  41. Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. arXiv preprint arXiv:2109.06590 (2021)

  42. Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: Official implementation of high-fidelity GAN inversion for image attribute editing (2021). https://github.com/Tengfei-Wang/HFGI

  43. Wei, T., et al.: A simple baseline for StyleGAN inversion. arXiv preprint arXiv:2104.07661 (2021)

  44. Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for StyleGAN image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)

    Google Scholar 

  45. Xia, W., Zhang, Y., Yang, Y., Xue, J.H., Zhou, B., Yang, M.H.: GAN inversion: a survey. arXiv preprint arXiv:2101.05278 (2021)

  46. Xu, Y., Du, Y., Xiao, W., Xu, X., He, S.: From continuity to editability: inverting GANs with consecutive images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13910–13918 (2021)

    Google Scholar 

  47. Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13789–13798 (2021)

    Google Scholar 

  48. Yu, C., Wang, W.: Adaptable GAN encoders for image reconstruction via multi-type latent vectors with two-scale attentions. arXiv preprint arXiv:2108.10201 (2021)

  49. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  50. Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_35

    Chapter  Google Scholar 

  51. Zhu, P., Abdal, R., Femiani, J., Wonka, P.: Barbershop: GAN-based image compositing using segmentation masks. arXiv preprint arXiv:2106.01505 (2021)

  52. zllrunning: Face parsing network pre-trained on CelebAMask-HQ dataset (2019). https://github.com/zllrunning/face-parsing.PyTorch

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Yao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11731 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, X., Newson, A., Gousseau, Y., Hellier, P. (2022). A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19784-0_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19783-3

  • Online ISBN: 978-3-031-19784-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics