Skip to main content

Generative Multiplane Images: Making a 2D GAN 3D-Aware

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13665))

Included in the following conference series:

Abstract

What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a ‘generative multiplane image’ (GMPI) and emphasize that its renderings are not only high-quality but also guaranteed to be view-consistent, which makes GMPIs different from many prior works. Importantly, the number of alpha maps can be dynamically adjusted and can differ between training and inference, alleviating memory concerns and enabling fast training of GMPIs in less than half a day at a resolution of \(1024^2\). Our findings are consistent across three challenging and common high-resolution datasets, including FFHQ, AFHQv2 and MetFaces.

X. Zhao—Work done as part of an internship at Apple.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Concurrently, EG3D [9] also finds that pose conditioning of the discriminator is required for their tri-plane representation to produce 3D-aware results, corroborating that this form of inductive bias is indeed necessary.

  2. 2.

    https://github.com/NVlabs/stylegan2-ada-pytorch.

  3. 3.

    https://metmuseum.github.io/.

References

  1. Cat hipsterizer (2022). https://github.com/kairess/cat_hipsterizer. Accessed 06 Mar 2022

  2. Aneja, J., Schwing, A.G., Kautz, J., Vahdat, A.: A contrastive learning approach for training variational autoencoder priors. In: ICLR (2020)

    Google Scholar 

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. ArXiv (2017)

    Google Scholar 

  4. Avidan, S., Shashua, A.: Novel view synthesis in tensor space. In: CVPR (1997)

    Google Scholar 

  5. Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)

  6. Binkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. In: ICLR (2018)

    Google Scholar 

  7. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)

    Google Scholar 

  8. Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: SIGGRAPH (2001)

    Google Scholar 

  9. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

    Google Scholar 

  10. Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)

    Google Scholar 

  11. Chen, S.E., Williams, L.: View interpolation for image synthesis. In: SIGGRAPH (1993)

    Google Scholar 

  12. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR (2020)

    Google Scholar 

  13. Cully, R.W.A., Chang, H.J., Demiris, Y.: MAGAN: margin adaptation for generative adversarial networks. ArXiv (2017)

    Google Scholar 

  14. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and Rendering Architecture from Photographs: a hybrid geometry- and image-based approach. In: SIGGRAPH (1996)

    Google Scholar 

  15. Deng, J., Guo, J., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)

    Google Scholar 

  16. Deng, Y., Yang, J., Xiang, J., Tong, X.: GRAM: generative radiance manifolds for 3D-aware image generation. In: CVPR (2022)

    Google Scholar 

  17. Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3D face reconstruction with weakly-supervised learning: from single image to image set. In: CVPRW (2019)

    Google Scholar 

  18. Deshpande, I., Zhang, Z., Schwing, A.G.: Generative modeling using the sliced Wasserstein distance. In: CVPR (2018)

    Google Scholar 

  19. Deshpande, I., et al.: Max-sliced Wasserstein distance and its use for GANs. In: CVPR (2019)

    Google Scholar 

  20. Gadelha, M., Maji, S., Wang, R.: 3D shape induction from 2D views of multiple objects. In: 3DV (2017)

    Google Scholar 

  21. Ghosh, S., Lv, Z., Matsuda, N., Xiao, L., Berkovich, A., Cossairt, O.: LiveView: dynamic target-centered MPI for view synthesis. arXiv preprint arXiv:2107.05113 (2021)

  22. Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  23. Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D-aware generator for high-resolution image synthesis. In: ICLR (2022)

    Google Scholar 

  24. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein GANs. In: NeurIPS (2017)

    Google Scholar 

  25. Habtegebrial, T., Jampani, V., Gallo, O., Stricker, D.: Generative view synthesis: from single-view semantics to novel-view images. In: NeurIPS (2020)

    Google Scholar 

  26. Hao, Z., Mallya, A., Belongie, S., Liu, M.Y.: GANcraft: unsupervised 3D neural rendering of minecraft worlds. In: ICCV (2021)

    Google Scholar 

  27. Henzler, P., Mitra, N.J., Ritschel, T.: Escaping Plato’s cave: 3D shape from adversarial rendering. In: ICCV (2019)

    Google Scholar 

  28. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)

    Google Scholar 

  29. Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)

    Google Scholar 

  30. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR (2018)

    Google Scholar 

  31. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: NeurIPS (2020)

    Google Scholar 

  32. Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)

    Google Scholar 

  33. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  34. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)

    Google Scholar 

  35. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

    Google Scholar 

  36. Kolouri, S., Rohde, G.K., Hoffman, H.: Sliced Wasserstein distance for learning gaussian mixture models. In: CVPR (2018)

    Google Scholar 

  37. Laine, S., Hellsten, J., Karras, T., Seol, Y., Lehtinen, J., Aila, T.: Modular primitives for high-performance differentiable rendering. TOG 39, 1–14 (2020)

    Article  Google Scholar 

  38. Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH (1996)

    Google Scholar 

  39. Li, C.L., Chang, W.C., Cheng, Y., Yang, Y., Póczos, B.: MMD GAN: towards deeper understanding of moment matching network. In: NeurIPS (2017)

    Google Scholar 

  40. Li, J., Feng, Z., She, Q., Ding, H., Wang, C., Lee, G.H.: MINE: towards continuous depth MPI with nerf for novel view synthesis. In: ICCV (2021)

    Google Scholar 

  41. Li, Y., Schwing, A.G., Wang, K.C., Zemel, R.: Dualing GANs. In: NeurIPS (2017)

    Google Scholar 

  42. Lin, Z., Khetan, A., Fanti, G., Oh, S.: PacGAN: the power of two samples in generative adversarial networks. In: NeurIPS (2018)

    Google Scholar 

  43. Mescheder, L.M., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: ICML (2018)

    Google Scholar 

  44. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  45. Miyato, T., Koyama, M.: cGANs with projection discriminator. In: ICLR (2018)

    Google Scholar 

  46. Mroueh, Y., Sercu, T.: Fisher GAN. In: NeurIPS (2017)

    Google Scholar 

  47. Mroueh, Y., Sercu, T., Goel, V.: McGan: mean and covariance feature matching GAN. ArXiv (2017)

    Google Scholar 

  48. Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: ICCV (2019)

    Google Scholar 

  49. Nguyen-Phuoc, T., Richardt, C., Mai, L., Yang, Y.L., Mitra, N.J.: BlockGAN: learning 3D object-aware scene representations from unlabelled images. In: NeurIPS (2020)

    Google Scholar 

  50. Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: CVPR (2021)

    Google Scholar 

  51. Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2022)

    Google Scholar 

  52. Pan, X., Xu, X., Loy, C.C., Theobalt, C., Dai, B.: A shading-guided generative implicit model for shape-accurate 3D-aware image synthesis. In: NeurIPS (2021)

    Google Scholar 

  53. Salimans, T., Zhang, H., Radford, A., Metaxas, D.: Improving GANs using optimal transport. In: ICLR (2018)

    Google Scholar 

  54. Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NeurIPS (2020)

    Google Scholar 

  55. Serengil, S.I., Ozpinar, A.: LightFace: a hybrid deep face recognition framework. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE (2020)

    Google Scholar 

  56. Shade, J., Gortler, S., Hey, L.W., Szeliski, R.: Layered depth images. In: SIGGRAPH (1998)

    Google Scholar 

  57. Shi, Y., Aggarwal, D., Jain, A.K.: Lifting 2D StyleGAN for 3D-aware face generation. In: CVPR (2021)

    Google Scholar 

  58. Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: CVPR (2019)

    Google Scholar 

  59. Sun, R., Fang, T., Schwing, A.G.: Towards a better global loss landscape of GANs. In: Proceedings of NeurIPS (2020)

    Google Scholar 

  60. Tucker, R., Snavely, N.: Single-view view synthesis with multiplane images. In: CVPR (2020)

    Google Scholar 

  61. Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS (2016)

    Google Scholar 

  62. Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3D-aware image synthesis via learning structural and textural representations. In: CVPR (2022)

    Google Scholar 

  63. Zhou, P., Xie, L., Ni, B., Tian, Q.: CIPS-3D: a 3D-aware generator of GANs based on conditionally-independent pixel synthesis. ArXiv (2021)

    Google Scholar 

  64. Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. TOG (2018)

    Google Scholar 

  65. Zhu, J.Y., et al.: Visual object networks: Image generation with disentangled 3D representations. In: NeurIPS (2018)

    Google Scholar 

Download references

Acknowledgements

We thank Eric Ryan Chan for discussion and providing processed AFHQv2-Cats dataset. Supported in part by NSF grants 1718221, 2008387, 2045586, 2106825, MRI #1725729, NIFA award 2020-67021-32799.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Zhao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7602 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, X., Ma, F., Güera, D., Ren, Z., Schwing, A.G., Colburn, A. (2022). Generative Multiplane Images: Making a 2D GAN 3D-Aware. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13665. Springer, Cham. https://doi.org/10.1007/978-3-031-20065-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20065-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20064-9

  • Online ISBN: 978-3-031-20065-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics