Skip to main content

High-Fidelity Synthesis with Disentangled Representation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12371))

Included in the following conference series:

Abstract

Learning disentangled representation of data without supervision is an important step towards improving the interpretability of generative models. Despite recent advances in disentangled representation learning, existing approaches often suffer from the trade-off between representation learning and generation performance (i.e., improving generation quality sacrifices disentanglement performance). We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that easily incorporates the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. To ensure that both generative models are aligned to render the same generative factors, we further constrain the GAN generator to maximize the mutual information between the learned latent code and the output. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation. We also show that the proposed decomposition leads to an efficient and stable model design, and we demonstrate photo-realistic high-resolution image synthesis results (\(1024\times 1024\) pixels) for the first time using the disentangled representations. Our code is available at https://www.github.com/1Konny/idgan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In practice, we can easily sample c from \(q_\phi (c)\) by \(c \sim q_\phi (c|x)p(x)\).

  2. 2.

    In practice, we learn the encoder \(q_\phi \) and generator G independently by Eq. (6) and (7), respectively, through two-step training.

  3. 3.

    GILBO is formulated similarly as \(\mathcal {R}_{\text {ID}}\) (Eq. (4)), but optimized over another auxiliary encoder network different from the one used in \(\mathcal {R}_{\text {ID}}\).

  4. 4.

    We simply downsample the generator output by bilinear sampling to match the dimension between the generator and encoder.

References

  1. Achille, A., Soatto, S.: Information dropout: learning optimal representations through noisy computation. In: TPAMI (2018)

    Google Scholar 

  2. Alemi, A.A., Fischer, I.: GILBO: one metric to measure them all. In: NeurIPS (2018)

    Google Scholar 

  3. Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014)

    Google Scholar 

  4. Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training. In: ICCV (2017)

    Google Scholar 

  5. Bengio, Y., Courville, A., Vincent, P.: Representation Learning: a review and new perspectives. In: PAMI (2013)

    Google Scholar 

  6. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks. In: ICLR (2017)

    Google Scholar 

  7. Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. In: NeurIPS (2017)

    Google Scholar 

  8. Chen, J., Batmanghelich, K.: Weakly supervised disentanglement by pairwise similarities. In: AAAI (2020)

    Google Scholar 

  9. Chen, T.Q., Li, X., Grosse, R., Duvenaud, D.: Isolating sources of disentanglement in variational autoencoders. In: NeurIPS (2018)

    Google Scholar 

  10. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS (2016)

    Google Scholar 

  11. Creager, E., et al.: Flexibly fair representation learning by disentanglement. In: ICML (2019)

    Google Scholar 

  12. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a Large-Scale Hierarchical Image Database. In: CVPR (2009)

    Google Scholar 

  13. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NeurIPS (2016)

    Google Scholar 

  14. Fréchet, M.: Sur la distance de deux lois de probabilité. Comptes Rendus Hebdomadaires Des Seances de L’Academie Des Sciences (1957)

    Google Scholar 

  15. Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  16. Gómez-Bombarelli, R., et al.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018)

    Article  Google Scholar 

  17. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a nash equilibrium. In: NeurIPS (2017)

    Google Scholar 

  18. Higgins, I., et al.: \(\beta \)-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)

    Google Scholar 

  19. Hoffman, M.D., Johnson, M.J.: ELBO surgery: yet another way to carve up the variational evidence lower bound. In: NeurIPS (2016)

    Google Scholar 

  20. Huang, H., Li, z., He, R., Sun, Z., Tan, T.: Introvae: introspective variational autoencoders for photographic image synthesis. In: NeurIPS. Curran Associates, Inc. (2018)

    Google Scholar 

  21. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11

    Chapter  Google Scholar 

  22. Jeon, I., Lee, W., Kim, G.: IB-GAN: disentangled representation learning with information bottleneck GAN (2019). https://openreview.net/forum?id=ryljV2A5KX

  23. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  24. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANS for improved quality, stability, and variation. In: ICLR (2018)

    Google Scholar 

  25. Khemakhem, I., Kingma, D., Hyvärinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. arXiv preprint arXiv:1907.04809 (2019)

  26. Kim, H., Mnih, A.: Disentangling by factorising. In: ICML (2018)

    Google Scholar 

  27. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13) (2013)

    Google Scholar 

  28. Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.A.: Fader networks: manipulating images by sliding attributes. In: NeurIPS. Curran Associates, Inc. (2017)

    Google Scholar 

  29. Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: ICML (2016)

    Google Scholar 

  30. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)

    Google Scholar 

  31. Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3

    Chapter  Google Scholar 

  32. Lezama, J.: Overcoming the disentanglement vs reconstruction trade-off via Jacobian supervision. In: ICLR (2019)

    Google Scholar 

  33. d Lin, Z., Thekumparampil, K.K., Fanti, G.C., Oh, S.: InfoGAN-CR: disentangling generative adversarial networks with contrastive regularizers. In: ICML (2020)

    Google Scholar 

  34. Liu, B., Zhu, Y., Fu, Z., de Melo, G., Elgammal, A.: OOGAN: disentangling GAN with one-hot sampling and orthogonal regularization. In: AAAI (2020)

    Google Scholar 

  35. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)

    Google Scholar 

  36. Locatello, F., Abbati, G., Rainforth, T., Bauer, S., Schölkopf, B., Bachem, O.: On the fairness of disentangled representations. In: NeurIPS (2019)

    Google Scholar 

  37. Locatello, F., Bauer, S., Lucic, M., Gelly, S., Schölkopf, B., Bachem, O.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: ICML (2019)

    Google Scholar 

  38. Locatello, F., Tschannen, M., Bauer, S., Rötsch, G., Schölkopf, B., Bachem, O.: Disentangling factors of variations using few labels. In: ICLR (2020)

    Google Scholar 

  39. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: ICLR (2016)

    Google Scholar 

  40. Mathieu, E., Rainforth, T., Siddharth, N., Teh, Y.W.: Disentangling disentanglement in variational auto-encoders. In: Bayesian Deep Learning Workshop, NeurIPS (2018)

    Google Scholar 

  41. Matthey, L., Higgins, I., Hassabis, D., Lerchner, A.: dSprites: disentanglement testing sprites dataset (2017). https://github.com/deepmind/dsprites-dataset/

  42. Mescheder, L., Nowozin, S., Geiger, A.: Which training methods for GANS do actually converge? In: ICML (2018)

    Google Scholar 

  43. Minka, T., et al.: Divergence measures and message passing. Technical report, Technical report, Microsoft Research (2005)

    Google Scholar 

  44. Narayanaswamy, S., et al.: Learning disentangled representations with semi-supervised deep generative models. In: NeurIPS (2017)

    Google Scholar 

  45. Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: Hologan: unsupervised learning of 3D representations from natural images. In: ICCV (2019)

    Google Scholar 

  46. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)

    Google Scholar 

  47. Peng, X.B., Kanazawa, A., Toyer, S., Abbeel, P., Levine, S.: Variational discriminator bottleneck: improving imitation learning, inverse RL, and GANs by constraining information flow. In: ICLR (2019)

    Google Scholar 

  48. Ruiz, A., Martínez, O., Binefa, X., Verbeek, J.: Learning disentangled representations with reference-based variational autoencoders. arXiv preprint arXiv:1901.08534 (2019)

  49. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

    Google Scholar 

  50. Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: ICLR (2018)

    Google Scholar 

  51. Tschannen, M., Bachem, O.F., Lučić, M.: Recent advances in autoencoder-based representation learning. In: Bayesian Deep Learning Workshop, NeurIPS (2018)

    Google Scholar 

  52. Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 4, 66–82 (1960)

    Article  MathSciNet  Google Scholar 

  53. Watters, N., Matthey, L., Burgess, C.P., Lerchner, A.: Spatial Broadcast Decoder: a simple architecture for learning disentangled representations in VAEs. arXiv preprint arXiv:1901.07017 (2019)

  54. Wei, X., Liu, Z., Wang, L., Gong, B.: Improving the improved training of Wasserstein GANs. In: ICLR (2018)

    Google Scholar 

  55. Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NeurIPS (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (2020-0-00153 and 2016-0-00464).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seunghoon Hong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17895 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, W., Kim, D., Hong, S., Lee, H. (2020). High-Fidelity Synthesis with Disentangled Representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12371. Springer, Cham. https://doi.org/10.1007/978-3-030-58574-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58574-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58573-0

  • Online ISBN: 978-3-030-58574-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics