Advertisement

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Conference paper
  • 736 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

Existing disentanglement methods for deep generative models rely on hand-picked priors and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal. We introduce a model-agnostic, unbiased stochastic approximation of this term based on Hutchinson’s estimator to compute it efficiently during training. Our method can be applied to a wide range of deep generators with just a few lines of code. We show that training with the Hessian Penalty often causes axis-aligned disentanglement to emerge in latent space when applied to ProGAN on several datasets. Additionally, we use our regularization term to identify interpretable directions in BigGAN’s latent space in an unsupervised fashion. Finally, we provide empirical evidence that the Hessian Penalty encourages substantial shrinkage when applied to over-parameterized latent spaces. We encourage readers to view videos of our disentanglement results at www.wpeebles.com/hessian-penalty, and code at https://github.com/wpeebles/hessian_penalty.

Notes

Acknowledgments

We thank Pieter Abbeel, Taesung Park, Richard Zhang, Mathieu Aubry, Ilija Radosavovic, Tim Brooks, Karttikeya Mangalam, and all of BAIR for valuable discussions and encouragement. This work was supported, in part, by grants from SAP, Adobe, and Berkeley DeepDrive.

Supplementary material

504443_1_En_35_MOESM1_ESM.pdf (8.4 mb)
Supplementary material 1 (pdf 8648 KB)

Supplementary material 2 (mp4 94740 KB)

References

  1. 1.
    Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM (JACM) 58(2), 1–34 (2011)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bau, D., et al.: Semantic photo manipulation with a generative image prior. In: SIGGRAPH (2019)Google Scholar
  3. 3.
    Bau, D., et al.: Gan dissection: visualizing and understanding generative adversarial networks. In: International Conference on Learning Representations (ICLR) (2019)Google Scholar
  4. 4.
    Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39593-2_1CrossRefGoogle Scholar
  5. 5.
    Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (ICLR) (2019)Google Scholar
  6. 6.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)Google Scholar
  7. 7.
    Collins, E., Bala, R., Price, B., Susstrunk, S.: Editing in style: uncovering the local semantics of GANs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  9. 9.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  10. 10.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  11. 11.
    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  12. 12.
    Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=Sy2fzU9gl
  13. 13.
    Hu, Q., Szabó, A., Portenier, T., Favaro, P., Zwicker, M.: Disentangling factors of variation by mixing them. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  14. 14.
    Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_11CrossRefGoogle Scholar
  15. 15.
    Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat.-Simul. Comput. 18(3), 1059–1076 (1989)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000)CrossRefGoogle Scholar
  17. 17.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  18. 18.
    Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. arXiv preprint arXiv:1907.07171 (2019)
  19. 19.
    Jha, A.H., Anand, S., Singh, M., Veeravasarapu, V.S.R.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 829–845. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_49CrossRefGoogle Scholar
  20. 20.
    Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901–2910 (2017)Google Scholar
  21. 21.
    Karaletsos, T., Belongie, S., Rätsch, G.: Bayesian representation learning with oracle constraints. arXiv preprint arXiv:1506.05011 (2015)
  22. 22.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (ICLR) (2018)Google Scholar
  23. 23.
    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  24. 24.
    Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. CoRR abs/1912.04958 (2019)Google Scholar
  25. 25.
    Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. In: International Conference on Artificial Intelligence and Statistics, pp. 2207–2217 (2020)Google Scholar
  26. 26.
    Khemakhem, I., Monti, R.P., Kingma, D.P., Hyvärinen, A.: ICE-Beem: identifiable conditional energy-based deep models. arXiv preprint arXiv:2002.11537 (2020)
  27. 27.
    Kim, H., Mnih, A.: Disentangling by factorising. arXiv preprint arXiv:1802.05983 (2018)
  28. 28.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  29. 29.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)Google Scholar
  30. 30.
    Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: Advances in Neural Information Processing Systems, pp. 2539–2547 (2015)Google Scholar
  31. 31.
    LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predicting Struct. Data 1 (2006)Google Scholar
  32. 32.
    Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: International Conference on Machine Learning, pp. 4114–4124 (2019)Google Scholar
  33. 33.
    Mathieu, M.F., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  34. 34.
    Moosavi-Dezfooli, S.M., Fawzi, A., Uesato, J., Frossard, P.: Robustness via curvature regularization, and vice versa. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9078–9086 (2019)Google Scholar
  35. 35.
    Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7588–7597 (2019)Google Scholar
  36. 36.
    Odena, A., et al.: Is generator conditioning causally related to GAN performance? arXiv preprint arXiv:1802.08768 (2018)
  37. 37.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  38. 38.
    Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  39. 39.
    Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  40. 40.
    Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space (2020)Google Scholar
  41. 41.
    White, T.: Sampling generative networks. arXiv preprint arXiv:1609.04468 (2016)
  42. 42.
    Yang, D., Hong, S., Jang, Y., Zhao, T., Lee, H.: Diversity-sensitive conditional generative adversarial networks. arXiv preprint arXiv:1901.09024 (2019)
  43. 43.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)Google Scholar
  44. 44.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)Google Scholar
  45. 45.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  46. 46.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  47. 47.
    Zhu, J.Y., et al.: Visual object networks: image generation with disentangled 3D representations. In: Advances in Neural Information Processing Systems, pp. 118–129 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of California, BerkeleyBerkeleyUSA
  2. 2.Yale UniversityNew HavenUSA
  3. 3.Adobe ResearchSan JoseUSA
  4. 4.MIT CSAILCambridgeUSA

Personalised recommendations