The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Peebles, William; Peebles, John; Zhu, Jun-Yan; Efros, Alexei; Torralba, Antonio

doi:10.1007/978-3-030-58539-6_35

William Peebles¹²,
John Peebles¹³,
Jun-Yan Zhu¹⁴,
Alexei Efros¹² &
…
Antonio Torralba¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Included in the following conference series:

European Conference on Computer Vision

3955 Accesses
38 Citations

Abstract

Existing disentanglement methods for deep generative models rely on hand-picked priors and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal. We introduce a model-agnostic, unbiased stochastic approximation of this term based on Hutchinson’s estimator to compute it efficiently during training. Our method can be applied to a wide range of deep generators with just a few lines of code. We show that training with the Hessian Penalty often causes axis-aligned disentanglement to emerge in latent space when applied to ProGAN on several datasets. Additionally, we use our regularization term to identify interpretable directions in BigGAN’s latent space in an unsupervised fashion. Finally, we provide empirical evidence that the Hessian Penalty encourages substantial shrinkage when applied to over-parameterized latent spaces. We encourage readers to view videos of our disentanglement results at www.wpeebles.com/hessian-penalty, and code at https://github.com/wpeebles/hessian_penalty.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM (JACM) 58(2), 1–34 (2011)
Article MathSciNet Google Scholar
Bau, D., et al.: Semantic photo manipulation with a generative image prior. In: SIGGRAPH (2019)
Google Scholar
Bau, D., et al.: Gan dissection: visualizing and understanding generative adversarial networks. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1
Chapter Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Google Scholar
Collins, E., Bala, R., Price, B., Susstrunk, S.: Editing in style: uncovering the local semantics of GANs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=Sy2fzU9gl
Hu, Q., Szabó, A., Portenier, T., Favaro, P., Zwicker, M.: Disentangling factors of variation by mixing them. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat.-Simul. Comput. 18(3), 1059–1076 (1989)
Article MathSciNet Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000)
Article Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. arXiv preprint arXiv:1907.07171 (2019)
Jha, A.H., Anand, S., Singh, M., Veeravasarapu, V.S.R.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 829–845. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_49
Chapter Google Scholar
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901–2910 (2017)
Google Scholar
Karaletsos, T., Belongie, S., Rätsch, G.: Bayesian representation learning with oracle constraints. arXiv preprint arXiv:1506.05011 (2015)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. CoRR abs/1912.04958 (2019)
Google Scholar
Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. In: International Conference on Artificial Intelligence and Statistics, pp. 2207–2217 (2020)
Google Scholar
Khemakhem, I., Monti, R.P., Kingma, D.P., Hyvärinen, A.: ICE-Beem: identifiable conditional energy-based deep models. arXiv preprint arXiv:2002.11537 (2020)
Kim, H., Mnih, A.: Disentangling by factorising. arXiv preprint arXiv:1802.05983 (2018)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: Advances in Neural Information Processing Systems, pp. 2539–2547 (2015)
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predicting Struct. Data 1 (2006)
Google Scholar
Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: International Conference on Machine Learning, pp. 4114–4124 (2019)
Google Scholar
Mathieu, M.F., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Uesato, J., Frossard, P.: Robustness via curvature regularization, and vice versa. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9078–9086 (2019)
Google Scholar
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7588–7597 (2019)
Google Scholar
Odena, A., et al.: Is generator conditioning causally related to GAN performance? arXiv preprint arXiv:1802.08768 (2018)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space (2020)
Google Scholar
White, T.: Sampling generative networks. arXiv preprint arXiv:1609.04468 (2016)
Yang, D., Hong, S., Jang, Y., Zhao, T., Lee, H.: Diversity-sensitive conditional generative adversarial networks. arXiv preprint arXiv:1901.09024 (2019)
Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Zhu, J.Y., et al.: Visual object networks: image generation with disentangled 3D representations. In: Advances in Neural Information Processing Systems, pp. 118–129 (2018)
Google Scholar

Download references

Acknowledgments

We thank Pieter Abbeel, Taesung Park, Richard Zhang, Mathieu Aubry, Ilija Radosavovic, Tim Brooks, Karttikeya Mangalam, and all of BAIR for valuable discussions and encouragement. This work was supported, in part, by grants from SAP, Adobe, and Berkeley DeepDrive.

Author information

Authors and Affiliations

University of California, Berkeley, Berkeley, USA
William Peebles & Alexei Efros
Yale University, New Haven, USA
John Peebles
Adobe Research, San Jose, USA
Jun-Yan Zhu
MIT CSAIL, Cambridge, USA
Antonio Torralba

Authors

William Peebles
View author publications
You can also search for this author in PubMed Google Scholar
John Peebles
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Yan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Alexei Efros
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Torralba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Peebles .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 94740 KB)

Supplementary material 1 (pdf 8648 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peebles, W., Peebles, J., Zhu, JY., Efros, A., Torralba, A. (2020). The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-58539-6_35
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics