Skip to main content

dpVAEs: Fixing Sample Generation for Regularized VAEs

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12625))

Included in the following conference series:

Abstract

Unsupervised representation learning via generative modeling is a staple to many computer vision applications in the absence of labeled data. Variational Autoencoders (VAEs) are powerful generative models that learn representations useful for data generation. However, due to inherent challenges in the training objective, VAEs fail to learn useful representations amenable for downstream tasks. Regularization-based methods that attempt to improve the representation learning aspect of VAEs come at a price: poor sample generation. In this paper, we explore this representation-generation trade-off for regularized VAEs and introduce a new family of priors, namely decoupled priors, or dpVAEs, that decouple the representation space from the generation space. This decoupling enables the use of VAE regularizers on the representation space without impacting the distribution used for sample generation, and thereby reaping the representation learning benefits of the regularizations without sacrificing the sample generation. dpVAE  leverages invertible networks to learn a bijective mapping from an arbitrarily complex representation distribution to a simple, tractable, generative distribution. Decoupled priors can be adapted to the state-of-the-art VAE regularizers without additional hyperparameter tuning. We showcase the use of dpVAEs  with different regularizers. Experiments on MNIST, SVHN, and CelebA demonstrate, quantitatively and qualitatively, that dpVAE  fixes sample generation and improves representation for regularized VAEs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Complete derivation can be found in the supplementary material.

  2. 2.

    The ELBOs for these regularizers can be found in the supplementary material.

  3. 3.

    Architectures and hyperparameters are described in the supplementary material. Additionally, results showcasing that representation learning (specifically disentanglement) is not adversely affected by the introduction of decoupled priors are also presented in the supplementary material.

References

  1. Zhao, S., Song, J., Ermon, S.: Infovae: balancing learning and inference in variational autoencoders. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5885–5892 (2019)

    Google Scholar 

  2. Higgins, I., et al.: beta-vae: learning basic visual concepts with a constrained variational framework. ICLR 2, 6 (2017)

    Google Scholar 

  3. Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning, pp. 2654–2663 (2018)

    Google Scholar 

  4. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)

    Google Scholar 

  5. Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4467–4477 (2017)

    Google Scholar 

  6. Mathieu, M.F., Zhao, J.J., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems, pp. 5040–5048 (2016)

    Google Scholar 

  7. Higgins, I., et al.: Darla: Improving zero-shot transfer in reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp. 1480–1490 (2017)

    Google Scholar 

  8. Rezende, D., Danihelka, I., Gregor, K., Wierstra, D., et al.: One-shot generalization in deep generative models. In: International Conference on Machine Learning, pp. 1521–1529 (2016)

    Google Scholar 

  9. Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems, pp. 3581–3589 (2014)

    Google Scholar 

  10. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)

    Article  Google Scholar 

  11. Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R.A., Murphy, K.: Fixing a broken elbo. In: International Conference on Machine Learning, pp. 159–168 (2018)

    Google Scholar 

  12. Yuille, A., Kersten, D.: Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10, 301–308 (2006)

    Article  Google Scholar 

  13. Nair, V., Susskind, J., Hinton, G.E.: Analysis-by-synthesis by learning to invert generative black boxes. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 971–981. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87536-9_99

    Chapter  Google Scholar 

  14. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ICLR (2014)

    Google Scholar 

  15. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)

    Google Scholar 

  16. Maaløe, L., Sønderby, C.K., Sønderby, S.K., Winther, O.: Auxiliary deep generative models. In: International Conference on Machine Learning, pp. 1445–1453 (2016)

    Google Scholar 

  17. Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: How to train deep variational autoencoders and probabilistic ladder networks. In: 33rd International Conference on Machine Learning (ICML 2016) (2016)

    Google Scholar 

  18. Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016)

    Google Scholar 

  19. Xu, W., Sun, H., Deng, C., Tan, Y.: Variational autoencoder for semi-supervised text classification. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  20. Tschannen, M., Bachem, O., Lucic, M.: Recent advances in autoencoder-based representation learning. In: Third workshop on Bayesian Deep Learning (NeurIPS 2018) (2018)

    Google Scholar 

  21. Chen, X., et al.: Variational lossy autoencoder. ICLR (2017)

    Google Scholar 

  22. Hoffman, M.D., Johnson, M.J.: Elbo surgery: yet another way to carve up the variational evidence lower bound. In: Workshop in Advances in Approximate Bayesian Inference, NIPS, vol. 1. (2016)

    Google Scholar 

  23. Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)

    Google Scholar 

  24. Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. In: ICLR (2018)

    Google Scholar 

  25. Makhzani, A., Frey, B.J.: Pixelgan autoencoders. In: Advances in Neural Information Processing Systems, pp. 1975–1985 (2017)

    Google Scholar 

  26. Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. In: ICLR (2016)

    Google Scholar 

  27. Rosca, M., Lakshminarayanan, B., Mohamed, S.: Distribution matching in variational inference. arXiv preprint arXiv:1802.06847 (2018)

  28. Xu, H., Chen, W., Lai, J., Li, Z., Zhao, Y., Pei, D.: On the necessity and effectiveness of learning the prior of variational auto-encoder. arXiv preprint arXiv:1905.13452 (2019)

  29. Shmelkov, K., Lucas, T., Alahari, K., Schmid, C., Verbeek, J.: Coverage and quality driven training of generative image models. arXiv preprint arXiv:1901.01091 (2019)

  30. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  31. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International conference on machine learning, pp. 214–223 (2017)

    Google Scholar 

  32. Sugiyama, M., Suzuki, T., Kanamori, T.: Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann. Inst. Stat. Math. 64, 1009–1044 (2012)

    Article  MathSciNet  Google Scholar 

  33. Mescheder, L., Nowozin, S., Geiger, A.: Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp. 2391–2400 (2017)

    Google Scholar 

  34. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015)

  35. Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: Veegan: Reducing mode collapse in gans using implicit variational learning. In: Advances in Neural Information Processing Systems, pp. 3308–3318 (2017)

    Google Scholar 

  36. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: International Conference on Learning Representations (2016)

    Google Scholar 

  37. Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558 (2017)

  38. Xiao, Z., Yan, Q., Amit, Y.: Generative latent flow. arXiv preprint arXiv:1905.10485 (2019)

  39. Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Advances in neural information processing systems, pp. 4743–4751(2016)

    Google Scholar 

  40. Bauer, M., Mnih, A.: Resampled priors for variational autoencoders. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 66–75 (2019)

    Google Scholar 

  41. Tomczak, J., Welling, M.: Vae with a vampprior. In: International Conference on Artificial Intelligence and Statistics, pp. 1214–1223 (2018)

    Google Scholar 

  42. Dilokthanakul, N., et al.: Deep unsupervised clustering with Gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648 (2016)

  43. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471(2015)

    Google Scholar 

  44. Gulrajani, I., et al.: Pixelvae: a latent variable model for natural images. In: ICLR (2017)

    Google Scholar 

  45. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, pp. 6306–6315 (2017)

    Google Scholar 

  46. Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)

    Google Scholar 

  47. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)

    Google Scholar 

  48. Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)

  49. Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear independent components estimation (2014)

    Google Scholar 

  50. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. In: ICLR (2017)

    Google Scholar 

  51. Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning. Volume 37 of Proceedings of Machine Learning Research., Lille, France, PMLR, pp. 1530–1538 (2015)

    Google Scholar 

  52. Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1 x 1 convolutions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10215–10224. Curran Associates, Inc. (2018)

    Google Scholar 

  53. Huang, C.W., et al.: Learnable explicit density for continuous latent space and variational inference. arXiv preprint arXiv:1710.02248 (2017)

  54. Das, H.P., Abbeel, P., Spanos, C.J.: Dimensionality reduction flows. arXiv preprint arXiv:1908.01686 (2019)

  55. Gritsenko, A.A., Snoek, J., Salimans, T.: On the relationship between normalising flows and variational-and denoising autoencoders (2019)

    Google Scholar 

  56. Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10–21 (2016)

    Google Scholar 

  57. Burgess, C.P., et al.: Understanding disentangling in beta-vae. arXiv preprint arXiv:1804.03599 (2018)

  58. Liu, Q., Wang, D.: Stein variational gradient descent: A general purpose Bayesian inference algorithm. In: Advances in Neural Information Processing Systems, pp. 2378–2386 (2016)

    Google Scholar 

  59. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, pp. 513–520 (2007)

    Google Scholar 

  60. Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: International Conference on Machine Learning, pp. 1718–1727 (2015)

    Google Scholar 

  61. Dziugaite, G.K., Roy, D.M., Ghahramani, Z.: Training generative neural networks via maximum mean discrepancy optimization. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pp. 258–267. AUAI Press (2015)

    Google Scholar 

  62. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010)

    Google Scholar 

  63. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  64. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)

    Google Scholar 

  65. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  66. Dinh, L., Sohl-Dickstein, J., Pascanu, R., Larochelle, H.: A RAD approach to deep mixture models. CoRR abs/1903.07714 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shireen Elhabian .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13044 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhalodia, R., Lee, I., Elhabian, S. (2021). dpVAEs: Fixing Sample Generation for Regularized VAEs. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69538-5_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69537-8

  • Online ISBN: 978-3-030-69538-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics