Skip to main content

Abstract

Chapter 12 explained that learning models can be divided into discriminative and generative models. The Variational Autoencoder (VAE), introduced in this chapter, is a generative model. Variational inference is a technique that finds a lower bound on the log-likelihood of the data and maximizes the lower bound rather than the log-likelihood in the Maximum Likelihood Estimation (MLE) (see Chap. 12). This lower bound is usually referred to as the Evidence Lower Bound (ELBO). Learning the parameters of latent space can be done using Expectation Maximization (EM), as done in factor analysis (see Chap. 12). The Variational Autoencoder (VAE) implements variational inference in an autoencoder neural network setup, where the encoder and decoder model the E-step (expectation step) and M-step (maximization step) of EM, respectively. However, VAE is usually trained using backpropagation in practice. Variational inference and VAE are found in many Bayesian analysis applications. For example, variational inference has been used in 3D human motion analysis and VAE has been used in forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jincheng Bai, Qifan Song, and Guang Cheng. “Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors”. In: arXiv preprint arXiv:2010.12887 (2020).

    Google Scholar 

  2. Christopher M Bishop. Pattern recognition and machine learning. Springer, 2006.

    Google Scholar 

  3. Guillaume Bouchard and Bill Triggs. “The tradeoff between generative and discriminative classifiers”. In: 16th IASC International Symposium on Computational Statistics. 2004.

    Google Scholar 

  4. Anthony L Caterini, Arnaud Doucet, and Dino Sejdinovic. “Hamiltonian variational auto-encoder”. In: Advances in Neural Information Processing Systems 31 (2018), pp. 8167–8177.

    Google Scholar 

  5. Carl Doersch. “Tutorial on variational autoencoders”. In: arXiv preprint arXiv:1606.05908 (2016).

    Google Scholar 

  6. John Duchi. Derivations for linear algebra and optimization. Tech. rep. Berkeley, California, 2007.

    Google Scholar 

  7. Mikhail Figurnov, Shakir Mohamed, and Andriy Mnih. “Implicit reparameterization gradients”. In: Advances in Neural Information Processing Systems 31 (2018), pp. 441–452.

    Google Scholar 

  8. Benjamin Fruchter. Introduction to factor analysis. Van Nostrand, 1954.

    MATH  Google Scholar 

  9. Benyamin Ghojogh et al. “Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review”. In: arXiv preprint arXiv:2011.00901 (2020).

    Google Scholar 

  10. Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural information processing systems, 2014, pp. 2672–2680.

    Google Scholar 

  11. Gaëtan Hadjeres, Frank Nielsen, and François Pachet. “GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures”. In: 2017 IEEE Symposium Series on Computational Intelligence. IEEE. 2017, pp. 1–7.

    Google Scholar 

  12. John R Hershey and Peder A Olsen “Approximating the Kullback Leibler divergence between Gaussian mixture models”. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Vol. 4. IEEE. 2007, pp. IV–317.

    Google Scholar 

  13. Xianxu Hou et al. “Deep feature consistent variational autoencoder”. In: 2017 IEEE Winter Conference on Applications of Computer Vision. IEEE. 2017, pp. 1133–1141.

    Google Scholar 

  14. Prateek Jain and Purushottam Kar. “Non-convex optimization for machine learning”. In: Foundations and Trends®in Machine Learning 10.3–4 (2017), pp. 142–336.

    Google Scholar 

  15. Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: International Conference on Learning Representations. 2014.

    Google Scholar 

  16. Shiqi Liu et al. “Discovering influential factors in variational autoencoders”. In: Pattern Recognition 100 (2020), p. 107166.

    Google Scholar 

  17. Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. “Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks”. In: International Conference on Machine Learning. 2017.

    Google Scholar 

  18. Andrew Y Ng and Michael I Jordan. “On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes”. In: Advances in neural information processing systems. 2002, pp. 841–848.

    Google Scholar 

  19. Stephen Odaibo. “Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function”. In: arXiv preprint arXiv:1907.08956 (2019).

    Google Scholar 

  20. Yunchen Pu et al. “Variational autoencoder for deep learning of images, labels and captions”. In: Advances in neural information processing systems. 2016, pp. 2352–2360.

    Google Scholar 

  21. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. “Stochastic backpropagation and approximate inference in deep generative models”. In: International Conference on Machine Learning. 2014.

    Google Scholar 

  22. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning representations by backpropagating errors”. In: Nature 323.6088 (1986), pp. 533–536.

    Google Scholar 

  23. Cristian Sminchisescu and Allan Jepson. “Generative modeling for continuous non-linearly embedded visual inference”. In: Proceedings of the twenty-first international conference on Machine learning. 2004, p. 96.

    Google Scholar 

  24. Hiroshi Takahashi et al. “Variational autoencoder with implicit optimal priors”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019, pp. 5066–5073.

    Google Scholar 

  25. Michalis Titsias and Miguel Làzaro-Gredilla. “Doubly stochastic variational Bayes for non-conjugate inference”. In: International conference on machine learning. 2014, pp. 1971–1979.

    Google Scholar 

  26. Arash Vahdat and Jan Kautz. “NVAE: A deep hierarchical variational autoencoder”. In: Advances in Neural Information Processing Systems 33 (2020).

    Google Scholar 

  27. Jacob Walker et al. “An uncertain future: Forecasting from static images using variational autoencoders”. In: European Conference on Computer Vision. Springer. 2016, pp. 835–851.

    Google Scholar 

  28. Weichang Yu, Lamiae Azizi, and John T Ormerod. “Variational nonparametric discriminant analysis”. In: Computational Statistics & Data Analysis 142 (2020), p. 106817.

    Google Scholar 

  29. Shengjia Zhao, Jiaming Song, and Stefano Ermon. “Towards deeper understanding of variational autoencoding models”. In: arXiv preprint arXiv:1702.08658 (2017).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix 20.1: Proof for Lemma 20.1

Appendix 20.1: Proof for Lemma 20.1

$$\displaystyle \begin{aligned} \text{KL}(p_1\|p_2) &= \int p_1(x) \log\big(\frac{p_1(x)}{p_2(x)}\big) dx \\ &=\int p_1(x) \log(p_1(x))\, dx - \int p_1(x) \log(p_2(x))\, dx. \end{aligned} $$

According to integration by parts:

$$\displaystyle \begin{aligned} \int p_1(x) \log(p_1(x))\, dx = -\frac{1}{2} (1 + \log (2 \pi \sigma_1^2)). \end{aligned} $$

There is also:

$$\displaystyle \begin{aligned} &-\int p_1(x) \log(p_2(x))\, dx = -\int p_1(x) \log\Big( \frac{1}{\sqrt{2 \pi \sigma_2^2}} e^{-\frac{(x - \mu_2)^2}{2 \sigma_2^2}} \Big)\, dx \\ &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) \underbrace{\int p_1(x) dx}_{=1} -\int p_1(x) \log\big( e^{-\frac{(x - \mu_2)^2}{2 \sigma_2^2}} \big)\, dx \\ &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) +\int p_1(x) \frac{(x - \mu_2)^2}{2 \sigma_2^2}\, dx \\ &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) + \frac{1}{2 \sigma_2^2} \Big(\int p_1(x) x^2 dx - \int p_1(x) 2x\mu_2 dx + \int p_1(x) \mu_2^2 dx\Big) \\ &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) + \frac{1}{2 \sigma_2^2} \Big(\mathbb{E}_{\sim p_1(x)} [x^2] - 2 \mu_2 \mathbb{E}_{\sim p_1(x)} [x] + \mu_2^2 \Big). \end{aligned} $$

It is known that:

$$\displaystyle \begin{aligned} \mathbb{V}\text{ar}[x] = \mathbb{E}[x^2] - \mathbb{E}[x]^2 \implies \mathbb{E}[x^2] = \sigma_1^2 + \mu_1^2 \end{aligned} $$

Therefore:

$$\displaystyle \begin{aligned} -\int p_1(x) \log(p_2(x))\, dx &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) + \frac{1}{2 \sigma_2^2} \Big(\sigma_1^2 + \mu_1^2 - 2 \mu_2 \mu_1 + \mu_2^2 \Big) \\ &= \frac{1}{2} \log( 2 \pi \sigma_2^2 ) + \frac{1}{2 \sigma_2^2} \big( \sigma_1^2 + (\mu_1 - \mu_2) \big). \end{aligned} $$

Lastly:

$$\displaystyle \begin{aligned} \text{KL}(p_1\|p_2) &= -\frac{1}{2} (1 + \log (2 \pi \sigma_1^2)) + \frac{1}{2} \log( 2 \pi \sigma_2^2 ) + \frac{1}{2 \sigma_2^2} \big( \sigma_1^2 + (\mu_1 - \mu_2) \big) \\ &= \log(\frac{\sigma_2}{\sigma_1}) + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}. \end{aligned} $$

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ghojogh, B., Crowley, M., Karray, F., Ghodsi, A. (2023). Variational Autoencoders. In: Elements of Dimensionality Reduction and Manifold Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-10602-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10602-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10601-9

  • Online ISBN: 978-3-031-10602-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics