Advertisement

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

Conference paper
  • 735 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as finite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model.

Notes

Acknowledgments

The work is supported by NSF DMS-2015577, DARPA XAI N66001-17-2-4029, ARO W911NF1810296, ONR MURI N00014-16-1-2007, and XSEDE grant ASC170063. We thank NVIDIA for the donation of Titan V GPUs. We thank Eric Fischer for the assistance with experiments.

Supplementary material

504443_1_En_22_MOESM1_ESM.pdf (209 kb)
Supplementary material 1 (pdf 208 KB)

References

  1. 1.
    Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985).  https://doi.org/10.1207/s15516709cog0901_7CrossRefGoogle Scholar
  2. 2.
    Amit, D.J.: Modeling Brain Function: The World of Attractor Neural Networks, 1st edn. Cambridge University Press, Cambridge (1989). http://www.worldcat.org/oclc/19922497CrossRefGoogle Scholar
  3. 3.
    Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 599–608. PMLR (2018). http://proceedings.mlr.press/v80/bojanowski18a.html
  4. 4.
    Chen, C., Li, C., Chen, L., Wang, W., Pu, Y., Duke, L.C.: Continuous-time flows for efficient inference and density estimation. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, vol. 80, pp. 824–833 (2018). http://proceedings.mlr.press/v80/chen18d.html
  5. 5.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006). http://www.elementsofinformationtheory.com/zbMATHGoogle Scholar
  6. 6.
    Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Workshop Track Proceedings (2015). http://arxiv.org/abs/1410.8516
  7. 7.
    Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings (2017). https://openreview.net/forum?id=HkpbnH9lx
  8. 8.
    Germain, M., Gregor, K., Murray, I., Larochelle, H.: Made: masked autoencoder for distribution estimation. In: International Conference on Machine Learning, pp. 881–889 (2015)Google Scholar
  9. 9.
    Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 1462–1471 (2015). http://proceedings.mlr.press/v37/gregor15.html
  10. 10.
    Han, T., Lu, Y., Wu, J., Xing, X., Wu, Y.N.: Learning generator networks for dynamic patterns. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 809–818. IEEE (2019)Google Scholar
  11. 11.
    Han, T., Lu, Y., Zhu, S., Wu, Y.N.: Alternating back-propagation for generator network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017, pp. 1976–1984 (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14784
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016).  https://doi.org/10.1109/CVPR.2016.90
  13. 13.
    Hendrycks, D., Gimpel, K.: Bridging nonlinearities and stochastic regularizers with Gaussian error linear units. CoRR abs/1606.08415 (2016). http://arxiv.org/abs/1606.08415
  14. 14.
    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp. 6626–6637 (2017)Google Scholar
  15. 15.
    Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268(5214), 1158–1161 (1995)CrossRefGoogle Scholar
  16. 16.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).  https://doi.org/10.1162/neco.1997.9.8.1735CrossRefGoogle Scholar
  17. 17.
    Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. In: Proceedings of the National Academy of Sciences, vol. 79, pp. 2554–2558. National Acad Sciences (1982)Google Scholar
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
  19. 19.
    Kingma, D.P., Salimans, T., Welling, M.: Improving variational inference with inverse autoregressive flow. CoRR abs/1606.04934 (2016). http://arxiv.org/abs/1606.04934
  20. 20.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014). http://arxiv.org/abs/1312.6114
  21. 21.
    Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions. In: Advances in Neural Information Processing Systems, pp. 10215–10224 (2018)Google Scholar
  22. 22.
    Langevin, P.: On the theory of Brownian motion (1908)Google Scholar
  23. 23.
    Maaløe, L., Fraccaro, M., Liévin, V., Winther, O.: Biva: a very deep hierarchy of latent variables for generative modeling. In: Advances in Neural Information Processing Systems, pp. 6551–6562 (2019)Google Scholar
  24. 24.
    Neal, R.M.: MCMC using Hamiltonian dynamics. In: Handbook of Markov Chain MonteCarlo, vol. 2 (2011)Google Scholar
  25. 25.
    Nijkamp, E., Hill, M., Zhu, S.C., Wu, Y.N.: Learning non-convergent non-persistent short-run MCMC toward energy-based model. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, Canada, 8–14 December (2019)Google Scholar
  26. 26.
    Pang, B., Han, T., Nijkamp, E., Zhu, S.C., Wu, Y.N.: Learning latent space energy-based prior model. arXiv preprint arXiv:2006.08205 (2020)
  27. 27.
    Poucet, B., Save, E.: Attractors in memory. Science 308(5723), 799–800 (2005)CrossRefGoogle Scholar
  28. 28.
    Qiu, Y., Wang, X.: Almond: adaptive latent modeling and optimization via neural networks and Langevin diffusion. J. Am. Stat. Assoc. 1–13 (2019)Google Scholar
  29. 29.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 1278–1286 (2014). http://proceedings.mlr.press/v32/rezende14.html
  30. 30.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  32. 32.
    Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder variational autoencoders. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016, pp. 3738–3746 (2016), http://papers.nips.cc/paper/6275-ladder-variational-autoencoders
  33. 33.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 2818–2826 (2016).  https://doi.org/10.1109/CVPR.2016.308
  34. 34.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, 5–9 June 2008, pp. 1064–1071 (2008).  https://doi.org/10.1145/1390156.1390290
  35. 35.
    Xie, J., Gao, R., Zheng, Z., Zhu, S., Wu, Y.N.: Learning dynamic generator model by alternating back-propagation through time. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 5498–5507 (2019).  https://doi.org/10.1609/aaai.v33i01.33015498
  36. 36.
    Zhao, S., Song, J., Ermon, S.: Learning hierarchical features from deep generative models. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 4091–4099. PMLR (2017). http://proceedings.mlr.press/v70/zhao17c.html

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of CaliforniaLos AngelesUSA
  2. 2.Stevens Institute of TechnologyHobokenUSA

Personalised recommendations