Advertisement

Machine Learning

, Volume 108, Issue 8–9, pp 1701–1727 | Cite as

Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference

  • Zhize Li
  • Tianyi Zhang
  • Shuyu Cheng
  • Jun Zhu
  • Jian LiEmail author
Article
  • 604 Downloads
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track

Abstract

Gradient-based Monte Carlo sampling algorithms, like Langevin dynamics and Hamiltonian Monte Carlo, are important methods for Bayesian inference. In large-scale settings, full-gradients are not affordable and thus stochastic gradients evaluated on mini-batches are used as a replacement. In order to reduce the high variance of noisy stochastic gradients, Dubey et al. (in: Advances in neural information processing systems, pp 1154–1162, 2016) applied the standard variance reduction technique on stochastic gradient Langevin dynamics and obtained both theoretical and experimental improvements. In this paper, we apply the variance reduction tricks on Hamiltonian Monte Carlo and achieve better theoretical convergence results compared with the variance-reduced Langevin dynamics. Moreover, we apply the symmetric splitting scheme in our variance-reduced Hamiltonian Monte Carlo algorithms to further improve the theoretical results. The experimental results are also consistent with the theoretical results. As our experiment shows, variance-reduced Hamiltonian Monte Carlo demonstrates better performance than variance-reduced Langevin dynamics in Bayesian regression and classification tasks on real-world datasets.

Keywords

Hamiltonian Monte Carlo Variance reduction Bayesian inference 

Notes

Acknowledgements

Funding was provided by National Basic Research Program of China (Grant No. 2015CB358700), National Natural Science Foundation of China (Grant Nos. 61772297, 61632016, 61761146003) and Microsoft Research Asia. We would like to thank Chang Liu for useful discussions.

Supplementary material

References

  1. Ahn, S., Korattikara, A., & Welling, M. (2012). Bayesian posterior sampling via stochastic gradient fisher scoring. In Proceedings of the 29th international conference on machine learning (pp. 1771–1778).Google Scholar
  2. Ahn, S., Shahbaba, B., & Welling, M. (2014). Distributed stochastic gradient MCMC. In International conference on machine learning (pp. 1044–1052).Google Scholar
  3. Allen-Zhu, Z., & Hazan, E. (2016). Variance reduction for faster non-convex optimization. In International conference on machine learning (pp. 699–707).Google Scholar
  4. Byrne, S., & Girolami, M. (2013). Geodesic Monte Carlo on embedded manifolds. Scandinavian Journal of Statistics, 40(4), 825–845.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chen, C., Ding, N., & Carin, L. (2015). On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In Advances in neural information processing systems (pp. 2278–2286).Google Scholar
  6. Chen, T., Fox, E., & Guestrin, C. (2014). Stochastic gradient Hamiltonian Monte Carlo. In International conference on machine learning (pp. 1683–1691).Google Scholar
  7. Defazio, A., Bach, F., & Lacoste-Julien, S. (2014). Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems (pp. 1646–1654).Google Scholar
  8. Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). Bayesian sampling using stochastic gradient thermostats. In Advances in neural information processing systems (pp. 3203–3211).Google Scholar
  9. Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.MathSciNetCrossRefGoogle Scholar
  10. Dubey, K. A., Reddi, S. J., Williamson, S. A., Poczos, B., Smola, A. J., & Xing, E. P. (2016). Variance reduction in stochastic gradient Langevin dynamics. In Advances in neural information processing systems (pp. 1154–1162).Google Scholar
  11. Ge, R., Li, Z., Wang, W., & Wang, X. (2019). Stabilized SVRG: Simple variance reduction for nonconvex optimization. In Conference on learning theory.Google Scholar
  12. Girolami, M., & Calderhead, B. (2011). Riemann manifold langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123–214.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems (pp. 315–323).Google Scholar
  14. Leimkuhler, B., & Shang, X. (2016). Adaptive thermostats for noisy gradient systems. SIAM Journal on Scientific Computing, 38(2), A712–A736.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Li, Z. (2019). SSRGD: Simple stochastic recursive gradient descent for escaping saddle points. arXiv preprint arXiv:1904.09265.
  16. Li, Z., & Li, J. (2018). A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In Advances in neural information processing systems (pp. 5569–5579).Google Scholar
  17. Liu, C., Zhu, J., & Song, Y. (2016). Stochastic gradient geodesic MCMC methods. In Advances in neural information processing systems (pp. 3009–3017).Google Scholar
  18. Ma, Y. A., Chen, T., & Fox, E. (2015). A complete recipe for stochastic gradient MCMC. In Advances in neural information processing systems (pp. 2917–2925).Google Scholar
  19. Neal, R. M., et al. (2011). MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 139–188.zbMATHGoogle Scholar
  20. Patterson, S., & Teh, Y.W. (2013). Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In Advances in neural information processing systems (pp. 3102–3110).Google Scholar
  21. Reddi, S. J., Hefny, A., Sra, S., Póczos, B., & Smola, A. (2016). Stochastic variance reduction for nonconvex optimization. In International conference on machine learning (pp. 314–323).Google Scholar
  22. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3). Google Scholar
  23. Welling, M., & Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (pp. 681–688).Google Scholar
  24. Zou, D., Xu, P., & Gu, Q. (2018). Stochastic variance-reduced Hamilton Monte Carlo methods. arXiv preprint arXiv:1802.04791.

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  • Zhize Li
    • 1
  • Tianyi Zhang
    • 1
  • Shuyu Cheng
    • 1
  • Jun Zhu
    • 1
  • Jian Li
    • 1
    Email author
  1. 1.Tsinghua UniversityBeijingChina

Personalised recommendations