Deep Boltzmann Machines and the Centering Trick

  • Grégoire Montavon
  • Klaus-Robert Müller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)

Abstract

Deep Boltzmann machines are in theory capable of learning efficient representations of seemingly complex data. Designing an algorithm that effectively learns the data representation can be subject to multiple difficulties. In this chapter, we present the “centering trick” that consists of rewriting the energy of the system as a function of centered states. The centering trick improves the conditioning of the underlying optimization problem and makes learning more stable, leading to models with better generative and discriminative properties.

Keywords

Deep Boltzmann machine centering reparameterization unsupervised learning optimization representations 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arnold, L., Auger, A., Hansen, N., Ollivier, Y.: Information-geometric optimization algorithms: A unifying picture via invariance principles, arXiv:1106.3708 (2011)Google Scholar
  2. 2.
    Braun, M.L., Buhmann, J., Müller, K.-R.: On relevant dimensions in kernel feature spaces. Journal of Machine Learning Research 9, 1875–1908 (2008)MathSciNetMATHGoogle Scholar
  3. 3.
    Cho, K., Raiko, T., Ilin, A.: Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: Proceedings of the 28th International Conference on Machine Learning, pp. 105–112 (2011)Google Scholar
  4. 4.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)CrossRefMATHGoogle Scholar
  5. 5.
    Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)Google Scholar
  6. 6.
    Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 282–317. MIT Press (1986)Google Scholar
  7. 7.
    LeCun, Y., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Montavon, G., Braun, M.L., Müller, K.-R.: Kernel analysis of deep networks. Journal of Machine Learning Research 12, 2563–2581 (2011)MathSciNetMATHGoogle Scholar
  9. 9.
    Montavon, G., Braun, M.L., Müller, K.-R.: Deep Boltzmann machines as feed-forward hierarchies. Journal of Machine Learning Research - Proceedings Track 22, 789–804 (2012)Google Scholar
  10. 10.
    Neal, R.M.: Annealed importance sampling. Statistics and Computing 11(2), 125–139 (2001)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Computation 6(1), 147–160 (1994)CrossRefGoogle Scholar
  12. 12.
    Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 448–455 (2009)Google Scholar
  13. 13.
    Salakhutdinov, R.: Learning and Evaluating Boltzmann Machines. Technical Report UTML TR 2008-002, Dept. of Computer Science, University of Toronto (2008)Google Scholar
  14. 14.
    Salakhutdinov, R., Murray, I.: On the quantitative analysis of deep belief networks. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 872–879 (2008)Google Scholar
  15. 15.
    Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  16. 16.
    Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.-R., Rätsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10(5), 1000–1017 (1999)CrossRefGoogle Scholar
  17. 17.
    Schraudolph, N.N.: Centering Neural Network Gradient Factors. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 207–226. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Tang, Y., Sutskever, I.: Data normalization in the learning of restricted Boltzmann machines. Technical Report UTML-TR-11-2, Department of Computer Science, University of Toronto (2011)Google Scholar
  19. 19.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1064–1071 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Grégoire Montavon
    • 1
  • Klaus-Robert Müller
    • 1
    • 2
  1. 1.Machine Learning GroupTechnische Universität BerlinBerlinGermany
  2. 2.Department of Brain and Cognitive EngineeringKorea UniversitySeoulKorea

Personalised recommendations