Neural Networks: Tricks of the Trade pp 621-637 | Cite as
Deep Boltzmann Machines and the Centering Trick
Chapter
- 16 Citations
- 2 Mentions
- 55k Downloads
Abstract
Deep Boltzmann machines are in theory capable of learning efficient representations of seemingly complex data. Designing an algorithm that effectively learns the data representation can be subject to multiple difficulties. In this chapter, we present the “centering trick” that consists of rewriting the energy of the system as a function of centered states. The centering trick improves the conditioning of the underlying optimization problem and makes learning more stable, leading to models with better generative and discriminative properties.
Keywords
Deep Boltzmann machine centering reparameterization unsupervised learning optimization representationsPreview
Unable to display preview. Download preview PDF.
References
- 1.Arnold, L., Auger, A., Hansen, N., Ollivier, Y.: Information-geometric optimization algorithms: A unifying picture via invariance principles, arXiv:1106.3708 (2011)Google Scholar
- 2.Braun, M.L., Buhmann, J., Müller, K.-R.: On relevant dimensions in kernel feature spaces. Journal of Machine Learning Research 9, 1875–1908 (2008)MathSciNetzbMATHGoogle Scholar
- 3.Cho, K., Raiko, T., Ilin, A.: Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: Proceedings of the 28th International Conference on Machine Learning, pp. 105–112 (2011)Google Scholar
- 4.Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)CrossRefzbMATHGoogle Scholar
- 5.Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)Google Scholar
- 6.Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 282–317. MIT Press (1986)Google Scholar
- 7.LeCun, Y., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)CrossRefGoogle Scholar
- 8.Montavon, G., Braun, M.L., Müller, K.-R.: Kernel analysis of deep networks. Journal of Machine Learning Research 12, 2563–2581 (2011)MathSciNetzbMATHGoogle Scholar
- 9.Montavon, G., Braun, M.L., Müller, K.-R.: Deep Boltzmann machines as feed-forward hierarchies. Journal of Machine Learning Research - Proceedings Track 22, 789–804 (2012)Google Scholar
- 10.Neal, R.M.: Annealed importance sampling. Statistics and Computing 11(2), 125–139 (2001)MathSciNetCrossRefGoogle Scholar
- 11.Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Computation 6(1), 147–160 (1994)CrossRefGoogle Scholar
- 12.Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 448–455 (2009)Google Scholar
- 13.Salakhutdinov, R.: Learning and Evaluating Boltzmann Machines. Technical Report UTML TR 2008-002, Dept. of Computer Science, University of Toronto (2008)Google Scholar
- 14.Salakhutdinov, R., Murray, I.: On the quantitative analysis of deep belief networks. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 872–879 (2008)Google Scholar
- 15.Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
- 16.Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.-R., Rätsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10(5), 1000–1017 (1999)CrossRefGoogle Scholar
- 17.Schraudolph, N.N.: Centering Neural Network Gradient Factors. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 207–226. Springer, Heidelberg (1998)CrossRefGoogle Scholar
- 18.Tang, Y., Sutskever, I.: Data normalization in the learning of restricted Boltzmann machines. Technical Report UTML-TR-11-2, Department of Computer Science, University of Toronto (2011)Google Scholar
- 19.Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1064–1071 (2008)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2012