Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines

  • KyungHyun Cho
  • Alexander Ilin
  • Tapani Raiko
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6791)

Abstract

We propose a few remedies to improve training of Gaussian-Bernoulli restricted Boltzmann machines (GBRBM), which is known to be difficult. Firstly, we use a different parameterization of the energy function, which allows for more intuitive interpretation of the parameters and facilitates learning. Secondly, we propose parallel tempering learning for GBRBM. Lastly, we use an adaptive learning rate which is selected automatically in order to stabilize training. Our extensive experiments show that the proposed improvements indeed remove most of the difficulties encountered when training GBRBMs using conventional methods.

Keywords

Restricted Boltzmann Machine Gaussian-Bernoulli Restricted Boltzmann Machine Adaptive Learning Rate Parallel Tempering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985)CrossRefGoogle Scholar
  2. 2.
    Cho, K.: Improved Learning Algorithms for Restricted Boltzmann Machines. Master’s thesis, Aalto University School of Science (2011)Google Scholar
  3. 3.
    Cho, K., Raiko, T., Ilin, A.: Parallel tempering is efficient for learning restricted boltzmann machines. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2010), Barcelona, Spain (July 2010)Google Scholar
  4. 4.
    Coates, A., Lee, H., Ng, A.Y.: An Analysis of Single-Layer Networks in Unsupervised Feature Learning. In: NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning (2010)Google Scholar
  5. 5.
    Desjardins, G., Courville, A., Bengio, Y.: Adaptive Parallel Tempering for Stochastic Maximum Likelihood Learning of RBMs. In: NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning (2010)Google Scholar
  6. 6.
    Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Delalleau, O.: Parallel Tempering for Training of Restricted Boltzmann Machines. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 145–152 (2010)Google Scholar
  7. 7.
    Fischer, A., Igel, C.: Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6354, pp. 208–217. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the Dimensionality of Data with Neural Networks. Science 313(5786), 504–507 (2006)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Hinton, G.: A Practical Guide to Training Restricted Boltzmann Machines. Tech. Rep. Department of Computer Science, University of Toronto (2010)Google Scholar
  10. 10.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, 1st edn. Wiley Interscience, Hoboken (2001)CrossRefGoogle Scholar
  11. 11.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. Rep. Computer Science Department, University of Toronto (2009)Google Scholar
  12. 12.
    Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-2010. Tech. Rep. Computer Science Department, University of Toronto (2010)Google Scholar
  13. 13.
    MIT Center For Biological and Computation Learning: CBCL Face Database #1, http://www.ai.mit.edu/projects/cbcl
  14. 14.
    Ranzato, M.A., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR, pp. 2551–2558 (2010)Google Scholar
  15. 15.
    Salakhutdinov, R.: Learning Deep Generative Models. Ph.D. thesis, University of Toronto (2009)Google Scholar
  16. 16.
    Schulz, H., Müller, A., Behnke, S.: Investigating Convergence of Restricted Boltzmann Machine Learning. In: NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning (2010)Google Scholar
  17. 17.
    Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel Distributed processing: Explorations in the Microstructure of Cognition, Foundations, vol. 1, USA, pp. 194–281. MIT Press, Cambridge (1986)Google Scholar
  18. 18.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1064–1071. ACM Press, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • KyungHyun Cho
    • 1
  • Alexander Ilin
    • 1
  • Tapani Raiko
    • 1
  1. 1.Department of Information and Computer ScienceAalto University School of ScienceFinland

Personalised recommendations