Training Restricted Boltzmann Machines with Multi-tempering: Harnessing Parallelization

Brakel, Philemon; Dieleman, Sander; Schrauwen, Benjamin

doi:10.1007/978-3-642-33266-1_12

Philemon Brakel²¹,
Sander Dieleman²¹ &
Benjamin Schrauwen²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7553))

Included in the following conference series:

International Conference on Artificial Neural Networks

3277 Accesses
10 Citations

Abstract

Restricted Boltzmann Machines (RBM’s) are unsupervised probabilistic neural networks that can be stacked to form Deep Belief Networks. Given the recent popularity of RBM’s and the increasing availability of parallel computing architectures, it becomes interesting to investigate learning algorithms for RBM’s that benefit from parallel computations. In this paper, we look at two extensions of the parallel tempering algorithm, which is a Markov Chain Monte Carlo method to approximate the likelihood gradient. The first extension is directed at a more effective exchange of information among the parallel sampling chains. The second extension estimates gradients by averaging over chains from different temperatures. We investigate the efficiency of the proposed methods and demonstrate their usefulness on the MNIST dataset. Especially the weighted averaging seems to benefit Maximum Likelihood learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Athènes, M., Calvo, F.: Multiple-Replica Exchange with Information Retrieval. Chemphyschem. 9(16), 2332–2339 (2008)
Article Google Scholar
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009), also published as a book. Now Publishers (2009)
Google Scholar
Brenner, P., Sweet, C.R., VonHandorf, D., Izaguirre, J.A.: Accelerating the Replica Exchange Method through an Efficient All-Pairs Exchange. The Journal of Chemical Physics 126(7), 074103 (2007)
Google Scholar
Desjardins, G., Courville, A.C., Bengio, Y., Vincent, P., Delalleau, O.: Tempered markov chain monte carlo for training of restricted boltzmann machines. Journal of Machine Learning Research - Proceedings Track 9, 145–152 (2010)
Google Scholar
Freund, Y., Haussler, D.: Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks. Tech. rep., Santa Cruz, CA, USA (1994)
Google Scholar
Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)
Article MATH MathSciNet Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Neal, R.M.: Annealed importance sampling. Statistics and Computing 11, 125–139 (1998)
Article MathSciNet Google Scholar
Salakhutdinov, R.: Learning in markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 1598–1606. Curran Associates, Inc. (2009)
Google Scholar
Salakhutdinov, R., Murray, I.: On the quantitative analysis of Deep Belief Networks. In: McCallum, A., Roweis, S. (eds.) Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 872–879. Omnipress (2008)
Google Scholar
Swendsen, R.H., Wang, J.S.: Replica Monte Carlo Simulation of Spin-Glasses. Physical Review Letters 57(21), 2607–2609 (1986)
Article MathSciNet Google Scholar
Tieleman, T., Hinton, G.: Using Fast Weights to Improve Persistent Contrastive Divergence. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1033–1040. ACM, New York (2009)
Google Scholar
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the International Conference on Machine Learning (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Systems, Ghent University, Sint-Pietersnieuwstraat 41, 9000, Gent, Belgium
Philemon Brakel, Sander Dieleman & Benjamin Schrauwen

Authors

Philemon Brakel
View author publications
You can also search for this author in PubMed Google Scholar
Sander Dieleman
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Schrauwen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Neuro Heuristic Research Group, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Informatics, Nicolaus Copernicus University, 87-100, Toruń, Poland
Włodzisław Duch
Center for Complex Systems Studies, Kalamazoo College, 49006, Kalamazoo, MI, USA
Péter Érdi
Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, 16146, Genoa, Italy
Francesco Masulli
Institut für Neuroinformatik, Universität Ulm, 89069, Ulm, Germany
Günther Palm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brakel, P., Dieleman, S., Schrauwen, B. (2012). Training Restricted Boltzmann Machines with Multi-tempering: Harnessing Parallelization. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-33266-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33265-4
Online ISBN: 978-3-642-33266-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics