A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines

Cho, KyungHyun; Raiko, Tapani; Ilin, Alexander; Karhunen, Juha

doi:10.1007/978-3-642-40728-4_14

KyungHyun Cho²²,
Tapani Raiko²²,
Alexander Ilin²² &
…
Juha Karhunen²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8131))

Included in the following conference series:

International Conference on Artificial Neural Networks

6251 Accesses
10 Citations

Abstract

A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum- likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. arXiv:1206.5538 [cs.LG] (June 2012)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning, corrected 2nd printing edn. Springer (2007)
Google Scholar
Cho, K.: Improved Learning Algorithms for Restricted Boltzmann Machines. Master’s thesis, Aalto University School of Science (2011)
Google Scholar
Desjardins, G., Courville, A., Bengio, Y.: On training deep Boltzmann machines. arXiv:1203.4416 [cs.NE] (March 2012)
Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)
Article MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE (11), 2278–2324
Google Scholar
Marlin, B.M., Swersky, K., Chen, B., de Freitas, N.: Inductive principles for restricted Boltzmann machine learning. In: Proc. of the 13th Int. Conf. on Artificial Intelligence and Statistics (AISTATS 2010), pp. 509–516 (2010)
Google Scholar
Montavon, G., Müller, K.-R.: Deep Boltzmann machines and the centering trick. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 621–637. Springer, Heidelberg (2012)
Google Scholar
Raiko, T., Valpola, H., LeCun, Y.: Deep learning made easier by linear transformations in perceptrons. In: Proc. of the 15th Int. Conf. on Artificial Intelligence and Statistics (AISTATS 2012), La Palma, Canary Islands, Spain (April 2012)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: A Better Way to Pre-Train Deep Boltzmann Machines. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Salakhutdinov, R.: Learning deep Boltzmann machines using adaptive MCMC. In: Fürnkranz, J., Joachims, T. (eds.) Proc. of the 27th Int. Conf. on Machine Learning (ICML 2010), pp. 943–950. Omnipress, Haifa (2010)
Google Scholar
Salakhutdinov, R., Hinton, G.: An efficient learning procedure for deep Boltzmann machines. Tech. Rep. MIT-CSAIL-TR-2010-037, MIT (August 2010)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proc. of the Int. Conf. on Artificial Intelligence and Statistics (AISTATS 2009), pp. 448–455 (2009)
Google Scholar
Tieleman, T., Hinton, G.E.: Using fast weights to improve persistent contrastive divergence. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 1033–1040. ACM, New York (2009)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, Finland
KyungHyun Cho, Tapani Raiko, Alexander Ilin & Juha Karhunen

Authors

KyungHyun Cho
View author publications
You can also search for this author in PubMed Google Scholar
Tapani Raiko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ilin
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karhunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty Automation,, Technical University of Sofia, 8 St. Kl. Ohridski Blvd., 1000, Sofia, Bulgaria
Valeri Mladenov
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl.25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89075, Ulm, Germany
Günther Palm
Quartier UNIL-Dorigny, Bâtiment Internef, Université de Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Computer Science, University of Milano, Via Comelico, 39, 20135, Milano, Italy
Bruno Appollini
Knowledge Engineering, School of Computing and Mathematical Sciences, Auckland University of Technology, 120 Mayoral Drive, 3rd floor, 1010, Auckland, New Zealand
Nikola Kasabov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cho, K., Raiko, T., Ilin, A., Karhunen, J. (2013). A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds) Artificial Neural Networks and Machine Learning – ICANN 2013. ICANN 2013. Lecture Notes in Computer Science, vol 8131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40728-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-40728-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40727-7
Online ISBN: 978-3-642-40728-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics