Soft Computing

, Volume 21, Issue 21, pp 6471–6479 | Cite as

Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training

  • Chunzhi Xie
  • Jiancheng Lv
  • Xiaojie Li
Methodologies and Application


Restricted Boltzmann machines (RBMs) have been successfully applied in unsupervised learning and image density-based modeling. The aim of the pre-training step for RBMs is to discover an unknown stationary distribution based on the sample data that has the lowest energy. However, conventional RBM pre-training is sensitive to the initial weights and bias. The selection of initial values in RBM pre-training will directly affect the capabilities and efficiency of the learning process. This paper uses principal component analysis to capture the principal component directions of the training data. A set of initial parameter values for the RBM can be obtained by computing the same reconstruction of the data. Experiments on the Yale and MNIST datasets show that the proposed method not only retains a strong learning ability, but also significantly accelerates the learning speed.


RBM PCA Pre-training Unsupervised learning 



This work was supported by National Science Foundation of China under Grants 61375065, 61432014 and 61432012.

Compliance with ethical standards

Conflict of interest


Ethical standards

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.


  1. Baldi P, Hornik K (1988) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58CrossRefGoogle Scholar
  2. Bengio Y (2013) Deep learning of representations: looking forward. In: SLSP, pp 1–37Google Scholar
  3. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(8):1798–1828CrossRefGoogle Scholar
  4. Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: ALT, pp 18–36Google Scholar
  5. Chen D, Socher R, Manning CD, Ng AY (2013) Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In: ICLRGoogle Scholar
  6. Cho K, Raiko T, Ilin A (2011) Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: ICML, pp 105–112Google Scholar
  7. Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP, pp 8599–8603Google Scholar
  8. Erhan D, Bengio Y et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetzbMATHGoogle Scholar
  9. Fischer A, Igel C (2011) Bounding the bias of contrastive divergence learning. Neural Comput 23:664–673MathSciNetCrossRefzbMATHGoogle Scholar
  10. Fischer A, Igel C (2014) Training restricted Boltzmann machines: an introduction. Pattern recognit 47(1):25–39CrossRefzbMATHGoogle Scholar
  11. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800Google Scholar
  12. Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Montavon G, Orr GB, Müller K-R (eds) Neural networks: tricks of the trade (2nd edn). Springer, Berlin, Heidelberg, pp 599–619Google Scholar
  13. Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
  14. Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  15. Huang P, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp 2333–2338Google Scholar
  16. Kamyshanska H, Memisevic R (2013) On autoencoder scoring. In: ICML, pp 720–728Google Scholar
  17. Kohli P, Osokin A, Jegelka S (2013) A principled deep random field model for image segmentation. In: CVPR, pp 1971–1978Google Scholar
  18. Liu J, Gong M, Zhao J et al (2014) Difference representation learning using stacked restricted Boltzmann machines for change detection in SAR images. Soft Comput 1–13. doi: 10.1007/s00500-014-1460-0
  19. Lv JC, Yi Z (2006) Global convergence of a PCA learning algorithm with a constant learning rate. Comput Math Appl 52(10–11):1425–1438MathSciNetCrossRefzbMATHGoogle Scholar
  20. Lv JC, Yi Z, Tan KK (2007) Determination of the number of principal directions in a biologically plausible PCA model. IEEE Trans Neural Netw 18(3):910–916CrossRefGoogle Scholar
  21. Lv JC, Yi Z, Zhou J (2010a) Subspace learning of neural networks. CRC PressGoogle Scholar
  22. Lv JC, Tan KK, Yi Z, Huang S (2010b) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226CrossRefGoogle Scholar
  23. Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: CVPR, pp 2480–2487Google Scholar
  24. Mittelman R, Kuipers B, Savarese S, Lee H (2014) Structured recurrent temporal restricted Boltzmann machine. In: ICML, pp 1647–1655Google Scholar
  25. Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22Google Scholar
  26. Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR, pp 2551–2558Google Scholar
  27. Salakhutdinov R, Mnih A, Hinton GE (2007) Restricted Boltzmann machines for collaborative filtering. In: ICML, pp 791–798Google Scholar
  28. Salakhutdinov R, Murray I (2008) On the quantitative analysis of deep belief networks. In: ICML, pp 872–879Google Scholar
  29. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng Andrew, Potts Chris (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLPGoogle Scholar
  30. Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. CVPRGoogle Scholar
  31. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: ICML, pp 1139–1147Google Scholar
  32. Tang Y, Salakhutdinov R, Hinton G (2012) Robust Boltzmann machines for recognition and denoising. In: CVPRGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Machine Intelligence LaboratoryCollege of Computer Science, Sichuan UniversityChengduPeople’s Republic of China

Personalised recommendations