Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training
- 195 Downloads
Restricted Boltzmann machines (RBMs) have been successfully applied in unsupervised learning and image density-based modeling. The aim of the pre-training step for RBMs is to discover an unknown stationary distribution based on the sample data that has the lowest energy. However, conventional RBM pre-training is sensitive to the initial weights and bias. The selection of initial values in RBM pre-training will directly affect the capabilities and efficiency of the learning process. This paper uses principal component analysis to capture the principal component directions of the training data. A set of initial parameter values for the RBM can be obtained by computing the same reconstruction of the data. Experiments on the Yale and MNIST datasets show that the proposed method not only retains a strong learning ability, but also significantly accelerates the learning speed.
KeywordsRBM PCA Pre-training Unsupervised learning
This work was supported by National Science Foundation of China under Grants 61375065, 61432014 and 61432012.
Compliance with ethical standards
Conflict of interest
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
- Bengio Y (2013) Deep learning of representations: looking forward. In: SLSP, pp 1–37Google Scholar
- Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: ALT, pp 18–36Google Scholar
- Chen D, Socher R, Manning CD, Ng AY (2013) Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In: ICLRGoogle Scholar
- Cho K, Raiko T, Ilin A (2011) Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: ICML, pp 105–112Google Scholar
- Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP, pp 8599–8603Google Scholar
- Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800Google Scholar
- Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Montavon G, Orr GB, Müller K-R (eds) Neural networks: tricks of the trade (2nd edn). Springer, Berlin, Heidelberg, pp 599–619Google Scholar
- Huang P, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp 2333–2338Google Scholar
- Kamyshanska H, Memisevic R (2013) On autoencoder scoring. In: ICML, pp 720–728Google Scholar
- Kohli P, Osokin A, Jegelka S (2013) A principled deep random field model for image segmentation. In: CVPR, pp 1971–1978Google Scholar
- Liu J, Gong M, Zhao J et al (2014) Difference representation learning using stacked restricted Boltzmann machines for change detection in SAR images. Soft Comput 1–13. doi: 10.1007/s00500-014-1460-0
- Lv JC, Yi Z, Zhou J (2010a) Subspace learning of neural networks. CRC PressGoogle Scholar
- Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: CVPR, pp 2480–2487Google Scholar
- Mittelman R, Kuipers B, Savarese S, Lee H (2014) Structured recurrent temporal restricted Boltzmann machine. In: ICML, pp 1647–1655Google Scholar
- Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22Google Scholar
- Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR, pp 2551–2558Google Scholar
- Salakhutdinov R, Mnih A, Hinton GE (2007) Restricted Boltzmann machines for collaborative filtering. In: ICML, pp 791–798Google Scholar
- Salakhutdinov R, Murray I (2008) On the quantitative analysis of deep belief networks. In: ICML, pp 872–879Google Scholar
- Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng Andrew, Potts Chris (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLPGoogle Scholar
- Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. CVPRGoogle Scholar
- Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: ICML, pp 1139–1147Google Scholar
- Tang Y, Salakhutdinov R, Hinton G (2012) Robust Boltzmann machines for recognition and denoising. In: CVPRGoogle Scholar