An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns
Energy-based models are popular in machine learning due to the elegance of their formulation and their relationship to statistical physics. Among these, the Restricted Boltzmann Machine (RBM), and its staple training algorithm contrastive divergence (CD), have been the prototype for some recent advancements in the unsupervised training of deep neural networks. However, CD has limited theoretical motivation, and can in some cases produce undesirable behaviour. Here, we investigate the performance of Minimum Probability Flow (MPF) learning for training RBMs. Unlike CD, with its focus on approximating an intractable partition function via Gibbs sampling, MPF proposes a tractable, consistent, objective function defined in terms of a Taylor expansion of the KL divergence with respect to sampling dynamics. Here we propose a more general form for the sampling dynamics in MPF, and explore the consequences of different choices for these dynamics for training RBMs. Experimental results show MPF outperforming CD for various RBM configurations.
KeywordsMarkov Chain Monte Carlo Transition Matrix Hide Unit Connectivity Function Deep Neural Network
Unable to display preview. Download preview PDF.
- 1.Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)Google Scholar
- 2.Bengio, Y., Yao, L., Cho, K.: Bounding the test log-likelihood of generative models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2013)Google Scholar
- 6.MacKay, D.J.C.: Failures of the one-step learning algorithm (2001). http://www.inference.phy.cam.ac.uk/mackay/abstracts/gbm.html, unpublished Technical Report
- 7.Marlin, B.M., de Freitas, N.: Asymptotic efficiency of deterministic estimators for discrete energy-based models: ratio matching and pseudolikelihood. In: Proceedings of the Uncertainty in Artificial Intelligence (UAI) (2011)Google Scholar
- 8.Salakhutdinov, R., Murray, I.: On the quantitative analysis of deep belief networks. In: Proceedings of the International Conference of Machine Learning (ICML) (2008)Google Scholar
- 9.Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel Distributed Processing: Volume 1: Foundations, pp. 194–281. MIT Press (1986)Google Scholar
- 10.Sohl-Dickstein, J.: Persistent minimum probability flow. Tech. rep, Redwood Centre for Theoretical Neuroscience (2011)Google Scholar
- 11.Sohl-Dickstein, J., Battaglino, P., DeWeese, M.R.: Minimum probability flow learning. In: Proceedings of the International Conference of Machine Learning (ICML) (2011)Google Scholar
- 12.Sutskever, I., Tieleman, T.: On the convergence properties of contrastive divergence. In: Proceedings of the AI & Statistics (AI STAT) (2009)Google Scholar
- 13.Tieleman, T., Hinton, G.E.: Using fast weights to improve persistent contrastive divergence. In: Proceedings of the International Conference of Machine Learning (ICML) (2009)Google Scholar