Machine Learning

, Volume 107, Issue 6, pp 943–968 | Cite as

On better training the infinite restricted Boltzmann machines

  • Xuan Peng
  • Xunzhang Gao
  • Xiang Li


The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinitely many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model’s generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.


Infinite restricted Boltzmann machines Model averaging Regularization Discriminative and generative training objective 


  1. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169.CrossRefGoogle Scholar
  2. Baldi, P., & Sadowski, P. (2014). The dropout learning algorithm. Artificial Intelligence, 210, 78–122.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Cho, K., Raiko, T., & Ilin, A. (2010). Parallel tempering is efficient for learning restricted Boltzmann machines. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 3246–3253). IEEE Press.Google Scholar
  4. Côté, M. A., & Larochelle, H. (2016). An infinite restricted Boltzmann machine. Neural Computation, 28, 1265–1289.CrossRefGoogle Scholar
  5. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems (pp. 1–15). Springer.Google Scholar
  6. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.MathSciNetzbMATHGoogle Scholar
  7. Fischer, A., & Igel, C. (2010). Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines. In Artificial Neural Networks—ICANN 2010 (pp. 208–217). Berlin: Springer.Google Scholar
  8. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800.CrossRefzbMATHGoogle Scholar
  9. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Larochelle, H., et al. (2012). Learning algorithms for the classification restricted Boltzmann machine. Journal of Machine Learning Research, 13, 643–669.MathSciNetzbMATHGoogle Scholar
  11. Marlin, B. M., Swersky, K., Chen, B., & de Freitas, N. (2010). Inductive principles for restricted Boltzmann machine learning. In Proceedings of the international conference on artificial intelligence and statistics (pp. 305–306).Google Scholar
  12. Mohamed, A., & Hinton, G. E. (2010). Phone recognition using restricted Boltzmann machines. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 4354–4357).Google Scholar
  13. Mohamed, A., Dahl, G. E., & Hinton, G. (2011). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20, 14–22.CrossRefGoogle Scholar
  14. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In ICML.Google Scholar
  15. Ping, W., & Liu, Q. (2016). AT Ihler. In NIPS: Learning infinite RBMs with Frank–Wolfe.Google Scholar
  16. Salakhutdinov, R., & Hinton G. E. (2009). Deep Boltzmann machines. In AISTATS.Google Scholar
  17. Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML) (pp. 872–879).Google Scholar
  18. Salakhutdinov, R., Mnih, A., & Hinton, G. E. (2007). Restricted Boltzmann machines for collaborative filtering. In Z. Ghahramani (Edi.), Proceedings of the 24th International Conference on Machine Learning (ICML) (pp. 791–798). ACM.Google Scholar
  19. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In Parallel distributed processing: Explorations in the microstructure of cognition, foundations (Vol. 1, pp. 194–281).Google Scholar
  20. Sohldickstein, J., Battaglino, P., & Deweese, M. R. (2009). Minimum probability flow learning. In ICML.Google Scholar
  21. Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetzbMATHGoogle Scholar
  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from over fitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetzbMATHGoogle Scholar
  23. Taylor, G. W., Hinton, G. E., & Roweis, S. T. (2007). Modeling human motion using binary latent variables. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems (NIPS 19) (pp. 1345–1352). Cambridge: MIT Press.Google Scholar
  24. Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine learning(ICML) (pp. 1064–1071).Google Scholar
  25. Tomczak, J. M., & Gonczarek, A. (2015). Sparse hidden units activation in restricted Boltzmann machine. In Progress in systems engineering (pp. 181–185). Springer International Publishing.Google Scholar
  26. Tran, T., Phung, D., & Venkatesh, S. (2014). Mixed-variate restricted Boltzmann machines. Eprint arxiv:1408.1160v1.
  27. Welling, M., Rosen-Zvi, M., & Hinton G. (2005). Exponential family harmoniums with an application to information retrieval. In NIPS.Google Scholar
  28. Welling, M., Zemel, R. S., & Hinton, G. E. (2002). Self supervised boosting. In NIPS.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.College of Electronic ScienceNational University of Defense TechnologyChangshaChina

Personalised recommendations