Skip to main content
Log in

Deep Learning Model Selection of Suboptimal Complexity

Automation and Remote Control Aims and scope Submit manuscript

Cite this article


We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others


  1. Grunwald, P., A Tutorial Introduction to the Minimum Description Length Principle, Advances in Minimum Description Length: Theory and Applications, Boston: MIT Press, 2005.

    Google Scholar 

  2. Bishop, C., Pattern Recognition and Machine Learning, New York: Springer, 2006.

    MATH  Google Scholar 

  3. Graves, A., Practical Variational Inference for Neural Networks, Adv. Neural Inform. Process. Syst., 2011, vol. 24, pp. 2348–2356.

    Google Scholar 

  4. Duvenaud, D., Maclaurin, D., and Adams, R., Early Stopping as Nonparametric Variational Inference, Artific. Intelligen. Statist., 2016, pp. 1070–1077.

    Google Scholar 

  5. Salakhutdinov, R. and Hinton, G., Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure, J. Machine Learning Res. Proc. Track., 2007, vol. 2, pp. 412–419.

    Google Scholar 

  6. Sutskever, I., Vinyals, O., and Le, Q., Sequence to Sequence Learning with Neural Networks, in Advances Neural Inform. Proc. Syst. 27: Annual Conf. Neural Inform. Proc. Syst., Montreal, December 8–13, 2014, pp. 3104–3112.

    Google Scholar 

  7. MacKay, D.J.C., Information Theory, Inference and Learning Algorithms, Cambridge: Cambridge Univ. Press, 2002.

    Google Scholar 

  8. Maclaurin, D., Duvenaud, D., and Adams, R., Gradient-based Hyperparameter Optimization through Reversible Learning, in Proc. 32 Int. Conf. Machine Learning (ICML-15), JMLRWorkshop Conf. Proc., 2015, pp. 2113–2122.

    Google Scholar 

  9. Hernández-Lobato, J.M. and Adams, R.P., Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks, in Proc. 32 Int. Conf. Machine Learning, 2015, pp. 1861–1869.

    Google Scholar 

  10. Kuznetsov, M.P., Tokmakova, A.A., and Strijov, V.V., Analytic and Stochastic Methods of Structure Parameter Estimation, Informatica, 2016, vol. 27, no. 3, pp. 607–624.

    Article  MATH  Google Scholar 

  11. Shang, Y. and Wah, B., Global Optimization for Neural Network Training, Computer, 1996, vol. 29, no. 3, pp. 45–54.

    Article  Google Scholar 

  12. Welling, M. and Teh, Y., Bayesian Learning via Stochastic Gradient Langevin Dynamics, in Proc. 28 Int. Conf. Machine Learning (ICML-11), Lise Getoor and Tobias Scheffer, Eds., New York: ACM, 2011, pp. 681–688.

    Google Scholar 

  13. Dembo, A., Cover, T., and Thomas, J., Information Theoretic Inequalities, IEEE Transact. Inform. Theory, 1991, vol. 37, no. 6, pp. 1501–1518.

    Article  MathSciNet  MATH  Google Scholar 

  14. Altieri, N. and Duvenaud, D., Variational Inference with Gradient Flows., accessed at 15.03.2017.

  15. Sato, I. and Nakagawa, H., Approximation Analysis of Stochastic Gradient Langevin Dynamics by Using Fokker-Planck Equation and Ito Process, in Proc. 31 Int. Conf. Machine Learning (ICML-14), 2014, pp. 982–990.

    Google Scholar 

  16. Li, C., Chen C., Carlson, D., and Carin, L., Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks, in Proc. Thirtieth AAAI Conf. Artific. Intelligence, AAAI Press, 2016, pp. 1788–1794.

    Google Scholar 

  17. Lichman, M., UCI Machine Learning Repository., accessed at 15.03.2017.

  18. LeCun, Y. and Cortes, C., MNIST Handwritten Digit Database, 2010.

    Google Scholar 

  19. Maclaurin, D. and Adams, R.P., Firefly Monte Carlo: Exact MCMC with Subsets of Data, in Proc. 24 Int. Conf. Artific. Intelligence, AAAI Press, 2015, pp. 4289–4295.

    Google Scholar 

  20. Computational Experiments Code. URL:, accessed at 15.03.2017.

  21. Lee, J., Simchowitz,M., Jordan, M., and Recht, B., Gradient Descent Converges to Minimizers, Berkeley: Univ. of California, 2016, vol. 1050.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to O. Yu. Bakhteev.

Additional information

Original Russian Text © O.Yu. Bakhteev, V.V. Strijov, 2018, published in Avtomatika i Telemekhanika, 2018, No. 8, pp. 129–147.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bakhteev, O.Y., Strijov, V.V. Deep Learning Model Selection of Suboptimal Complexity. Autom Remote Control 79, 1474–1488 (2018).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: