We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.
This is a preview of subscription content,to check access.
Access this article
Similar content being viewed by others
Grunwald, P., A Tutorial Introduction to the Minimum Description Length Principle, Advances in Minimum Description Length: Theory and Applications, Boston: MIT Press, 2005.
Bishop, C., Pattern Recognition and Machine Learning, New York: Springer, 2006.
Graves, A., Practical Variational Inference for Neural Networks, Adv. Neural Inform. Process. Syst., 2011, vol. 24, pp. 2348–2356.
Duvenaud, D., Maclaurin, D., and Adams, R., Early Stopping as Nonparametric Variational Inference, Artific. Intelligen. Statist., 2016, pp. 1070–1077.
Salakhutdinov, R. and Hinton, G., Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure, J. Machine Learning Res. Proc. Track., 2007, vol. 2, pp. 412–419.
Sutskever, I., Vinyals, O., and Le, Q., Sequence to Sequence Learning with Neural Networks, in Advances Neural Inform. Proc. Syst. 27: Annual Conf. Neural Inform. Proc. Syst., Montreal, December 8–13, 2014, pp. 3104–3112.
MacKay, D.J.C., Information Theory, Inference and Learning Algorithms, Cambridge: Cambridge Univ. Press, 2002.
Maclaurin, D., Duvenaud, D., and Adams, R., Gradient-based Hyperparameter Optimization through Reversible Learning, in Proc. 32 Int. Conf. Machine Learning (ICML-15), JMLRWorkshop Conf. Proc., 2015, pp. 2113–2122.
Hernández-Lobato, J.M. and Adams, R.P., Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks, in Proc. 32 Int. Conf. Machine Learning, 2015, pp. 1861–1869.
Kuznetsov, M.P., Tokmakova, A.A., and Strijov, V.V., Analytic and Stochastic Methods of Structure Parameter Estimation, Informatica, 2016, vol. 27, no. 3, pp. 607–624.
Shang, Y. and Wah, B., Global Optimization for Neural Network Training, Computer, 1996, vol. 29, no. 3, pp. 45–54.
Welling, M. and Teh, Y., Bayesian Learning via Stochastic Gradient Langevin Dynamics, in Proc. 28 Int. Conf. Machine Learning (ICML-11), Lise Getoor and Tobias Scheffer, Eds., New York: ACM, 2011, pp. 681–688.
Dembo, A., Cover, T., and Thomas, J., Information Theoretic Inequalities, IEEE Transact. Inform. Theory, 1991, vol. 37, no. 6, pp. 1501–1518.
Altieri, N. and Duvenaud, D., Variational Inference with Gradient Flows. https://doi.org/approximateinference.org/accepted/AltieriDuvenaud2015.pdf, accessed at 15.03.2017.
Sato, I. and Nakagawa, H., Approximation Analysis of Stochastic Gradient Langevin Dynamics by Using Fokker-Planck Equation and Ito Process, in Proc. 31 Int. Conf. Machine Learning (ICML-14), 2014, pp. 982–990.
Li, C., Chen C., Carlson, D., and Carin, L., Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks, in Proc. Thirtieth AAAI Conf. Artific. Intelligence, AAAI Press, 2016, pp. 1788–1794.
Lichman, M., UCI Machine Learning Repository. https://doi.org/archive.ics.uci.edu/ml, accessed at 15.03.2017.
LeCun, Y. and Cortes, C., MNIST Handwritten Digit Database, 2010. https://doi.org/yann.lecun.com/exdb/mnist/
Maclaurin, D. and Adams, R.P., Firefly Monte Carlo: Exact MCMC with Subsets of Data, in Proc. 24 Int. Conf. Artific. Intelligence, AAAI Press, 2015, pp. 4289–4295.
Computational Experiments Code. URL: svn.code.sf.net/p/mlalgorithms/code/Group074/Bakhteev2016Evidence, accessed at 15.03.2017.
Lee, J., Simchowitz,M., Jordan, M., and Recht, B., Gradient Descent Converges to Minimizers, Berkeley: Univ. of California, 2016, vol. 1050.
Original Russian Text © O.Yu. Bakhteev, V.V. Strijov, 2018, published in Avtomatika i Telemekhanika, 2018, No. 8, pp. 129–147.
About this article
Cite this article
Bakhteev, O.Y., Strijov, V.V. Deep Learning Model Selection of Suboptimal Complexity. Autom Remote Control 79, 1474–1488 (2018). https://doi.org/10.1134/S000511791808009X