Two-Point Step Size Gradient Method for Solving a Deep Learning Problem

Todorov, T. D.; Tsanev, G. S.

doi:10.1007/s10598-019-09468-5

Two-Point Step Size Gradient Method for Solving a Deep Learning Problem

Published: 01 November 2019

Volume 30, pages 427–438, (2019)
Cite this article

Computational Mathematics and Modeling Aims and scope Submit manuscript

T. D. Todorov¹ &
G. S. Tsanev²

81 Accesses
Explore all metrics

This paper is devoted to an analysis of the rate of deep belief learning by multilayer neural networks. In designing neural networks, many authors have applied the mean field approximation (MFA) to establish that the state of neurons in hidden layers is active. To study the convergence of the MFAs, we transform the original problem to a minimization one. The object of investigation is the Barzilai–Borwein method for solving the obtained optimization problem. The essence of the two-point step size gradient method is its variable steplength. The appropriate steplength depends on the objective functional. Original steplengths are obtained and compared with the classical steplength. Sufficient conditions for existence and uniqueness of the weak solution are established. A rigorous proof of the convergence theorem is presented. Various tests with different kinds of weight matrices are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

G. E. Hinton, A. Krizhevsky, N. Srivastava, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
H. K. Jabbar and R. Z. Khan, “Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study),” in: Computer Science, Communication & Instrumentation Devices, Editors: J. Stephen, H. Rohil, and S. Vasavi, (2015), pp. 163–172.
R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines,” Proc. Conf. Artif. Intel. Stat. (AISTATS 2009), 448–455 (2009).
L. Bottou, F. E. Curtis, and Jorge Nocedal, “Optimization methods for large-scale machine learning,” SIAM Rev., 60, 2, 223–311 (2018).
Article MathSciNet Google Scholar
R. Salakhutdinov, “Learning Deep Boltzmann Machines using adaptive MCMC,” Proc. 27th Int. Conf. Mach. Lear., Haifa, Israel, 943–950 (2010).
R. Salakhutdinov and H. Larochelle, “Efficient learning of Deep Boltzmann Machines,” J. Mach. Learn. Res., 9, 693–700 (2010).
Google Scholar
G. Hinton and R. Salakhutdinov, “An efficient learning procedure for deep Boltzmann machines,” Neural Comput.,24, 8, 1967–2006 (2012).
Article MathSciNet Google Scholar
K. Cho, T. Raiko, A. Ilin, and J. Karhunen, “A two-stage pretraining algorithm for Deep Boltzmann Machines,” Artif. Neural Netw. Mach. Learn. (ICANN), 8131, 106-113 (2013).
Google Scholar
K. Cho, T. Raiko, and A. Ilin, “Gaussian–Bernoulli Deep Boltzmann Machine,” IEEE Int. Joint Conf. Neural Netw., Dallas, Texas, USA, 1–7 (2013).
A. Dremeau, “Boltzmann machine and mean-field approximation for structured sparse decompositions,” IEEE Trans Signal Process., 60, 7, 3425–3438 (2012).
Article MathSciNet Google Scholar
N. Srivastava and R. Salakhutdinov, “Multimodal learning with Deep Boltzmann Machines,” J. Mach. Learn. Res., 15, 2949–2980 (2014).
MathSciNet MATH Google Scholar
J. Barzilai and J. M. Borwein, “Two-point step size gradient methods,” IMA J. Numer. Anal., 8, 1, 141–148 (1988).
Article MathSciNet Google Scholar
E. G. Birgin, J. M. Martínez, and M. Raydan, “Spectral projected gradient methods: review and perspectives,” J. Stat. Softw., 60, 3, 1–21 (2014).
Article Google Scholar
M. Raydan, “On the Barzilai and Borwein choice of steplength for the gradient method,” IMA J. Numer. Anal., 13, 3, 321–326 (1993).
Article MathSciNet Google Scholar
T. D. Todorov, “Nonlocal problem for a general second-order elliptic operator,” Comput. Math. Appl., 69, 5, 411–422 (2015).
Article MathSciNet Google Scholar
D. Wei, “Finite element approximations of solutions to p-harmonic equation with Dirichlet data,” Numert. Func. Anal. Optim., 10(11&12), 1235–1251 (1989).
Article MathSciNet Google Scholar
T. D. Todorov, “Dirichlet problem for a nonlocal p-Laplacian elliptic equation,” Comput. Math. Appl.,76, 6, 1261–1274 (2018).
Article MathSciNet Google Scholar
A. Zhang, J. Zhu, and B. Zhang, “Max-margin infinite hidden Markov models,” Proc. 31st Int. Conf. Mach. Learn. (PMLR), 32, 1, 315–323 (2014).
Google Scholar
G. S. Tsanev, “Deep multiconnected Boltzmann machine for classification,” Amer. J. Eng. Res., 6, 5, 186–194 (2017).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Informatics, Technical University, Gabrovo, Bulgaria
T. D. Todorov
Department of Computer Systems and Technology, Technical University, Gabrovo, Bulgaria
G. S. Tsanev

Authors

T. D. Todorov
View author publications
You can also search for this author in PubMed Google Scholar
G. S. Tsanev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. D. Todorov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Todorov, T.D., Tsanev, G.S. Two-Point Step Size Gradient Method for Solving a Deep Learning Problem. Comput Math Model 30, 427–438 (2019). https://doi.org/10.1007/s10598-019-09468-5

Download citation

Published: 01 November 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10598-019-09468-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-Point Step Size Gradient Method for Solving a Deep Learning Problem

Access this article

Similar content being viewed by others

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Bolstering stochastic gradient descent with model building

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-Point Step Size Gradient Method for Solving a Deep Learning Problem

Access this article

Similar content being viewed by others

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Bolstering stochastic gradient descent with model building

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation