Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

Solodov, M.V.

doi:10.1023/A:1018366000512

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

Published: October 1998

Volume 11, pages 23–35, (1998)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

M.V. Solodov¹

463 Accesses
54 Citations
3 Altmetric
Explore all metrics

Abstract

We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain ε-approximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly well-suited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D.P. Bertsekas, “A new class of incremental gradient methods for least squares problems,” SIAM J. on Optimization, vol. 7, pp. 913-926, 1997.
Article Google Scholar
D.P. Bertsekas, Nonlinear Programming, Athena Scientific: Belmont, MA, 1995.
Google Scholar
D.P. Bertsekas, “Incremental least squares methods and the extended Kalman filter,” SIAM Journal on Optimization, vol. 6, pp. 807-822, 1996.
Google Scholar
D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific: Belmont, MA, 1996.
Google Scholar
A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing, John Wiley & Sons: New York, 1994.
Google Scholar
A.A. Gaivoronski, “Convergence properties of backpropagation for neural networks via theory of stochastic gradient methods. Part I,” Optimization Methods and Software, vol. 4, pp. 117-134, 1994.
Google Scholar
T. Khanna, Foundations of Neural Networks, Addison-Wesley: NJ, 1989.
Google Scholar
K. Lang, A.Waibel, and G. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, pp. 23-43, 1990.
Article Google Scholar
Z.-Q. Luo, “On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks,” Neural Computation, vol. 3, pp. 226-245, 1991.
Google Scholar
Z.-Q. Luo and P. Tseng, “Analysis of an approximate gradient projection method with applications to the backpropagation algorithm,” Optimization Methods and Software, vol. 4, pp. 85-101, 1994.
Google Scholar
O.L. Mangasarian, “Mathematical programming in neural networks,” ORSA Journal on Computing, vol. 5,no. 4, pp. 349-360, 1993.
Google Scholar
O.L. Mangasarian and M.V. Solodov, “Backpropagation convergence via deterministic nonmonotone perturbed minimization,” in Advances in Neural Information Processing Systems 6, G. Tesauro, J.D. Cowan, and J. Alspector (Eds.), Morgan Kaufmann: San Francisco, CA, 1994, pp. 383-390.
Google Scholar
O.L. Mangasarian and M.V. Solodov, “Serial and parallel backpropagation convergence via nonmonotone perturbed minimization,” Optimization Methods and Software, vol. 4, pp. 103-116, 1994.
Google Scholar
E. Polak, Computational Methods in Optimization: A Unified Approach, Academic Press: New York, 1971.
Google Scholar
B.T. Polyak, Introduction to Optimization, Optimization Software, Inc. Publications Division: New York, 1987.
Google Scholar
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland (Eds.), MIT Press: Cambridge, MA, 1986, pp. 318-362.
Google Scholar
S. Shah, F. Palmieri, and M. Datum, “Optimal filtering algorithms for fast learning in feedforward neural networks,” Neural Networks, vol. 5, pp. 779-787, 1992.
Google Scholar
M.V. Solodov, “Convergence analysis of perturbed feasible descent methods,” Journal of Optimization Theory and Applications, vol. 93, no.2, pp. 337-353, May 1997.
Article Google Scholar
M.V. Solodov and S.K. Zavriev, “Error-stabilty properties of generalized gradient-type algorithms,” Technical Report Mathematical Programming 94-05, Computer Science Department, University of Wisconsin, 1210 West Dayton Street, Madison, Wisconsin 53706, USA, June 1994. Journal of Optimization Theory and Applications, vol. 98, no.3, September 1998.
Google Scholar
W.N. Street and O.L. Mangasarian, “Improved generalization via tolerant training,” Technical Report 95-11, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706, USA, July 1995. Journal of Optimization Theory and Applications, vol. 96, pp. 259-279, 1998.
Google Scholar
P. Tseng, “Incremental gradient(-projection) method with momentum term and adaptive stepsize rule,” SIAM J. on Optimization, vol. 8, pp. 506-531, 1998.
Article Google Scholar
H. White, “Some asymptotic results for learning in single hidden-layer feedforward network models,” Journal of the American Statistical Association, vol. 84, no.408, pp. 1003-1013, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Matemática Pura e Aplicada, Estrada Dona Castorina 110, Jardim Botânico, Rio de Janeiro, RJ, 22460-320, Brazil
M.V. Solodov

Authors

M.V. Solodov
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solodov, M. Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero. Computational Optimization and Applications 11, 23–35 (1998). https://doi.org/10.1023/A:1018366000512

Download citation

Issue Date: October 1998
DOI: https://doi.org/10.1023/A:1018366000512

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

Abstract

Access this article

Similar content being viewed by others

One-step truncated gradient descents

A conjugate gradient method with sufficient descent property

New stepsizes for the gradient method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

Abstract

Access this article

Similar content being viewed by others

One-step truncated gradient descents

A conjugate gradient method with sufficient descent property

New stepsizes for the gradient method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation