Deterministic Convergence of an Online Gradient Method with Momentum

  • Naimin Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4113)


An online gradient method with momentum for feedforward neural network is considered. The learning rate is set to be a constant and the momentum coefficient an adaptive variable. Both the weak and strong convergence results are proved, as well as the convergence rates for the error function and for the weight.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bhaya, A., Kaszkurewicz, E.: Steepest Descent with Momentum for Quadratic Functions is a Version of the Conjugate Gradient Method. Neural Networks 17, 65–71 (2004)zbMATHCrossRefGoogle Scholar
  2. 2.
    Ellacott, S.W.: The Numerical Approach of Neural Networks. In: Taylor, J.G. (ed.) Mathematical Approaches to Neural Networks, pp. 103–138. North-Holland, Amsterdam (1993)Google Scholar
  3. 3.
    Gaivjoronski, A.A.: Convergence Properties of Backpropagation for Neural Nets via Theory of Stochastic Gradient Methods, Part I. Optimization Methods and Software 4, 117–134 (1994)CrossRefGoogle Scholar
  4. 4.
    Hassoun, M.H.: Foundation of Artificial Neural Networks. MIT Press, Cambridge (1995)Google Scholar
  5. 5.
    Li, Z.X., Wu, W., Tian, Y.L.: Convergence of an Online Gradient Method for Feedforward Neural Networks with Stochastic Inputs. J. Comput. Appl. Math. 163, 165–176 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Luo, Z., Tseng, P.: Analysis of an Approximate Gradient Projection Method with Applications to the Backpropagation Algorithm. Optimization Methods and Software 4, 85–101 (1994)CrossRefGoogle Scholar
  7. 7.
    Luo, Z.: On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks. Neural Comput 3, 226–245 (1991)CrossRefGoogle Scholar
  8. 8.
    Mangagasarian, O.L., Solodov, M.V.: Serial and Parallel Backpropagation Convergence via Nonmonotone Perturbed Minimization. Optimization Methods and Software 4, 103–116 (1994)CrossRefGoogle Scholar
  9. 9.
    Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, Cambrige (1986)Google Scholar
  10. 10.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning Representations by Back-propagating Errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  11. 11.
    Sollich, P., Barber, D.: Online Learning from Finite Training Sets and Robustness Tjo Input Bias. Neural Comput. 10, 2201–2217 (1998)CrossRefGoogle Scholar
  12. 12.
    Torii, M., Hagan, M.T.: Stability of Steepest Descent with Momentum for Quadratic Functions. IEEE Transactions on Neural Networks 13, 752–756 (2002)CrossRefGoogle Scholar
  13. 13.
    Wu, W., Xu, Y.S.: Deterministic Convergence of an Online Gradient Method for Neural Networks. J. Comput. Appl. Math. 144, 335–347 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Wu, W., Feng, G.R., Li, X.: Training Multilayer Perceptrons via Minimization of Ridge Functions. Advances in Computational Mathematics 17, 331–347 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Zhang, N.M., Wu, W., Zheng, G.F.: Convergence of Gradient Method with Momentum for Two-layer Feedforward Neural Networks. IEEE Transactions on Neural Networks 17, 522–525 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Naimin Zhang
    • 1
  1. 1.School of Mathematics and Information ScienceWenzhou UniversityWenzhouP.R. China

Personalised recommendations