Computational Properties of Cyclic and Almost-Cyclic Learning with Momentum for Feedforward Neural Networks

  • Jian Wang
  • Wei Wu
  • Jacek M. Zurada
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7367)


Two backpropagation algorithms with momentum for feedforward neural networks with a single hidden layer are considered. It is assumed that the training samples are supplied to the network in a cyclic or an almost-cyclic fashion in the learning procedure. A re-start strategy for the momentum is adopted such that the momentum coefficient is set to zero at the beginning of each training cycle. Corresponding weak and strong convergence results are presented, respectively. The convergence conditions on the learning rate, the momentum coefficient and the activation functions are much relaxed compared with those of the existing results. Numerical examples are implemented to support our theoretical results and demonstrate that ACMFNN does much better than CMFNN on both convergence speed and generalization ability.


Backpropagation momentum cyclic almost-cyclic convergence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Tsinghua University Press and Prentice Hall (2001)Google Scholar
  2. 2.
    Rumelhart, D.E., McClelland, J.L.: Parall Distributed Processing-Explorations in the Microstructure of Cognition. MIT Press, Cambridge (1986)Google Scholar
  3. 3.
    de Oliveira, E.A., Alamino, R.C.: Performance of the Bayesian Online Algorithm for the Perceptron. IEEE Trans. Neural Networ. 18, 902–905 (2007)CrossRefGoogle Scholar
  4. 4.
    Heskes, T., Wiegerinck, W.: A Theoretical Comparison of Batch-Mode, On-Line, Cyclic, and Almost-Cyclic Learning. IEEE T Neural Networ. 7, 919–925 (1996)CrossRefGoogle Scholar
  5. 5.
    Wilson, D.R., Martinez, T.R.: The general inefficiency of batch training for gradient descent learning. Neural Networks 16, 1429–1451 (2003)CrossRefGoogle Scholar
  6. 6.
    Terence, D.S.: Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2, 459–473 (1989)CrossRefGoogle Scholar
  7. 7.
    Finnoff, W.: Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. Neural Computation 6, 242–254 (1994)CrossRefGoogle Scholar
  8. 8.
    Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison Wesley, Redwood City (1991)Google Scholar
  9. 9.
    Becker, S., Le Cun, Y.: Improving the convergence of back-propagation learning with second-order methods. In: Proc. of the 1988 Conneciiontst Models Summer School, San Mateo, pp. 29–37 (1989)Google Scholar
  10. 10.
    Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural Network Design. PWS, Boston (1996)Google Scholar
  11. 11.
    Liang, Y.C., Feng, D.P., Lee, H.P.: Successive Approximation Training Algorithm for Feedforward Neural Networks. Neurocomputing 42, 11–322 (2002)Google Scholar
  12. 12.
    Chakraborty, D., Pal, N.R.: A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning. IEEE Trans. Neural Networ. 14, 1–14 (2003)CrossRefGoogle Scholar
  13. 13.
    Fine, T.L., Mukherjee, S.: Parameter Convergence and Learning Curves for Neural Networks. Neural Computation 11, 747–769 (1999)CrossRefGoogle Scholar
  14. 14.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  15. 15.
    Tadic, V., Stankovic, S.: Learning in neural networks by normalized stochastic gradient algorithm: Local convergence. In: Proceedings of the 5th Seminar Neural Networks Application Electronic Engineering, Yugoslavia (2000)Google Scholar
  16. 16.
    Wu, W., Shao, H.M., Qu, D.: Strong Convergence of Gradient Methods for BP Networks Training. In: Proc. Int. Conf. Neural Networks & Brains, pp. 332–334 (2005)Google Scholar
  17. 17.
    Wu, W., Feng, G.R., Li, X.: Training multilayer perceptrons via minimization of sum of ridge functions. Advances in Computational Mathematics 17, 331–347 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Bhaya, A., Kaszkurewicz, E.: Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Networks 17, 65–71 (2004)zbMATHCrossRefGoogle Scholar
  19. 19.
    Torii, M., Hagan, M.T.: Stability of steepest descent with momentum for quadratic functions. IEEE Trans. Neural Networks 13, 752–756 (2002)CrossRefGoogle Scholar
  20. 20.
    Wu, W., Zhang, N.M., Li, Z.X.: Convergence of gradient method with momentum for back-propagation neural networks. Journal of Computational Mathematics 26, 613–623 (2008)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Zhang, N.M., Wu, W., Zheng, G.F.: Gonvergence of gradient method with momentum for two-layer feedforward neural networks. IEEE Trans. Neural Networks 17, 522–525 (2006)CrossRefGoogle Scholar
  22. 22.
    Zhang, N.M.: Deterministic Convergence of an Online Gradient Method with Momentum. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS, vol. 4113, pp. 94–105. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Zhang, N.M.: An online gradient method with momentum for two-layer feedforward neural networks. Applied Mathematics and Computation 212, 488–498 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Wang, J., Yang, J., Wu, W.: Convergence of Cyclic and Almost-Cyclic Learning with Momentum for Feedforward Neural Networks. IEEE Trans. Neural Networks 22, 1297–1306 (2011)CrossRefGoogle Scholar
  25. 25.
    Powell, M.J.D.: Restart procedure for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Ren, Y.J.: Numerical analysis and the implementations based on Matlab. Higer Education Press, Beijing (2007)Google Scholar
  27. 27.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jian Wang
    • 1
    • 2
  • Wei Wu
    • 1
  • Jacek M. Zurada
    • 2
  1. 1.Dalian University of TechnologyDalianChina
  2. 2.University of LouisvilleLouisvilleU.S.A

Personalised recommendations