Skip to main content

Convergence of a modified gradient-based learning algorithm with penalty for single-hidden-layer feed-forward networks


Based on a novel algorithm, known as the upper-layer-solution-aware (USA), a new algorithm, in which the penalty method is introduced into the empirical risk, is studied for training feed-forward neural networks in this paper, named as USA with penalty. Both theoretical analysis and numerical results show that it can control the magnitude of weights of the networks. Moreover, the deterministic theoretical analysis of the new algorithm is proved. The monotonicity of the empirical risk with penalty term is guaranteed in the training procedure. The weak and strong convergence results indicate that the gradient of the total error function with respect to weights tends to zero, and the weight sequence goes to a fixed point when the iterations approach positive infinity. Numerical experiment has been implemented and effectively verifies the proved theoretical results.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Bishop CM (1993) Neural networks for pattern recognition. MIT Press, Cambridge

    MATH  Google Scholar 

  2. 2.

    Wan EA (1990) Neural network classification: a Bayesian interpretation. IEEE Trans Neural Netw 1(4):303–305

    MathSciNet  Article  Google Scholar 

  3. 3.

    Eberhart RC, Shi Y (2007) Neural network concepts and paradigms. In: Computational intelligence. Elsevier, New York, pp 145–196.

  4. 4.

    Zhang K, Ma XP, Li YL, Wu HY, Cui CY, Zhang XM, Zhang H, Yao J (2018) Parameter prediction of hydraulic fracture for tight reservoir based on micro-seismic and history matching. Fractals 26(2):1–17

    Google Scholar 

  5. 5.

    Huang G Bin, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  6. 6.

    Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257

    MathSciNet  Article  Google Scholar 

  7. 7.

    Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6(6):861–867

    Article  Google Scholar 

  8. 8.

    Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257

    Article  Google Scholar 

  9. 9.

    Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Dissertation, Harvard University

  10. 10.

    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. MIT Press, Cambridge

    Book  Google Scholar 

  11. 11.

    Goodband JH, Haas OC, Mills JA (2008) A comparison of neural network approaches for on-line prediction in IGRT. Med Phys 35(3):1113–1122

    Article  Google Scholar 

  12. 12.

    Huang G Bin, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: IEEE International joint conference on neural networks, pp 985–990

  13. 13.

    Huang GB, Siew CK (2005) Extreme learning machine with randomly assigned RBF kernels. Int J Inf Technol 11(1):16–24

    Google Scholar 

  14. 14.

    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  15. 15.

    Li MB, Huang GB, Saratchandran P, Sundararajan N (2005) Fully complex extreme learning machine. Neurocomputing 68(1):306–314

    Article  Google Scholar 

  16. 16.

    Huang G Bin, Siew CK (2004) Extreme learning machine: RBF network case. In: Control, automation, robotics and vision conference, pp 1029–1036

  17. 17.

    Zhu WT, Miao J, Qing L (2014) Constrained extreme learning machine: a novel highly discriminative random feedforward neural network. In: International joint conference on neural networks. IEEE, pp 800–807

  18. 18.

    Zhu QY, Qin AK, Suganthan PN, Huang GB (2005) Evolutionary extreme learning machine. Pattern Recognit 38(10):1759–1763

    Article  Google Scholar 

  19. 19.

    Ding S, Zhao H, Zhang Y, Xu X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115

    Article  Google Scholar 

  20. 20.

    Yu D, Deng L (2012) Efficient and effective algorithms for training single-hidden-layer neural networks. Pattern Recognit Lett 33(5):554–558

    MathSciNet  Article  Google Scholar 

  21. 21.

    Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468

    Article  Google Scholar 

  22. 22.

    Cao J, Lin Z, Huang GB (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36(3):285–305

    Article  Google Scholar 

  23. 23.

    Huynh HT, Won Y, Kim JJ (2008) An improvement of extreme learning machine for compact single-hidden-layer feedforward neural networks. Int J Neural Syst 18(5):433–441

    Article  Google Scholar 

  24. 24.

    Han F, Yao HF, Ling QH (2013) An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 116:87–93

    Article  Google Scholar 

  25. 25.

    Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech, pp 223–227

  26. 26.

    Wang Y, Li D, Yi D, Zhisong P (2015) Anomaly detection in traffic using L1-norm minimization extreme learning machine. Neurocomputing 149:415–425

    Article  Google Scholar 

  27. 27.

    Hinton GE (1989) Connectionist learning procedures. Artif Intell 40(13):185–234

    Article  Google Scholar 

  28. 28.

    Reed R (1993) Pruning algorithm—a survey. IEEE Trans Neural Netw 4(5):740–747

    Article  Google Scholar 

  29. 29.

    Ishikawa M (1996) Structural learning with forgetting. Neural Netw 9(3):509–521

    Article  Google Scholar 

  30. 30.

    Setiono R (1997) A penalty-function approach for pruning feedforward neural networks. Neural Comput 9(1):185–204

    Article  Google Scholar 

  31. 31.

    Haykin S (1994) Neural networks: a comprehensive foundation. Macmillan, New York

    MATH  Google Scholar 

  32. 32.

    Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Hoerl AE (1962) Application of ridge analysis to regression problems. Chem Eng Prog 58:54–59

    Google Scholar 

  34. 34.

    Tychonoff AN (1963) Solution of incorrectly formulated problems and the regularization method. Sov Math 4:1035–1038

    MATH  Google Scholar 

  35. 35.

    Takase H, Kita H, Hayashi T (2003) Effect of regularization term upon fault tolerant training. In: International joint conference on neural networks, pp 1048–1053

  36. 36.

    Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1):80–86

    Article  Google Scholar 

  37. 37.

    Sun W, Yuan YX (2001) Optimization theory and methods. Science Press, Beijing

    Google Scholar 

  38. 38.

    Lichman M (2013) UCI machine learning repository. Accessed 12 Mar 2017

Download references


This work was supported in part by the National Natural Science Foundation of China (No. 61305075, 11604181), the Natural Science Foundation of Shandong Province (No. ZR2015AL014, ZR201709220208) and the Fundamental Research Funds for the Central Universities (No. 15CX08011A, 18CX02036A, 16CX02012A).

Author information



Corresponding author

Correspondence to Zhaoyang Sang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Zhang, B., Sang, Z. et al. Convergence of a modified gradient-based learning algorithm with penalty for single-hidden-layer feed-forward networks. Neural Comput & Applic 32, 2445–2456 (2020).

Download citation


  • Neural networks
  • Penalty
  • Gradient
  • Monotonicity
  • Convergence