Neural Computing and Applications

, Volume 26, Issue 2, pp 383–390 | Cite as

Batch gradient training method with smoothing \(\boldsymbol{\ell}_{\bf 0}\) regularization for feedforward neural networks

  • Huisheng ZhangEmail author
  • Yanli Tang
  • Xiaodong Liu
Original Article


This paper considers the batch gradient method with the smoothing \(\ell _0\) regularization (BGSL0) for training and pruning feedforward neural networks. We show why BGSL0 can produce sparse weights, which are crucial for pruning networks. We prove both the weak convergence and strong convergence of BGSL0 under mild conditions. The decreasing monotonicity of the error functions during the training process is also obtained. Two examples are given to substantiate the theoretical analysis and to show the better sparsity of BGSL0 than three typical \(\ell _p\) regularization methods.


Feedforward neural networks Gradient method \(\ell _0\) Regularization Sparsity Convergence 



We are grateful to the reviewers for their insightful comments. This research is supported by the National Natural Science Foundation of China (No. 61101228) and the China Postdoctoral Science Foundation (No. 2012M520623)


  1. 1.
    Hornik K (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefGoogle Scholar
  2. 2.
    Rubio JD, Angelov P, Pacheco J (2011) Uniformly stable backpropagation algorithm to train a feedforward neural network. IEEE Trans Neural Netw 22(3):356–366CrossRefGoogle Scholar
  3. 3.
    Sum J, Leung CS, Ho K (2012) Convergence analyses on on-line weight noise injection-based training algorithms for MLPs. IEEE Trans Neural Netw Learn Syst 23(11):1827–1840CrossRefGoogle Scholar
  4. 4.
    Bordignon F, Gomide F (2014) Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127:13–20CrossRefGoogle Scholar
  5. 5.
    Pratama M, Anavatti SG, Angelov PP, Lughofer E (2014) PANFIS: a novel incremental learning machine. IEEE Trans Neural Netw Learn Syst 25(1):55–68CrossRefGoogle Scholar
  6. 6.
    Rubio JJ (2014) Analytic neural network model of a wind turbine. Soft Comput. doi: 10.1007/s00500-014-1290-0 Google Scholar
  7. 7.
    Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC–19(6):716–723CrossRefMathSciNetGoogle Scholar
  8. 8.
    Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464CrossRefzbMATHGoogle Scholar
  9. 9.
    Stathakis D (2009) How many hidden layers and nodes? Int J Remote Sens 30(8):2133–2147CrossRefGoogle Scholar
  10. 10.
    Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258CrossRefGoogle Scholar
  11. 11.
    Karayiannis NB, Glenn WM (1997) Growing radial basis neural networks: merging supervised and unsupervised learning with network growth techniques. IEEE Trans Neural Netw 8(6):1492–1506CrossRefGoogle Scholar
  12. 12.
    Huang GB, Paramasivan S, Narasimhan S (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67CrossRefGoogle Scholar
  13. 13.
    Reed R (1993) Pruning algorithms: a survey. IEEE Trans Neural Netw 4:740–747CrossRefGoogle Scholar
  14. 14.
    Loone S, Irwin G (2001) Improving neural network training solutions using regularisation. Neurocomputing 37:71–90CrossRefzbMATHGoogle Scholar
  15. 15.
    Setiono R (1997) A penalty-function approach for pruning feedforward neural networks. Neural Comput 9:185–204CrossRefzbMATHGoogle Scholar
  16. 16.
    Shao HM, Xu DP, Zheng GF, Liu LJ (2012) Convergence of an online gradient method with inner-product penalty and adaptive momentum. Neurocomputing 77:243–252CrossRefGoogle Scholar
  17. 17.
    Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1:239–242CrossRefGoogle Scholar
  18. 18.
    Lughofer E (2011) Evolving fuzzy systems—methodologies, advanced concepts and applications. Springer, BerlinCrossRefzbMATHGoogle Scholar
  19. 19.
    Rubio JJ (2014) Evolving intelligent algorithms for the modelling of brain and eye signals. Appl Soft Comput 14(B):259–268CrossRefGoogle Scholar
  20. 20.
    Ordonez FJ, Iglesias JA, Toledo DP, Ledezma A, Sanchis A (2013) Online activity recognition using evolving classifiers. Expert Syst Appl 40:1248–1255CrossRefGoogle Scholar
  21. 21.
    Saito K, Nakano S (2000) Second-order learning algorithm with squared penalty term. Neural Comput 12:709–729CrossRefGoogle Scholar
  22. 22.
    Zhang HS, Wu W, Liu F, Yao MC (2009) Boundedness and convergence of online gadient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20(6):1050–1054CrossRefGoogle Scholar
  23. 23.
    Zhang HS, Wu W, Yao MC (2012) Boundedness and convergence of batch back-propagation algorithm with penalty for feedforward neural networks. Neurocomputing 89:141–146CrossRefGoogle Scholar
  24. 24.
    Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765–770CrossRefGoogle Scholar
  25. 25.
    Yu X, Chen QF (2012) Convergence of gradient method with penalty for Ridge Polynomial neural network. Neurocomputing 97:405–409CrossRefGoogle Scholar
  26. 26.
    Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc Ser B Methodol 58:267–288zbMATHMathSciNetGoogle Scholar
  27. 27.
    Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing \(L_{1/2}\) regularization for training of feedforward neural networks. Neural Netw 50:72–78CrossRefzbMATHGoogle Scholar
  28. 28.
    Fan QW, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(L_{1/2}\) regularization penalty. Neurocomputing 131:208–216CrossRefGoogle Scholar
  29. 29.
    Liu Y, Wu W, Fan QW, Yang DK, Wang J (2014) A modified gradient learning algorithm with smoothing \(L_{1/2}\) regularization for Takagi–Sugeno fuzzy models. Neurocomputing 138(2014):229–237CrossRefGoogle Scholar
  30. 30.
    Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inform Theory 51(12):4203–4215CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Wang YF, Liu P, Li ZH, Sun T, Yang CC, Zheng QS (2013) Data regularization using Gaussian beams decomposition and sparse norms. J Inverse Ill-Posed Probl 21(1):1–23CrossRefMathSciNetGoogle Scholar
  32. 32.
    Liu Y, Yang J, Li L, Wu W (2012) Negative effects of sufficiently small initialweights on back-propagation neural networks. J Zhejiang Univ-Sci C (Comput Electron) 13(8):585–592CrossRefGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2014

Authors and Affiliations

  1. 1.Department of MathematicsDalian Maritime UniversityDalianPeople’s Republic of China
  2. 2.Research Center of Information and ControlDalian University of TechnologyDalianPeople’s Republic of China

Personalised recommendations