Piecewise Polynomial Activation Functions for Feedforward Neural Networks

  • Ezequiel López-RubioEmail author
  • Francisco Ortega-Zamorano
  • Enrique Domínguez
  • José Muñoz-Pérez


Since the origins of artificial neural network research, many models of feedforward networks have been proposed. This paper presents an algorithm which adapts the shape of the activation function to the training data, so that it is learned along with the connection weights. The activation function is interpreted as a piecewise polynomial approximation to the distribution function of the argument of the activation function. An online learning procedure is given, and it is formally proved that it makes the training error decrease or stay the same except for extreme cases. Moreover, the model is computationally simpler than standard feedforward networks, so that it is suitable for implementation on FPGAs and microcontrollers. However, our present proposal is limited to two-layer, one-output-neuron architectures due to the lack of differentiability of the learned activation functions with respect to the node locations. Experimental results are provided, which show the performance of the proposal algorithm for classification and regression applications.


Activation functions Feedforward neural networks Supervised learning Regression Classification 



This work is partially supported by the Ministry of Economy and Competitiveness of Spain under Grants TIN2014-53465-R, project name Video surveillance by active search of anomalous events, and TIN2014-57341-R, project name Metaheuristics, holistic intelligence and smart mobility. It is also partially supported by the Autonomous Government of Andalusia (Spain) under project P12-TIC-657, project name Self-organizing systems and robust estimators for video surveillance. All of them include funds from the European Regional Development Fund (ERDF). The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of Málaga. They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs used for this research.


  1. 1.
    Agostinelli F, Hoffman M, Sadowski PJ, Baldi P (2014) Learning activation functions to improve deep neural networks. CoRR arXiv:1412.6830, URL
  2. 2.
    Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theor 39(3):930–945MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bartlett PL, Maiorov V, Meir R (1998) Almost linear VC-dimension bounds for piecewise polynomial networks. Neural Comput 10(8):2159–2173CrossRefGoogle Scholar
  4. 4.
    Campo ID, Finker R, Echanobe J, Basterretxea K (2013) Controlled accuracy approximation of sigmoid function for efficient FPGA-based implementation of artificial neurons. Electron Lett 49(25):1598–1600CrossRefGoogle Scholar
  5. 5.
    Castelli I, Trentin E (2012a) Semi-unsupervised weighted maximum-likelihood estimation of joint densities for the co-training of adaptive activation functions. In: Schwenker F, Trentin E (eds) Partially supervised learning: first IAPR TC3 workshop, PSL 2011, Ulm, 15–16 Sept 2011. Revised selected papers, Springer, Berlin, Heidelberg, pp 62–71Google Scholar
  6. 6.
    Castelli I, Trentin E (2012b) Supervised and unsupervised co-training of adaptive activation functions in neural nets. In: Schwenker F, Trentin E (eds) Partially supervised learning: first IAPR TC3 workshop, PSL 2011, Ulm, 15–16 Sept 2011. Revised selected papers, Springer, Berlin, Heidelberg, pp 52–61Google Scholar
  7. 7.
    Castelli I, Trentin E (2014) Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recognit Lett 37(Supplement C):178–191CrossRefGoogle Scholar
  8. 8.
    Chen CT, Chang WD (1996) A feedforward neural network with function shape autotuning. Neural Netw 9(4):627–641CrossRefGoogle Scholar
  9. 9.
    Costarelli D, Vinti G (2016) Max-product neural network and quasi-interpolation operators activated by sigmoidal functions. J Approx Theory 209:1–22MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Ertugrul ÖF (2018) A novel type of activation function in artificial neural networks: trained activation function. Neural Netw 99:148–157CrossRefGoogle Scholar
  12. 12.
    Fritsch FN, Carlson RE (1980) Monotone piecewise cubic interpolation. SIAM J Numer Anal 17:238–246MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS 2011)Google Scholar
  14. 14.
    Goodfellow IJ, Warde-Farley D, Mirza M, Courville AC, Bengio Y (2013) Maxout networks. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, 16–21 June 2013, pp 1319–1327Google Scholar
  15. 15.
    Gulcehre C, Cho K, Pascanu R, Bengio Y (2014) Learned-norm pooling for deep neural networks. Lect Notes Comput Sci 8724:530–546CrossRefGoogle Scholar
  16. 16.
    Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12CrossRefGoogle Scholar
  17. 17.
    Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefzbMATHGoogle Scholar
  18. 18.
    Huynh HT, Won Y (2009) Extreme learning machine with fuzzy activation function. In: 2009 Fifth international joint conference on INC, IMS and IDC.
  19. 19.
    Kang M, Palmer-Brown D (2005) An adaptive function neural network (ADFUNN) for phrase recognition. In: IEEE international joint conference on neural networks, 2005. IJCNN 2005, vol 1, pp 593–597Google Scholar
  20. 20.
    Kang M, Palmer-Brown D (2007) A multi-layer adaptive function neural network (MADFUNN) for letter image recognition. In: International joint conference on neural networks, 2007. IJCNN 2007, pp 2817–2822Google Scholar
  21. 21.
    Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: 30 th International conference on machine learning, vol 28Google Scholar
  22. 22.
    Microelectronics Center of North Carolina (2016) MCNC benchmarks. Accessed 15 Oct 2016
  23. 23.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814Google Scholar
  24. 24.
    Ortega-Zamorano F, Jerez J, Juarez G, Perez J, Franco L (2014) High precision fpga implementation of neural network activation functions. In: IEEE symposium on intelligent embedded systems (IES), 2014, pp 55–60.
  25. 25.
    Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRefzbMATHGoogle Scholar
  26. 26.
    Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. In: Anderson JA, Rosenfeld E (eds) Neurocomputing: foundations of research. MIT Press, Cambridge, pp 696–699Google Scholar
  27. 27.
    Sakurai A (1998) Tight bounds for the VC-dimension of piecewise polynomial networks. In: Advances in neural information processing systems, vol 11, pp 323–329Google Scholar
  28. 28.
    Springenberg J, Riedmiller M (2013) Improving deep neural networks with probabilistic maxout units, pp 1–10. arXiv:1312.6116
  29. 29.
    Sunat K, Lursinsap C, Chu CHH (2007) The p-recursive piecewise polynomial sigmoid generators and first-order algorithms for multilayer tanh-like neurons. Neural Comput Appl 16(1):33–47CrossRefGoogle Scholar
  30. 30.
    Trentin E (2001) Networks with trainable amplitude of activation functions. Neural Netw 14(4–5):471–493CrossRefGoogle Scholar
  31. 31.
    University of California Irvine (2016) Machine learning repository. Accessed 17 Oct 2016
  32. 32.
    Vecci L, Piazza F, Uncini A (1998) Learning and approximation capabilities of adaptive spline activation function neural networks. Neural Netw 11(2):259–270CrossRefGoogle Scholar
  33. 33.
    Wang GT, Li P, Cao JT (2012) Variable activation function extreme learning machine based on residual prediction compensation. Soft Comput 16(9):1477–1484. CrossRefGoogle Scholar
  34. 34.
    Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard UniversityGoogle Scholar
  35. 35.
    Zhang M, Fulcher J, Scofield RA (1997) Rainfall estimation using artificial neural network group. Neurocomputing 16(2):97–115CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Languages and Computer ScienceUniversity of MálagaMálagaSpain

Personalised recommendations