Advertisement

Neural Computing and Applications

, Volume 29, Issue 7, pp 305–315 | Cite as

Constructive lower bounds on model complexity of shallow perceptron networks

  • Věra Kůrková
EANN 2016

Abstract

Limitations of shallow (one-hidden-layer) perceptron networks are investigated with respect to computing multivariable functions on finite domains. Lower bounds are derived on growth of the number of network units or sizes of output weights in terms of variations of functions to be computed. A concrete construction is presented with a class of functions which cannot be computed by signum or Heaviside perceptron networks with considerably smaller numbers of units and smaller output weights than the sizes of the function’s domains. A subclass of these functions is described whose elements can be computed by two-hidden-layer perceptron networks with the number of units depending on logarithm of the size of the domain linearly.

Keywords

Shallow and deep networks Model complexity and sparsity Signum perceptron networks Finite mappings Variational norms Hadamard matrices 

Notes

Acknowledgments

This work was partially supported by the Czech Grant Agency grant GA15-18108S and institutional support of the Institute of Computer Science RVO 67985807.

Compliance with ethical standards

Conflict of interest

The author declares that she has no conflict of interests.

References

  1. 1.
    Ba LJ, Caruana R (2014) Do deep networks really need to be deep? In: Ghahrani Z et al (eds) Advances in neural information processing systems, vol 27, pp 1–9Google Scholar
  2. 2.
    Barron AR (1992) Neural net approximation. In: Narendra K (ed) Proceedings 7th Yale workshop on adaptive and learning systems, pp 69–72. Yale University PressGoogle Scholar
  3. 3.
    Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends in Machine Learning 2:1–127CrossRefzbMATHGoogle Scholar
  5. 5.
    Bengio Y, Delalleau O, Roux NL (2006) The curse of highly variable functions for local kernel machines. In: Advances in neural information processing systems 18, pp 107–114. MIT PressGoogle Scholar
  6. 6.
    Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. In: Bottou LO, Chapelle D, DeCoste, Weston J (eds) Large-Scale Kernel Machines. MIT PressGoogle Scholar
  7. 7.
    Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learning Syst 25(8):1553–1565CrossRefGoogle Scholar
  8. 8.
    Candès EJ (2008) The restricted isometric property and its implications for compressed sensing. C R Acad Sci Paris I 346:589–592CrossRefzbMATHGoogle Scholar
  9. 9.
    Candès EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Process 51:4203–4215MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14:326–334CrossRefzbMATHGoogle Scholar
  11. 11.
    Erdös P, Spencer JH (1974) Probabilistic methods in combinatorics. Academic PressGoogle Scholar
  12. 12.
    Fine TL (1999) Feedforward neural network methodology. Springer, Berlin HeidelbergzbMATHGoogle Scholar
  13. 13.
    Gnecco G, Sanguineti M (2011) On a variational norm tailored to variable-basis approximation schemes. IEEE Trans Inf Theory 57:549–558MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Ito Y (1992) Finite mapping by neural networks and truth functions. Mathematical Scientist 17:69–77MathSciNetzbMATHGoogle Scholar
  16. 16.
    Kainen PC, Kůrková V, Sanguineti M (2012) Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans Inf Theory 58:1203–1214MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Kainen PC, Kůrková V, Vogt A (1999) Approximation by neural networks is not continuous. Neurocomputing 29:47–56CrossRefGoogle Scholar
  18. 18.
    Kainen PC, Kůrková V, Vogt A (2000) Geometry and topology of continuous best and near best approximations. J Approx Theory 105:252–262MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kainen PC, Kůrková V, Vogt A (2007) A Sobolev-type upper bound for rates of approximation by linear combinations of Heaviside plane waves. J Approx Theory 147:1–10MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kecman V (2001) Learning and soft computing. MIT Press, CambridgezbMATHGoogle Scholar
  21. 21.
    Kůrková V (1997) Dimension-independent rates of approximation by neural networks. In: Warwick K, Kárný M (eds) Computer-intensive methods in control and signal processing. The curse of dimensionality, pp 261–270. Birkhäuser, BostonGoogle Scholar
  22. 22.
    Kůrková V (2008) Minimization of error functionals over perceptron networks. Neural Comput 20:250–270MathSciNetzbMATHGoogle Scholar
  23. 23.
    Kůrková V (2012) Complexity estimates based on integral transforms induced by computational units. Neural Netw 33:160–167CrossRefzbMATHGoogle Scholar
  24. 24.
    Kůrková V (2016) Lower bounds on complexity of shallow perceptron networks. In: Jayne C, Iliadis L (eds) Engineering applications of neural networks. Communications in computer and information sciences, vol 629, pp 283–294. SpringerGoogle Scholar
  25. 25.
    Kůrková V, Kainen PC (1994) Functionally equivalent feedforward neural networks. Neural Comput 6 (3):543–558CrossRefGoogle Scholar
  26. 26.
    Kůrková V, Kainen PC (1996) Singularities of finite scaling functions. Appl Math Lett 9(2):33–37MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Kůrková V, Kainen PC (2014) Comparing fixed and variable-width Gaussian kernel networks. Neural Netw 57:23–28CrossRefzbMATHGoogle Scholar
  28. 28.
    Kůrková V, Sanguineti M (2002) Comparison of worst-case errors in linear and neural network approximation. IEEE Trans Inf Theory 48:264–275MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Kůrková V, Sanguineti M (2008) Approximate minimization of the regularized expected error over kernel models. Math Oper Res 33:747–756MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Kůrková V, Sanguineti M (2016) Model complexities of shallow networks representing highly varying functions. Neurocomputing 171:598–604CrossRefGoogle Scholar
  31. 31.
    Kůrková V, Savický P, Hlaváčková K (1998) Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Netw 11:651–659CrossRefGoogle Scholar
  32. 32.
    LeCunn Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  33. 33.
    MacWilliams F, Sloane NJA (1977) The theory of error-correcting codes. North-Holland, AmsterdamGoogle Scholar
  34. 34.
    Maiorov V, Pinkus A (1999) Lower bounds for approximation by MLP neural networks. Neurocomputing 25:81–91CrossRefzbMATHGoogle Scholar
  35. 35.
    Maiorov VE, Meir R (2000) On the near optimality of the stochastic approximation of smooth functions by neural networks. Adv Comput Math 13:79–103MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Mhaskar HN, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. Center for brains, minds & machines CBMM Memo No. 045v3, pp 1–12Google Scholar
  37. 37.
    Mhaskar HN, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. Center for brains, minds & machines CBMM Memo No. 045v4, pp 1–12Google Scholar
  38. 38.
    Sloane NJA A library of Hadamard matrices. http://www.research.att.com/njas/hadamard/
  39. 39.
    Sussman HJ (1992) Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Netw 5(4):589–593CrossRefGoogle Scholar
  40. 40.
    Sylvester J (1867) Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated pavements in two or more colours, with applications to Newton’s rule, ornamental tile-work, and the theory of numbers. Phil Mag 34:461– 475CrossRefGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.Institute of Computer ScienceCzech Academy of SciencesPragueCzech Republic

Personalised recommendations