Abstract
Limitations of capabilities of shallow networks to efficiently compute real-valued functions on finite domains are investigated. Efficiency is studied in terms of network sparsity and its approximate measures. It is shown that when a dictionary of computational units is not sufficiently large, computation of almost any uniformly randomly chosen function either represents a well-conditioned task performed by a large network or an ill-conditioned task performed by a network of a moderate size. The probabilistic results are complemented by a concrete example of a class of functions which cannot be efficiently computed by shallow perceptron networks. The class is constructed using pseudo-noise sequences which have many features of random sequences but can be generated using special polynomials. Connections to the No Free Lunch Theorem and the central paradox of coding theory are discussed.
Similar content being viewed by others
References
Ba LJ, Caruana R (2014) Do deep networks really need to be deep? In: Ghahrani Z (ed) Advances in neural information processing systems, vol 27. MIT Press, Cambridge, pp 1–9
Ball K (1997) An elementary introduction to modern convex geometry. In: Levy S (ed) Flavors of geometry. Cambridge University Press, Cambridge, pp 1–58
Barron AR (1992) Neural net approximation. In: Narendra KS (ed) Proceedings of 7th Yale workshop on adaptive and learning systems. Yale University Press, New Haven, pp 69–72
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. In: Bottou L, Chapelle O, DeCoste D, Weston J (eds) Large-scale kernel machines. MIT Press, Cambridge
Bengio Y, Delalleau O, Roux NL (2006) The curse of highly variable functions for local kernel machines. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 107–114
Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25:1553–1565
Candès EJ (2008) The restricted isometric property and its implications for compressed sensing. C R Acad Sci Paris I 346:589–592
Coffey JT, Goodman RM (1990) Any code of which we cannot think is good. IEEE Trans Inf Theor 36:1453–1461
Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applictions in pattern recognition. IEEE Trans Electron Comput 14:326–334
DeVore RA, Howard R, Micchelli C (1989) Optimal nonlinear approximation. Manuscr Math 63:469–478
Donoho D (2006) For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59:797–829
Donoho DL, Tsaig Y (2008) Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54:4789–4812
Fine TL (1999) Feedforward neural network methodology. Springer, Berlin
Gnecco G, Sanguineti M (2009) The weight-decay technique in learning from data: an optimization point of view. Comput Manag Sci 6:53–79
Gribonval R, Nielsen M (2003) Sparse representations in unions of bases. IEEE Trans Inf Theory 49:3320–3325
Ito Y (1992) Finite mapping by neural networks and truth functions. Math Sci 17:69–77
Kainen PC, Kůrková V, Vogt A (1999) Approximation by neural networks is not continuous. Neurocomputing 29:47–56
Kainen PC, Kůrková V, Vogt A (2000) Geometry and topology of continuous best and near best approximations. J Approx Theory 105:252–262
Kainen PC, Kůrková V, Vogt A (2001) Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann Oper Res 101:143–147
Kainen PC, Kůrková V, Sanguineti M (2012) Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans Inf Theory 58:1203–1214
Kůrková V (1997) Dimension-independent rates of approximation by neural networks. In: Warwick K, Kárný M (eds) Computer-intensive methods in control and signal processing. Birkhäuser, Boston, pp 261–270 The Curse of Dimensionality
Kůrková V (2012) Complexity estimates based on integral transforms induced by computational units. Neural Netw 33:160–167
Kůrková V (2017) Sparsity of shallow networks representing finite mappings. In: Boracchi G (ed) Engineering applications of neural networks, vol CCIS 744. Springer, Berlin, pp 337–348
Kůrková V (2018) Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput Appl 29:305–315
Kůrková V, Sanguineti M (2008) Approximate minimization of the regularized expected error over kernel models. Math Oper Res 33:747–756
Kůrková V, Kainen PC (2014) Comparing fixed and variable-width Gaussian networks. Neural Netw 57:23–28
Kůrková V, Sanguineti M (2016) Model complexities of shallow networks representing highly varying functions. Neurocomputing 171:598–604
Kůrková V, Savický P, Hlaváčková K (1998) Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Networks 11:651–659
Kůrková V, Sanguineti M (2017) Probabilistic lower bounds for approximation by shallow perceptron network. Neural Netw 91:34–41
Laughlin SB, Sejnowski TJ (2003) Communication in neural networks. Science 301:1870–1874
MacWilliams F, Sloane NA (1977) The theory of error-correcting codes. North Holland Publishing Co., New York
Mhaskar H, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. CBMM Memo No. 045, May 31, 2016. https://arxiv.org/pdf/1603.00988v4.pdf. Accessed 29 May 2016
Mhaskar H, Liao Q, Poggio T (2016) Learning real and Boolean functions: when is deep better than shallow. CBMM Memo No. 45, March 4, 2016. https://arxiv.org/pdf/1603.00988v1.pdf. Accessed 3 Mar 2016
Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195
Poggio T, Mhaskar H, Rosasco L, Miranda B, Liao Q (2017) Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int J Autom Comput. https://doi.org/10.1007/s11633-017-1054-2
Roychowdhury V, Siu KY, Orlitsky A (1994) Neural models and spectral methods. In: Roychowdhury V, Siu K, Orlitsky A (eds) Theoretical advances in neural computation and learning. Springer, New York, pp 3–36
Schläfli L (1901) Theorie der Vielfachen Kontinuität. Zürcher & Furrer, Zürich
Schroeder M (2009) Number theory in science and communication. Springer, Berlin
Tillmann A (2015) On the computational intractability of exact and approximate dictionary learning. IEEE Signal Process Lett 22:45–49
Vaiter S, Peyre G, Dossal C, Fadili J (2013) Robust sparse analysis regularization. IEEE Trans Inf Theory 59:2001–2016
Acknowledgements
This work was partially supported by the Czech Grant Foundation Grant GA15-18108S and institutional support of the Institute of Computer Science RVO 67985807.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that she has no conflict of interest.
Additional information
This work was partially supported by the Czech Grant Foundation Grants GA15-18108S, GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.
Rights and permissions
About this article
Cite this article
Kůrková, V. Limitations of shallow networks representing finite mappings. Neural Comput & Applic 31, 1783–1792 (2019). https://doi.org/10.1007/s00521-018-3680-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3680-1