F. Anselmi, L. Rosasco, C. Tan, T. Poggio. Deep Convolutional Networks are Hierarchical Kernel Machines, Center for Brains, Minds and Machines (CBMM) Memo No. 035, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
T. Poggio, L. Rosasco, A. Shashua, N. Cohen, F. Anselmi. Notes on Hierarchical Splines, DCLNs and i-theory, Center for Brains, Minds and Machines (CBMM) Memo No. 037, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
T. Poggio, F. Anselmi, L. Rosasco. I-theory on Depth vs Width: Hierarchical Function Composition, Center for Brains, Minds and Machines (CBMM) Memo No. 041, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
H. Mhaskar, Q. L. Liao, T. Poggio. Learning Real and Boolean Functions: When is Deep Better than Shallow, Center for Brains, Minds and Machines (CBMM) Memo No. 045, The Center for Brains, Minds and Machines, USA, 2016.
Google Scholar
H. N. Mhaskar, T. Poggio. Deep Vs. Shallow Networks: An Approximation Theory Perspective, Center for Brains, Minds and Machines (CBMM) Memo No. 054, The Center for Brains, Minds and Machines, USA, 2016.
MATH
Google Scholar
D. L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality. Lecture–Math Challenges of Century, vol. 13, pp. 178–183, 2000.
Google Scholar
Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015.
Article
Google Scholar
K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
Article
MATH
Google Scholar
M. Riesenhuber, T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, vol. 2, no. 11, pp. 1019–1025, 1999.
Article
Google Scholar
H. N. Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, vol. 1, no. 1, pp. 61–80, 1993.
MathSciNet
Article
MATH
Google Scholar
C. K. Chui, X. Li, H. Mhaskar. Neural networks for localized approximation. Mathematics of Computation, vol. 63, no. 208, pp. 607–623, 1994.
MathSciNet
Article
MATH
Google Scholar
C. K. Chui, X. Li, H. N. Mhaskar. Limitations of the approximation capabilities of neural networks with one hidden layer. Advances in Computational Mathematics, vol.5, no. 1, pp. 233–243, 1996.
MathSciNet
Article
MATH
Google Scholar
A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, vol. 8, pp. 143–195, 1999.
MathSciNet
Article
MATH
Google Scholar
T. Poggio, S. Smale. The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, vol. 50, no. 5, pp. 537–544, 2003.
MathSciNet
MATH
Google Scholar
B. Moore, T. Poggio. Representation properties of multilayer feedforward networks. Neural Networks, vol.1, no.S1, pp. 203, 1998.
Google Scholar
R. Livni, S. Shalev-Shwartz, O. Shamir. A provably efficient algorithm for training deep networks. CoRR, abs/1304.7045, 2013.
Google Scholar
O. Delalleau, Y. Bengio. Shallow vs. deep sum-product networks. In Proceedings of Advances in Neural Information Processing Systems 24, NIPS, Granada, Spain, pp. 666–674, 2011.
Google Scholar
G. F. Montufar, R. Pascanu, K. Cho, Y. Bengio. On the number of linear regions of deep neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Denver, USA, pp. 2924–2932, 2014.
Google Scholar
H. N. Mhaskar. Neural networks for localized approximation of real functions. In Proceedings of IEEE-SPWorkshop on Neural Networks for Processing III, pp. 190–196, IEEE, Linthicum Heights, USA, 1993.
Google Scholar
N. Cohen, O. Sharir, A. Shashua. On the expressive power of deep learning: A tensor analysis. arXiv:1509.0500v1, 2015.
Google Scholar
F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, T. Poggio. Unsupervised Learning of Invariant Representations With Low Sample Complexity: The Magic of Sensory Cortex or A New Framework for Machine Learning? Center for Brains, Minds and Machines (CBMM) Memo No. 001, The Center for Brains, Minds and Machines, USA, 2014.
MATH
Google Scholar
F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, T. Poggio. Unsupervised learning of invariant representations. Theoretical Computer Science, vol. 633, pp. 112–121, 2016.
MathSciNet
Article
MATH
Google Scholar
T. Poggio, L. Rosaco, A. Shashua, N. Cohen, F. Anselmi. Notes on Hierarchical Splines, DCLNs and i-theory, Center for Brains, Minds and Machines (CBMM) Memo No. 037. The Center for Brains, Minds and Machines, 2015.
Google Scholar
Q. L. Liao, T. Poggio. Bridging the Gaps between Residual Learning, Recurrent Neural Networks and Visual Cortex, Center for Brains, Minds and Machines (CBMM) Memo No. 047, The Center for Brains, Minds and Machines, 2016.
Google Scholar
M. Telgarsky. Representation benefits of deep feedforward networks. arXiv:1509.08101v2, 2015.
Google Scholar
I. Safran, O. Shamir. Depth separation in ReLU networks for approximating smooth non-linear functions. arXiv:1610.09887v1, 2016.
Google Scholar
H. N. Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, vol. 8, no. 1, pp. 164–177, 1996.
Article
Google Scholar
E. Corominas, F. S. Balaguer. Conditions for an infinitely differentiable function to be a polynomial. Revista Matemática Hispanoamericana vol. 14, no. 1–2, pp. 26–43, 1954. (in Spanish)
MathSciNet
Google Scholar
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. L. Liao. Why and when can deep–but not shallow–networks avoid the curse of dimensionality: A review. arXiv:1611.00740v3, 2016.
Google Scholar
R. A. DeVore, R. Howard C. A. Micchelli. Optimal nonlinear approximation. Manuscripta Mathematica, vol. 63, no. 4, pp. 469–478, 1989.
MathSciNet
Article
MATH
Google Scholar
H. N. Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, vol. 20, no. 4, pp. 561–590, 2004.
MathSciNet
Article
MATH
Google Scholar
F. Bach. Breaking the curse of dimensionality with convex neural networks. arXiv:1412.8690, 2014.
MATH
Google Scholar
D. Kingma, J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
Google Scholar
J. Bergstra, Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, vol. 13, no. 1, pp. 281–305, 2012.
MathSciNet
MATH
Google Scholar
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016.
Google Scholar
R. Eldan, O. Shamir. The power of depth for feedforward neural networks. arXiv:1512.03965v4, 2016.
Google Scholar
H. W. Lin, M. Tegmark. Why does deep and cheap learning work so well? arXiv:1608.08225, 2016.
Google Scholar
J. T. Håstad. Computational Limitations for Small Depth Circuits, Cambridge, MA, USA: MIT Press, 1987.
Google Scholar
N. Linial, Y. Mansour, N. Nisan. Constant depth circuits, Fourier transform, and learnability. Journal of the ACM, vol. 40, no. 3, pp. 607–620, 1993.
MathSciNet
Article
MATH
Google Scholar
Y. Bengio, Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, J. Weston, Eds., Cambridge, MA, USA: MIT Press, 2007.
Google Scholar
Y. Mansour. Learning Boolean functions via the Fourier transform. Theoretical Advances in Neural Computation and Learning, V. Roychowdhury, K. Y. Siu, A. Orlitsky, Eds., pp. 391–424, US: Springer, 1994.
Chapter
Google Scholar
M. Anthony, P. Bartlett. Neural Network Learning: Theoretical Foundations, Cambridge, UK: Cambridge University Press, 2002.
MATH
Google Scholar
F. Anselmi, L. Rosasco, C. Tan, T. Poggio. Deep Convolutional Networks are Hierarchical Kernel Machines, Center for Brains, Minds and Machines (CBMM) Memo No. 035, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
B. M. Lake, R. Salakhutdinov, J. B. Tenenabum. Humanlevel concept learning through probabilistic program induction. Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
MathSciNet
Article
MATH
Google Scholar
A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, vol. 7, no. 1, pp. 117–139, 2016.
MathSciNet
MATH
Google Scholar
S. Soatto. Steps towards a theory of visual information: Active perception, signal-to-symbol conversion and the interplay between sensing and control. arXiv:1110.2053, 2011.
Google Scholar
T. A. Poggio, F. Anselmi. Visual Cortex and Deep Networks: Learning Invariant Representations, Cambridge, MA, UK: MIT Press, 2016.
Google Scholar
L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications, no. 31, no. 4, pp. 2029–2054, 2010.
MathSciNet
Article
MATH
Google Scholar
S. Shalev-Shwartz, S. Ben-David. Understanding Machine Learning: From Theory to Algorithms, Cambridge, UK: Cambridge University Press, 2014.
Book
MATH
Google Scholar
T. Poggio, W. Reichardt. On the representation of multiinput systems: Computational properties of polynomial algorithms. Biological Cybernetics, vol. 37, no. 3, 167–186, 1980.
MathSciNet
Article
MATH
Google Scholar
M. L. Minsky, S. A. Papert. Perceptrons: An Introduction to Computational Geometry, Cambridge MA, UK: The MIT Press, 1972.
MATH
Google Scholar