Skip to main content
Log in

The Unbearable Shallow Understanding of Deep Learning

  • Original Paper
  • Published:
Minds and Machines Aims and scope Submit manuscript

Abstract

This paper analyzes the rapid and unexpected rise of deep learning within Artificial Intelligence and its applications. It tackles the possible reasons for this remarkable success, providing candidate paths towards a satisfactory explanation of why it works so well, at least in some domains. A historical account is given for the ups and downs, which have characterized neural networks research and its evolution from “shallow” to “deep” learning architectures. A precise account of “success” is given, in order to sieve out aspects pertaining to marketing or sociology of research, and the remaining aspects seem to certify a genuine value of deep learning, calling for explanation. The alleged two main propelling factors for deep learning, namely computing hardware performance and neuroscience findings, are scrutinized, and evaluated as relevant but insufficient for a comprehensive explanation. We review various attempts that have been made to provide mathematical foundations able to justify the efficiency of deep learning, and we deem this is the most promising road to follow, even if the current achievements are too scattered and relevant for very limited classes of deep neural models. The authors’ take is that most of what can explain the very nature of why deep learning works at all and even very well across so many domains of application is still to be understood and further research, which addresses the theoretical foundation of artificial learning, is still very much needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We are grateful to an anonymous reviewer for pointing this out.

References

  • Aarts, E., & Korst, J. (1989). Simulated annealing and Boltzmann machines. New York: Wiley.

    MATH  Google Scholar 

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Technical report, Google Brain Team.

  • Alippi, C., & Vellasco, M. (1992). GALATEA neural VLSI architectures: Communication and control considerations. Microprocessing and Microprogramming, 35, 175–181.

    Google Scholar 

  • Ambrosio, L., Gigli, N., & Savaré, G. (2008). Gradient flows in metric spaces and in the space of probability measures. Basel: Birkhäuser.

    MATH  Google Scholar 

  • Anderson, J. A., & Rosenfeld, E. (Eds.). (2000). Talking nets: An oral history of neural networks. Cambridge: MIT Press.

    Google Scholar 

  • Arel, I., Rose, D. C., & Karnowski, T. P. (2010). Deep machine learning-a new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, 5, 13–18.

    Google Scholar 

  • Batin, M., Turchin, A., Markov, S., Zhila, A., & Denkenberger, D. (2017). Artificial intelligence in life extension: From deep learning to superintelligence. Informatica, 41, 401–417.

    MathSciNet  Google Scholar 

  • Bednar, J. A. (2009). Topographica: Building and analyzing map-level simulations from Python, C/C++, MATLAB, NEST, or NEURON components. Frontiers in Neuroinformatics, 3, 8.

    Google Scholar 

  • Bednar, J. A. (2014). Topographica. In D. Jaeger & R. Jung (Eds.), Encyclopedia of computational neuroscience (pp. 1–5). Berlin: Springer.

    Google Scholar 

  • Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279.

    Google Scholar 

  • Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations. Berlin: Springer.

    MATH  Google Scholar 

  • Betti, E. (1872). Il nuovo cimento. Series, 2, 7.

    Google Scholar 

  • Bianchini, M., & Scarselli, F. (2014a). On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Transactions on Neural Networks and Learning Systems, 25, 1553–1565.

    Google Scholar 

  • Bianchini, M., & Scarselli, F. (2014b). On the complexity of shallow and deep neural network classifiers. In Proceedings of European Symposium on Artificial Neural Networks (pp. 371–376).

  • Bo, L., Lai, K., Ren, X., & Fox, D. (2011). Object recognition with hierarchical kernel descriptors. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 1729–1736).

  • Bojar, O., Buck, C., Federmann, C., Haddow, B., Koehn, P., Leveling, J., Monz, C., et al. (2014). Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Workshop on Statistical Machine Translation (pp. 12–58).

  • Booker, L., Forrest, S., Mitchell, M., & Riolo, R. (Eds.). (2005). Perspectives on adaptation in natural and artificial systems. Oxford: Oxford University Press.

    Google Scholar 

  • Bottou, L., & LeCun, Y. (2004). Large scale online learning. In Advances in neural information processing systems (pp. 217–224).

  • Bower, J. M., & Beeman, D. (1998). The book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • Bracewell, R. (2003). Fourier analysis and imaging. Berlin: Springer.

    MATH  Google Scholar 

  • Cadieu, C. F., Hong, H., Yamins, D. L. K., Pinto, N., Ardila, D., Solomon, E. A., et al. (2014). Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology, 10, e1003963.

    Google Scholar 

  • Cadieu, C., Kouh, M., Pasupathy, A., Connor, C. E., Riesenhuber, M., & Poggio, T. (2007). A model of V4 shape selectivity and invariance. Journal of Neurophysiology, 98, 1733–1750.

    Google Scholar 

  • Carnap, R. (1938). The logical syntax of language. New York: Harcourt, Brace and World.

    MATH  Google Scholar 

  • Carreira-Perpiñán, M., & Hinton, G. (2005). On contrastive divergence learning. In R. Cowell, & Z. Ghahramani (Eds.), Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 33–40).

  • Cauchy, A. L. (1847). Méthode générale pour la résolution des systèmes d’équations simultanées. Comptes rendus des séances de l’Académie des sciences de Paris, 25, 536–538.

    Google Scholar 

  • Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. CoRR arXiv:abs/1405.3531.

  • Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2015). MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR arXiv:abs/1512.01274.

  • Chollet, F. (2018). Deep learning with python. Shelter Island (NY): Manning.

    Google Scholar 

  • Chui, M., Manyika, J., Miremadi, M., Henke, N., Chung, R., Nel, P., et al. (2018). Notes from the AI frontier: Insights from hundreds of use cases. Technical Reports. April, McKinsey Global Institute.

  • Cicchetti, D. V. (1991). The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation. Behavioral and Brain Science, 14, 119–186.

    Google Scholar 

  • Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 27755.

    Google Scholar 

  • Cinbis, R.G., Verbeek, J., & Schmid, C. (2012). Segmentation driven object detection with fisher vectors. In International Conference on Computer Vision, (pp. 2968–2975).

  • Cireşan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.

  • Clarke, A., Devereux, B. J., Randall, B., & Tyler, L. K. (2015). Predicting the time course of individual objects with MEG. Cerebral Cortex, 25, 3602–3612.

    Google Scholar 

  • Coates, A., Huval, B., Wang, T., Wu, D.J., Ng, A.Y., & Catanzaro, B. (2013). Deep learning with COTS HPC systems. In International Conference on Machine Learning, (pp. 1337–1345).

  • Connors, J. A., & Stevens, C. F. (1971). Prediction of repetitive firing behaviour from voltage clamp data on an isolated neurone soma. Journal of Physiology, 213, 31–53.

    Google Scholar 

  • Conway, B. R. (2018). The organization and operation of inferior temporal cortex. Annual Review of Vision Science, 4, 19.1–19.22.

    Google Scholar 

  • Copeland, J., & Proudfoot, D. (1996). On Alan Turing’s anticipation of connectionism. Synthese, 108, 361–377.

    MathSciNet  MATH  Google Scholar 

  • Curry, H. B. (1944). The method of steepest descent for non-linear minimization problems. Quarterly of Applied Mathematics, 2, 258–261.

    MathSciNet  MATH  Google Scholar 

  • Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function, mathematics of control. Signals and Systems, 2, 303–314.

    MathSciNet  MATH  Google Scholar 

  • Daniel, H. D. (2005). Publications as a measure of scientific advancement and of scientists’ productivity. Learned Publishing, 18, 143–148.

    Google Scholar 

  • Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge: MIT Press.

    MATH  Google Scholar 

  • de Villers, J., & Barnard, E. (1992). Backpropagation neural nets with one and two hidden layers. IEEE Transactions on Neural Networks, 4, 136–141.

    Google Scholar 

  • Deutsch, K. W. (1966). The nerves of government: Models of political communication and control. New York: Free Press.

    Google Scholar 

  • Douglas, R. J., & Martin, K. A. (2004). Neuronal circuits of the neocortex. Annual Review of Neuroscience, 27, 419–451.

    Google Scholar 

  • Douglas, R. J., Martin, K. A., & Whitteridge, D. (1989). A canonical microcircuit for neocortex. Neural Computation, 1, 480–488.

    Google Scholar 

  • Durrani, N., Haddow, B., Koehn, P., & Heafield, K. (2014). Edinburgh’s phrase-based machine translation systems for WMT-14. In Proceedings of the Workshop on Statistical Machine Translation (pp. 97–104).

  • Eickenberg, M., Gramfort, A., Varoquaux, G., & Thirion, B. (2017). Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage, 152, 184–194.

    Google Scholar 

  • Eldan, R., & Shamir, O. (2016). The power of depth for feedforward neural networks. Journal of Machine Learning Research, 49, 1–34.

    Google Scholar 

  • Eliasmith, C. (2013). How to build a brain: A neural architecture for biological cognition. Oxford: Oxford University Press.

    Google Scholar 

  • Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang, Y., et al. (2012). A large-scale model of the functioning brain. Science, 338, 1202–1205.

    Google Scholar 

  • Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–221.

    Google Scholar 

  • Elman, J. L., Bates, E., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge: MIT Press.

    Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. Journal of Computer Vision, 88, 303–338.

    Google Scholar 

  • Fellbaum, C. (1998). WordNet. Malden: Blackwell Publishing.

    MATH  Google Scholar 

  • Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47.

    Google Scholar 

  • Flack, J. C. (2018). Coarse-graining as a downward causation mechanism. Philosophical transactions of the Royal Society A, 375, 20160338.

    Google Scholar 

  • Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3–71.

    Google Scholar 

  • Fukushima, K. (1975). Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20, 121–136.

    Google Scholar 

  • Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193–202.

    MATH  Google Scholar 

  • Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1, 119–130.

    Google Scholar 

  • Gallistel, C. R. (1990). The organization of learning. Cambridge (MA): MIT Press.

    Google Scholar 

  • Gauthier, I., & Tarr, M. J. (2016). Visual object recognition: Do we (finally) know more now than we did? Annual Review of Vision Science, 2, 16.1–16.20.

    Google Scholar 

  • Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 1440–1448).

  • Godfrey, J., Holliman, E., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In International Conference on Acoustics, Speech and Signal Processing (pp. 517–520).

  • Grill-Spector, K., Weiner, K. S., Gomez, J., Stigliani, A., & Natu, V. S. (2018). The functional neuroanatomy of face perception: From brain measurements to deep neural networks. Interface Focus, 8, 20180013.

    Google Scholar 

  • Güçlü, U., & van Gerven, M. A. J. (2014). Unsupervised feature learning improves prediction of human brain activity in response to natural images. PLoS Computational Biology, 10, 1–16.

    Google Scholar 

  • Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35, 10005–10014.

    Google Scholar 

  • Guo, Z. C., Shi, L., & Lin, S. B. (2019). Realizing data features by deep nets. CoRR arXiv:abs/1901.00139.

  • Hain, T., Woodland, P. C., Evermann, G., Gales, M. J. F., Liu, X., Moore, G. L., et al. (2005). Automatic transcription of conversational telephone speech. IEEE Transactions on Speech and Audio Processing, 13, 1173–1185.

    Google Scholar 

  • Hanson, N. R. (1958). Patterns of discovery. Cambridge: Cambridge University Press.

    Google Scholar 

  • Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95, 245–258.

    Google Scholar 

  • Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., Kalro, A., Law, J., Lee, K., Lu, J., Noordhuis, P., Smelyanskiy, M., Xiong, L., & Wang, X. (2018). Applied machine learning at Facebook: A datacenter infrastructure perspective. In IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 620–629).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 2818–2826).

  • Hebb, D. O. (1949). The organization of behavior. New York: Wiley.

    Google Scholar 

  • Hemlin, S. (1996). Research on research evaluation. Social Epistemology, 10, 209–250.

    Google Scholar 

  • Hendricks, V. F., Jakobsen, A., & Pedersen, S. A. (2000). Identification of matrices in science and engineering. Journal for General Philosophy of Science, 31, 277–305.

    Google Scholar 

  • Hines, M., & Carnevale, N. (1997). The NEURON simulation environment. Neural Computation, 9, 1179–1209.

    Google Scholar 

  • Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 162, 83–112.

    MATH  Google Scholar 

  • Hinton, G.E., McClelland, J.L., & Rumelhart, D.E. (1986). Distributed representations. In D. E. Rumelhart & J. L. McClelland (Eds.) (pp. 77–109).

  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 28, 504–507.

    MathSciNet  MATH  Google Scholar 

  • Hodas, N., & Stinis, P. (2018). Doing the impossible: Why neural networks can be trained at all. CoRR arXiv:abs/1805.04928.

  • Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of ion currents and its applications to conduction and excitation in nerve membranes. Journal of Physiology, 117, 500–544.

    Google Scholar 

  • Holland, J. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.

    Google Scholar 

  • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366.

    MATH  Google Scholar 

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 7132–7142).

  • Hubel, D., & Wiesel, T. (1962). Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. Journal of Physiology, 160, 106–154.

    Google Scholar 

  • Hubel, D., & Wiesel, T. (1968). Receptive fields and functional architecture of mokey striate cortex. Journal of Physiology, 195, 215–243.

    Google Scholar 

  • Ising, E. (1925). Beitrag zur Theorie des Rerromagnetismus. Zeitschrift für Physik, 31, 253–258.

    Google Scholar 

  • Iso, S., Shiba, S., & Yokoo, S. (2018). Scale-invariant feature extraction of neural network and renormalization group flow. Physical Review E, 97, 053304.

    Google Scholar 

  • Jones, W., Alasoo, K., Fishman, D., & Parts, L. (2017). Computational biology: Deep learning. Emerging Topics in Life Sciences, 1, 136–161.

    Google Scholar 

  • Jordan, R., Kinderlehrer, D., & Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM Journal Mathematical Analysis, 29, 1–17.

    MathSciNet  MATH  Google Scholar 

  • Kadanoff, L. P. (2000). Statistical physics: Statics, dynamics and renormalization. Singapore: World Scientific Publishing.

    MATH  Google Scholar 

  • Kaplan, D. M. (2011). Explanation and description in computational neuroscience. Synthese, 183, 339–373.

    Google Scholar 

  • Kaplan, D. M., & Craver, C. F. (2011). Towards a mechanistic philosophy of neuroscience. In S. French & J. Saatsi (Eds.), Continuum companion to the philosophy of science (pp. 268–292). London: Continuum Press.

    Google Scholar 

  • Karmiloff-Smith, A. (1992). Beyond modularity: A developmental perspective on cognitive science. Cambridge: MIT Press.

    Google Scholar 

  • Kass, R. E., Amari, S. I., Arai, K., Diekman, E. N. B. C. O., Diesmann, M., Doiron, B., et al. (2018). Computational neuroscience: Mathematical and statistical perspectives. Annual Review of Statistics and Its Application, 5, 183–214.

    MathSciNet  Google Scholar 

  • Kay, K. N., Winawer, J., Mezer, A., & Wandell, B. A. (2013). Compressive spatial summation in human visual cortex. Journal of Neurophysiology, 110, 481–494.

    Google Scholar 

  • Ketkar, N. (2017). Introduction to PyTorch (pp. 195–208). Berkeley: Apress.

    Google Scholar 

  • Khaligh-Razavi, S. M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10, e1003915.

    Google Scholar 

  • Khan, S., & Tripp, B. P. (2017). One model to learn them all. CoRR arXiv:abs/1706.05137.

  • Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations.

  • Klein, B., Harvey, B. M., & Dumoulin, S. O. (2014). Attraction of position preference by spatial attention throughout human visual cortex. Neuron, 84, 227–237.

    Google Scholar 

  • Kotseruba, I., & Tsotsos, J. K. (2018). 40 years of cognitive architectures: Core cognitive abilities and practical applications. Artificial Intelligence Review,. https://doi.org/10.1007/s10462-018-9646-y.

    Article  Google Scholar 

  • Kouh, M., & Poggio, T. (2008). A canonical neural circuit for cortical nonlinear operations. Neural Computation, 20, 1427–1451.

    MathSciNet  MATH  Google Scholar 

  • Kriegeskorte, N. (2009). Relating population-code representations between man, monkey, and computational models. Frontiers in Neuroscience, 3, 363–373.

    Google Scholar 

  • Kriegeskorte, N., Mur, M., & Bandettini, P. (2009). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4.

    Google Scholar 

  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical Reports. Vol. 1, No. 4, University of Toronto.

  • Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1090–1098).

  • Kushner, H. J., & Clark, D. (1978). Stochastic approximation methods for constrained and unconstrained systems. Berlin: Springer.

    MATH  Google Scholar 

  • Lai, G., Xie, Q., Liu, H., Yang, Y., & Hovy, E. (2017). RACE: Large-scale reading comprehension dataset from examinations. In Conference on Empirical Methods in Natural Language Processing (pp 796–805).

  • Laird, J. E., & van Lent, M. (2001). Human-level AI’s killer application: Interactive computer games. AI Magazine, 22, 15–25.

    Google Scholar 

  • Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Science, 40, 1–72.

    Google Scholar 

  • Landgrebe, J., & Smith, B. (2019). Making AI meaningful again. Synthese,. https://doi.org/10.1007/s11229-019-02192-y:1-21.

    Article  Google Scholar 

  • Laudan, L. (1984). Explaining the success of science: Beyond epistemic realism and relativism. In J. T. Cushing, C. F. Delaney, & G. Gutting (Eds.), Science and reality: Recent work in the philosophy of science (pp. 83–105). Notre Dame: University of Notre Dame Press.

    Google Scholar 

  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541–551.

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.

    Google Scholar 

  • Lee, C. Y., Gallagher, P. W., & Tu, Z. (2018). Generalizing pooling functions in CNNs: Mixed, gated, and tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 863–875.

    Google Scholar 

  • Lehky, S. R., & Tanaka, K. (2016). Neural representation for object recognition in inferotemporal cortex. Current Opinion in Neurobiology, 37, 23–35.

    Google Scholar 

  • Leibniz, G.W. (1666). De arte combinatoria. Ginevra, in Opera Omnia a cura di L. Dutens, 1768.

  • Lettvin, J., Maturana, H., McCulloch, W., & Pitts, W. (1959). What the frog’s eye tells the frog’s brain. Proceedings of IRE, 47, 1940–1951.

    Google Scholar 

  • Levenberg, K. (1944). A method for solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2, 164–168.

    MathSciNet  MATH  Google Scholar 

  • Lin, H. W., Tegmark, M., & Rolnick, D. (2017). Why does deep and cheap learning work so well? Journal of Statistical Physics, 168, 1223–1247.

    MathSciNet  MATH  Google Scholar 

  • Lin, S. B. (2018). Generalization and expressivity for deep nets. IEEE Transactions on Neural Networks and Learning Systems, 30, 1392–1406.

    MathSciNet  Google Scholar 

  • Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26.

    Google Scholar 

  • López-Rubio, E. (2018). Computational functionalism for the deep learning era. Minds and Machines, 28, 667–688.

    Google Scholar 

  • Lorente de Nó, R. (1938). Architectonics and structure of the cerebral cortex. In J. Fulton (Ed.), Physiology of the nervous system (pp. 291–330). Oxford: Oxford University Press.

    Google Scholar 

  • Lu, Y. (2019). Artificial intelligence: A survey on evolution, models, applications and future trends. Journal of Management Analytics,. https://doi.org/10.1080/23270012.2019.1570365:1-29.

    Article  Google Scholar 

  • MacWhinney, B. (Ed.). (1999). The emergence of language (2nd ed.). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Maex, R., Berends, M., & Cornelis, H. (2010). Large-scale network simulations in systems neuroscience. In E. De Schutter (Ed.), Computational modeling methods for neuroscientists (pp. 317–354). Cambridge: MIT Press.

    Google Scholar 

  • Marcus, G. (2018). Deep learning: A critical appraisal. CoRR arXiv:abs/1801.00631.

  • Markov, N., Ercsey-Ravasz, M. M., Gomes, A. R. R., Lamy, C., Magrou, L., Vezoli, J., et al. (2014). A weighted and directed interareal connectivity matrix for macaque cerebral cortex. Cerebral Cortex, 24, 17–36.

    Google Scholar 

  • Markram, H., Muller, E., Ramaswamy, S., Reimann, M. W., et al. (2015). Reconstruction and simulation of neocortical microcircuitry. Cell, 163, 456–492.

    Google Scholar 

  • Maudit, N., Duranton, M., Gobert, J., & Sirat, J. (1992). Lneuro1.0: A piece of hardware lego for building neural network systems. IEEE Transactions on Neural Networks, 3, 414–422.

    Google Scholar 

  • McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.

    MathSciNet  MATH  Google Scholar 

  • Mehta, P., & Schwab, D. J. (2014). An exact mapping between the variational renormalization group and deep learning. CoRR arXiv:abs/1410.03831.

  • Mei, S., Montanari, A., & Nguyen, P. M. (2018). A mean field view of the landscape of two-layer neural networks. Proceedings of the Natural Academy of Science USA, 115, E7665–E7671.

    MathSciNet  MATH  Google Scholar 

  • Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (2005). Computational maps in the visual cortex. New York: Springer.

    Google Scholar 

  • Miller, J., & Bower, J. M. (2013). Introduction: Origins and history of the cns meetings. In J. M. Bower (Ed.), 20 years of computational neuroscience (pp. 1–13). Berlin: Springer.

    Google Scholar 

  • Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529–533.

    Google Scholar 

  • Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Nguyen, P. M. (2019). Mean field limit of the learning dynamics of multilayer neural networks. CoRR arXiv:abs/1902.02880.

  • Nickles, T. (2006). Heuristic appraisal: Context of discovery or justification? In J. Schickore & F. Steinle (Eds.), Revisiting discovery and justification (pp. 159–182). Dordrecht: Springer.

    Google Scholar 

  • Niiniluoto, I. (1993). The aim and structure of applied research. Erkenntnis, 38, 1–21.

    Google Scholar 

  • Niu, J., Tang, W., Xu, F., Zhou, X., & Song, Y. (2016). Global research on artificial intelligence from 1990–2014: Spatially-explicit bibliometric analysis. International Journal of Geo-Information, 5, 66.

    Google Scholar 

  • O’Brien, G., & Opie, J. (2004). Notes toward a structuralist theory of mental representation. In H. Clapin, P. Staines, & P. Slezak (Eds.), Representation in mind: New approaches to mental representation. Amsterdam: Elsevier.

    Google Scholar 

  • Olshausen, B. A. (2014). Perception as an inference problem. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (fifth ed., pp. 295–304). Cambridge: MIT Press.

    Google Scholar 

  • Özkural, E. (2018). The foundations of deep learning with a path towards general intelligence. In Proceedings of International Conference on Artificial General Intelligence (pp. 162–173).

    Google Scholar 

  • Peirce, C. S. (1935). Pragmatism and abduction. In C. Hartshorne & P. Weiss (Eds.), Collected papers of Charles Sanders Peirce (Vol. 5, pp. 112–128). Cambridge: Harvard University Press.

    Google Scholar 

  • Petersen, P., Raslan, M., & Voigtlaender, F. (2018). Topological properties of the set of functions generated by neural networks of fixed size. CoRR arXiv:abs/1806.08459.

  • Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. CoRR arXiv:abs/1802.03268.

  • Piccinini, G. (2004). The first computational theory of mind and brain: A close look at McCulloch and Pitts’s ’Logical calculus of ideas immanent in nervous activity’. Synthese, 141, 175–215.

    MathSciNet  Google Scholar 

  • Piccinini, G. (2006). Computational explanation in neuroscience. Synthese, 153, 343–353.

    MathSciNet  Google Scholar 

  • Piccinini, G. (2007). Computational modeling vs. computational explanation: Is everything a turing machine, and does it matter to the philosophy of mind? Australasian Journal of Philosoph, 85, 93–115.

    Google Scholar 

  • Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193.

    Google Scholar 

  • Plebe, A. (2018). The search of “canonical” explanations for the cerebral cortex. History and Philosophy of the Life Sciences, 40, 40–76.

    Google Scholar 

  • Plebe, A., & De La Cruz, V. M. (2018). Neural representations beyond “plus X”. Minds and Machines, 28, 93–117.

    Google Scholar 

  • Plebe, A., & Domenella, R. G. (2007). Object recognition by artificial cortical maps. Neural Networks, 20, 763–780.

    MATH  Google Scholar 

  • Plebe, A., & Grasso, G. (2016). The brain in silicon: History, and skepticism. In F. Gadducci & M. Tavosanis (Eds.), History and philosophy of computing (pp. 273–286). Berlin: Springer.

    Google Scholar 

  • Polak, E. (1971). Computational methods in optimization: A unified approach. New York: Academic Press.

    Google Scholar 

  • Protopapas, A. D., Vanier, M., & Bower, J. M. (1998). Simulating large networks of neurons. In C. Koch & I. Segev (Eds.), Methods in neuronal modeling from ions to networks (second ed.). Cambridge: MIT Press.

    Google Scholar 

  • Psillos, S. (2000). The present state of the scientific realism debate. British Journal for the Philosophy of Science, 51, 705–728.

    Google Scholar 

  • Putnam, H. (1978). Meaning and the moral sciences. London: Routledge.

    Google Scholar 

  • Quinlan, P. (1991). Connectionism and psychology. Hemel Hempstead: Harvester Wheatshaft.

    Google Scholar 

  • Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38, 7255–7269.

    Google Scholar 

  • Rall, W. (1957). Membrane time constant of motoneurons. Science, 126, 454.

    Google Scholar 

  • Rall, W. (1964). Theoretical significance of dendritic tress for neuronal input-output relations. In R. F. Reiss (Ed.), Neural theory and modeling (pp. 73–97). Stanford: Stanford University Press.

    Google Scholar 

  • Rall, W. (1969). Time constants and electrotonic length of membrane cylinders and neurons. Biophysic Journal, 9, 1483–1508.

    Google Scholar 

  • Rall, W., & Shepherd, G. M. (1968). Theoretical reconstruction of field potentials and dendrodendritic synaptic interactions in olfactory bulb. Journal of Neurophysiology, 31, 884–915.

    Google Scholar 

  • Ramón y Cajal, S. (1917). Recuerdos de mi vida (Vol. II). Madrid: Imprenta y Librería de Nicolás Moya.

    Google Scholar 

  • Ramsey, W., Stich, S. P., & Rumelhart, D. E. (Eds.). (1991). Philosophy and connectionist theory. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29, 2352–2449.

    MathSciNet  MATH  Google Scholar 

  • Reichenbach, H. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge. Chicago: Chicago University Press.

    Google Scholar 

  • Richardson, M., Burges, C.J., & Renshaw, E. (2013). MCTest: A challenge dataset for the open-domain machine comprehension of text. In Conference on Empirical Methods in Natural Language Processing (pp. 193–203).

  • Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

    Google Scholar 

  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400–407.

    MathSciNet  MATH  Google Scholar 

  • Robinson, L., & Rolls, E. T. (2015). Invariant visual object recognition: Biologically plausible approaches. Biological Cybernetics, 109, 505–535.

    MathSciNet  Google Scholar 

  • Rolls, E. (2016). Cerebral cortex: Principles of operation. Oxford: Oxford University Press.

    Google Scholar 

  • Rolls, E., & Deco, G. (2002). Computational neuroscience of vision. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Rolls, E. T., & Stringer, S. M. (2006). Invariant visual object recognition: A model, with lighting invariance. Journal of Physiology, 100, 43–62.

    Google Scholar 

  • Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organisation in the brain. Psychological Review, 65, 386–408.

    Google Scholar 

  • Rosenblatt, F. (1962). Principles of neurodynamics: Perceptron and the theory of brain mechanisms. Washington (DC): Spartan.

    MATH  Google Scholar 

  • Rosenfeld, A. (1969). Picture processing by computer. New York: Academic Press.

    MATH  Google Scholar 

  • Rosenfeld, A., & Kak, A. C. (1982). Digital picture processing (2nd ed.). New York: Academic Press.

    MATH  Google Scholar 

  • Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: The basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, architectures and applications (pp. 1–34). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

    MATH  Google Scholar 

  • Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge: MIT Press.

    Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252.

    MathSciNet  Google Scholar 

  • Sachan, M., Dubey, A., Xing, E.P., & Richardson, M. (2015). Learning answer-entailing structures for machine comprehension. In Annual Meeting of the Association for Computational Linguistics (pp. 239–249).

  • Safran, I., & Shamir, O. (2017). Depth-width tradeoffs in approximating natural functions with neural networks. CoRR arXiv:abs/1610.09887.

  • Sánchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp 1665–1672).

  • Sanders, J., & Kandrot, E. (2014). CUDA by example: An introduction to general-purpose GPU programming. Reading: Addison Wesley.

    Google Scholar 

  • Saon, G., Kurata, G., Sercu, T., Audhkhasi, K., Thomas, S., Dimitriadis, D., Cui, X., Ramabhadran, B., et al. (2017). English conversational telephone speech recognition by humans and machines. In Conference of the International Speech Communication Association (pp 132–136).

  • Schickore, J., & Steinle, F. (Eds.). (2006). Revisiting discovery and justification: Historical and philosophical perspectives on the context distinction. Berlin: Springer.

    Google Scholar 

  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.

    Google Scholar 

  • Schmidt, M., Roux, N. L., & Bach, F. (2017). Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162, 83–112.

    MathSciNet  MATH  Google Scholar 

  • Shannon, C. (1950). Programming a computer for playing chess. Philosophical Magazine, 41, 256–275.

    MathSciNet  MATH  Google Scholar 

  • Shea, N. (2014). Exploitable isomorphism and structural representation. Proceedings of the Aristotelian Society, 114, 123–144.

    Google Scholar 

  • Shepherd, G. M. (1988). A basic circuit for cortical organization. In M. S. Gazzaniga (Ed.), Perspectives on memory research (pp. 93–134). Cambridge: MIT Press.

    Google Scholar 

  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.

    Google Scholar 

  • Simon, H. A. (1977). Models of discovery. Dordrecht: Reidel Publishing Company.

    Google Scholar 

  • Simon, H. A. (1996). The sciences of the artificial (third ed.). Cambridge: MIT Press.

    Google Scholar 

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. CoRR arXiv:abs/1409.1556.

  • Slavenburg, G.A., Rathnam, S., & Dijkstra, H. (1996). The Trimedia TM-1 PCI VLIW media processor. In Hot Chips Symposium.

  • Stigliani, A., Jeska, B., & Grill-Spector, K. (2017). Encoding model of temporal processing in human visual cortex. Proceedings of the Natural Academy of Science USA, 1914, E11047–E11056.

    Google Scholar 

  • Stinchcombe, M. (1999). Neural network approximation of continuous functionals and continuous functions on compactifications. Neural Networks, 12, 467–477.

    Google Scholar 

  • Stinchcombe, M., & White, H. (1989). Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In Proceedings International Joint Conference on Neural Networks, S. Diego (CA) (pp. 613–617).

  • Stringer, S. M., & Rolls, E. T. (2002). Invariant object recognition in the visual system with novel views of 3d objects. Neural Computation, 14, 2585–2596.

    MATH  Google Scholar 

  • Stringer, S. M., Rolls, E. T., & Tromans, J. M. (2007). Invariant object recognition with trace learning and multiple stimuli present during training. Network: Computation in Neural Systems, 18, 161–187.

    Google Scholar 

  • Stueckelberg, E., & Petermann, A. (1953). La normalisation des constantes dans la théorie des quanta. Helvetica Physica Acta, 26, 499–520.

    MathSciNet  MATH  Google Scholar 

  • Swoyer, C. (1991). Structural representation and surrogative reasoning. Synthese, 87, 449–508.

    MathSciNet  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 1–9).

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 2818–2826).

  • Tacchetti, A., Isik, L., & Poggio, T. A. (2018). Invariant recognition shapes neural representations of visual input. Annual Review of Vision Science, 4, 403–422.

    Google Scholar 

  • Tan, K. H., & Lim, B. P. (2018). The artificial intelligence renaissance: Deep learning and the road to human-level machine intelligence. APSIPA Transactions on Signal and Information Processing, 7, e6.

    Google Scholar 

  • Theeten, J., Duranton, M., Maudit, N., & Sirat, J. (1990). The l-neuro chip: A digital VLSI with an on-chip learning mechanism. In Proceedings of International Neural Network Conference (pp. 593–596). Kluwer Academic.

  • Thomson Kelvin, W. (1855). On the theory of the electric telegraph. Proceedings of the Royal Society of London, 7, 382–399.

    Google Scholar 

  • Traub, R. D. (1977). Motorneurons of different geometry and the size principle. Biological Cybernetics, 25, 163–176.

    Google Scholar 

  • Traub, R. D. (1979). Neocortical pyramidal cells: A model with dendritic calcium conductance reproduces repetitive firing and epileptic behavior. Brain, 173, 243–257.

    Google Scholar 

  • Tripp, B.P. (2017). Similarities and differences between stimulus tuning in the inferotemporal visual cortex and convolutional networks. In International Joint Conference on Neural Networks (pp. 3551–3560).

  • Trischler, A., Ye, Z., Yuan, X., He, J., Bachman, P., & Suleman, K. (2016). A parallel–hierarchical model for machine comprehension on sparse data. CoRR arXiv:abs/1603.08884.

  • Turing, A. (1948). Intelligent machinery. Tech. rep., National Physical Laboratory, London, raccolto. In D. C. Ince (Ed.) Collected works of A. M. Turing: Mechanical intelligence, Edinburgh University Press, 1969.

  • Ullman, S., Harari, D., & Dorfman, N. (2012). From simple innate biases to complex visual concepts. Proceedings of the Natural Academy of Science USA, 109, 18215–18220.

    Google Scholar 

  • Van Essen, D. C. (2003). Organization of visual areas in macaque and human cerebral cortex. In L. Chalupa & J. Werner (Eds.), The visual neurosciences. Cambridge: MIT Press.

    Google Scholar 

  • Van Essen, D. C., & DeYoe, E. A. (1994). Concurrent processing in the primate visual cortex. In M. S. Gazzaniga (Ed.), The cognitive neurosciences. Cambridge: MIT Press.

    Google Scholar 

  • VanRullen, R. (2017). Perception science in the age of deep neural networks. Frontiers in Psychology, 8, 142.

    Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 6000–6010).

  • Veselý, K., Ghoshal, A., Burget, L., & Povey, D. (2013). Sequence-discriminative training of deep neural networks. In Conference of the International Speech Communication Association (pp. 2345–2349).

  • Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2016). Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Transaction on Pattern Analysis and Machine Intelligence, 39, 652–663.

    Google Scholar 

  • Volterra, V. (1930). Theory of functionals and of integral and integro-differential equations. London: Blackie & Son. (Translation by M. Long).

    MATH  Google Scholar 

  • von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetic, 14, 85–100.

    Google Scholar 

  • von Economo, C., & Koskinas, G. N. (1925). Die Cytoarchitektonik der Hirnrinde des erwachsenen Menschen. Berlin: Springer.

    Google Scholar 

  • Wallis, G., & Rolls, E. (1997). Invariant face and object recognition in the visual system. Progress in Neurobiology, 51, 167–194.

    Google Scholar 

  • Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D thesis, Harvard University.

  • Werbos, P. (1994). The roots of backpropagation: From ordered derivatives to neural networks. New York: Wiley.

    Google Scholar 

  • Wiener, N. (1949). Extrapolation, interpolation and smoothing of stationary time series. New York: Wiley.

    MATH  Google Scholar 

  • Williams, T., & Li, R. (2018). Wavelet pooling for convolutional neural networks. In International Conference on Learning Representations.

  • Willshaw, D. J., & von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proceedings of the Royal Society of London, B194, 431–445.

    Google Scholar 

  • Wilson, K. G., & Kogut, J. (1974). The renormalization group and the \(\epsilon\) expansion. Physics Reports, 12, 75–199.

    Google Scholar 

  • Wu, E., & Liu, Y. (2008). Emerging technology about GPGPU. In IEEE Asia Pacific Conference on Circuits and Systems (pp. 618–622).

  • Yamins, D. L. K., Honga, H., Cadieua, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the Natural Academy of Science USA, 23, 8619–8624.

    Google Scholar 

  • Yang, Y., Tarr, M.J., Elissa, M., & Aminoff, R.E.K. (2018). Exploring spatio–temporal neural dynamics of the human visual cortex. bioRxiv arXiv:422576.

  • Zhou, D. X. (2002). The covering number in learning theory. Journal of Complexity, 18, 739–767.

    MathSciNet  MATH  Google Scholar 

  • Zhou, J., Cao, Y., Wang, X., Li, P., & Xu, W. (2016). Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics, 4, 371–383.

    Google Scholar 

  • Zhu, H., Wei, F., Qin, B., & Liu, T. (2018). Hierarchical attention flow for multiple-choice reading comprehension. In AAAI Conference on Artificial Intelligence (pp. 6077–6084).

  • Ziman, J. (2000). Real science: What it is and what it means. Cambridge: Cambridge University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Plebe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Plebe, A., Grasso, G. The Unbearable Shallow Understanding of Deep Learning. Minds & Machines 29, 515–553 (2019). https://doi.org/10.1007/s11023-019-09512-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11023-019-09512-8

Keywords

Navigation