Abstract
In artificial intelligence, recent research has demonstrated the remarkable potential of Deep Convolutional Neural Networks (DCNNs), which seem to exceed state-of-the-art performance in new domains weekly, especially on the sorts of very difficult perceptual discrimination tasks that skeptics thought would remain beyond the reach of artificial intelligence. However, it has proven difficult to explain why DCNNs perform so well. In philosophy of mind, empiricists have long suggested that complex cognition is based on information derived from sensory experience, often appealing to a faculty of abstraction. Rationalists have frequently complained, however, that empiricists never adequately explained how this faculty of abstraction actually works. In this paper, I tie these two questions together, to the mutual benefit of both disciplines. I argue that the architectural features that distinguish DCNNs from earlier neural networks allow them to implement a form of hierarchical processing that I call “transformational abstraction”. Transformational abstraction iteratively converts sensory-based representations of category exemplars into new formats that are increasingly tolerant to “nuisance variation” in input. Reflecting upon the way that DCNNs leverage a combination of linear and non-linear processing to efficiently accomplish this feat allows us to understand how the brain is capable of bi-directional travel between exemplars and abstractions, addressing longstanding problems in empiricist philosophy of mind. I end by considering the prospects for future research on DCNNs, arguing that rather than simply implementing 80s connectionism with more brute-force computation, transformational abstraction counts as a qualitatively distinct form of processing ripe with philosophical and psychological significance, because it is significantly better suited to depict the generic mechanism responsible for this important kind of psychological processing in the brain.
Similar content being viewed by others
Notes
This question has been raised as the “interpretation problem”; however, this label has been used too broadly and inconsistently to admit of a single solution. Some commentators use it to broach the question addressed here—why do DCNNs succeed where other neural network architectures struggle—while others use it to raise other questions, such as semantic interpretability or decision justification.
Some residual problems may be extracted from the critiques, however, especially regarding the biological plausibility of the procedures used to train DCNNs. I address these residual concerns in the final section.
Even three-layer perceptrons have been trained to categorize triangle exemplars with a high degree of accuracy (Spasojević et al. 2012).
This is but the barest gloss on a rich research area in the foundations of logic and math going back to Hilbert—for a recent overview, see Antonelli (2010).
Achille and Soatto (2017) have recently argued that implicit or explicit regularization is a fourth crucially important feature in generalizing DCNN performance (to prevent them from simply memorizing the mapping for every exemplar in the training set), but since there is significant diversity in regularization procedures and this idea is more preliminary, I do not discuss it further here.
Note that when DCNNs are deployed for categorization or other forms of decision-making, the final layer of the network will typically be a fully-connected classifier that takes input from all late-stage nodes (i.e. a fully connected layer of nodes or set of category-specific support-vector machines). These are used to draw the boundaries between the different category manifolds in the transformed similarity space. Since these components are deployed in many other machine learning methods that do not model transformational abstraction, I do not discuss them further here.
An important current point of controversy is whether specifically max-pooling is required to reduce the search space and avoid overfitting, or whether other downsampling methods might be as effective. For two poles in this debate, see (Patel et al. 2016; Springenberg et al. 2014). The present paper holds that even if alternative solutions are also practically effective, biologically-relevant networks must somehow implement the aggregative role of complex cells—though max-pooling is perhaps only one possible technique in a family of downsampling operations that could accomplish this (DiCarlo and Cox 2007).
For some early empirical support for this view, see Achille and Soatto (2017).
For a worked example, see Goodfellow et al. (2016, p. 334), who show that edge detection alone can be roughly 60,000 times more computationally efficient when performed by a DCNN, compared to a traditional 3-layer perceptron.
One could also worry here that AlphaGo did not learn the rules of Go from experience, but this does not impugn the point. What is claimed is rather that once these rules were provided, a DCNN can learn strategies without any domain-specific strategy heuristics (which knowledge of the rules do not provide). This is especially driven home by AlphaGo Zero, which acquired strategies entirely through self-play (Silver et al. 2017).
Interestingly, the DeepArt team found that average-pooling was more effective than max-pooling when the network was in generation mode.
A likelier critical outcome is that both DCNNs and mammalian neocortex are members of the LN generic mechanism family, but there are other members in this family besides DCNNs that provide a tighter fit in performance and structure to humans. For example, while a more recent study by DiCarlo and co-authors confirmed that DCNNs predict many low-resolution patterns in human perceptual similarity judgments and do so using the same sorts of features that are found in late-stage ventral stream processing in V4/5 and IT, they found that these models were not as predictive of high-resolution, image-by-image comparisons in humans as were rhesus monkeys (Rajalingham et al. 2018). They speculate that an alternative but nearby subfamily of models that tweaks one or more typical features of DCNNs—i.e. their diet of training on static images, or lack of recurrent connections between layers—might provide an even better mechanistic model of human perceptual similarity and categorization judgments without unduly complicating the model. However, whether this prospect will pay off—and do so without inhibiting the ability of DCNNs to generalize to non-primate species—remains an open empirical question, and DCNNs remain the most successful mechanistic model of primate visual perception that we have to date.
References
Achille, A., & Soatto, S. (2017). Emergence of invariance and disentangling in deep representations. arXiv Preprint arXiv:1706.01350.
Antonelli, G. A. (2010). Notions of invariance for abstraction principles. Philosophia Mathematica, 18(3), 276–292.
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22, 577–660.
Bechtel, W., & Abrahamsen, A. (2005). Explanation: A mechanist alternative. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 36(2), 421–441.
Berkeley, G. (1710/1982). A treatise concerning the principles of human knowledge. Indianapolis: Hackett. (Original work published in 1710).
Beth, E. W. (1957). Uber lockes “Allgemeines Dreieck”. Kant-Studien, 1(48), 361–380.
Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., et al. (2016). Model-free episodic control. arXiv Preprint arXiv:1606.04460.
Boone, W., & Piccinini, G. (2016). Mechanistic abstraction. Philosophy of Science, 83(5), 686–697.
Botvinick, M., Barrett, D. G., Battaglia, P., de Freitas, N., Kumaran, D., Leibo, J. Z., et al. (2017). Building machines that learn and think for themselves. Behavioral and Brain Sciences, 40, 26–28.
Boyd, R. (1999). Kinds, complexity and multiple realization. Philosophical Studies, 95(1–2), 67–98.
Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence, 47(1–3), 139–159.
Buckner, C. (2011). Two approaches to the distinction between cognition and “mere association”. International Journal of Comparative Psychology, 24(4), 314–348.
Buckner, C. (2015). Functional kinds: A skeptical look. Synthese, 192(12), 3915–3942.
Buckner, C., & Garson, J. (2018). Connectionism: Roots, revolution, and radiation. In M. Sprevak & M. Columbo (Eds.), The Routledge handbook of the computational mind. New York: Routledge.
Camp, E. (2015). Logical concepts and associative characterizations. In E. Margolis & S. Laurence (Eds.), The conceptual mind: New directions in the study of concepts (pp. 591–621). Cambridge: MIT Press.
Chatterjee, A. (2010). Disembodying cognition. Language and Cognition, 2(1), 79–116.
Churchland, P. M. (1989). A neurocomputational perspective: The nature of mind and the structure of science. Cambridge: MIT press.
Clark, A. (1989). Microcognition: Philosophy, cognitive science, and parallel distributed processing (Vol. 6). Cambridge: MIT Press.
Craver, C., & Kaplan, D. M. (2018). Are more details better? On the norms of completeness for mechanistic explanations. The British Journal for the Philosophy of Science, axy015. https://doi-org.ezproxy.lib.uh.edu/10.1093/bjps/axy015.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314.
DeMers, D., & Cottrell, G. W. (1993). Non-linear dimensionality reduction. In S. J. Hanson, J. D. Cowan & C. L. Giles (Eds.), Advances in neural information processing systems (NIPS) 5 (pp. 580–587). San Mateo: Morgan Kaufmann.
DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8), 333–341. https://doi.org/10.1016/j.tics.2007.06.010.
DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? Neuron, 73(3), 415–434.
Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2015). Learning to generate chairs with convolutional neural networks. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1538–1546). https://doi.org/10.1109/CVPR.2015.7298761.
Elsayed, G. F., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., & Sohl-Dickstein, J. (2018). Adversarial examples that fool both human and computer vision. arXiv Preprint arXiv:1802.08195.
Fukushima, K. (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Technical Report, A, 62(10), 658–665.
Fukushima, K. (2003). Neocognitron for handwritten digit recognition. Neurocomputing, 51, 161–180.
Gärdenfors, P. (2004). Conceptual spaces: The geometry of thought. Cambridge: MIT press.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414–2423).
Gauker, C. (2011). Words and images: An essay on the origin of ideas. Oxford: OUP.
Glennan, S. (2002). Rethinking mechanistic explanation. Philosophy of Science, 69(S3), S342–S353.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Book in preparation for MIT Press. http://www.deeplearningbook.org.
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv Preprint arXiv:1412.6572.
Gray, H. (1918). Anatomy of the human body, rev. and re-edited by Warren H. Lewis. Philadelphia: Lea & Febiger.
Grósz, T., & Nagy, I. (2014). Document classification with deep rectifier neural networks and probabilistic sampling. In Proceedings of the international conference on text, speech, and dialogue (pp. 108–115). Cham: Springer.
Hahnloser, R. H., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000). Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789), 947.
Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95(2), 245–258.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis for Technische Universität München, München.
Hong, H., Yamins, D. L., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature neuroscience, 19(4), 613.
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106–154.
Hume, D. (1739). A treatise on human nature. Oxford: Oxford University Press.
Kaplan, D. M., & Craver, C. F. (2011). The explanatory force of dynamical and mathematical models in neuroscience: A mechanistic perspective. Philosophy of Science, 78(4), 601–627.
Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915. https://doi.org/10.1371/journal.pcbi.1003915.
Kumaran, D., Hassabis, D., & McClelland, J. L. (2016). What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7), 512–534.
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, E253.
Laurence, S., & Margolis, E. (2012). Abstraction and the origin of general ideas. Philosopher’s Imprint, 12(19), 1–22.
Laurence, S., & Margolis, E. (2015). Concept nativism and neural plasticity. In E. Margolis & S. Laurence (Eds.), The conceptual mind: New directions in the study of concepts (pp. 117–147). Cambridge: MIT Press.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems (pp. 396–404).
Levy, A., & Bechtel, W. (2013). Abstraction and the organization of mechanisms. Philosophy of Science, 80(2), 241–261.
Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. https://doi.org/10.1038/ncomms13276.
Luc, P., Neverova, N., Couprie, C., Verbeek, J., & LeCun, Y. (2017). Predicting deeper into the future of semantic segmentation. In IEEE international conference on computer vision (ICCV) (Vol. 1).
Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about mechanisms. Philosophy of Science, 67(1), 1–25.
Machery, E. (2009). Doing without concepts. Oxford: Oxford University Press.
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv:1801.00631 [cs, Stat].
McClelland, J. L. (1988). Connectionist models and psychological evidence. Journal of Memory and Language, 27(2), 107–123.
McClelland, J. L., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T., Seidenberg, M. S., et al. (2010). Letting structure emerge: Connectionist and dynamical systems approaches to cognition. Trends in cognitive sciences, 14(8), 348–356.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv Preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Montúfar, G. F., Pascanu, R., Cho, K., & Bengio, Y. (2014). On the number of linear regions of deep neural networks. In Advances in neural information processing systems (pp. 2924–2932).
Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill, 1(10), e3.
Patel, A. B., Nguyen, M. T., & Baraniuk, R. (2016). A probabilistic framework for deep learning. In Advances in Neural Information Processing Systems (pp. 2558–2566).
Perry, C. J., & Fallah, M. (2014). Feature integration and object representations along the dorsal stream visual hierarchy. Frontiers in Computational Neuroscience, 8, 84. https://doi.org/10.3389/fncom.2014.00084.
Piccinini, G., & Craver, C. (2011). Integrating psychology and neuroscience: Functional analyses as mechanism sketches. Synthese, 183(3), 283–311.
Priebe, N. J., Mechler, F., Carandini, M., & Ferster, D. (2004). The contribution of spike threshold to the dichotomy of cortical simple and complex cells. Nature Neuroscience, 7(10), 1113.
Quine, W. V. (1971). Epistemology naturalized. Akten Des XIV. Internationalen Kongresses Für Philosophie, 6, 87–103.
Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. bioRxiv, 240614.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Ritter, S., Barrett, D. G., Santoro, A., & Botvinick, M. M. (2017). Cognitive psychology for deep neural networks: A shape bias case study. arXiv Preprint arXiv:1706.08606.
Rogers, T. T., & McClelland, J. L. (2014). Parallel distributed processing at 25: Further explorations in the microstructure of cognition. Cognitive Science, 38(6), 1024–1077. https://doi.org/10.1111/cogs.12148.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B. Lloyd (Eds.), Cognition and categorization (pp. 27–48). Hillsdale, NJ: Erlbaum.
Scellier, B., & Bengio, Y. (2017). Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in Computational Neuroscience, 11, 24. https://doi.org/10.3389/fncom.2017.00024.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
Sejnowski, T. J., Koch, C., & Churchland, P. S. (1988). Computational neuroscience. Science, 241(4871), 1299–1306.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354.
Singhal, H. (2017). Convolutional neural network with TensorFlow implementation. Retrieved September 7, 2018, from https://medium.com/data-science-group-iitr/building-a-convolutional-neural-network-in-python-with-tensorflow-d251c3ca8117.
Spasojević, S. S., Šušić, M. Z., & DJurović, Ž. M. (2012). Recognition and classification of geometric shapes using neural networks. In 2012 11th symposium on neural network applications in electrical engineering (NEUREL) (pp. 71–76). IEEE.
Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv Preprint arXiv:1412.6806. Retrieved from https://arxiv.org/abs/1412.6806
Stinson, C. (2016). Mechanisms in psychology: ripping nature at its seams. Synthese, 193(5), 1585–1614.
Stinson, C. (2017). Back to the cradle: Mechanism schemata from piaget to DNA. In M. Adams, Z. Biener, U. Feest, & J. Sullivan (Eds.), Eppur si muove: Doing history and philosophy of science with Peter Machamer (pp. 183–194). Cham: Springer.
Stinson, C. (2018). Explanation and connectionist models. In M. Colombo & M. Sprevak (Eds.), The Routledge handbook of the computational mind. New York, NY: Routledge.
Vidyasagar, T. R. (2013). Reading into neuronal oscillations in the visual system: implications for developmental dyslexia. Frontiers in Human Neuroscience. https://doi.org/10.3389/fnhum.2013.00811.
Weiskopf, D. A. (2011a). Models and mechanisms in psychological explanation. Synthese, 183(3), 313.
Weiskopf, D. A. (2011b). The functional unity of special science kinds. The British Journal for the Philosophy of Science, 62(2), 233–258.
Yamins, D. L., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356.
Ylikoski, P., & Kuorikoski, J. (2010). Dissecting explanatory power. Philosophical Studies, 148(2), 201–219.
Yu, C., & Smith, L. B. (2011). What you learn is what you see: using eye movements to study infant cross-situational word learning. Developmental Science, 14(2), 165–180.
Acknowledgements
This paper benefitted from an extraordinary amount of feedback from others, far too many to mention individually here. Particular thanks are due to Colin Allen, David Barack, Hayley Clatterbuck, Christopher Gauker, Bob Kentridge, Marcin Miłkowski, Mathias Niepert, Gualtiero Piccinini, Brendan Ritchie, Bruce Rushing, Whit Schonbein, Susan Sterrett, Evan Westra, Jessey Wright, two anonymous reviewers for this journal, and audiences at the University of Evansville, the Society for Philosophy and Psychology, the Southern Society for Philosophy and Psychology, Rice University’s CogTea, and the UH Department of Philosophy’s “works in progress” colloquium.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buckner, C. Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese 195, 5339–5372 (2018). https://doi.org/10.1007/s11229-018-01949-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-018-01949-1