Do Machine-Learning Machines Learn?

Bringsjord, Selmer; Govindarajulu, Naveen Sundar; Banerjee, Shreya; Hummel, John

doi:10.1007/978-3-319-96448-5_14

Do Machine-Learning Machines Learn?

Selmer Bringsjord³,
Naveen Sundar Govindarajulu³,
Shreya Banerjee³ &
…
John Hummel⁴

Conference paper
First Online: 29 August 2018

3671 Accesses
6 Citations
1 Altmetric

Part of the book series: Studies in Applied Philosophy, Epistemology and Rational Ethics ((SAPERE,volume 44))

Abstract

We answer the present paper’s title in the negative. We begin by introducing and characterizing “real learning” ($\mathcal {RL}$) in the formal sciences, a phenomenon that has been firmly in place in homes and schools since at least Euclid. The defense of our negative answer pivots on an integration of reductio and proof by cases, and constitutes a general method for showing that any contemporary form of machine learning (ML) isn’t real learning. Along the way, we canvass the many different conceptions of “learning” in not only AI, but psychology and its allied disciplines; none of these conceptions (with one exception arising from the view of cognitive development espoused by Piaget), aligns with real learning. We explain in this context by four steps how to broadly characterize and arrive at a focus on $\mathcal {RL}$.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The need for the qualifications (i.e. determinate, non-question-begging) should be obvious. The answer to the present paper’s title that a machine which machine-learns by definition learns, since ‘learn’ appears in ‘machine-learn,’ assumes at the outset that what is called ‘machine learning’ today is real learning—but that’s precisely what’s under question; hence the petitio.
2.
All mathematical models of learning relevant to the present discussion that we are aware of take learning to consist fundamentally in the learning of number-theoretic functions from $\mathbb {N} \times \mathbb {N} \times \cdots \times \mathbb {N}$ to $\mathbb {N}$. Even when computational learning was firmly and exclusively rooted in classical recursion theory, and dedicated statistical formalisms were nowhere to be found, the target of learning was a function of this kind; see e.g. (Gold 1965; Putnam 1965), a modern, comprehensive version of which is given in (Jain et al. 1999). We have been surprised to hear that some in our audience aren’t aware of the basic, uncontroversial fact, readily appreciated by consulting the standard textbooks we cite here and below, that machine learning in its many guises takes the target of learning to be number-theoretic functions. A “shortcut” to grasping a priori that all systematic, rigorously described forms of learning in matters and activities computational and mechanistic must be rooted in number-theoretic functions, is to simply note that computer science itself consists in the study and embodiment of number-theoretic functions, defined and ordered in hierarchies (e.g. see Davis and Weyuker 1983). We by the way focus herein on unary functions $f : \mathbb {N} \mapsto \mathbb {N}$ only for ease of exposition.
3.
A biconditional isn’t needed. We use only a weaker set of necessary conditions, not a set of necessary and sufficient conditions.
4.
Not to be be confused with RL, reinforcement learning, in which real learning, as revealed herein, doesn’t happen.
5.
As many readers will know, Searle’s (1980) Chinese Room Argument (CRA) is intended to show that computing machines can’t understand anything. It’s true that Bringsjord has refined, expanded, and defended CRA (e.g. see Bringsjord 1992, Bringsjord and Noel 2002; Bringsjord 2015), but bringing to bear here this argumentation in support of the present paper’s main claim would instantly demand an enormous amount of additional space. And besides, as we now explain, calling upon this argumentation is unnecessary.
6.
Since at bottom, as noted (see note 2), the target of learning should be taken for generality and rigor to be a number-theoretic function, it’s natural to consider learning in the realm of the formal sciences.
7.
Just as (computer) programs can be correct or incorrect, so too proofs can be correct or incorrect. For more on this, see e.g. (Arkoudas and Bringsjord 2007).
8.
If we regard Turing to have been speaking of modern AI in his famous (Turing, 1950), note then too that his orientation is test-based: he gave here of course the famous ‘Turing Test.’.
9.
In fact, this is why real learning for humans in mathematics is challenging; see e.g. (Moore 1994).
10.
Our assumption here thus specifically invokes connectionist ML. But this causes no loss of generality, as we explain by way the “tour” of ML taken in Sect. 6.1, and the fact that the proof in the Appendix, as explained there, is a general method that will work form any contemporary form of ML.
11.
This is a rough-and-ready extraction from (Jain et al. 1999), and must be sufficient given the space limitations of the present short paper, at least for now. Of course, there are many forms of ML/machine learning in play in AI of today. In Sect. 6.1 we consider different forms of ML in contemporary AI. In Sect. 6.2 we consider different types of “learning” in psychology and allied disciplines.
12.
Lathrop (1996) shows, it might be asserted, that uncomputable functions can be machine-learned. But in his scheme, there is only a probabilistic approximation of real learning, and—in clear tension with (c1$'$)–(c3)—no proof in support of the notion that anything has been learned. The absence of such proofs is specifically called out in the formal deduction given in the Appendix.
13.
A pair of additional works help to further seal our case: (Kearns and Vazirani 1994; Shalev- Shwartz and Ben-David 2014). Study of these texts will reveal that $\mathcal {RL}$ as per (c1$'$)–(c3) is nowhere to be found.
14.
We of course join epistemological cognoscenti in being aware of Gettier-style cases, but they can be safely left aside here. For the record, Bringsjord claims to have a solution anyway—one that is at least generally in the spirit of Chisholm’s (1966) proposed solution, which involves requiring that the justification in justified-true-belief accounts of knowledge be of a certain sort. For Gettier’s landmark paper, see (Gettier 1963).
15.
Specifically, we shall see that the formal deduction of the Appendix is actually a method for showing that other forms of “modern” ML, not just those that rely on ANNs, don’t enable machines to really learn anything. E.g., the method can take Bayesian learning in, and yield as output that such learning isn’t real learning.
16.
Shakespeare himself, or better yet even Ibsen, or better better yet Bellow, couldn’t have invented a story dripping with this much irony—a story in which the machine-learning people persecuted the logicians for building “brittle” systems, and then the persecutors promptly proceeded to blithely build comically brittle systems as their trophies (given to themselves).
17.
In which is by the way cited hypercomputational artificial neural networks.
18.
E.g. even beginning textbooks introducing single-variable differential/integral calculus ask for verification of human learning by asking for proofs. The cornerstone and early-on-introduced concept of a limit is accordingly accompanied by requests to students that they supply proofs in order to confirm that they understand this concept. Thus we e.g. have on p. 67 of (Stewart, 2016) a request that our reader prove that
$$\begin{aligned} \lim _{x \rightarrow 3} g(x) = (4x - 5) = 7 \end{aligned}$$
. What machine-learning machine that has learned the function g here can do that?
19.
This is essentially the Short Short Story Game of (Bringsjord 1998), much harder than such Turing-computable games as Checkers, Chess, and Go, which are all at the same easy level of difficulty (EXPTIME).
20.
Outside of the present paper, we have carried out a second analysis that confirms this, by examining learning in AI as characterized in (Russell and Norvig 2009), and invite skeptical readers to carry out their own analysis for this textbook, and indeed for any comprehensive, mainstream textbook. The upshot will be the stark fact that $\mathcal {RL}$, firmly in place since Euclid as what learning in the formal sciences is, will be utterly absent.
21.
Instead of looking to published attempts to systematically present AI (such as the textbooks upon which we rely herein), one could survey practitioners in AI, and see if their views harmonize with the publications explicitly designed to present all of AI (from a high-altitude perspective). E.g., one could turn to such reports as (Müller and Bostrom 2016), in which the authors report on a specific question, given at a conference that celebrated AI’s “turning 50” (AI@50), which asked for an opinion as to the earliest date (computing) machines would be able to simulate human-level learning. It’s rather interesting that 41% of respondents said this would never happen. It would be interesting to know if, in the context of the attention ML receives these days, the number of these pessimists would be markedly smaller. If so, that may well be because, intuitively, plenty of people harbor suspicions that ML in point of fact hasn’t achieved any human-level real learning.
22.
Luger’s book revolves around a fundamental distinction between what he calls weak problem-solving versus strong problem-solving.
23.
There are a few exceptions. Hummel (2010) has explained that sophisticated and powerful forms of symbolic learning, ones aligned with second-order logic, are superior to associative forms of learning. Additionally, there’s one clear historical exception, but it’s now merely a sliver in psychology (specifically, in psychology of reasoning), and hence presently has insufficient adherents to merit inclusion in the ontology we now proceed to canvass. We refer here to the type of learning over the years of human development and formal education posited by Piaget; e.g. see (Inhelder and Piaget 1958). Piaget’s view, in a barbaric nutshell, is that, given solid academic education, nutrition, and parenting, humans develop the capacity to reason with and even eventually over first-order and modal logic—which means that such humans would develop the capacity to learn in $\mathcal {RL}$ fashion, in school. Since attacks on Piaget’s view, starting originally with those of Wason and Johnson-Laird (e.g. see Wason and Johnson-Laird 1972), many psychologists have rejected Piaget’s position. For what it’s worth, Bringsjord has defended Piaget; see e.g. (Bringsjord et al. 1998).
24.
We are happy to concede that years of laborious (and tedious?) study of conditioning using appetitive and aversive reinforcement (and such phenomena as inhibitory conditioning, conditioned suppression, higher-order conditioning, conditioned reinforcement, and blocking) has revealed that conditioning can’t be literally reduced to new reflexes, but there is no denying that in conditioning, any new knowledge and representation that takes form falls light years short of $\mathcal {RL}$.
25.
Note that all occurrences of ‘understanding’ in the itemized list that follows, in keeping with the psychometric operationalization introduced at the outset in order not to rely on the murky concept of understanding, could be invoked here; but doing so would take much space and time, and be quite inelegant.
26.
Peano Arithmetic (PA) is rarely introduced by name in K–12 education, but all the axioms of it, save perhaps for the Induction Schema, are introduced and taught there.
27.
This conception matches that of an agent in orthodox AI: see the textbooks, e.g. (Luger 2008; Russell and Norvig 2009).

References

Achab, M., Bacry, E., Gaïffas, S., Mastromatteo, I., Muzy, J.F.: Uncovering causality from multivariate Hawkes integrated cumulants. In: Precup, D., Teh, Y.W. (eds) Proceedings of the 34th International Conference on Machine Learning, PMLR, International Convention Centre, Sydney, Australia. Proceedings of Machine Learning Research, vol. 70, pp. 1–10 (2017). http://proceedings.mlr.press/v70/achab17a.html
Arkoudas, K.: Denotational proof languages. Ph.D. thesis, MIT (2000)
Google Scholar
Arkoudas, K., Bringsjord, S.: Computers, justification, and mathematical knowledge. Minds Mach. 17(2), 185–202 (2007)
Article Google Scholar
Arkoudas, K., Musser, D.: Fundamental Proof Methods in Computer Science: A Computer-Based Approach. MIT Press, Cambridge (2017)
MATH Google Scholar
Bandura, A., Walters, R.H.: Social Learning Theory, vol. 1. Prentice-Hall, Englewood Cliffs (1977)
Google Scholar
Bandura, A., Ross, D., Ross, S.A.: Transmission of aggression through imitation of aggressive models. J. Abnorm. Soc. Psychol. 63(3), 575 (1961)
Article Google Scholar
Barrett, L.: Beyond the Brain: How Body and Environment Shape Animal and Human Minds. Princeton University Press, Princeton (2015)
Google Scholar
Bellman, A., Bragg, S., Handlin, W.: Algebra 2: Common Core. Pearson, Upper Saddle River (2012). Series Authors: Charles, R., Kennedy, D., Hall, B., Consulting Authors: Murphy, S.G
Google Scholar
Boolos, G.S., Burgess, J.P., Jeffrey, R.C.: Computability and Logic, 4th edn. Cambridge University Press, Cambridge (2003)
MATH Google Scholar
Bringsjord, S.: What Robots Can and Can’t Be. Kluwer, Dordrecht (1992)
Book Google Scholar
Bringsjord, S.: Chess is too easy. Technol. Rev. 101(2), 23–28 (1998). http://kryten.mm.rpi.edu/SELPAP/CHESSEASY/chessistooeasy.pdf
Google Scholar
Bringsjord, S.: Psychometric artificial intelligence. J. Exp. Theor. Artif. Intell. 23(3), 271–277 (2011)
Article Google Scholar
Bringsjord, S.: The symbol grounding problem-remains unsolved. J. Exp. Theor. Artif. Intell. 27(1), 63–72 (2015). https://doi.org/10.1080/0952813X.2014.940139
Article Google Scholar
Bringsjord, S., Arkoudas, K.: The modal argument for hypercomputing minds. Theor. Comput. Sci. 317, 167–190 (2004)
Article MathSciNet Google Scholar
Bringsjord, S., Noel, R.: Real robots and the missing thought experiment in the Chinese room dialectic. In: Preston, J., Bishop, M. (eds.) Views into the Chinese Room: New Essays on Searle and Artificial Intelligence, pp. 144–166. Oxford University Press, Oxford (2002)
Google Scholar
Bringsjord, S., Schimanski, B.: What is artificial intelligence? psychometric AI as an answer. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 887–893. Morgan Kaufmann, San Francisco (2003). http://kryten.mm.rpi.edu/scb.bs.pai.ijcai03.pdf
Bringsjord, S., Zenzen, M.: Superminds: People Harness Hypercomputation, and More. Kluwer Academic Publishers, Dordrecht (2003)
Book Google Scholar
Bringsjord, S., Bringsjord, E., Noel, R.: In defense of logical minds. In: Proceedings of the 20th Annual Conference of the Cognitive Science Society, pp. 173–178. Lawrence Erlbaum, Mahwah (1998)
Google Scholar
Bringsjord, S., Kellett, O., Shilliday, A., Taylor, J., van Heuveln, B., Yang, Y., Baumes, J., Ross, K.: A new Gödelian argument for hypercomputing minds based on the busy beaver problem. Appl. Math. Comput. 176, 516–530 (2006)
MathSciNet MATH Google Scholar
Chisholm, R.: Theory of Knowledge. Prentice-Hall, Englewood Cliffs (1966)
Google Scholar
Davis, M., Weyuker, E.: Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science, 1st edn. Academic Press, New York (1983)
MATH Google Scholar
Dodig-Crnkovic, G., Giovagnoli, R. (eds.): Computing Nature: Turing Centenary Perspective. Springer, Berlin (2013). https://www.springer.com/us/book/9783642372247
Google Scholar
Domjan, M.: The Principles of Learning and Behavior, 7th edn. Cengage Learning, Stamford (2015)
Google Scholar
Gallistel, C.R.: Learning and representation. In: Learning and Memory: A Comprehensive Reference, vol. 1. Elsevier (2008) https://doi.org/10.1016/j.neuron.2017.05.021
Gettier, E.: Is justified true belief knowledge? Analysis 23, 121–123 (1963). http://www.ditext.com/gettier/gettier.html
Article Google Scholar
Gold, M.: Limiting recursion. J. Symb. Logic 30(1), 28–47 (1965)
Article MathSciNet Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
MATH Google Scholar
Goodstein, R.: On the restricted ordinal theorem. J. Symb. Logic 9(31), 33–41 (1944)
Article MathSciNet Google Scholar
Huitt, W.: Classroom Instruction. Educational Psychology Interactive (2003)
Google Scholar
Hummel, J.: Symbolic versus associative learning. Cogn. Sci. 34(6), 958–965 (2010)
Article MathSciNet Google Scholar
Inhelder, B., Piaget, J.: The Growth of Logical Thinking from Childhood to Adolescence. Basic Books, New York (1958)
Book Google Scholar
Jain, S., Osherson, D., Royer, J., Sharma, A.: Systems That Learn: An Introduction to Learning Theory, 2nd edn. MIT Press, Cambridge (1999)
Google Scholar
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Google Scholar
Kitzelmann, E.: Inductive programming: a survey of program synthesis techniques. In: International Workshop on Approaches and Applications of Inductive Programming, pp 50–73. Springer (2009)
Google Scholar
Lathrop, R.: On the learnability of the uncomputable. In: Saitta, L. (ed.) Proceedings of the 13th International Conference on Machine Learning, The conference was held in Italy, 3–6 July 1996, pp 302–309. Morgan Kaufman, San Francisco (1996). https://pdfs.semanticscholar.org/6919/b6ad91d9c3aa47243c3f641ffd30e0918a46.pdf
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Luger, G.: Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 6th edn. Pearson, London (2008)
Google Scholar
Mackintosh, N.J.: Conditioning and Associative Learning. Calendron Press, Oxford (1983)
Google Scholar
Marblestone, A.H., Wayne, G., Kording, K.P.: Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10(94) (2016). https://doi.org/10.3389/fncom.2016.00094
Moore, R.C.: Making the transition to formal proof. Educ. Stud. Math. 27(3), 249–266 (1994)
Article Google Scholar
Müller, V., Bostrom, N.: Future progress in artificial intelligence: a survey of expert opinion. In: Müller, V. (ed.) Fundamental Issues of Artificial Intelligence (Synthese Library), pp. 553–571. Springer, Berlin (2016)
Google Scholar
Penn, D., Holyoak, K., Povinelli, D.: Darwin’s mistake: explaining the discontinuity between human and nonhuman minds. Behav. Brain Sci. 31, 109–178 (2008)
Google Scholar
Putnam, H.: Trial and error predicates and a solution to a problem of Mostowski. J. Symbolic Logic 30(1), 49–57 (1965)
Article MathSciNet Google Scholar
Rado, T.: On non-computable functions. Bell Syst. Tech. J. 41, 877–884 (1963)
Article MathSciNet Google Scholar
Ross, J.: Immaterial aspects of thought. J. Philos. 89(3), 136–150 (1992)
Article Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Upper Saddle River (2009)
MATH Google Scholar
Schapiro, A., Turk-Browne, N.: Statistical learning. Brain Mapp. Encyclopedic Ref. 3, 501–506 (2015)
Article Google Scholar
Searle, J.: Minds, brains and programs. Behav. Brain Sci. 3, 417–424 (1980)
Article Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Stewart, J.: Calculus. 8th edn. Cengage Learning, Boston (2016), We refer here to an electronic version of the print textbook. The “Student Edition” of the hard-copy textbook has an ISBN 978-1-305-27176-0
Google Scholar
Titley, H.K., Brunel, N., Hansel, C.: Toward a neurocentric view of learning. Neuron 95(1), 19–32 (2017)
Article Google Scholar
Turing, A.: Computing machinery and intelligence. Mind LIX 59(236), 433–460 (1950)
Article MathSciNet Google Scholar
Wason, P., Johnson-Laird, P.: Psychology of Reasoning: Structure and Content. Harvard University Press, Cambridge (1972)
Google Scholar
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar

Download references

Acknowledgement

We are deeply appreciative of feedback received at PT-AI 2017, the majority of which is addressed herein. The first author is also specifically indebted to John Hummel for catalyzing, in vibrant discussions at MAICS 2017, the search for formal arguments and/or theorems establishing the proposition Hummel and Bringsjord co-affirm: viz., statistical machine learning simply doesn’t enable machines to actually learn, period. Bringsjord is also thankful to Sergei Nirenburg for valuable conversations. Many readers of previous drafts have been seduced by it’s-not-really-learning forms of learning (including worse-off-than artificial neural network (ANN) based deep learning (DL) folks: Bayesians), and have offered spirited objections, all of which are refuted herein; yet we are grateful for the valiant tries. Bertram Malle stimulated and guided our sustained study of types of learning in play in psychology$^+$, and we are thankful. Jim Hendler graciously read an early draft; his resistance has been helpful (though perhaps now he’s a convert). The authors are also grateful for five anonymous reviews, some portions of which reflected at least partial and passable understanding of our logico-mathematical perspective, from which informal notions of “learning” are inadmissible in such debates as the present one. Two perspicacious comments and observations from two particular PT-AI 2017 participants, subsequent to the conference, proved productive to deeply ponder. We acknowledge the invaluable support of “Advanced Logicist Machine Learning” from ONR, and of “Great Computational Intelligence” from AFOSR. Finally, without the wisdom, guidance, leadership, and raw energy of Vincent Müller, PT-AI 2017, and any ideas of ours that have any merit at all, and that were expressed there and/or herein, would not have formed.

Author information

Authors and Affiliations

Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, 12180, USA
Selmer Bringsjord, Naveen Sundar Govindarajulu & Shreya Banerjee
901 West Illinois Street, Urbana, IL, 61801, USA
John Hummel

Authors

Selmer Bringsjord
View author publications
You can also search for this author in PubMed Google Scholar
Naveen Sundar Govindarajulu
View author publications
You can also search for this author in PubMed Google Scholar
Shreya Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
John Hummel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Selmer Bringsjord .

Editor information

Editors and Affiliations

Interdisciplinary Ethics Applied (IDEA) Centre, University of Leeds, Leeds, West Yorkshire, UK
Vincent C. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bringsjord, S., Govindarajulu, N.S., Banerjee, S., Hummel, J. (2018). Do Machine-Learning Machines Learn?. In: Müller, V. (eds) Philosophy and Theory of Artificial Intelligence 2017. PT-AI 2017. Studies in Applied Philosophy, Epistemology and Rational Ethics, vol 44. Springer, Cham. https://doi.org/10.1007/978-3-319-96448-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-96448-5_14
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96447-8
Online ISBN: 978-3-319-96448-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics