1 Artificial Intelligence and Arithmetic

Approaches to artificial intelligence (AI) can be categorized in many ways, but one of the most fundamental distinctions concerns what is understood by the subject matter itself, i.e., what exactly is the intelligence that is pursued artificially. Standardly, this division is made along two dimensions (e.g., Russell & Norvig, 2020). The first dimension concerns whether intelligence is a property of behavior or internal processes (e.g., thought or reasoning). The division along the second dimension is based on whether intelligence should be human-like or rational in a more general, abstract sense.

These differences are particularly interesting in relation to human cognitive abilities in processing numerosities.Footnote 1 The history of machine solution of numerical problems goes back to at least the designs of Leonardo da Vinci and first functional mechanical calculators of arithmetical operations were introduced in the seventeenth century by Wilhelm Schickard and Blaise Pascal (Russell & Norvig, 2020). The development of electronic calculators in the 1940s and 50s, and in particular the introduction of solid-state transistors in the 1960s was transformative to the field (Hamrick, 1996). The speed and reliability of the new electronic calculators conclusively established calculating arithmetical operations as a domain in which machines outperform humans. Since the 1970s, pocket-sized electronic calculators have been available, and they (and later smartphones) have revolutionized the way arithmetical operations are carried out. Instead of using traditional cognitive tools, such as pen and paper, abacus and the slide rule, that facilitate arithmetical calculations, modern people can perform arithmetical operations by simply being able to feed the input to a machine and read the output.Footnote 2

In calculating arithmetical operations, the electronic calculator follows algorithms that, while human-programmed, are not meant to simulate or emulate human processes of arithmetical reasoning in, e.g., mental arithmetic or pen and paper calculations.Footnote 3 Electronic calculators do not memorize, for example, addition or multiplication tables. They may follow a multiplication algorithm that resembles the way humans conduct long multiplication on pen and paper (called the schoolbook or standard algorithm), but calculators function with binary representations of numbers and (standardly) calculate every multiplication using the same algorithm. This is clearly different from human performance in multiplication, in which the result of 9 * 9, for example, is usually recalled by rote memory whereas the result of 11 * 11 needs to be (at least for many people) calculated following the schoolbook algorithm.

Therefore, when comparing the human cognitive ability of conducting arithmetical calculations (possibly including cognitive tools such as pen and paper) to machine calculation, we can see that the artificial intelligence (the calculator) is designed to follow a particular type of processing, rather than simulating human-like ability with arithmetical operations. It is unlikely that there will be any change in this state of affairs, for a simple reason: arithmetical operations have already for many decades been something that machines are extremely good at. In comparison, humans are woefully slow and unreliable in their ability to conduct arithmetical calculations. It is hard to see why an engineer would try to build a machine that calculates arithmetical operations in the manner of human agents.

However, this is only the case if the focus is on calculating results of arithmetical operations. Human numerical ability encompasses much more than just arithmetical calculations. Even when limiting numerical ability to the natural numbers, arithmetic is a discipline including a multitude of activities aside from calculations of results of arithmetical operations. In addition, arithmetic is associated with a wide range of applications both in science and everyday life. On the highest level, what human mathematicians typically consider interesting about arithmetic are not calculations, but proving general theorems about all natural numbers. In this way, human arithmetic—and therefore also artificial human-like arithmetical intelligence—is clearly a much wider field than what can be carried out by electronic calculators. But what exactly is it? This is the question I will focus on first in this paper.

When we want to simulate human-like numerical cognition and arithmetical intelligence, what precisely is the target phenomenon? This what-question, as will be seen, prompts further questions. How could human-like artificial arithmetical intelligence be developed? And particularly interesting for philosophy, why would we want to do that? In this paper, I will treat all three—the what, how and why—questions. In Sect. 2, regarding the what-question, I show that human numerical cognition is in many ways peculiar and differs from the kind of numerical manipulation associated with formal arithmetic. In Sect. 3, regarding the how-question, I present recent progress in simulating human numerical cognition by deep neural networks. In Sect. 4, regarding the why-question, I analyze the relevance of these developments to the study of numerical cognition in humans and by extension the epistemology of arithmetic. Finally, in Sect. 5, I will present some problems and future prospects related to developing artificial human-like arithmetical intelligence.Footnote 4

2 Human Numerical Cognition

In order to model human-like arithmetical intelligence, we need to have a good understanding of what the relevant cognitive processes are. Since we are ultimately interested in elucidating arithmetic as a human activity, I contend that we should not be a priori excessively discriminatory concerning what counts as a relevant cognitive process. For example, clearly learning and conducting formal proof procedures are part of arithmetical intelligence. But how much will they tell us about what is an interesting theorem for a human arithmetician? To understand arithmetical cognition, we need to understand arithmetic as a human endeavor. The first step of this is to study how human arithmetical intelligence develops, starting from basic numerical cognition.Footnote 5

The problem is, human numerical cognition is full of peculiar phenomena that are seemingly harmful for the development of arithmetical intelligence, as it is understood in terms of modern mathematics. For example, numerical cognition includes a Stroop effect, a delay in reaction time in presence of incongruent stimuli. Experiments show that judgments of numerosity can be influenced by non-numerical magnitude stimuli, such as object size. In one example, Henik and Tzelgov (1982) showed that adult humans show both longer reaction times and lower accuracy in estimating which number is larger when the physical number symbol was incongruent with the numerical value (e.g., the pair

figure e

was more difficult to process than the pair

figure f

). Presumably, humans confuse the physical size with the numerical value and when prompted to answer which one is “larger”, they need to process the answer for longer when the stimuli are incongruent. It should be noted that the difference was not due to the subjects being confused about what they were asked. Even though it was clear that it was the size of the numerical value that they were asked about, the Stroop effect remained.

At a first glance, it seems that the Stroop effect on numerosities is something that we should hardly consider as a goal for artificial agents. It is difficult to see it serving any beneficiary purpose for the development of arithmetical intelligence. Yet this kind of judgment is potentially problematic when we consider the possibility of constructing human-like artificial intelligence. We do not know everything that is essential to the development of human arithmetical intelligence. While the numerical Stroop effect may be an unwanted side effect, its existence could also be indicative of some important characteristic of the way human beings cognize about numerosities and magnitudes.

This brings us to the question just what we mean by human numerical cognition and arithmetical intelligence. This question is divided into two subquestions. First, we need to ask what the target phenomenon in general is that we are trying to capture in the study of cognition and intelligence. Second, we must ask how similar that cognition and intelligence is across cultures and individuals.

For the first question, the standard approach traditionally in artificial intelligence and cognitive science has been to focus on human competence rather than performance (Chomsky, 1965). Individuals differ, of course, in their abilities, so Chomsky was looking for a way to talk about human abilities and capacities in a general way. Chomsky introduced the distinction for linguistics, but it has since been widely applied for also other cognitive phenomena (e.g., vision; see Marr, 1982). Fundamentally, the idea is that rather than actual individuals, in cognitive modelling we are interested in ideal human agents.Footnote 6

However, when it comes to something like arithmetical intelligence, this approach is potentially problematic. Arithmetic, unlike language and vision, is a cognitive ability that does not exist universally among the human species. It has developed independently several times over the course of human history, but there are also cultures that have never developed arithmetic, or even basic numeral words (Gordon, 2004; Ifrah, 1998; Pica et al., 2004). Therefore, to answer the second question presented above, arithmetical cognition not only differs across individuals, but also greatly among cultures. In order to deal with this integral cultural aspect of arithmetical cognition, we have argued for enculturated competence as an amendment of the Chomskyan notion of competence (Fabry & Pantsar, 2021). The target phenomenon, when it comes to human arithmetical intelligence, is not some universal ability. Instead, it is a culturally shaped ability, which should be studied in that particular context (Pantsar, 2019).

However, as argued by many researchers (Butterworth, 1999; Carey, 2009; Dehaene, 2011), it is quite likely that human arithmetical intelligence is developed based on universal, evolutionarily developed abilities to observe and process numerosities, what Carey (2009) calls core cognitive abilities. These abilities are thought to be present already in infancy and shared with many non-human animals. Such non-symbolic proto-arithmetical abilities (Pantsar, 2014) are standardly divided into two in the literature. First is subitizing, the ability to determine the amount of objects in our field of vision accurately without counting (Dehaene, 2011; Knops, 2020; Pantsar, 2019, 2021a; Starkey & Cooper, 1980). The subitizing ability is accurate, but it stops working when the number of objects exceeds three or four, which is generally thought to be the limit of the object tracking system (OTS). The OTS, also called the parallel individuation system in the literature (see, e.g., Carey, 2009), allows observing multiple objects in parallel and is often thought to be the cognitive system behind the subitizing ability (Knops, 2020). Because the OTS serves other functions beside subitizing the numerosity of objects, it is not considered to be numerosity-specific.

The second ability is estimating without counting and it is usually thought to be due to an approximate number system (ANS) (or “number sense”) (Dehaene, 2011; Spelke, 2000). Unlike the subitizing ability, the estimation ability is not limited to small numerosities. However, the estimations get increasingly inaccurate as the sizes of the estimated collections increase. This logarithmic nature of the ANS is captured by the so-called Weber-Fechner Law (or Weber’s Law), according to which humans can detect (without counting) differences in sizes of magnitudes when they are above a certain ratio (Knops, 2020).

The ANS is thought to be responsible for two peculiar effects in human numerical cognition. Consistent with the Weber-Fechner Law, the ANS has two standard behavioral signatures. First is the distance effect, which means that distinguishing between numerosities becomes easier with increasing numerical distance between them. Second is the size effect, which refers to the way numerosity estimations become less accurate as the numerical size of the estimated collection increases.

These ANS signatures also influence the human performance in numerical tasks after they have developed symbolic ability to process numerosities. Consider, for example, the following task: which one of the numbers below is bigger?

figure a

After that, consider the task which of the following numbers is bigger:

figure b

One might expect that an arithmetically trained adult would find both tasks equally easy to solve. But the data in fact show that the reaction time with the top pair is considerably longer than with the bottom pair, consistent with the distance effect (Dehaene, 2011). Indeed, the distance effect remains in place even when solving the problem requires comparing the first digit of two-digit numbers. For example, compare the pairings:

figure c

And:

figure d

Even though the second digit makes no difference for solving the task, again the data show a distance effect that makes the reaction time associated with the top pair longer (Hinrichs et al., 1981; Pinel et al., 2001). The distance effect even remains when the task is to determine whether two numbers are the same, in which case all that is needed is to distinguish between two number symbols (Dehaene & Akhavein, 1995). These data suggest strongly that the ANS cannot be “turned off” and it remains a part of our numerical cognition even when we have developed symbolic abilities to process numerosities.Footnote 7

3 Modelling and Simulating Human-Like Numerical Cognition

The phenomena above present an interesting matter for the question of modelling human numerical cognition and developing human-like artificial arithmetical intelligence. As originally envisioned by Newell and Simon (1961), the aim of artificial intelligence was to emulate human cognition. However, the distance and size effects, as well as the Stroop effect, seem like aspects of human cognition that are sub-optimal and confusing. Hence, any rational approach to AI (as opposed to human-like approach to AI) would be likely to see them as the kind of baggage that should not be included in artificial intelligence. On the other hand, if we want to model human arithmetical intelligence, how can we know that the proto-arithmetical abilities and their peculiarities are not in fact important? Even though the Stroop, size and distance effects by themselves might not be useful for an arithmetical AI, they could be symptoms of some underlying features that are essential to the way human numerical cognition functions, and thus also matter in the question of simulating human-like arithmetical intelligence.

Recently, there has emerged a direction of AI research that deems it important to simulate human numerical cognition, warts and all. The Stanford research group led by Jay McClelland has established a project to train artificial neural networks to learn in a two-dimensional visual environment in a way that aims to mirror the way children develop early numerical cognition (McClelland et al., 2016). Instead of symbol-based processing, their use of artificial neural networks (from here on just neural networks) is meant to simulate how training with visual stimuli can lead an AI to develop its own ability to process numerosities. An important pioneering experiment in this research area was reported by Stoianov and Zorzi (2012) who used a “stochastic hierarchical generative model”, which is a generic multilayer (i.e., deep) neural network that has one visible layer that encodes sensory data and hidden layers that generate increasingly complex nonlinear representations of the sensory data. Such features are seen as attractive for cognitive modelling, since they can mirror the neuronal structure of the brain (Stoianov & Zorzi, 2012).

What Stoianov and Zorzi did was to present the deep neural network with pairs of images with different sizes and numbers of dots, as is done standardly in number comparison tasks concerning the ANS with humans (Dehaene, 2011). This was done by unsupervised learning, so the neural network was not trained to focus on numerosity, area size, or any other factor of the visual stimuli. What the researchers found out was that the neural network learned to perform the numerosity comparison task with similar behavioral signatures as humans and non-human animals (Stoianov & Zorzi, 2012). Furthermore, the response profiles of the emergent “numerosity detectors” in the neural network resembled those reported earlier in the lateral intraparietal area of macaque brains (Roitman et al., 2007).

In the experiment reported by Stoianov and Zorzi, numerosity emerges as a statistical property of the input images. In the Stanford group, this line of research has been developed further. In an experiment reported by Testolin and colleagues (Testolin et al., 2020), numerosity discrimination is reported in generic deep neural networks, starting from a randomly weighed neural network. In addition to vector images of dots, as in the Stoianov and Zorzi study, the network of Testolin and colleagues learns numerical acuity from “natural” visual stimuli (derived, for example, from pictures of groups of animals), resulting in trajectories highly similar to the ones observed in human longitudinal studies (Halberda & Feigenson, 2008; Piazza et al., 2010). Furthermore, the final competence of the neural network is an approximation of that of human adults (Testolin et al., 2020).Footnote 8

What do these results imply? As argued by Testolin and colleagues, they suggest that the human numerical estimation ability is not necessarily due to an evolutionary developed numerosity-specific system, as assumed in the ANS hypothesis. Rather, a generic unsupervised neural network can end up with a similar numerical acuity, both by observing collections of dots and images derived from natural scenes. Of course, this does not mean that the ANS hypothesis is incorrect, but it should be enough to make us consider whether the estimation ability is due to some more general cognitive characteristic involved in learning.

Intriguingly, the data reported in such experiments does not fully conform to the human proto-arithmetical abilities involving numerosities. In another experiment, reported in (Chen et al., 2018), it is shown that this kind of emerging numerical acuity shows a coefficient of variation consistent with the Weber-Fechner Law, but only starting from four items. This is in line with the argumentation of Carey, Beck, and others who believe the OTS to be integral to the development of exact number concepts (Beck, 2017; Carey, 2009). Hence we should consider the possibility that the OTS is an innate cognitive system while, instead of an approximate number system, the numerosity estimation ability emerges generally in complex enough neuronal systems with appropriate training material. To test this hypothesis, we would need neural networks designed to simulate the OTS by limiting the amount of objects they can “observe” in parallel.

So far we have been discussing only proto-arithmetical abilities, but there have already been initial steps toward training neural networks to learn culturally specific numerical procedures, such as counting. Unlike in learning the estimation ability, in teaching a neural network to count the learning is supervised. In an experiment reported by Fang and colleagues (Fang et al., 2018), a recurrent neural network goes through a process of teacher guided or teacher monitored learning in which it is given two-dimensional images of blobs.Footnote 9 The general idea is to teach the network to “touch” the blobs and connect it to a numeral word, simulating the way children learn to count.Footnote 10 In teacher guided learning the teacher gives the network the targets of the next count word and the next blob location (or, if at the end of the counting procedure, a signal to end the process). In teacher monitored learning, the network attempts a procedure which is then corrected by the teacher by demonstrating the correct counting procedure (ibid.). The results showed that after mastering the task of “touching” the blobs, the networks achieved near perfect scores in counting to six after 2,000 training trials, after which they learned to count further with more trials. Importantly, the networks were generic systems that had no previous ability in processing numerosities. Obviously there are many aspects of children’s acquisition of number concepts that are not reflected in the results, but the data from the study is consistent with data on children’s learning of counting. Also for children, counting does not seem to require a full grasp of number concepts (Davidson et al., 2012), and gestures like pointing have been shown to be advantageous (see, e.g., Alibali & DiRusso, 1999).

This research direction was taken further in the experiment reported in (Di Nuovo & McClelland, 2019), in which a humanoid robot with two functional “five-fingered hands” was trained to use them to represent spoken numeral digits. The robot AI receives proprioceptive information from the robot hands, simulating tactile and proprioceptive sensory input in humans. What their analysis showed was that the proprioceptive information improved accuracy in recognizing spoken numeral words, through the AI being faster in creating a uniform number line than a control AI without the robot hand. Similar results were reported for a humanoid robot in (Pecyna et al., 2020). This type of research, while still in very early stages, could help reveal why finger counting procedures are advantageous also for human children in learning to count and acquiring number concepts (see, e.g., Bender & Beller, 2012).Footnote 11

Without further evidence, simply learning a counting process and recognizing numeral words is not necessarily connected to number concepts, but research like the simulations referenced above could prove to be important in explaining number concept acquisition. After that initial stage, the next challenge would be to simulate the learning of arithmetical operations. Ultimately, the aim of human-like arithmetical cognition would be to prove theorems of arithmetic. As should have become clear by now, the state-of-the-art of the current research is far from such human arithmetical ability. However, in only a few years, there has been progress with machine learning in deep neural networks that already points to relevance in explaining human development in numerical cognition. At this point we should start considering just how relevant the data are, and in what way.Footnote 12

4 Relevance for Research on Numerical Cognition and Epistemology of Arithmetic

It should be stressed that a lot of further research is needed before results like the ones considered in the previous section can be seen as solid evidence against or for particular views concerning the development of human numerical cognition and arithmetical intelligence. However, the potential of this type of AI research for explaining the emergence of human numerical cognition can be immediately seen. Aside from locating neuronal activity in numerosity processing to certain parts of the brain (most importantly the parietal and frontal lobes, and specifically the intraparietal sulcus, see, e.g., (Bugden et al., 2012; Fias et al., 2007)), the evidence for the existence of the ANS is primarily behavioral. But if the behavioral signatures of the ANS emerge also from generic deep neural networks, and if we believe that artificial neural networks can help explain human cognition, the ANS hypothesis may need to be reassessed. If the hypothesis receives no further support, we should at least be open to considering the possibility that there may not be numerosity-specific innate cognitive systems, given that the OTS is not numerosity-specific, either. In that case, numerical ability would develop based on more general cognitive characteristics. This would be a major departure from much of the present literature on numerical cognition and therefore a highly important topic for the epistemology of arithmetic.Footnote 13

If correct, the absence of numerosity-specific innate cognitive systems would not imply that there are no evolutionarily developed proto-arithmetical abilities. This is important to note. The research on artificial neural networks in no way suggests that subitizing and estimating are not evolutionarily developed proto-arithmetical abilities. What it suggests is the possibility that one of these abilities (estimating) may emerge from more general features of learning based on sensory input of discrete objects. This is an interesting prospect that should be taken into consideration also in the neuro-psychological study of human numerical cognition.Footnote 14

However, subitizing and estimating are proto-arithmetical, not arithmetical, abilities. What could research on deep neural networks tell us about the development of proper arithmetical abilities? As mentioned above, aside from behavioral signatures, the evidence for the existence of ANS in humans is mainly based on fMRI studies that show consistent activation of certain brain regions, including the intraparietal sulcus (IPS), in numerical estimation tasks (Cantlon et al., 2006). Since the IPS is also activated in symbolic numerical tasks, it has been argued that in the course of development of arithmetical cognition, the neural circuits originally developed for the ANS are recycled for new, arithmetical purposes (Menary, 2015). This principle of neuronal recycling was proposed originally by Dehaene, and Menary made it the foundation of his account of enculturation (Dehaene, 2009; Menary, 2014, 2015). Enculturation refers to the transformative processes in which interactions with the surrounding culture shape how cognitive practices are acquired and developed, leading to both structural and functional changes in the brain (Fabry, 2018; Menary, 2015). The key idea behind the enculturation framework is to stress the importance of the way evolutionarily developed biological faculties are transformed through the cultural transmission of cognitive practices in the course of ontogenetic development.

Arithmetic consists of such cognitive practices, and starting from the acquisition of number concepts, processes of enculturation shape the development of our arithmetical ability (Fabry, 2020; Jones, 2020; Pantsar, 2019, 2021a). But what are the biological faculties that are transformed in this process? This is where the research on human-like AI in deep neural networks could become highly relevant. It is possible that the IPS, for example, as the location of the “number sense” (Dehaene, 2011) is not due to any numerosity-specific biological cognitive system located in the IPS. One of the functions the IPS is primarily associated with is visual attention, and it is thought to play a key role in visuospatial working memory (Todd & Marois, 2004). It could be the case that the numerosity estimation ability is located in the IPS because it is closely connected to visual attention, in particular visuospatial working memory. Since the neural network experiments reported in (Stoianov & Zorzi, 2012) and (Testolin et al., 2020) use 2D images as their sole input, this connection is certainly worth studying. Perhaps there is something general about processing visual stimuli of objects that leads to the emergence of the estimation ability?

Related to this, it has been suggested that instead of neuronal recycling a different, more general principle of neural reuse (Anderson, 2010) is the foundation of enculturation (Fabry, 2020; Jones, 2020). According to the neural reuse hypothesis, there is greater structural and functional plasticity in the brain, which allows for more variation in the allocation of neural resources for particular cognitive functions. In light of the evidence from neural networks, neural reuse could indeed provide a better explanation of learning. There is reason to believe that learning is not tied as strongly to particular “modules” of the brain as the neuronal recycling principle suggests (Fabry, 2020; Pantsar, 2019). While there are undoubtedly many functions that in neurotypical individuals are located in the same areas of the brain, the brain also has a great deal of plasticity. This is supported by many data concerning the way also non-typical brain regions can be adopted for cognitive purposes (Anderson, 2010; Anderson, 2015;). Unlike in the so-called blank-slate theories of mind, however, the neural reuse account includes the notion of functional bias, which can explain why cognitive abilities can be typically associated with particular brain regions (Anderson, 2015).

This is consistent with the data on congenitally blind individuals that show how numerical cognition maps onto brain regions that are typically associated with visuo-spatial processing in the sighted, including the IPS (Crollen & Collignon, 2020). Moreover, blindness does not prevent the development of the estimation ability; indeed, data show that blind subjects have higher acuity in (tactile and auditory) numerosity estimation tasks than sighted ones (Castronovo & Seron, 2007). In addition, early (but curiously not late) blindness is associated with better arithmetic ability in adults compared to sighted subjects (Dormal et al., 2016). These data suggest that while visual processing may be closely connected to the development of numerosity estimation and later arithmetical ability, it is neither indispensable nor even ultimately advantageous in that development. The study of Crollen and Collignon also suggests that there is another reason why numerical ability is associated with brain regions such as the IPS, and it is not due to the connection to processing visual stimuli.

Does this indicate that there exists an ANS after all, and it is located (in an important part) in the IPS? While this possibility cannot be ruled out, I want to suggest another possible explanation. The object tracking system has also been located (partly; also other brain regions are associated with it) in the IPS (Blumberg et al., 2015; Howe et al., 2009). While often associated with visual object tracking, the object tracking ability is not limited to vision. Interestingly, auditory object tracking has been associated with common attentional resources with visual object tracking (Fougnie et al., 2018). It could be that a general object tracking ability is responsible for the emergence of both proto-arithmetical abilities: subitizing and estimation. Instead of being tied to vision, our proto-arithmetical abilities are associated with the IPS because it is (partly) responsible for our general ability to track objects. Here we can see a potential connection to the neural network studies presented in the previous section. What the neural networks learned to do was based on visual input, but it was ultimately based on responding to discrete objects. Perhaps that could be the key to explaining why the estimation ability emerged in the generic deep neural networks. What they had in common with both sighted and congenitally blind people was that they processed inputs of clearly distinguishable objects.

5 Future Prospects and Problems

In the end of the previous section, I have pondered different hypotheses concerning the emergence of the proto-arithmetical abilities. Those hypotheses are highly speculative and not meant to suggest anything more than possibilities for future research. An interesting hypothesis to test, for example, would be whether neural networks develop the estimation ability if not shown clearly distinguishable discrete objects. However, it would hardly be unambiguous what the confirmation or disconfirmation of that hypothesis would entail in terms of explaining the development of human proto-arithmetical abilities and numerical cognition. These questions require a lot more research before any connection between AI research and human cognition can be considered to be corroborated. But I contend that the experimental work reviewed in this paper shows potential for such research. Ultimately, I believe that AI research in this area should be conducted in close connection with cognitive neuroscience, in a genuinely interdisciplinary approach where philosophical research should not be forgotten.

Of course scientific AI research is valuable on its own, but in explaining the development of numerical and mathematical cognition, it could have relevance to a wide field of questions. To give just one example, a basic question prompted by the research presented above is what would be the minimal system that can develop ANS-like behavior. This could be relevant for determining what kind of neuronal structure is required for proto-arithmetical abilities to arise.

More generally, AI research on deep neural networks could help us get a better grasp of the different influences (e.g., proto-arithmetical and cultural) in the development of arithmetical intelligence. In a collaborative interdisciplinary research program, one key issue would be to understand better just how human numerical cognition develops. In order to make an AI learn (even roughly) like humans, or to recognize that it indeed learns like humans, we need to know more about the way humans learn. I have proposed that the framework of enculturation is important for this purpose. Some AI learning associated with numerical abilities (like the experiment about the ANS signature) is unsupervised, but if we want to simulate arithmetical learning, it has to be supervised and structured, to mirror the way humans acquire number concepts and arithmetical abilities supervised by parents, teachers, etc. In order to do this, we need to know how human learning is supervised and structured, and what effect it has on the development of cognitive capacities.

Aside from the question of modelling human (or animal) cognition, the most obvious connection is to the field of education science. If an AI can simulate the development of human mathematical cognition, it could be used to design new teaching contents and methods. With human children, seeing the effect of new contents and methods takes years. In addition, such experimentation is potentially problematic ethically, since a group of children as test subjects may be exposed to inferior teaching, which could damage their future prospects in education and wider in life. With artificial intelligences, there is no such risk and the process takes far less time. The simulation of years’ worth of education could be conducted in the matter of hours.Footnote 15

While research on deep artificial neural networks shows promise, we should also have a healthy dose of skepticism regarding the kind of claims that can be made. One important issue is that the training sets for neural networks tend to be several orders of magnitude larger than what humans typically experience in their learning process. Given the potential differences in the respective learning processes, it is reasonable to be skeptical of the way deep neural networks learn things fitting with actual human learning. This is a general problem that goes beyond numerical and mathematical cognition. As remarked by Dehaene, for example, the way babies can learn new words with only one or two repetitions is in stark contrast to the ineffective way neural networks learn content (Dehaene, 2020). This is something that Bayesian approaches to AI are designed to tackle, but the progress so far still leaves an enormous gap between the way humans (and other animals) learn and machine learning.

However, while there is reason for skepticism, this matter should be seen as a challenge to be tackled rather than an insurmountable problem.Footnote 16 Different sizes of training sets can be experimented on and they can be compared with the kind of amounts of similar experiences that human children would typically encounter in their ontogeny. For example, as reported by Testolin and colleagues (2020), the artificial neural networks developed ANS-type numerosity discrimination ability up to Weber fraction 0.4 after roughly 1.5 million pattern presentations. Similar ability in humans is present around four years of age (Halberda & Feigenson, 2008; Piazza et al., 2010). Are these figures comparable? How many relevant natural patterns can a four-year old be presumed to have experienced? This is not easy to establish, but doing so would be important for determining the similarity (or difference) of human and AI learning processes. This is particularly relevant for supervised and structured learning. While thus far the way deep neural networks learn tends to require huge training sets, with different kinds of teacher guided or teacher monitored learning the size of the sets could be made much smaller. For example, in the counting experiment reported by Fang and colleagues, the supervised neural network managed to count to six after 2,000 training trials. This could be comparable to the amount of trials children require in learning to count.

Generally, while we can expect many similar problems as AI research advances, I believe that AI has established itself as a relevant research topic related to the study of human numerical cognition and arithmetical intelligence. We are only in early stages of that research, but there is reason for optimism, as well as for a healthy dose of skepticism. While the empirical data is far from conclusive, the modern methods of machine learning provide an interesting new dimension to studying the development of human numerical and arithmetical cognition. Although the neural networks in the research discussed in this paper can, at least for now, only be assessed in terms of their behavior, they are an important development over the “Turing test” kind of approach to AI in which a computer is specifically programmed to simulate human behavior. Unlike that kind of programming, the neural networks provide a bottom-up approach to emulating human cognition. That generic neural networks can learn some behaviors that have been considered to be the domain of human (and animal cognition), like the ANS-based estimation, is a highly interesting result that should not be dismissed.