1 Introduction

Our mathematical capacities are impressive. Somehow, without clear evolutionary precedents, we are able to theorize about numbers, sets, and other mathematical objects. Mathematics also makes extensive use of symbols, and in such a way that, for example, Dutilh Novaes (2013) has argued that symbol use is in some ways constitutive of doing mathematics. It thus comes as no surprise that accounts of cognition which stress the interaction between the brain and the environment have seen mathematics as an important case study (De Cruz 2008; Menary and Kirchhoff 2014; Menary 2015). More specifically, our arithmetical capacities—extensively studied by cognitive scientists—are an important example for such accounts of cognition. Typically, philosophers pointing to our arithmetical capacities draw a distinction between our innate quantity-related capacities (shared with a large number of other animals) and our learned, culturally influenced, arithmetical capacities—which I will here discuss primarily from the Western context, since most of the empirical studies on the subject used Western participants.

Our innate quantity-related capacities are usually divided into two systems, which are functionally different and located in different parts of the brain (Feigenson et al. 2004). First of all, we possess what is known as the parallel individuation system, which keeps track of up to three or four objects at the same time. This system does not explicitly represent number, but it does allow infants to exhibit surprise (in the form of longer looking times) when one object unexpectedly disappears from a collection of two objects (Wynn 1992; Feigenson et al. 2002). The parallel individuation system is limited to these very small numbers. For example, infants don’t distinguishing between 2 and 4 crackers. When presented with two boxes, one containing two crackers, and the other four crackers, infants will crawl to either box at chance. If the comparison is, instead, between 2 and 3 crackers they will reliably crawl towards the one with more crackers (Feigenson et al. 2002).

A second innate system, known as the Approximate Number System (ANS), allows us to distinguish between larger collections. The ANS allows us to distinguish between collections with, for example, 4 and 8 items (Xu 2003). Not all collections can be distinguished, however: adults are not able to reliably choose between collections with 21 and 24 items. Whether or not we are able to distinguish between collections reliably (and how well we are able to do so when we meet a given reliability threshold) depends on the ratio between the number of items: when one collection has twice as many items it is easier than when it has only one and a half times as many items (Barth et al. 2003). The lack of exact number representations in the ANS and the parallel individuation system means that there is a big discrepancy between our innate abilities and the ones we end up with after years of practice with counting and arithmetical operations.

Those years of practice help us to acquire what Menary (2015) calls the Discrete Number System (DNS), though this may not be a system in the more specific sense above (there may be different cognitive processes for processing different numbers, whereas the use of ‘system’ above applies to a unified cognitive process that has been localized in the brain). The DNS encompasses our familiar arithmetical abilities, such as the ability to distinguish between collections of arbitrary sizes—paradigmatically by counting. That ability is exact, in the sense that people who have it can (in principle) distinguish between any two numbers. The DNS also includes representations for all of these different sizes, i.e. explicit representations of the natural numbers. These representations are not only helpful in designating numbers, they also play a role in algorithms for arithmetical operations (Menary 2015, p. 14).

The DNS, and its relation to numerals,Footnote 1 has been discussed in a few places already. The work of Schlimm is notable for its focus on numerals (Schlimm 2018; Schlimm and Neth 2008), as is the study of neuronal recycling by Dehaene and Cohen (2007). On the basis of these discussions Menary (2015) has claimed that a proper understanding of the DNS implies that Cognitive Integration is true, i.e. that the Hypothesis of Extended Cognition (HEC) holds instead of the weaker Hypotesis of Embedded Cognition (HEMC).

The first, HEC, was first developed by Clark and Chalmers (1998), Hutchins (1995) and Zhang and Norman (1995). It states that not all cognitive processes are wholly located in the brain. In other words, “mental and cognitive processes and states are integrated with states and processes found in the environment” (Menary 2010, p. 562). The second, HEMC, instead claims that the environment plays an important role but that cognitive processes are not (partly) constitutedFootnote 2 by the environment. So, “mental and cognitive processes and states are scaffolded by/depend upon the environment” (Menary 2010, p. 562). Menary places, for example, Sterelny (2010) in this camp. Furthermore, both of these positions are externalist positions, where externalism is characterized by the claim that “(EXT) mental processes depend intimately on environmental resources, and should be studied within the context of those resources” (Sprevak 2010, p. 362). This contrasts with internalist views of cognition, which defend “(INT) mental processes are largely self-sufficient, and can be studied largely in isolation from environmental props” (Sprevak 2010, p. 361)

According to Menary (2015) a case study of the DNS can help decide that HEC is a better hypothesis than HEMC, an issue which has been debated in the literature (Baumgartner and Wilutzky 2017; Pöyhönen 2014; Sprevak 2010). The goal of this paper is therefore to look at the DNS case study in more detail, combining literature from cognitive science on the functioning of the brain with more theoretical studies of numeral systems [such as Zhang and Norman (1995) and Schlimm and Neth (2008)]. This has not yet been done in the context where the DNS is used in an argument for HEC over HEMC, though details resulting from this combination of approaches are relevant: the small number DNS is, I argue, not a good basis for an argument for externalism, whereas the large number DNS at least decides in favour of EXT. I, however, also argue that the large number DNS does not decide between HEC and HEMC (on the basis of an inference to the best explanation).

2 The small number DNS

There is an important difference between the small and large number DNS. The properties of the numeral system are not equally important for the cognitive mechanisms that underlie the two parts of the DNS. The distinction is based on the fact that there are different cognitive mechanisms underlying the small and large number DNS, where the small number DNS functions irrespective of the internal structure of numerals (the way they are composed out of digits) but the large number DNS is heavily influenced by the structure of numerals. However, we do not yet know enough about the relevant cognitive mechanisms to pinpoint the border between the small and large number DNS. ‘Three’, for example, clearly falls under the small number DNS. The parallel individuation system is probably the underlying cognitive mechanism in most cases, as e.g. numerical comparisons for small numbers are supported by the parallel individuation system (Cheung and Le Corre, forthcoming). One could thus draw the border at the limit of that system, so around four, which Zhou and Bowern (2015) also suggests as a natural ‘low-limit’ case, based on a study of Australian languages.

A border around four doesn’t quite capture my characterization of the small number DNS as the part of the DNS which functions irrespective of the internal structure of numerals. ‘Six’ also lacks internal structure, and the DNS might rely on the (sufficiently exact) ANS for those numbers (Huber et al. 2017; Sullivan and Barner 2010). ‘Twenty-three’ and ‘one hundred’ on the other hand, have an internal structure, which influences the underlying cognitive processes that are part of the DNS. For that reason the border between small an large may be higher than four. The location of the border could also depend partly on culture, specifically on the point where numerals first have an internal structure. Note, however, that this only changes the range and not the nature of the small number DNS. In any case, the small number DNS would (if a border higher than four or five is the right one) be made up of disparate cognitive mechanisms. The system is, instead, unified by a feature relevant to the current case study: an indifference to the specifics of the notation.

Regardless of where exactly one draws the border, a good starting point for a discussion of the small number DNS is the way in which we acquire the relevant number concepts as children. As described above, initially we are able to distinguish collections with one item from those with two and three items, but not from collections with four or more items. When children are about 22 months old they start to succeed at tasks where they have to distinguish collections with one item from those with more than three items (Sarnecka and Lee 2009). Before then, comparisons of one item with any number of items above three results in behaviour that is at chance. The crucial switch at 22 months is that children acquire the ability to perform above chance at tasks where they have to choose between a collection with one item and a collection with more than three items .

While it is not yet known what cognitive mechanism underpins success at these tasks, it does seem to be a fairly general mechanism. It also underlies the recognition of a grammatically marked singular/plural distinction in language (Li et al. 2009). In fact, the above ability is developed faster when the singular/plural distinction is explicitly marked in the grammar (Sarnecka et al. 2007).

My reason for bringing up this ability is that I think that the acquisition of number concepts relies heavily on it. As I have defended in Buijsman (2017), it is possible to acquire the number concept one on the basis of the ability to distinguish collections with one item from those with more than one item. In effect, this means that children are able to determine that a collection has exactly one item. More precisely, as they have this ability before they have acquired the concept one, this can be interpreted as the ability to recognize when there is an F (e.g. a cookie) and there are no other Fs (no other cookies). The concept one can then be acquired based on that ability, as one applies in precisely those scenarios where there is an F and there are no other Fs.

I think that the same mechanism drives the acquisition process of the following number concepts. Carey (2009) has a similar account that only applies to the numbers one through four, but I see no reason why the following idea can’t be applied to five, six, and so on. The idea is that the ability to distinguish between collections of one and those with more than one item can support the acquisition of number concepts through iterated use. The concept two can be acquired on the basis of situations where a collection with exactly one item is added to another collection with exactly one item. That is a slightly different proposal from that of Carey (2009), who argues that children form mental models based on the parallel individuation system and then establish one-to-one correspondences with these mental models. I am not suggesting that children construct such mental models. Instead, I suggest that they recognize the common factor to situations where the word ‘two’ is correctly applied purely on the basis of an iterated application of this ability to distinguish between one item and more than one item [see Buijsman (2017) for more details on this account of number concept acquisition].

How does our acquisition of number concepts relate to the role of the DNS as a case study for theories of cognition? Menary (2015) claims that studying the DNS will decide between different accounts of cognition. Therefore, either the way in which we acquire the DNS or the normal functioning of the DNS should form the basis of an inference to the best explanation with the conclusion that HEC is true. If the DNS is best explained only when one accepts that cultural artefacts are partly constitutive of cognitive processes, then one should require that claim either to explain the acquisition of the DNS or the normal functioning of the DNS.

Numerals, however, do not seem to play a large role in the above account of the acquisition of the DNS. Numerals do have to be mentioned, because studies with small cultures that seem to lack exact number words (Gordon 2004; Frank et al. 2008; Pica et al. 2004) suggest that the acquisition of exact number concepts is prompted by the availability of exact number words. Based on these studies some have argued that the presence of number words is a necessary condition (Núñez 2017) for the acquisition of number concepts, though this is contested (Butterworth et al. 2008). I will assume that we require number words to acquire number concepts, as that would be the most favourable outcome for Menary’s arguments.

Though number words may be a necessary precondition to acquire number concepts, the ability to distinguish between collections with one item and those with more than one item does not seem to depend on the presence of number words in the language. We, unfortunately, do not know enough about this ability to say so with certainty. Still, there is an indication that children learn to distinguish these collections regardless of any knowledge they have about numbers (Barner et al. 2004). In other words, there is reason to think that the ability which I think underlies our acquisition of number concepts is developed without any help from the presence of a numeral system. This would have to be tested in more detail though, for example by asking people from these cultures to distinguish collections with one item from those with more than one item in experimental designs similar to those presented to Western children.

If the ability to recognize situations with one F and no other Fs is independent from the presence of number words, then the small number DNS is not a good basis for an argument for HEC. Number words only act as prompters to attend to numerical aspects of collections of items, but do not figure in the further explanation of the underlying cognitive mechanisms that are developed as a result. The crucial ability that allows us to acquire these number concepts is developed independently from number words and is only helped along by (and so not dependent on) a grammatically marked singular/plural distinction. Whether it is developed independently from other cultural factors is, again, hard to say. In any case, the situation is not the clear example that could be hoped for by the proponents of HEC.

Note that this point doesn’t depend on the particular account of the data that I have summarized above. The account presented in Carey (2009) holds that the acquisition of the number concepts two, three and four is based on the parallel individuation system. As I mentioned, she claims that children construct mental models of collections with the relevant number of items on the basis of the parallel individuation system. These mental models are then the basis for our acquisition of the number concepts. Again, numerals do not figure in the explanation of the underlying cognitive mechanisms—in this case the mental models—except as prompters to initiate the development of the small number DNS.

The absence of these cultural tools in the explanations of the inner workings of the cognitive mechanisms underlying the small number DNS is not unexpected. One of the results that Menary (2015) discusses as proof of the influence of language on the underlying cognitive mechanisms is a study by Dehaene et al. (1999). They found that response times to arithmetical problems are influenced by the language in which they are presented to Russian–English bilingual speakers. Participants responded faster to exact arithmetic problems in the language in which they were given at the beginning of the experiment. If they had to switch to the other language it took them longer to solve the same arithmetic problem. No such differences were found with approximate (ANS-based) arithmetic problems, which involve arrays of dots. This result, however, doesn’t seem to hold for small numbers. Spelke (2003) reports a follow-up study that found no difference in reaction times between the initial and switched language for small exact numbers, but did replicate the difference for large exact numbers. In short, it seems that the cognitive mechanisms underlying small number DNS can be explained without reference to numerals (except as prompts for the initial development). Consequently, the small number DNS does not make for a good case study in support of EXT. Instead, the hope for a defender of HEC should rest on the large number DNS.

3 The large number DNS

As discussed, the border between the small and large number DNS is unclear, which is why I will stick to cases that clearly fall within my characterization of the large number DNS: numbers represented by multi-digit numerals. There is ample empirical evidence that features of the (Hindu-Arabic) numeral system are relevant to the cognitive processes underlying our grasp of these numbers.

As for the development of the large number DNS, it cannot be based on the parallel individuation system, because the number of items is too large. Feigenson (2011) does suggest that the parallel individuation system can be extend beyond four, but it still would not extend to a number such as 10,000. Nor is the ANS a plausible basis, as the link of the ANS with the DNS is weak for very larger numbers (Huber et al. 2017; Sullivan and Barner 2010). Furthermore, Carey et al. (2017) provide evidence that children do not learn numbers between four and ten through word-to-ANS-value mappings. Finally, while the ANS is coupled to larger number concepts, there seems to be a lag between the time when the number concept is acquired and when the connection with the ANS is established (Lipton and Spelke 2005). So, there is good reason to expect that the acquisition of the large number DNS will be different, though we do not know enough to say how it happens exactly. Fortunately there is more to say about how we process larger numbers once we have acquired all the relevant number concepts. A good overview of that work with respect to Hindu-Arabic numerals is Nuerk et al. (2015), and I discuss some of the results reviewed in that paper. I also use some of the data on numeral systems in Oceanic languages, for which Bender and Beller (2017) is a good overview.

One final issue needs to be mentioned before I discuss the empirical results. I switch here from spoken numerals (discussed for the small number DNS) to written Hindu-Arabic numerals. Based on the study by Zhang and Wang (2005) one may think that this difference is important for the findings discussed below. They argue that the cognitive processes may be different when comparing a written Hindu-Arabic numeral to a remembered number (earlier presented as Hindu-Arabic numeral), as opposed to a scenario where two written Hindu-Arabic numerals are compared. Their findings are, however, contested by further empirical tests that failed to find a difference (Moeller et al. 2009, 2013). The underlying cognitive processes seem to be the same, even if the numbers are represented differently (both internally and externally). Unless new evidence surfaces to the contrary, then, the switch between number words and Hindu-Arabic numerals shouldn’t matter for the discussion of cognitive mechanisms that follows.

Until recently there were two opinions on the nature of the cognitive mechanisms thanks to which we process multi-digit numerals. One option was that we process numerals as a whole: when reading a numeral such as 143 we don’t break it up into its component parts (1, 4 and 3). The other option was that we process multi-digit numbers by breaking them up into parts and then work with the individual digits in parallel, eventually composing these parts to figure out which number is represented.Footnote 3 On the first option it doesn’t seem to matter much what kind of numeral system one uses, since the cognitive processes would not need to break up numerals in accordance with the structure of the numeral system. In the decomposed case the way the numeral system is made up, which of course depends on one’s culture, plays an important role in explaining how we cognitively process these numbers. It is good news for HEC and HEMC, therefore, that the evidence now strongly favours the hypothesis that numerals are processed in a decomposed fashion (García-Orza and Damas 2011; Moeller et al. 2011; Nuerk et al. 2015). Furthermore, the fact that we process them in this way has clear effects on our performance at different arithmetical tasks (these effects are the evidence that has been put forward in support of the hypothesis that processing is decomposed).

One of these effects is known as the unit-decade compatibility effect (for numerals consisting of two digits). When participants are asked to decide which of two numbers is larger they respond faster for \(42 < 57\) than for \(47 < 62\). In the former, but not the latter, case there is no conflict between the overall outcome and the outcome for the individual digits. Both \(4 < 5\) and \(2 < 7\) for \(42 < 57\), whereas \(4 < 6\) but \(7 > 2\) for \(47 < 62\). This incompatibility in the second case leads to longer reaction times (Nuerk et al. 2001; Verguts and De Moor 2005). There is even an additional effect, where processing is faster if there is also a compatibility for the within-number comparison: \(34 < 79\) is easier than \(32 < 76\), because in the first case the irrelevant \(3 < 4\) and \(7 < 9\) are congruent with the relevant \(3 < 7\). In the second case the irrelevant \(3 > 2\) and \(7 > 6\) conflict with the relevant \(3 < 7\) (Wood et al. 2005). Finally, the unit-decade compatibility effect isn’t just relevant for comparison tasks. Guillaume et al. (2012) found that the compatible pairs of numbers are also added more easily and faster than incompatible pairs (25 + 48 is easier than 28 + 45). Furthermore, there is a difference in strategy execution. With compatible pairs of numbers (25 + 48) participants more often started by adding from 48 (to 68, then 73 instead of 65, 73 when starting from 25). In the case of incompatible pairs participants showed no such preference for starting with the larger number. Instead they tended to choose the number to the left of the addition sign (Guillaume et al. 2012).

This is far from the only influence of the numeral system on arithmetic performance. As is to be expected, cases of addition where one needs to perform a carry, because the units add up to 10 or more (e.g. 25 + 47), take more time and are more prone to errors than instances where this is not necessary (Ashcraft and Stazyk 1981). Similarly, subtraction takes longer and is more likely to lead to errors when one needs to borrow (e.g. for 43 − 18) because the subtraction \(3 - 8\) goes below zero (Sandrini et al. 2003). Multiplication errors where the decade digit is correct are also more likely than errors where the decade digit is wrong. So, for \(7 \times 3\) the correct outcome is 21, and 24 is a more likely mistake than 18 (Domahs et al. 2006; Verguts and Fias 2005). It seems that possible solutions are represented in a decomposed format: decade and unit digits are processed separately, so errors on only the unit digit are more likely than errors on both digits.

One can also consider the algorithms for multiplication that Menary (2015, p. 14) discusses from this, more cognitive, perspective. He mentions two ways in which one may calculate what \(23 \times 11\) is. A first option is to start from the right, with 1 and then do the multiplication with 10:

figure a

Another option is to proceed instead from the 10 and do the multiplication even more explicitly on a digit-by-digit basis:

figure b

Menary concludes that these algorithms display the usefulness of spatially arranging the numerals in a certain way. The cognitive science results from above suggest that a stronger conclusion may be drawn from these examples. Both algorithms, because they separate the multiplication problem into subproblems about the separate digits, fit in perfectly with the way in which we process multi-digit numerals. So, one of the reasons why we may have ended up using these algorithms and not some other algorithms (e.g. one where we repeatedly add 23 to the first number and subtract 1 from the second number) is that these are in line with the way the brain processes Hindu-Arabic numerals. Since numeral processing is decomposed the brain already has the individual digits at hand when one parses the numbers involved in the multiplication problem. We moreover generate the possible answers to the multiplication separately for the separate digits. These written algorithms make that cognitive strategy more explicit, reducing errors along the way. It is not just the spatial organisation that is relevant here; the close resemblance with the way our brain parses multi-digit numerals also plays an important role.

Our cognitive processes influence our cultural strategies for performing arithmetical calculations. These cognitive processes, in turn, are clearly influenced by cultural factors. Which numeral system is used has to make an important difference for how the brain processes numerals, and so how the brain works with larger numbers. For one thing it matters what base a numeral system has, as this determines (at least in a place-value system) which numerals are multi-digit numerals and which are not: a cultural factor may help determine the range of the small number DNS. For example, in our base-10 numeral system the cut-off point is at 10, though a fair number of languages have number words such as ‘eleven’ that do not reflect this. Other numeral systems, however, have different bases and so different numerals that count as multi-digit—though low bases are typical. The Babylonian numeral system, an example that may come to mind of a system with a high base, actually has a 10–6 cycle to reach base 60: there are different signs until 10 and then a repetition of those signs until you return to the symbol for one when you reach 60 (Høyrup 2001).

More importantly, not all numeral systems are place-value systems. The Roman numeral system, for example, is a sign-value system where the value of a digit is determined by the kind of sign it is [and so not by its place—at least not in the original numeral system where e.g. 4 was IIII instead of the Medieval IV (Schlimm and Neth 2008)]. In this case numerals are probably interpreted decomposed when there are several digits, but the process of arriving at the final value will be quite different. Since place-value is not important it may well be that effects based on place-value, such as the unit-decade compatibility effect, disappear. In fact, in a series of experiments Krajsci and Szabó (2012) found that when taught a completely new sign-value system and a new place-value system performance on simple addition and comparison tasks is better for the sign-value system. In other words, sign-value systems are easier to learn and work with for those kinds of tasks. The same seems to be true for other early numeral systems, such as those from Ancient Egypt and the Mayas (Nickerson 1988). As Schlimm and Neth (2008) also found, one needs to remember fewer addition facts for calculations with Roman numerals. On the other hand, such calculations require more basic perceptual-motor interactions than calculations with Hindu-Arabic numerals. So, cognitive performance and the underlying cognitive processes are different for different numeral systems.

Similar differences in performance have been observed for other numeral systems. The Oceanic language Mangarevan contains two numeral systems: a decimal system and a system that mixes decimal and binary patterns (Bender and Beller 2017). The mixed system reduces the number of addition facts that have to be remembered for calculation, and is favoured by its users over the regular decimal system (Bender and Beller 2014). One reason is that the mixed system outperforms the regular decimal system found in English in terms of compactness and regularity (Bender et al. 2015). Again, these different features of the numeral system influence cognition [see also Zhang and Norman (1995) for another comparison of cognitive differences resulting from use of different numeral systems]: they reduce the load on working memory because users need to keep track of fewer symbols and addition facts. This reduction of cognitive load may also have been an important feature of early number use, especially their use of tokens, notched tallies, etc., as Overmann (2016) argues in a review of archeological evidence.

Unfortunately little more is known about the cognitive processes that are used when working with any of these other numeral systems. For that reason, I focus primarily on what we know about Hindu-Arabic numerals in the rest of the paper. There is clear evidence that the numeral system influences the way in which we process larger numerals. The acquisition and functioning of the large number DNS thus depends on culture. The exact influences will be clearer once more extensive studies have been conducted with other numeral systems, but that there are such influences is not in question.

To return to the idea of the DNS as a case study for our theories of cognition, there is a clear difference between the small and large number DNS. The small number case, as I argued, offers little support for the claim that cultural factors are important. The large number case on the other hand is an excellent illustration of the way in which cultural factors influence, and are influenced by, cognitive processes in the brain. Numerals are important to prompt the acquisition of the small number DNS, but we do not need to reference properties of numerals in an explanation of the cognitive mechanisms underlying the small number DNS. On the other hand, the properties of numerals are extremely important for the cognitive mechanisms underlying the large number DNS, as these mechanisms function on the template provided by the numeral system which is learned. It is the large number part of the DNS that is important for the question whether Menary (2015) is right in claiming that it can help decide in favour of HEC.

4 Empirical evidence for enculturation?

Philosophers have already debated the claim that empirical evidence supports the hypothesis of extended cognition, as I mentioned in the introduction. From that debate it is clear that the issue can be divided into two parts: whether there is empirical evidence that decides between internalism (INT) and externalism (EXT) about cognition, and whether there is empirical evidence that decides between HEC and HEMC. I will not tackle these questions in full generality, as all I aim to do here is to determine whether the case study of the DNS helps on either of these issues. In my view it is of some help, namely that it supports EXT over INT, but fails to decide between HEC and HEMC.

One last question is how general we take these claims to be. Could we say, for example, that INT holds for the cognitive processes underpinning our grasp of small numbers and EXT for those processes underlying our grasp of large numbers? It at least seems to be an option. Since the two cognitive processes may be distinct, e.g. in terms of how numerical comparisons are evaluated, they may also be differently constituted. The philosophical debate tends to interpret these questions more generally, so I will also stick to a more general interpretation. Either INT holds for arithmetical cognition or EXT holds, which means that the large number DNS could be sufficient as case study to argue that EXT and HEC are true for arithmetical cognition.

4.1 Internalism versus externalism

Why does the above study of the DNS support EXT over INT for arithmetical cognition? The important part of the case study is, as I indicated, the large number DNS. The explanations of those cognitive processes clearly involve environmental props. One needs to appeal to features of the numeral system to explain why the brain does certain things, such as attend to the position of each decomposed digit. Furthermore, the computation of place-value/sign-value is specific to the numeral system that is in use. As I also explained, the type of numeral system that is in use impacts performance, strategy choice and strategy execution. Strategy execution is influenced by a kind of unit-decade compatibility effect, and strategy choice depends more generally on the numeral system one is used to: different strategies will be relevant for Roman numerals than for Hindu-Arabic numerals (Schlimm and Neth 2008; Zhang and Norman 1995). One such difference is the load on working memory, for example in terms of how many basic addition facts need to be remembered.

The differences in performance summarized in the previous section show that environmental/cultural props involved in arithmetical cognition play an important role in explaining how that part of cognition works. A lot of our performance at tasks with larger numbers is bound up with contingent features of the numeral system we are using. As a result we need to pay close attention to environmental resources when studying these cognitive processes. In contrast to small numbers, where it is possible to study the cognitive processes (the parallel individuation system, the way we distinguish one item from more than one item, etc.) in isolation from features of the numeral system, we cannot study the large number DNS without taking into account features of the relevant numeral system. On the formulation of INT and EXT from the introduction this means that EXT is true for arithmetical cognition: arithmetical cognition depends intimately on environmental resources and so should be studied within the context of those resources. Consequently, an externalist account like HEC or HEMC is true of arithmetical cognition. However, the case study does not decide between these two alternatives, as I argue in the next subsection.

4.2 HEC versus HEMC

The central issue on which HEC and HEMC differ is whether the cultural practices that our cognitive processes depend on partly constitute our cognitive systems or are mere causal factors influencing the (internal) cognitive processes. The way in which a case study is supposed to decide between these two positions is that the explanation of the cognitive processes is supposed to be better, one way or the other. The argument would be an inference to the best explanation, where either HEC or HEMC offers the best explanation of (that part of) cognition (Sprevak 2010). Menary (2015) is aware of the criticism of such an argument for HEC, but maintains that HEC has the upper hand specifically because it can answer the question “Assuming that cognitive processing criss-crosses between neural space and public space, how does it do this?” (Menary 2015, p. 9). More generally, Menary sees the argument for HEC, which he calls CI, as an inference to the best explanation based on three features: the novelty and uniqueness of mathematical cognition, as well as the interactions between mathematical cognitive processes and the public numeral systems.

In more detail, Menary focusses his arguments around three findings. The first is by Lyons et al. (2012), of so-called ‘symbolic estrangement’. Symbolic estrangement is the finding that comparisons across formats (so with one number presented by a numeral and the other by an array of dots) are harder than within formats, even when the two numerals are from different numeral systems. That is in line with the studies by Dehaene et al. (1999) and Spelke (2003) that I commented on at the end of Sect. 2. As the reader may recall, it seems that this holds for large numbers but not for small numbers. Furthermore, these studies primarily support the novelty aspect: they show that the DNS extends our cognitive capacities and doesn’t just build upon the ANS.

The second finding is a study by Landy and Goldstone (2007) with college-level algebraists. They found that by altering the spacing between addition and multiplication symbols it is possible to induce errors regarding which operation takes precedence. In other words, Landy and Goldstone (2007) found that the spatial distribution of mathematical symbols has effects on performance. Again, this is in line with other findings I have discussed, such as the effect of the type of place-value system on performance effects like the unit-decade compatibility effect and the difference in performance for different numeral systems. So, this is the second part on which Menary builds his arguments. It shows some of the interactions between our cognitive processes and cultural tools.

The third and last piece of support is the idea that the acquisition of the DNS involves changes to the brain (Dehaene and Cohen 2007). The DNS goes beyond our initial capacities, so Menary argues, because it requires changes to the brain in order to acquire it. The novelty aspect of the DNS is supported by these findings. Furthermore, the uniqueness of the DNS is supported, since only humans seem to be able to go through the changes to the brain that are necessary to acquire the DNS.

Menary uses these three supports to argue for HEC over HEMC in two main passages. The first passage is rather multi-faceted:

Our cognitive capacities cannot cope with long sequences of complex symbols and operations on them. This is why we must learn strategies and methods for writing out proofs. Symbol manipulation makes a unique difference to our ability to complete mathematical tasks, and we cannot simply ignore their role. If we take the approach of CI, then mathematical cognition is constituted by these bouts of symbol manipulation, and we cannot simply shrink the system back to the brain. The case for a strongly embedded approach to mathematical cognition depends upon the novelty and uniqueness of mathematical practices and dual component transformations. (Menary 2015, p. 16)

Numerals are needed to acquire the DNS and (for large numbers) need to be mentioned in an explanation of the underlying cognitive processes. As Menary points out, we cannot ignore the role of numerals—which is why I agree that the case study supports externalism. However, I disagree that the role symbols play implies that the cognitive process cannot be shrunk back to the brain. Novelty and uniqueness hold just as much for the small number part of the DNS as for the large number part of the DNS. Without numerals, it seems, we would not develop either. The small number part of the DNS is thus also new, as it extends our cognitive capabilities beyond our innate capacities. We need to learn to work with small numbers, just as we need to learn to read. Yet the novelty and uniqueness of the small number DNS is not sufficient to necessitate acceptance of HEC. The cognitive processes underlying the small number DNS can be understood in terms that are independent from properties of public symbol systems. As discussed in Sect. 2, the parallel individuation system, the ability to distinguish between one and more than one item and the mental models based on the parallel individuation system can be understood without reference to properties of numerals. While numerals are important to prompt the development of the small number DNS, they are not needed in an explanation of the underlying cognitive processes of the small number DNS. Therefore, novelty and uniqueness alone do not support HEC. The dual component interactions (i.e. interactions going back and forth between the cognitive processes and the public numeral systems) have to provide the support for HEC instead, since of the three features Menary discusses that is the only one that is exclusive to the large number DNS.

I discussed some dual component interactions in Sect. 3, namely some of the effects on performance of the numeral system and some effecte of our cognitive systems on algorithm choice. The question Menary raises is how one explains this criss-crossing without accepting the constitutivity claim. In other words, the argument is that if these cultural practices (such as algorithm choice) are not constitutive of our cognitive processes, then one cannot offer an equally good explanation of these interactions. I think that that challenge can be met. In Sect. 3 I offered an explanation of the influence of our cognitive processes on algorithm choice: our cognitive mechanisms process numerals in a decomposed fashion. That is how our cognitive mechanisms engage with public symbols. Hence, the effect of our cognitive processes on these cultural practices is that they are brought in line with the functioning of the cognitive mechanisms. Multiplication algorithms that also work in this decomposed fashion are preferred over other algorithms, because of the way the brain processes Hindu-Arabic numerals. This explanation is purely causal, yet does account for the influences going back and forth between the numeral system and our cognitive processes. In line with the general argument pushed by Sterelny (2010), one can give an equally good causal explanation of the features that are supposed to be best explained by the constitutivity claim.

There are a lot of other dual component interactions that would need to be explained. For example, one needs to explain the interactions between the features of the numeral system and our cognitive mechanisms. The choice of base determines which comparisons and calculations run into the unit-decade compatibility effect. Furthermore, the choice of the kind of numeral system (place-value or sign-value, for example) has effects on performance. I think that these interactions can be explained without reference to the constitutivity claim. The effect of the choice of base has to do with the kind of numerals that feed into the decomposed processing. If the system has base 10, then those are the numerals 0–9. If the system has base 6, the numerals are 0–5, and so on. It is a matter of fine-tuning the decomposed processing mechanism, and so it seems that at least these interactions can be explained in purely causal terms. I am for that reason optimistic that dual component interactions can be explained without accepting HEC.

Menary also claims that symbol manipulation makes a unique difference to our ability to complete mathematical tasks [cf. Landy and Goldstone (2007)], and that this is a reason to accept HEC over HEMC. It is certainly true that these studies show that the symbols we work with have on impact on our performance. However, as one can also see in the discussion section of Landy and Goldstone (2007), this doesn’t mean that the symbols themselves need to be counted as part of the cognitive processes. Their suggestion is that the brain mechanisms for evaluating mathematical expressions interact more closely with visual processes than previously thought. Consequently, incorporating the visual processes, and not the symbols, is at the moment an equally good explanation—and one put forward by the researchers in question. The first argument, with all of its facets, does not yet decide between HEC and HEMC.

The second passage in which Menary argues for HEC is more straightforward:

symbols are not simply impermanent scaffolds, they are permanent scaffolds. They become part of the architecture of cognition (and not simply through internalisation). Mastery of symbol systems results in changes to cortical circuitry, altering function and sensitivity to a new, public, representational system. However, it also results in new sensori-motor capacities for manipulating symbols in public space. (Menary 2015, pp. 16–17)

Menary’s main claim of interest is that symbols do not simply become part of the architecture of cognition through internalisation. Of course, we manipulate symbols in public space frequently and it seems reasonable to say that performance is better with than without such symbol manipulation. That is also what one might expect if HEMC is true: without the scaffolding it is harder than with the scaffolding in place—once the DNS is acquired. But it is possible without public symbol manipulation, meaning that one can view the process as one of internalisation. The findings of Landy and Goldstone (2007) can be interpreted, as I argued above, in such a way that they only show something about the internal organization of the brain. Similarly, the cognitive mechanisms for large numbers described in Sect. 3 have all been described in an internal fashion. We may often use public symbols because it is easier (and hence performance is better), but that does not yet show that we cannot (in principle) do without these external cultural practices once we have acquired the DNS.

Finally, one may wonder whether the other cultural influences discussed in Sect. 3 pi.e. the studies reviewed in Bender and Beller (2017)] can support an inference to the best explanation for HEC. First of all, there is an issue with such an argument: while we know that there are differences in performance and strategy execution depending on the numeral system, we do not know if there are also differences in the underlying cognitive mechanisms. Of course, the cognitive mechanisms are employed differently—the theoretical approach of e.g. Schlimm and Neth (2008) shows that the balance between remembered facts and perceptual-motor interactions is different for different numeral systems. This balance may also be different for people using the same numeral system, but coming from a different cultural background. Tang et al. (2006) argue that a difference in brain activation between English and Chinese speakers while solving arithmetical problems could be due to a stronger reliance on short-term memory and perceptual-motor interactions in the case of Chinese speakers. However, that doesn’t mean that the components that make up the DNS are fundamentally different. One reason to think that the components underlying the DNS are the same is that participants that had to learn new place-value and sign-value numeral systems had little difficult doing so. They also did not experience difficulties when they had to calculate with these new systems (Krajsci and Szabó 2012). If so, then there is nothing more to the cultural differences than the dual-component interactions discussed above—which I argued one can interpret in a merely causal manner.

Suppose, though, that there are differences; Hindu-Arabic numerals are processed decomposed but Roman numerals are processed holistically, say. Even in that scenario those differences can, probably, be explained in causal terms rather than constitutive ones. Zhang and Wang (2005), who thought they had found such a difference in processing, give an explanation in terms of interactions between perceptual mechanisms and the brain mechanisms for evaluating mathematical expressions—similar to the explanation Landy and Goldstone (2007) propose based on their findings. So, rather than accepting HEC one could instead propose an explanation where the differences in numeral systems causally interact with perceptual mechanisms, leading to a change in the cognitive processes that underlie the DNS.

5 Conclusion

Is the DNS a good basis for an inference to the best explanation with HEC as its conclusion? In the case of the small number DNS the answer is a clear no: in that case numerals help us (and may be required to) acquire the relevant concepts, but they do not seem to figure in an explanation of the underlying cognitive mechanisms. The large number DNS, on the other hand, functions in such a way that one needs to mention numerals in an explanation of the underlying cognitive mechanisms. The cognitive processes that make up the large number DNS are structured in a way that builds on the (internal) structure of the numeral system we use. Performance is clearly influenced by contingent features of the numeral system, and it seems that the underlying cognitive processes are at least combined in different ways depending on the kind of numeral system one uses.

This has some importance for the way in which we think about arithmetical cognition. As I have argued, the DNS case study gives us a good reason to think that internalism about arithmetical cognition in the sense of INT is false, so some version of externalism is true. The case study on its own, however, will not decide which version that is. Whether HEC or HEMC is true about arithmetical cognition does not seem to be decided by the empirical data, because the mechanisms underlying the DNS can be explained equally well in causal as in constitutive terms—as Sprevak (2010) has argued for the general case. We can make good sense of what is going on with either framework, so something else (such as a mark of the cognitive) will have to decide between these hypotheses.