Numerical distance effect size is a poor metric of approximate number system acuity
Individual differences in the ability to compare and evaluate nonsymbolic numerical magnitudes—approximate number system (ANS) acuity—are emerging as an important predictor in many research areas. Unfortunately, recent empirical studies have called into question whether a historically common ANS-acuity metric—the size of the numerical distance effect (NDE size)—is an effective measure of ANS acuity. NDE size has been shown to frequently yield divergent results from other ANS-acuity metrics. Given these concerns and the measure’s past popularity, it behooves us to question whether the use of NDE size as an ANS-acuity metric is theoretically supported. This study seeks to address this gap in the literature by using modeling to test the basic assumption underpinning use of NDE size as an ANS-acuity metric: that larger NDE size indicates poorer ANS acuity. This assumption did not hold up under test. Results demonstrate that the theoretically ideal relationship between NDE size and ANS acuity is not linear, but rather resembles an inverted J-shaped distribution, with the inflection points varying based on precise NDE task methodology. Thus, depending on specific methodology and the distribution of ANS acuity in the tested population, positive, negative, or null correlations between NDE size and ANS acuity could be predicted. Moreover, peak NDE sizes would be found for near-average ANS acuities on common NDE tasks. This indicates that NDE size has limited and inconsistent utility as an ANS-acuity metric. Past results should be interpreted on a case-by-case basis, considering both specifics of the NDE task and expected ANS acuity of the sampled population.
KeywordsNumerical distance effect Estimation Approximate number system Acuity Numerical magnitudes
People can evaluate nonsymbolic numerical magnitudes (i.e., which pack has more wolves) without counting (Taves, 1941) via what is known as the approximate number system (ANS). The ANS allows us to perceive numerical magnitudes from the world in an analog fashion, similarly to how we perceive other magnitudes, like size (Kaufman, Lord, Reese, & Volkmann, 1949). This skill, “ANS acuity,” varies: Some individuals can make faster and more accurate judgments than others (Halberda & Feigenson, 2008). Better ANS acuity has been linked to better math skills and better standardized test performance (Gilmore, McCarthy, & Spelke, 2010; Halberda, Mazzocco, & Feigenson, 2008) (also see Chen & Li, 2014, for meta-analysis and review) and may even influence judgment and decision-making in adults (Peters, Slovic, Västfjäll, & Mertz, 2008; Schley & Peters, 2014). Unfortunately, recent empirical studies call into question the effectiveness of a historically common ANS-acuity metric: the size of the numerical distance effect (NDE size; Gilmore, Attridge, & Inglis, 2011; Holloway & Ansari, 2009; Inglis & Gilmore, 2014; Maloney, Risko, Preston, Ansari, & Fugelsang, 2010; also see Sasanguie, Defever, Van den Bussche, & Reynvoet, 2011). The goal of this study was to assess the theoretical support for using NDE size as an ANS-acuity metric.
Individual differences, ANS acuity, and the NDE
There is strong evidence that people invoke ANS-based analog magnitudes when considering symbolic numbers (Dehaene, 1992; Dehaene, Bossini, & Pascal, 1993; Moyer & Landauer, 1967). Moyer and Landauer’s (1967) seminal study demonstrated that people show distance effects when making quantity judgments about symbolic magnitudes. For example, people are faster at determining that 6 is smaller than 9 than they are at determining that 7 is smaller than 8. Such effects are a classic pattern in analog magnitude comparisons (Moyer & Landauer, 1967).
Moyer and Landauer’s (1967) now classic approach of using the presence of distance effects to demonstrate that the ANS is invoked in symbolic magnitude comparisons appears to have inspired the later use of NDE size as an ANS-acuity metric. In a practice that seems to originate with Sekuler and Mierkiewicz (1977), researchers will calculate NDE size by finding the savings in the speed and/or accuracy of numerical comparisons (e.g., “Which is larger?”) at larger (easier) versus smaller (harder) distances. Larger NDE size is taken to indicate poorer ANS acuity (Peters et al., 2008). Recently, several studies have questioned this measure’s ability to distinguish individual differences in ANS acuity (Gilmore et al., 2011; Holloway & Ansari, 2009; Inglis & Gilmore, 2014; Maloney et al., 2010; also see Sasanguie et al., 2011) (also see Price, Palmre, Battista, & Ansari, 2012). Given these issues of empirical support, this manuscript seeks to address whether the use of NDE size as an ANS-acuity metric is theoretically supported.
ANS theory and NDE size
The exact nature of the ANS has yet to be completely determined, but it is well established that it obeys Weber’s law (Cordes, Gelman, Gallistel, & Whalen, 2001; Dehaene, Izard, Spelke, & Pica, 2008; Mechner, 1958; Meck & Church, 1983; Whalen, Gallistel, & Gelman, 1999). As is typically the case for magnitude perception (see Kingdom & Prins, 2010), numerical magnitudes are not perceived exactly. Rather, percepts are normally or quasinormally distributed around a (possibly biased) mean value. The ability to distinguish between two quantities is dependent on the amount of overlap between their perceived magnitude distributions. Importantly, this overlap—and thus the ease with which two values can be distinguished—is dependent upon their ratio. As a result, one can observe both distance and size effects in magnitude discriminations. It is easier to distinguish numerical quantities that are more distant from each other (6 dots [:::] vs. 12 dots [::::::]) than those that are closer together (8 dots [::::] vs. 10 dots [:::::]). Also, it is easier to distinguish numerical quantities at the same distance with smaller sized magnitudes (6 dots [:::] vs. 8 dots [::::]) than with larger magnitudes (14 dots [:::::::] vs. 16 dots [::::::::]). ANS magnitude comparisons yield standard psychophysical functions: The likelihood that an individual will successfully discriminate between two magnitudes will increase curvilinearly from chance to asymptote at or near 100% accuracy, as the ratio of the larger to the smaller value increases. Reaction Times (RTs) similarly decrease with the comparison ratio (Whalen et al., 1999; see Kingdom & Prins, 2010, for a discussion of psychophysical functions).
ANS acuity is defined by an individual’s Weber fraction (Cordes et al., 2001; Dehaene, et al., 2008; Halberda et al., 2008; Siegler & Opfer, 2003; Whalen et al., 1999). Weber’s law implies that the standard deviation (SD) of the distribution around an estimated magnitude is a constant proportion of that magnitude’s mean (M). This constant proportion is, by definition, the Weber fraction (w) of the perceiver’s ANS. This results in greater overlap between the magnitude distributions perceived from stimuli at smaller ratios (::::/:::, 1.33) than at larger ratios (::::::/:::, 2). After accounting for other biases, w determines the variability in the representation of a particular magnitude, which in turn determines the amount of overlap between any two magnitudes, which finally determines how likely it is that an individual will be able to tell two nonsymbolic magnitudes apart (ANS acuity). The smaller an individual’s w, the better the individual will be at discriminating between nonsymbolic numerical magnitudes because there is less overlap in their numerical magnitude perceptions.
It follows that the ANS’s contribution to NDE size should be a function of the specific magnitudes being compared and ANS acuity (w). Thus, we can derive the relative size of the ANS’s contribution to NDE size for any specific task and any given w by considering the resulting distributions. As long as judgments are based on ANS distributions, error rates and RTs should be functionally related to the amount of overlap in these distributions.
The goal of this work is to assess the theoretical support for using NDE size as an ANS acuity metric. Thus, the theoretically ideal NDE sizes calculated here are dependent only on the magnitude ratios of the stimuli and w. Real-world data would involve other sources of RT and error (attention to task, nondecision time, etc.), adding noise that would make this relationship less clear. However, as these factors are separate from the ANS, they are excluded from this ideal model.
Formula: The relationship of overlap and erfc with w
Calculations are based on the linear model of the ANS, which claims that means of perceived numerical magnitude distributions increase linearly with the size of the stimuli, and the distributions’ standard deviations are proportional to their means (i.e., scalar variability; Cordes et al., 2001). (Note: Magnitudes might alternatively be modeled as logarithmically spaced with constant variability, yielding similar outcomes.) Thus, the ratio of the standard deviation to the mean is constant for a given individual on a given task. This constant is the w of the individual’s ANS: their ANS acuity. Here, magnitudes are treated as Gaussian distributions around unbiased means equal to the stimulus value (M), where SD = w × M. Thus, the derived overlap in ANS distributions is a function of the stimulus ratio and the Weber fraction (w) of the ANS. Overlap calculations are described in the Appendix. Additionally, following the method used by Halberda et al. (2008), the erfc (the complementary error function) was used to determine the rate at which a given pair of magnitudes will not be distinguished. Given no other sources of error, erfc should be equal to twice the error rate of ANS-based magnitude judgments, as the observer would be presumed to choose the correct answer by chance on half of such trials. The MATLAB code used for these calculations is given in the Appendix.
Overlaps and erfcs can be calculated for any stimulus ratio and w. Thus, the theoretically maximum contribution of ANS acuity to NDE size can be found for any w on any particular NDE task. However, the ranges of greatest interest are those that correspond to ws seen in humans. Consistent with prior work (e.g., Cordes et al., 2001; Whalen et al., 1999), Chesney, Bjälkebring, & Peters, (2015) found mean ws of .22 (SD = .06). However, others have found mean ws of .11 in educated adults (Dehaene et al., 2008). ANS acuity also varies with age. Halberda et al. (2008) found that 14-year-olds had mean ws of .28 (SD =.10). Studies with infants have found ws of 1.0 (Xu & Spelke, 2000), while 1-year-olds show ws of less than .5 (Cantrell & Smith, 2013).
Task, NDE size, and ANS acuity
The ideal relationship between NDE size and w was modeled for two different tasks and calculation methods.
Task 1 is based on an NDE measure like that used by Peters et al. (2008). Participants are given a central comparison value (e.g., 5) and asked to indicate whether stimulus values are greater or less than that value. The stimuli follow a 2 × 2 design: Half are less (e.g., 1, 4) and half are greater (e.g., 6, 9) than the central value. Also, half are close (e.g., 4, 6) and half are far (e.g., 1, 9) from the central value. An individual’s NDE size is operationalized as the difference in accuracy and/or RT on close versus far trials.
In this paradigm, although stimuli distances are symmetrical around the central comparison value, the ratios are asymmetrical. For the stimuli greater than the central value (high), the close and far ratios are 6/5 (1.2) and 9/5 (1.8), respectively. For the stimuli less than the central value (low), the ratios are 5/4 (1.25) and 5/1 (5). Analyses used in the literature (e.g., Peters et al., 2008) classify stimuli as near and far, collapsing across these ratio differences. This is modeled here by averaging NDE sizes found for ratios above and below the central value (average).
An alternative method of gauging NDE size is to find the slope of the linear regression of RTs or error rates on ratio or distance, treating ratio/distance as continuous rather than dichotomous (e.g., Sekuler & Mierkiewicz, 1977). Negative slopes indicate the presence of a distance effect. Larger (i.e., more strongly negative) absolute slopes are treated as indicating larger ws (i.e., poorer ANS acuity). Theoretically, ideal NDE slopes were modeled based on the comparison task developed by Chesney et al. (2015), which used ratios between 1 and 2.6 and numerical magnitudes between 10 and 30. While magnitude overlaps are dependent on ratio, NDE slope calculations have used distance as the Independent variable (e.g., Sekuler & Mierkiewicz, 1977). Therefore NDE slopes were found by regressing overlap and erfc on both the ratio and absolute distance between comparison pairs across the human range of ws.
Modeled overlaps and erfcs
Relationship between NDE size and w is J shaped—
Clearly, the general presumption that larger NDE sizes correlate with larger ws does not always hold. One could expect NDE size and w to have a positive linear relationship only if the population’s ws were located between .05 and .20. Indeed, depending on the population’s w distribution, one could predict a positive, negative, or nonexistent correlation between NDE size and w. Moreover, one could not necessarily recover ws from NDE size because, again owing to the J-shaped relation, more than one w maps to the same NDE size. For example, average erfc savings of .48 map to ws of both .22 (average adult) and .55 (very poor).
Even though numerical distance effects can indicate the involvement of the ANS in a task, NDE tasks have limited utility for measuring individual differences in ANS acuity. This model provides a novel theoretical exploration of why this is the case: one cannot a priori expect NDE size and ANS acuity to be linearly related. The J-shaped relationship between w and NDE size persists across tasks and analytical methods, although the inflection points are task specific. Small NDE sizes are expected both for individuals with particularly small and particularly large ws. As a result, even the direction of the correlation would be dependent both on the specifics of the task and on the w distribution in the sample.
For typical NDE tasks, like those above, peak NDE size is approached at ws of ~.2–.3. The location of this peak is a real concern, as several studies have found adults’ ws typically center around ~.22 (e.g., Cordes et al., 2001; Whalen et al., 1999). But the range of human ws is wide. Other studies have found mean ws of .11 in adults (Dehaene et al., 2008) and ws of 1.0 in infants (Xu & Spelke, 2000). Thus, one cannot presume the w range in a novel population will coincide with the w range in which the relationship of w and NDE size is quasilinear. This is problematic to the literature as a whole, and particularly for research attempting to draw conclusions about the nature of ANS acuity’s involvement in other cognitive tasks.
Nevertheless, if the w range within a to-be-tested population is known, one might construct a task for which the assumption of a near-linear relationship between NDE size and ANS acuity is theoretically supported. One could expect w ranges of .10 to .34 in American university students (Chesney et al., 2015; but see Dehaene et al., 2008). Over this range, NDE size found using the ratios 5 versus 1.25 would yield a strong linear relationship between ideal NDE size and w (this is illustrated by the “low” line on Fig. 3). Stimuli should be carefully chosen to avoid confounds. For example, nonsymbolic stimuli (e.g., dot sets) should be used: Symbolic number knowledge could interfere with task performance, and, indeed, there is some debate as to whether distance effects seen with symbolic number are necessarily the result of ANS involvement (Krajcsi, Lengyel, & Kojouharova, 2016). Nonnumeric properties of the stimuli (e.g., area) should be carefully controlled, as these are known to influence ANS assessments (Hurewitz, Gelman, & Schnitzer, 2006). Further, ratios should be instantiated using quantities sufficiently large to avoid interference from subitizing (a process that allows individuals to assess small quantities—typically less than 7—accurately without counting; see Chesney & Haladjian, 2011; Feigenson, Dehaene, & Spelke, 2004). However, such tasks should still be tested empirically. Other factors, such as individual differences in nondecision time and error tolerance, might mask the theoretical quasilinear relation in practice. A better course might be to assess ANS acuity using tested tasks that find w by fitting numerosity comparison performance to sigmoidal or psychophysical curves (e.g., Chesney et al., 2015; Halberda et al., 2008).
It is important that ANS-acuity metrics are both empirically and theoretically supported. As shown here, the assumed linear relationship between NDE size and ANS acuity is only theoretically supported in some conditions, and researchers may not be able to discern whether these conditions have been met. Considered in combination with the questionable empirical support of NDE size as an ANS-acuity metric (Gilmore et al., 2011; Holloway & Ansari, 2009; Inglis & Gilmore, 2014; Maloney et al., 2010; also see Sasanguie et al., 2011), it is recommended that use of this metric should be avoided. Interpretation of existing NDE-size data should take into account the expected relationship between NDE size and ANS acuity for that specific methodology and typical w ranges found for age-matched and education-matched populations.
- Holloway, I. D., & Ansari, D. (2009). Mapping numerical magnitudes onto symbols: The numerical distance effect and individual differences in children’s mathematics achievement. Journal of Experimental Child Psychology, 103, 17–29. https://doi.org/10.1016/j.jecp.2008.04.001 CrossRefPubMedGoogle Scholar
- Kingdom, F. A. A., & Prins, N. (2010). Psychophysics: A practical introduction. London, UK: Academic Press.Google Scholar
- Krajcsi, A., Lengyel, G., & Kojouharova, P. (2016). The source of the symbolic numerical distance and size effects. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01795
- Price, G. R., Palmer, D., Battista, S., & Ansari, D. (2012). Nonsymbolic numerical magnitude comparison: Reliability and validity of different task variants and outcome measures, and their relationship to arithmetic achievement in adults. Acta Psychologica, 140, 50–57. https://doi.org/10.1016/j.actpsy.2012.02.008 CrossRefPubMedGoogle Scholar
- Sasanguie, D., Defever, E., Van den Bussche, E., & Reynvoet, B. (2011). The reliability of and the relation between non-symbolic numerical distance effects in comparison, same-different judgments and priming. Acta Psychologica, 136, 73–80. https://doi.org/10.1016/j.actpsy.2010.10.004 CrossRefPubMedGoogle Scholar
- Taves, E. H. (1941). Two mechanisms for the perception of visual numerousness. Archives of Psychology, 37(265), 1–47.Google Scholar