Humans are believed to be equipped with a nonverbal cognitive system for extracting the number of objects in a visual scene (Feigenson et al., 2004), the ANS. Previous research has indicated a relation between the acuity of the ANS and math performance (Schneider et al., 2017) and adults (Chen & Li, 2014). However, the results are far from consistent, and recently, several studies have not been able to find the ANS-to-math-performance link (e.g., Inglis et al., 2011). One possibility is that humans lack a dedicated system for number processing altogether (Gebuis & Reynvoet, 2012; Leibovich et al., 2017). In line with this possibility, some studies have found that when controlling dot stimuli for properties other than number (e.g., average area, convex hull), the correlation between math performance and ANS acuity disappears (Gilmore et al., 2013). Importantly, controlling for visual cues in such a way that half of the stimuli are congruent and the other half are incongruent results in strong congruency effects where participants perform close to, or even below, chance on incongruent trials (Szűcs et al., 2013).
An alternative account for the strong congruency effects is an attentional bias induced by stimulus control, ABC, that arise when using extreme measures to control for visual, nonnumerical cues. This manipulation introduces attentional processes that dominate or distract participants from extracting numerosity. Strict stimulus control might contaminate measures of ANS acuity and make them less valid. In the current study, we designed three experiments to test this possibility.
In Experiment 1, we tested the first predictions of the ABC hypothesis. Participants carried out two dot-comparison tests, representing two possible measures of ANS acuity. For the first, we introduced stimuli that were neutral regarding the preattentive dimension of object size by equating this size in the to-be-compared arrays. Like what could be expected in a natural setting, this allows variables such as total area, convex hull, and density of the stimulus to covary with numerosity. For the second test, we instead used a deliberate manipulation of object size with stimuli divided into congruent and incongruent regarding this dimension, analogous to the resulting size difference induced by rigorous stimulus control.
While performance on the neutral test was correlated with most math-related variables, performance on the size-manipulated test was not. Within the incongruent and congruent subsets of the size-manipulated test, only the incongruent (but not the congruent) subset predicted math variables. This pattern was predicted and mimics results in previous research. Furthermore, the size-manipulated test did not exhibit internal consistency. Performance on the congruent and incongruent stimuli was negatively correlated, and performance on the incongruent stimulus was uncorrelated with total performance on the task. The stimuli used in the first experiment rely on a mere manipulation of size and are not identical to those used in previous ANS research. The claim is not that this study demonstrates that standard control measures would necessarily be subject to the same effects, but the fact that such controls routinely incorporate size differences between stimuli should in itself raise concerns. The results are in line with the hypothesis that using extreme controls for visual cues in dot-comparison tests may contaminate them by introducing additional processes.
The results of Experiment 1 do not speak to what processes might cause congruency effects. Experiment 2 was designed to investigate the cause of these effects. If attention processes are behind congruency effects, these should be reduced with more time to process the stimuli and counteract initial, rapid, bottom-up attention. According to the notion that participants are not using a dedicated system for extracting numerosity, but are guided entirely by nonnumeric cues when estimating number, we should expect the opposite. We found a congruency effect that was reduced by a prolonged presentation. This result supports the idea that congruency effects can be accounted for by attentional processes, but it seems inconsistent with an account suggesting that people fundamentally extract numerosity by means of integrating visual cues.
Experiment 3 addressed the concerns of a lack of a direct measure of attention. Results showed that stimulus type had an effect on which dot array participants directed their initial attention to. They predominantly looked first towards the stimulus with the dots of larger individual size. For congruent stimuli, they consequently initially looked at the more numerous array, but for incongruent stimuli, they looked towards the less numerous array. The stimulus that participants looked at first was associated with their responses, so that when they looked at the less numerous stimulus set first, they were more likely to be wrong than when the initial gaze was towards the more numerous stimulus set.
These results suggest that nonnumeric stimulus properties drive participants’ attention, which in turn influences their responses. It is reasonable to conclude that the short presentation times often used in dot-comparison tasks do not allow participants to correct an initial bias to respond in accordance with the first fixated array.
There is by now a large body of evidence showing that nonnumeric variables influence number estimation (e.g., Gebuis & Reynvoet, 2012; Gilmore et al., 2013; Poom et al., 2019; Szűcs et al., 2013; Tokita & Ishiguchi, 2010). Thus, a naïve notion of an approximate number system that allows us efficiently to estimate number, irrespective of the circumstances, can hardly be defended. However, the numerous demonstrations that judgments of numerosity are influenced by nonnumeric variables should not be taken as evidence that nullify the existence of a dedicated system for numerosity processing. It is rather a general phenomenon in perception that visual experiences of basic features, such as color and motion, are influenced by a number of variables other than the activation of dedicated feature-specific detectors. For example, a basic visual task, such as color perception, depends on context (simultaneous color contrast), the spectral content of the illuminating light which is discounted (allowing color constancy), whether or not the colored target is in shadow or directly illuminated (allowing brightness constancy), adaptation with subsequent color aftereffects, prior experiences with objects of known color (leading to biases in color matching tasks), and synaesthesia, where for some people input in other sensory organs are misperceived and appear as colors. These cues modify the perceptual experience and modify the output from the excitations of the three types of cones acting as bandpass filters of the spectra of incoming light in the retina. However, despite the influence of such extraneous variables on color perception, there can be no doubt that humans are equipped with a system dedicated to color perception.
There are, of course, limitations to the current study. Numerosity and nonnumerical stimulus properties are necessarily coupled. We have argued that item size may be the crucial preattentive dimension, but array area and convex hull may also be involved in preattentive attraction of gaze, something that cannot be disentangled from the results in Experiment 3 since these variables covaried. In Experiments 1 and 2, there is a confound between item size and the proportion of covered area within the array, since the array area was kept constant in congruent and incongruent subsets. Still, since item size has been found to be a stimulus feature picked up preattentively, this stimulus dimension is, in our opinion, most likely the culprit in this case (Wolfe & Horowitz, 2004). It should be noted that number is also a feature that can be picked up preattentively (e.g., Castaldi et al., 2020; Cicchini et al., 2016; Ferrigno et al., 2017). Thus, it is likely that item size becomes relatively more salient than the number of objects in incongruent stimulus sets, while the two dimensions align in congruent stimulus sets.
Our results show that overt reflexive attention, defined as selectively processing one location over others by moving the eyes to a point at that location influences decisions in size-manipulated numerosity tasks. Covert attention, defined as paying attention without moving the eyes (Posner, 1980), may nevertheless also play a role here since both types of attention behave similarly (Blair & Ristic, 2019). Most of the studies that find a link between math proficiency and ANS in adults have used spatially intermixed dot displays, whereas failures to demonstrate a link often have used spatially separated displays (see Norris & Castronovo, 2016). This could be interpreted as overt attention and gaze playing more important roles in separated displays that therefore should be avoided. However, spatially intermixed presentation is probably no safeguard against attentional effects. Both congruency effects and the pattern of correlations where performance on incongruent items drives correlations with math have been found with intermixed presentation (Norris & Castronovo, 2016), implying that covert attention processes play a role also with the latter design. In addition, as spatial based attention is directed to locations in the visual field, object-based attention is directed to organized chunks of visual information corresponding to an object or a group of items belonging to the same object or a group in the environment (Mozer & Vecera, 2005). Potentially, both such object-based attention and covert attention processes could account for influences of preattentional features when using spatially intermixed displays.
A methodological implication of these results would seem to be that applying a sequential one-at-a-time stimulus presentation would be preferable because this presentation eliminates attentional processes. However, as shown by Lindskog et al. (2013), sequential presentation may be less predictive of math performance due to the introduction of a time-order error by which the most recently presented stimulus set appears to be more numerous (the “recent-is-more” effect; see van den Berg, Lindskog, Poom, & Winman, 2017), that further complicates the use of this method.
Researchers have repeatedly emphasized the importance of controlling for visual variables in ANS acuity number estimation tasks and have taken great measures to achieve this. This notion is echoed in a methodological review (Dietrich, Huber, & Nuerk, 2015), where it is stated that “controlling visual properties is essential to ensure that the task really measures the ability to discriminate numerosity and not the ability to discriminate visual cues” (p. 10). Our results imply that this appeal for control does not come without serious caveats. We surmise that some studies with a zealous aim to achieve this control paradoxically have ended up with stimuli that, in an almost grotesquely conspicuous way, have signalled nonnumeric visual properties such as object size, with a strong impact on judgment processes. Often, these studies, rather than “controlling for” visual properties are better described as psychological experiments set up to determine whether or not number judgments are influenced by nonnumeric continuous visual properties when these are manipulated systematically. That nonnumeric variables influence number judgment is a well-established fact, albeit not entirely surprising. The message conveyed here is that maybe we should direct our effort at more interesting theoretical research problems other than how to cleverly achieve stringent stimulus control and consequently having to explain congruency effects that are laboratory artifacts of this control. It is by now clear that control of nonnumeric features can create large problems when measuring ANS acuity, but it is less obvious what, if any, problem that actually is solved by this procedure. If we are worried by participants relying on nonnumeric attributes, we should obviously instead refrain from controlling for these variables altogether, because, as we have shown, such control has the opposite effect. If a participant, when making a number estimate, is influenced by, for example, the cumulative area of the objects, just like he or she could be in a natural environment, we would not necessarily consider this a problem.
In a natural environment, certain statistical regularities between perceptual variables hold. Organisms have evolved to exploit these regularities, and they should be preserved in the laboratory, instead of being distorted by rigorous control. The fact that cumulative area commonly covaries with number is not a bug, but a feature of the environment. In accordance with Brunswik (1955), we argue for the necessity of a design that is representative of the environment and, instead of interfering with this environment, strives to retain its causal texture in stimuli that are used in the laboratory (see Dhami, Hertwig, & Hoffrage, 2004, for an extensive elaboration of this “representative design” concept). However, rigorous stimulus control is reasonable to use in investigations aimed to discover whether infants or animals can discriminate between numerical and nonnumerical stimulus dimensions, or have any concept of numerosity at all (Feigenson et al., 2004). Thus, stimulus control may be necessary when asking the existence research question “Can they do this?,” but could be counterproductive in measuring proficiency in the ability: “How well does individual X perform at this?”
To summarize, results from the experiments provide support for the idea that attentional processes can account for congruency effects in dot-comparison tasks. There are also two further implications from our findings. First, even though it is commendable that researchers want to control strictly for possible confounds in dot-comparison tasks, it is important to note that when such controls are made in an extreme way, they most likely coincidently contaminate the tasks, leaving them less valid measures of ANS acuity. Second, although much more research is needed before the debate is settled, congruency effects per se cannot be taken as evidence that humans lack a dedicated system for extracting numerosity from a visual scene.