Behavioral evidence for format-dependent processes in approximate numerosity representation
A genuinely abstract number representation is thought to be capable of representing the numerosity of any set of discrete elements, whether they are sequentially or simultaneously presented. Recent neuroimaging studies, however, have demonstrated that different areas of intraparietal sulcus play a role in extracting numerosity across simultaneous or sequential presentation during a quantification process, suggesting the existence of a format-dependent numerical system. To test whether behavioral evidence exists for format-dependent numerical processing in adult humans, we measured the Weber fractions of numerosity discrimination for sequential stimuli, simultaneous stimuli, and cross-format stimuli with a carefully controlled experimental procedure. The results showed distinct differences between the performance in the simultaneous and sequential conditions, supporting the existence of format-dependent processes for numerosity representation. Moreover, the performance on cross-format trials differed among participants, with the exception that performance was always worse than in the simultaneous condition. Taken together, our findings suggest that numerical representation may involve a complex set of multiple stages.
KeywordsNumerosity discriminationSimultaneous presentationSequential presentationCross-format presentation
Many studies have pointed to the idea that adult humans, as well as infants and animals, possess a domain-specific system for approximate numerosity representation (e.g., Cantlon & Brannon, 2006; Feigenson, Dehaene, & Spelke, 2004; Hauser, Tsao, Garcia, & Spelke, 2003; Whalen, Gallistel, & Gelman, 1999). Furthermore, converging empirical findings from several areas of cognitive neuroscience have collectively argued for biologically determined mechanisms for such a representation ability (e.g., Cantlon, Brannon, Carter, & Pelphrey, 2006; Nieder & Dehaene, 2009; Piazza, 2010). At the same time, certain researchers have prompted extensive investigation into the processes of number representation in the behavioral and neuroimaging fields (e.g., Cohen Kadosh, Lammertyn, & Izard, 2008; Cohen Kadosh & Walsh, 2009; Göbel, Johansen-Berg, Behrens, & Rushworth, 2004; Simon, 1999). For example, Cohen Kadosh and Walsh challenged the idea of abstract numerical representation by presenting various empirical data showing both methodological and theoretical limitations of the earlier studies.
One of the claims made by the proponents of abstract numerical representation is that the processing of approximate numerosity is independent of presentation format. They have argued that abstract numerical representation could genuinely be capable of representing the numerosity of any set of discrete elements, whether of events or items and whether sequentially or simultaneously presented (Barth, Kanwisher, & Spelke, 2003; Dehaene, Dehaene-Lambertz, & Cohen, 1998; Gallistel & Gelman, 1992; Jordan & Brannon, 2006; Piazza, 2010). In line with this theory, Barth et al. demonstrated that there was no cost of comparing numerosities across versus within visual and auditory stimulus sets, or across versus within simultaneous and sequential sets. Barth et al. claimed that comparison across presentation formats or modalities is not performed using format-specific or modality-specific numerosity representations, but rather using the true, abstract numerosity representation system. Evidence for format-independent or modality-independent numerical representation ability has also been claimed in both infants (e.g., Jordan & Brannon, 2006; Kobayashi, Hiraki, & Hasegawa, 2005) and animals (Jordan, Brannon, Logothetis, & Ghazanfar, 2005).
It has, however, remained unclear whether approximate numerical representations are truly format independent. Three primary reasons exist for doubting the format independence of approximate numerical representation. First, recent neurophysiological and neuroimaging studies have suggested that numerical information about sequentially and simultaneously presented stimuli can be extracted via different processes. Nieder, Diester, and Tudusciuc (2006), for example, investigated the role of individual intraparietal sulcus (IPS) neurons in representing simultaneously and sequentially presented elements in trained monkeys. Nieder et al. demonstrated that different populations of neurons were engaged in extracting numerosity in either simultaneous or sequential presentation, suggesting segregated processing of simultaneous and sequential numerical quantity. They also found that another neuronal population represented the numerosity of a set, regardless of whether it had been presented in a simultaneous or a sequential format, suggesting the existence of a final convergence of the segregated information to form abstract quantity representations. In their behavioral data, Nieder et al. showed that numerosity discrimination is more difficult with sequentially presented than with simultaneously presented stimuli. They argued that, in contrast to immediate, perceptual-like assessment of numerical information in a simultaneous array, sequential enumeration demands a more complex coding of numerical information. To test whether such format-dependent representation of numerosity also exists in humans, Dormal, Andres, Dormal, and Pesenti (2010) examined the areas of IPS that were activated in numerosity processing of simultaneous and sequential stimuli. They found that the processing of simultaneous stimuli bilaterally induced activation in several areas of IPS, whereas processing of sequential stimuli was restricted to the right hemisphere. Dormal et al. claimed that the right IPS contains both format-dependent and format-independent representations of numerosity.
Second, the processes of extracting numerical information in simultaneously and sequentially presented stimuli have been explained by two different models. That is, extraction of numerosity information from simultaneous visual arrays has been mainly explained by the numerosity detector model (Dehaene & Changeux, 1993), while processing of sequentially presented stimuli has been primarily explained by the mode-control model (Meck & Church, 1983). The numerosity detector model implements numerosity in parallel. This model posits that each visual item is first coded as a local Gaussian distribution of activation by topographically organized input clusters, and items of different sizes are then normalized to a size-independent code. Verguts and Fias (2004) presented a neural model that showed how the numerosity detector could be developed under unsupervised learning. The mode-control model, on the other hand, implements numerosity in series. This model posits that each item is encoded by an impulse from a pacemaker, which is then added to an accumulator. The magnitude in the accumulator at the end of the count is then loaded into memory and forms a representation of the number of a set. Gallistel and Gelman (1992) described the preverbal system of counting and arithmetic reasoning by extending the idea of the mode-control model. Even though numerosity representations derived from both models obey Weber’s law, whether and how the numerosity detector model could process sequentially presented items has not been fully explained. Similarly, how the mode-control model could process simultaneously presented items has not been explained.
Third, limitations may exist within the experimental procedures of the empirical studies that have claimed the format independence of numerosity representation in terms of control of stimuli, precision in measurement, and numbers of items tested. For example, Barth et al. (2003) used a cross-format comparison task and found that the accuracy on these tasks was comparable to that on single-format (i.e., simultaneous- or sequential-format) tasks, suggesting that nonnumerical cues do not play a substantial role even in single-format tasks. The numerical contrasts in their experiments were, however, quite large. As Sophian and Chu (2008) have noted, the contrasts that Barth et al. presented corresponded to Weber fractions of .50 or greater. With this level of measurement precision, differences in the performance of each task could remain undetected. More to the point, in infant and animal studies, the number of items tested was smaller than four. When the number of items is so small, the process of cross-format or cross-modality numerosity comparison could instead be explained by the object-file model (e.g., Simon, 1999; Trick & Pylyshyn, 1994) and not by numerosity representation. Because it remains unclear whether the system for representing small numbers of objects is distinct from the system for representing larger numbers of objects (Ansari, Lyons, van Eimeren, & Xu, 2007), it is necessary to test how and whether the effects of presentation format differ among a variety of numerosities.
In this study, we tested whether and how the numerical comparison of simultaneous, sequential, and cross-format stimuli differs behaviorally under adequate control of the concerns discussed above. We measured the Weber fractions of discrimination tasks to assess differences in performance. Many studies have shown that both behavioral and neuronal tuning functions obey the Weber law (i.e., discriminability depends on the Weber ratio of the numerosity to be compared) over a broad range of numerosities (e.g., Burgess & Barlow, 1983; Nieder & Merten, 2007; Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004; Tokita & Ishiguchi, 2010, 2011). Moreover, some studies have claimed that the Weber fraction could be an indicator of the ability to perform approximate numerosity representation (Halberda, Mazzocco, & Feigenson, 2008). We introduced a wide range of element numbers, including both small and large numerosities, to examine how the element number related to the effects. Most importantly, we applied rigid stimulus controls so that other properties—such as area, contour length, and presentation time—would not be confounded with the number of elements.
The present experiment
Eight participants participated in the experiment. All of them had the same amount of experience in numerosity comparison tasks and had normal or corrected-to-normal vision.
Two independent variables were examined in the experiment: presentation format (simultaneous condition, sequential condition, and cross-format condition) and standard number of elements (5, 10, 20, and 40). The cross-format condition had two subconditions. In Cross-Format Condition 1, the standard stimuli were in a visual array and the comparison stimuli were in a temporal sequence. In Cross-Format Condition 2, the standard stimuli were in a temporal sequence and the comparison stimuli were in a visual array. The numbers of comparison stimuli for the standards of 5, 10, 20, and 40 stimuli were 3, 4, 6, and 7; 8, 9, 11, and 12; 15, 17, 19, 21, 23, and 25; and 30, 34, 38, 42, 46, and 50 stimuli, respectively.
Trials in the simultaneous, the sequential, and the combined cross-format conditions were separated into different blocks. These three blocked experimental conditions were presented between subjects in a pseudocounterbalanced order. Trials with all of the standard element number sets were intermixed within a block. Each condition had 400 trials (20 repetitions × 4 comparison levels for the standards of 5 and 10 stimuli, and 20 repetitions × 6 comparison levels for the standards of 20 and 40 stimuli), resulting in 1,600 trials in total. Each block consisted of 80 trials, for 20 blocks in total. Participants performed five blocks in each experimental session, and thus took 4 days to complete the experiment. Intermissions of approximately 3 min were given between blocks. The sequence of trials was completely randomized within a block; the standard stimuli came first on half of the trials, and second on the remaining trials. Participants were given 20 practice trials before the actual experiment began.
In the simultaneous condition, a set of light-gray dots appeared in a dark-gray display region that varied between 3.4° × 3.4° and 4° × 4° in the center of a monitor. We carefully controlled the size of the dots so that the area and/or contour length would not be confounded with the number of dots (see Supplemental Table A). All of the dots in a particular array were the same size, but the diameter varied from array to array in the standard and comparison arrays. On half of the trials in a block, the average diameters of the dots were equal in the standard and comparison arrays. On the other half of the trials in a block, the diameter of the dots in the comparison array was adjusted so that the summed area of the dots was not correlated with the number of stimuli. Thus, the total area of the dots was not a reliable indicator of numerosity. The dots could be located anywhere within an x–y position noise grid, and the sets of positions varied from array to array. We controlled the minimum interdot distance1 and the regularity of the spatial distribution of the dots2 so that the spatial arrangement of dots was not a reliable cue for the numerosity judgments.
In the sequential condition, a sequence of light-gray dots appeared in the same dark-gray display region as in the simultaneous condition. We carefully controlled the interstimulus interval (ISI) so that the time for a sequence and the presentation rate of the stimuli would not be confounded with the number of elements (see Supplementary Table B). All of the dots in a particular sequence had the same duration, but the durations varied from sequence to sequence between 33 to 50 ms. On half of the trials in a block, the average ISIs in both the standard and comparison sequences were 125 ms. On the remaining half, the average ISIs in the comparison sequences were carefully controlled so that the average total intervals for the standard and comparison sequences would be approximately equal. Thus, the number of events would be the only cue for the numerosity judgments. To make the sequences aperiodic, we randomly added temporal jitter (−24, −17, −8, 0, 8, 17, or 24 ms) to each ISI so that the temporal rate would not constitute a rhythmic pattern. Importantly, the ISIs were carefully determined so that the participants would not make judgments on the basis of verbal counting and/or temporal patterns. To make verbal counting impossible, the longest stimulus interval was set to be less than 250 ms, since previous studies have proved that participants cannot rely on verbal or subverbal counting within that duration (e.g., Piazza, Mechelli, Price, & Butterworth, 2006; Tokita & Ishiguchi, 2011).
Weber fractions were measured using the method of constant stimuli. First, the number of elements in the comparison stimulus was plotted on the x-axis, and the proportion of “greater” responses for each comparison stimulus was plotted on the y-axis. The plotted data points constructed the psychometric function approximated by a cumulative Gaussian function, on which the difference threshold was obtained. This difference threshold was defined as the smallest amount of event number change that would allow a correct response rate of 75% to be achieved. Weber fractions were obtained by dividing the difference thresholds by the standard numbers.
Participants sat in a darkened room at a distance of approximately 115 cm from the presentation screen. A numeric keypad was placed directly in front of the participants, who made responses by pressing the “1” or the “3” key. The stimulus presentation in the simultaneous condition is shown in Fig. 1a. Each trial started with a red fixation cross for 400 ms, followed by the first array. The pairs of arrays—standard and comparison—were shown in random succession. Each array was displayed for 400 ms, with arrays separated by a stimulus interval.
The stimulus presentation in the sequential condition is shown in Fig. 1b. Each trial started with a red fixation cross for 400 ms, followed by the first sequence. Pairs of sequences—standard and comparison—were shown in succession, in random order. The average durations of sequences with standard element numbers of 5, 10, 20, and 40 were 700, 1,500, 3,200, and 6,500 ms, respectively. The two sequences were separated by a stimulus interval.
The stimulus presentation in the cross-format condition is shown in Fig. 1c. In this condition, the procedure was the same as in the other conditions, except that a sequential stimulus and a simultaneous stimulus were shown in succession, in random order.
In an attempt to equalize the memory demands for shorter and longer trials, a delay was introduced in each trial between the presentation of the first and second stimuli (i.e., a stimulus interval). In the simultaneous condition, the average duration of the stimulus interval was 6,500 ms, whereas in the sequential condition, the average durations of the stimulus intervals with standard numbers of 5, 10, 20, and 40 stimuli were 5,800, 5,000, 3,300, and 500 ms, respectively. In the cross-format condition, the same delay was introduced in each trial as in the simultaneous and sequential conditions.
The participants’ task was to choose which stimulus, the first or the second, contained more elements. No feedback on the correctness of their choices was provided. At the beginning of each session, the participants were explicitly instructed to attend to the number of elements presented and to discriminate on the basis of the numerosity they felt, not by verbal counting.
A Macintosh G4 computer was used to generate the display and to record the data. The stimuli were presented on a color monitor with a refresh rate of 120 Hz (SONY Color Graphic Display Model GDM-F400).
To test whether there were systematic differences in Cross-Format Conditions 1 and 2, we utilized a t test to compare the Weber fractions in the two cross-format conditions for each participant. There was no significant difference between the two conditions for all participants except participant S5. This participant showed smaller Weber fractions in Cross-Format Condition 1, in which the presentation format of the standard stimulus was a simultaneous array. Thus, we averaged over the Weber fractions for the two cross-format conditions for all participants except S5 and used the data for further analysis.
To test whether and how precision in numerosity comparison differed between the simultaneous and the sequential conditions, a 2 (presentation format: simultaneous or sequential) × 4 (standard number: 5, 10, 20, or 40) repeated measures analysis of variance (ANOVA) was conducted on the individual Weber fractions using the data from all participants. This yielded a significant main effect of presentation format F(1, 7) = 50.07, p < .01: The Weber fractions in the sequential condition were significantly larger than those in the simultaneous condition. Furthermore, a significant interaction was observed between presentation format and standard number F(3, 21) = 9.87, p < .05. Subsequent analysis revealed that precision was significantly higher at the standard number of 5 F(3, 21) = 16.87, p < .01 in the simultaneous condition, whereas no effect of standard number was observed in the sequential condition F(3, 21) = 0.87, p > .1.
In order to test whether and how precision in numerosity comparison differed between the cross-format and single-format conditions (i.e., the simultaneous and sequential conditions), a 3 (presentation format: simultaneous, sequential, or cross-format) × 4 (standard number: 5, 10, 20, and 40) repeated measures ANOVA was conducted on the individual Weber fractions using the data from 7 participants, excluding the data of participant S5. There was a significant main effect of presentation format F(2, 12) = 25.28, p < .01, and a Bonferroni post hoc analysis revealed that the Weber fractions in the sequential and cross-format conditions were significantly larger than those in the simultaneous condition, indicating that precision in the sequential and the cross-format conditions was substantially worse than in the simultaneous condition.
We investigated whether and how precision in approximate numerosity comparisons between simultaneous, sequential, and cross-format presentations of stimuli would differ across a broad range of numerosities. The results demonstrated three significant differences in performance among the three presentation format conditions. First, the precision of numerosity comparisons was significantly higher for simultaneously presented elements than for sequentially presented elements across all standard numbers. Second, only for simultaneous presentation, the Weber fractions were strikingly smaller with the standard element number of 5 than with any of the other standard numbers, whereas no such effect was observed for sequential and cross-format presentation. Third, there were greater individual differences in performance within cross-format comparisons than within single-format comparisons. Taken together, our results imply the existence of format-dependent processes in representations of numerosity with simultaneous and sequential presentation formats.
Our results are consistent with the neurobiological and neuroimaging studies that have shown the existence of format-dependent numerosity representation (Dormal et al., 2010; Nieder et al., 2006). Together, they argue for independent processing of simultaneous and sequential numerical quantities, suggesting that different neuronal systems may extract numerosity across simultaneous and sequential numerical elements. The lower precision in the sequential condition is also consistent with the results of the behavioral data from Nieder et al.’s investigation. At the same time, our results do not support evidence from the behavioral studies that have claimed format independence for the processing of numerosity representation.
In this case, what is the source of the difference in precision of numerosity representations between simultaneously and sequentially presented stimuli, and how does this discrepancy in precision occur? When a visual array is presented simultaneously, numerosity can be estimated at a glance from the area, density, and/or simultaneous spatial arrangement of the elements. In other words, in this condition, parallel spatial processing of numerosity perception is involved. In contrast, when the elements of a set are presented sequentially, they need to be successively enumerated across time. In this condition, the cardinal value of sequential stimuli can be represented by the last numerical quantity. This process may require memory capacity that is not required for the processing of simultaneous stimuli. Thus, it could be assumed that the processes of extracting the numerical value of items in a set differ between presentations of simultaneous stimuli and sequential stimuli.
The Weber fractions of the cross-format trials were significantly different among our participants, except that they were all higher than in the simultaneous trials. If a convergent system could integrate the information from sequential and simultaneous numerical processing to form highly abstract numerical representations, as Nieder et al. (2006) claimed, performance levels on the cross-format trials should lie between those on the simultaneous and sequential trials. Not all of our results, however, supported this prediction. The performance of 2 of our 8 participants exhibited this trend, but the remaining participants performed differently. These differences in individual performance imply that numerical comparison across formats may be subject to the individual participant’s strategy.
How, then, did the participants compare numerosities across formats? We offer two possibilities. First, the participants may have used one to one correspondent strategies. In this process, participants mentally map one simultaneous element to one sequential element until an element is left unmapped. This process might require neither format-specific number processing systems nor a convergent system, but it might require high working memory load, especially for larger numerosities such as 20 and 40. Another possible strategy was that participants could make a symbolic number estimation at the presentation of a set in each format and compare the two number estimations at the judgment stage of the task. In this process, both precision and accuracy would be crucial in the mapping of a physical numerosity onto a symbolic number word. In the present study, our experimental procedure did not specify the strategies that the participants might have applied in the cross-format trials. We therefore need to test our predictions in further research.
Additionally, the remarkably lower Weber fractions for a standard number of 5 shown in the simultaneous condition suggest that the processing of small and large numerosities engages different mechanisms. Although a standard number of 5 is out of the range of subitizing, we speculate that the lower Weber fractions at this standard number may nonetheless be due to the rapid and accurate numerical estimations of small numerosities. Importantly, the results show that this difference in performance was not observed with sequential presentation. This result is consistent with Cordes, Gelman, Gallistel, and Whalen (2001), who, in a nonverbal counting experiment, demonstrated a constant coefficient of variation within and beyond the subitizing range, suggesting that small numbers were represented on the same continuum as large numbers and were subject to the same scalar variability.
In conclusion, this study has provided behavioral evidence for format-dependent processes in approximate numerosity representation in adult humans. Although many studies support the idea that adult humans, as well as infants and nonhuman animals, share a format-independent numerical representation system, it remains unknown how numerical information from the format-specific systems is combined at the final judgment stage. Our findings imply that the process of approximate numerosity representation is complex and involves multiple stages at the behavioral and neuronal levels.
In this study, the interdot distance was defined by the distance from the center of a dot to those of the other dots. For the interdot distance controls, we set the minimum interdot distance (MinID). We had two levels of MinID for the standard element number of 5, and three levels of MinID for the standard numbers of 10, 20, and 40 stimuli, so that the spatial configuration would not be a cue for the numerosity judgments.
To ensure that the influence of the distributions of dots did not differ across the display sizes, we controlled the relative regularity of the dot distributions by introducing the ratio of relative regularity. The ratio of relative regularity is expressed by MinID/MinIDmax (Durgin, 1995), where MinIDmax is the theoretical maximum of MinID, expressed as the square root of area/number of dots.
This research was partially supported by a Grant-in-Aid for Scientific Research (C) (Grant 22530791) from the Japan Society for the Promotion of Science.