Substantial empirical evidence has supported the idea that humans possess innate neural mechanisms that generate approximate numerical representations (e.g., Brannon, 2006; Cantlon, Platt, & Brannon, 2009; Feigenson, Dehaene, & Spelke, 2004; Whalen, Gallistel, & Gelman, 1999). Furthermore, converging empirical findings from several areas of cognitive neuroscience argue for biologically determined mechanisms for approximate numerical representation (e.g., Cantlon, Brannon, Carter, & Pelphrey, 2006; Nieder & Dehaene, 2009; Piazza, Izard, Pinel, Bihan, & Dehaene, 2004). In line with these arguments, it has been claimed that a genuinely abstract numerical representation would be capable of representing the numerosity of any set of discrete elements independently of the processing of continuous magnitudes.

However, in practice, significant empirical evidence has shown potential interactions between the processing of numerosity and continuous magnitudes such as time and space (e.g., Casasanto & Boroditsky, 2008; Droit-Volet, Clement, & Fayol, 2003; Roitman, Brannon, Andrews, & Platt, 2007; Walsh, 2003; Xuan, Zhan, He, & Chen, 2007). In neurobiological and neuroimaging fields, there is a debate as to whether the human brain processes a shared representation for various types of magnitude, such as numerical quantities, physical size, and loudness (e.g., Kadosh, Lammertyn, & Izard, 2008).

Generally, two types of hypotheses have been presented concerning the relationship between temporal and numerosity processing: a separate system model and the shared system model.

Proponents of the separate system model assume that information is processed in dimension specific systems for time, numerosity, and other magnitudes. This hypothesis predicts no interaction between magnitude dimensions, since the magnitude information is processed independently. A recent report of Agrillo, Ranpura, and Butterworth (2010) supported this prediction. They investigated the relationship between the estimation of time and number by using a conflict paradigm, in which observers were required to estimate either the duration of the stimuli or the number of tones presented. The results showed that estimates of duration were unaffected by the number of tones and that estimates of numerosity were unaffected by duration.

The hypothesis of a shared mechanism assumes that numerosity and temporal information may be dissociated but also share a common neural basis (e.g., Cappelletti, Freeman, & Cipolotti, 2009; Dormal, Andres, & Pesenti, 2008). This hypothesis predicts asymmetrical or symmetrical interactions between numerosity and temporal processing. In animal studies, interactions between numerosity and temporal processing have shown unidirectionality, such that only numerosity influences time whereas time does not influence numerosity (e.g., Breukelaar & Dalrymple-Alford, 1998; Roberts, Coughlin, & Roberts, 2000; Roitman, Andrews, Brannon, & Platt, 2003). Data from human studies have also shown asymmetrical interactions between numerosity and temporal processing, but opposite to the results from animals; numerical information has been shown to interfere with judgment of duration, whereas duration did not affect numerical judgments (Dormal, Seron, & Pesenti, 2006; Droit-Volet et al., 2003; Roitman et al., 2007). For instance, many studies have shown that when subjects performed a numerical task or judged the numerosity of dots, the temporal intervals were perceived as shorter than their physical duration (Dormal et al., 2008; Dormal et al., 2006; Droit-Volet et al., 2003; Xuan et al., 2007).

In these studies, greater sensitivity has been shown to numerosity discrimination than to duration discrimination, which was evident by the small Weber fraction for numerosity judgments, as compared with temporal judgments. Two explanations have been suggested to account for the unidirectional interactions of numerosity over time in human observers. One is that the processing of numerosity is comparatively less demanding of attention than is time (Dormal et al., 2006; Droit-Volet et al., 2003; Roitman et al., 2007). Another explanation is that observers frequently experience explicit judgments on numerosity, whereas observers rarely experience judgments on duration in the retrospective manner (Dormal et al., 2006).

In this study, we investigated whether and how temporal information would affect the performance of numerosity discrimination of sequential stimuli, in an attempt to make clear the relationship between temporal and numerosity processing. Our experiment had distinctive features that differed from those in previous studies. To assess the effect of temporal information, we measured both precision and accuracy in a particular experiment. Although some studies have tested precision (i.e., variability of the observer’s response), few studies have measured accuracy (i.e., whether the number of events were overestimated or underestimated). Since temporal information could affect both the precision and accuracy of the performance, we considered it is necessary to measure both.

If the performance of numerosity discrimination was affected by temporal information, it would provide further evidence for a shared system for temporal and numerosity processing.

Experiment

We tested the effect of temporal information on accuracy and precision in numerosity discrimination of sequential events. We manipulated the duration of the event presentation (i.e., the stimulus duration) and the total interval of the sequence (i.e., the total interval). The schematic view of the manipulation of temporal information is shown in Fig. 1. The range of the independent variables for each condition is shown in Table 1. We employed three levels of standard event numbers (i.e., standard number) to test whether and how the effect of temporal information would differ among standard numbers.

Fig. 1
figure 1

Schematic presentation of the stimuli. a Comparison sequence in the control condition; b Long -duration condition. c Long-interval condition

Table 1 The range of stimulus duration, stimulus interval and total interval in each condition

We used the method of constant stimuli, in which observers decided on each trial which visual sequence, a standard sequence or a comparison sequence, had more events. To test the precision, we derived Weber fractions that indicated the observer’s variance of numerosity comparison. Both behavioral and neurobiological evidence showed that the performance of numerosity comparison obeys the Weber law: Discriminability depends on the ratio of the numerosity to be compared (see, e.g., Boisvert, Abroms, & Roberts, 2003; Brannon, 2006; Cantlon et al., 2009; Cordes, Gelman, Gallistel, & Whalen, 2001; Whalen et al., 1999). The value of the Weber fraction has been estimated to range from 0.10 to 0.14, on the basis of the coefficient of variances for nonverbal counting tasks in previous studies (e.g., Cordes et al., 2001; Whalen et al., 1999). If the stimulus duration and/or the total interval affects the precision of the numerosity comparison, the Weber fraction will deviate from those in the control condition. To test the accuracy of the numerical comparison, we derived the point of subjective equality (PSE). If the stimulus duration and/or the stimulus interval affects the numerosity judgment, dissociation between the actual event number and perceived numerosity will occur, and the PSE will deviate from the actual value.

Method

Participants

Eight observers participated in the experiment. None of the observers had prior experience with numerosity comparison of sequential events. All observers had normal or corrected-to-normal vision.

Design

Two independent variables were examined in the experiment: manipulation of temporal information (the control condition, the long-duration condition, and the long-interval condition) and standard event number (5, 10, and 20). Two sets of sequences, a standard sequence and a comparison sequence, appeared successively in random order. The numbers of events in the comparison sequences were 3, 4, 6, and 7 for the event number of 5, 8, 9, 11, and 12 for the event number of 10, and 15, 17, 19, 21, 23, and 25 for the event number of 20. Trials in the three temporal information conditions and the three standard numbers were intermixed in a block. The temporal information factor was manipulated within a block.

Each condition had 280 trials (20 repetitions × 4 comparison levels at a standard number of 5 and 10, and 20 repetitions × 6 comparison levels at a standard element of 20) and resulted in 840 trials in total. Each of the blocks had 84 trials, with 10 blocks in total. The sequence of the trial was completely randomized in a block. The standard sequence came first on half the trials and second on the other half. The observers were given 20 practice trials before the actual experiment began.

Stimuli

A sequence consisted of a series of events (i.e., flashes of a light dot) on a dark gray background. The diameter of the dot was about 9 min of visual angle.

In the standard sequences, the stimulus duration in a particular sequence had the same value but varied from sequence to sequence from 33 to 50 ms so that the total presentation time was not a reliable cue to numerosity. The average stimulus interval was 142 ms at all standard numbers. In the comparison sequences, the stimulus duration and the total interval were varied in accordance with the conditions.

In the control condition, the stimulus duration and the total interval in the comparison sequences were carefully controlled so that the number of events would be the only cue for numerosity judgments. In the long-duration condition, the stimulus duration for the comparison sequences was 1.5–1.6 times longer than that for the standard sequences. Therefore, average duration varied from 58 to 83 ms. Average blank duration was manipulated as in the control condition. In the long-interval condition, the total interval of the comparison sequences was 1.5–1.6 times longer than that of the standard sequences. The stimulus duration in a particular sequence had the same value but varied from sequence to sequence from 33 to 50 ms, as in the control condition.

Stimulus intervals were carefully determined so that the observers would not make judgments on the basis of verbal counting and/or temporal patterns. To make verbal counting impossible, the longest stimulus interval was set to be less than 266 ms, since previous studies had proved that observers could not rely on verbal or subverbal counting within that duration (e.g., Piazza, Mechelli, Price, & Butterworth, 2006). To make the sequence aperiodic, we randomly added the temporal jitter (8, 17, 25, and 33 ms) to blank durations so that the temporal rate would not constitute a rhythmic pattern and a temporal pattern would not be a cue for numerosity discrimination. However, psychophysical testing revealed that with this degree of jitter, participants were unable to discriminate periodic and aperiodic sequences; observers did not perceive the sequence as aperiodic unless the size of the jitter was as large as 33 ms.

Measurements

The PSE and the Weber fractions were measured using the method of constant stumuli. First, the number of elements in comparison sequences was plotted on the x-axis, and the proportion of greater response for each comparison sequence was plotted on the y-axis. The plotted data points constructed the psychometric function approximated by a cumulative Gaussian function, on which the difference threshold was obtained. The difference threshold was defined as the smallest amount of event number change for which a correct response rate of 75% was achieved. The Weber fractions were obtained by dividing the difference thresholds by the standard numbers. The PSEs were obtained as the value of the location on the psychometric function at which the standard and comparative sequence choice probabilities were equal to 50%. In this experiment, we applied the standardized PSE by dividing the PSE by the number of standard events.

Procedure

Observers sat in a darkened room at a distance of approximately 115 cm from the presentation screen. A keypad was placed directly in front of the observers. The observers made responses by pressing the “1” or “3” key. Each trial started with a red fixation cross for 400 ms, followed by the first sequence. Two sets of sequences, a standard sequence and a comparison sequence, were shown in succession in random order. The two sequences were separated by an interval of 960 ms. The observer’s task was to answer which sequence, the first or second, contained more events. No feedback on the correctness of choices was provided. At the beginning of each session, participants were explicitly instructed to attend to the number of events presented. The observers were also told to discriminate on the basis of the numerosity they felt, and not by verbal counting.

A Macintosh G4 computer was used to generate the display and to record the data. The stimuli were presented on a color monitor at a refresh rate of 120 Hz (SONY Color Graphic Display Model GDM-F400).

Results

The fits of data points to psychometric functions were generally good, and the Pearson product–moment correlation coefficient exceeded .9 in all cases, with the exception of 1 observer. The data for this observer were excluded, while the data of the remaining 7 observers were used for further analysis.

Figure 2 shows the Weber fractions and the standardized PSEs of individual observers in each condition. Figure 3 shows the mean Weber fraction and the mean standardized PSE in each condition as a function of standard numbers. Dotted lines on the figures of the PSEs indicated a standardized PSE value of 1.0. The x-axis presents three conditions of time manipulation. Dotted lines on the figures of the standardized PSEs indicate the PSE value of 1.0.

Fig. 2
figure 2

The Weber fractions and the standardized points of subjective equality (PSEs) of individual observers; the x-axis presents three conditions of time manipulation. Dotted lines on the figures of the standardized PSEs indicate the PSE value of 1.0

Fig. 3
figure 3

Mean Weber fraction (a) and mean standardized PSE (b) as a function of standard numbers in the experiment. Error bars represent standard deviations

A 3 (temporal information condition) × 3 (standard number) repeated measures analysis of variance (ANOVA) was conducted on the individual Weber fractions. There were significant main effects of condition, F(2, 12) = 7.912, p < .01; a Bonferroni post hoc analysis revealed that the Weber fractions in the long-duration and the long-interval conditions were significantly larger than those in the control condition, indicating that precision was substantially worse in the long-duration and long-interval conditions than in the control condition (p < .01). There was no significant main effect of the standard numbers, F(2, 12) = 2.159, p > .1, suggesting that precision did not differ across standard numbers.

In order to test how the manipulation of temporal information affects the accuracy of numerosity comparison, a 3 (temporal information condition) × 3 (standard number) repeated measures ANOVA was conducted on the standardized PSEs. There were significant main effects of condition, F(2, 12) = 3.056, p < .05. A Bonferroni post hoc analysis revealed that the standardized PSEs in the long-interval condition were significantly larger than those in the control and long-duration conditions (p < .01). There was no significant main effect of the standard numbers, F(2, 12) = 1.348, p > .1, suggesting that accuracy did not differ across standard numbers.

Furthermore, we carried out a one-sample t-test to compare the mean standardized PSE of each standard number in each condition with the PSE of 1.0. In the control and long-duration conditions, there was no significant difference between the mean PSE and 1.0 at any standard number. In the long-interval condition, the mean standardized PSE was significantly larger than 1.0 at standard numbers of 5, t(6) = 3.148, p < .05, 10, t(6) = 3.325, p < .05, and 20, t(6) = 2.482, p < .05, suggesting an underestimation of the event number in the long-interval condition.

Discussion

We investigated whether and how temporal information would affect the performance of numerosity discrimination in sequentially presented events. The results clearly showed that temporal information affected the performance of numerosity discrimination; precision deteriorated when the event duration and the total interval were manipulated, and the numbers of events for the longer total interval were judged as less numerous than those for the shorter total interval across standard numbers. This provides evidence for the interaction of temporal information over numerosity processing. Thus, our results could support the existence of a shared system for these magnitude dimensions. In addition, our result is inconsistent with the claim that a genuinely abstract numerical representation would be capable of representing numerosity independently of the processing of continuous magnitudes. Further investigation is necessary to reveal the relation between the numerosity representation system and continuous magnitudes.

Two important questions arise from our results. First, how was the direction of the bias determined? The number of events in the sequence were judged less numerous in the long-interval condition than in the shorter interval condition. However, the fact that observers are likely to judge the event number as less numerous for the longer interval than for the shorter interval contradicts the idea that the magnitude of the total interval interferes with numerosity judgments; observers should judge the event number for the long interval as more numerous when temporal information interferes. Two possibilities could be considered. One is that as the total interval becomes longer, the number of events stored in the internal accumulator is lessened, due to memory decay over time. Another possibility is that the observer might be likely to judge the number of events presented at a high rate as more numerous than those presented at a low rate. Many studies have pointed out that flashes or tones presented at a higher rate tend to be overestimated for the actual number (e.g., Philippia, van Erp & Werkhoven, 2008; Romo & Salinas, 2003).

Second, in both the long-duration and long-interval conditions, precision was significantly poorer than in the control condition. By what process did precision decrease when temporal information was manipulated? One possibility is that the decrease in precision was due to the limitation of cues for numerosity judgments. In a numerosity judgment, multiple perceptual cues, such as the total presentation time, the rate of event presentation, and the temporal configuration of the events’ sequence, could be used. By the manipulation of temporal information, the amount of cues for numerosity could be minimized, yielding a decrease in precision. To reveal the detailed relationship between time and numerosity processing, further investigations are necessary to answer these questions.

It should be noted that there were inconsistencies between our results and those in previous studies. Two types of behavioral evidence have typically been presented: One has demonstrated unidirectional interference of numerosity and temporal information in such a way that numerosity processing affects temporal processing (e.g., Droit-Volet et al., 2003; Dormal et al., 2006; Roitman et al., 2007); another has demonstrated that estimates of duration and numerosity are mutually unaffected (Agrillo, Ranpura, & Butterworth, 2010). How can we reconcile these discrepancies with our results?

As for the possible source of unidirectional interference, we could predict that, in the previous studies, temporal information was not salient enough for the observers to notice the temporal variation. In the present study, variation of temporal information in the long-duration and the long-interval conditions was salient enough that the observers noticed the difference.

As for the study by Agrillo et al. (2010), observers were required to estimate the duration or the number of tones by holding down or tapping the space bar as long as or as often as they estimated. This experimental procedure produces two possible sources for results that have been discrepant from ours. First, some of the stimulus durations were much longer than 266 ms, and observers might have resorted to verbal or subverbal counting during the task. Under the condition that verbal counting is possible, no interactions between numerosity and temporal processing should be predicted, because both estimations of duration and of numerosity could be derived from the counting strategies. Second, in their estimation task, auditory stimuli were presented at the presentation phase, whereas tapping or pressing a space bar was required in the response phase (Agrillo et al., 2010). Since the modality specificity of temporal processing has been suggested (e.g., Ivry & Schlerf, 2008), the difference of modality in the presentation phase and the response phase could deteriorate the accuracy in the measurement of the interaction between time and numerosity.

Despite the carefully controlled procedure, the present study has a significant limitation in determining the processing level at which temporal information affects judgment of numerosity. More specifically, our results did not make explicit whether the increase in the Weber fraction was due to shared systems for time and numerosity, or reflected the generic effect of distraction caused by variation in the magnitudes of any irrelevant dimensions, since variation on any irrelevant dimension would produce an increase in variability. To dissociate these levels, we need to provide evidence that only temporal manipulation would interfere with judgment of numerosity. To this end, in a future study, we need to examine whether variation of irrelevant dimensions such as stimulus size and brightness would interfere when temporal information is constant.

In conclusion, we provide the evidence for interaction of temporal information with numerosity processing. Our results support the shared system model between time and numerosity processing. To reveal the details of the relationship between these dimensions, we need to clarify the process by which interference occurred, how the direction of bias is determined, and whether and how the interactions between different magnitude dimensions change over practice.