Enumerating by pointing to locations: a new method for measuring the numerosity of visual object representations
- First Online:
- Cite this article as:
- Haladjian, H.H. & Pylyshyn, Z.W. Atten Percept Psychophys (2011) 73: 303. doi:10.3758/s13414-010-0030-5
- 425 Downloads
The fast and accurate enumeration of a small set of objects, called subitizing, is thought to involve a different mechanism from other numerosity judgments, such as those based on estimation. In this report, we examine the subitizing limit using a novel enumeration task that obtained the perceived locations of enumerated objects. Observers were shown brief masked displays (50, 200, and 350 ms) of 2–9 small black discs randomly placed on a gray screen and then asked to place a marker where each disc had been located. The number of these markers provided an estimate of the number of items processed. This “pointing” methodology enabled observers to accurately “enumerate” displays containing up to six items in contrast with the four-item limit typically found when using standard reporting methods (and replicated here in Experiment 2). These results suggest a different account of the limits found in most subitizing and enumeration studies.
KeywordsAttention Spatial perception Subitizing Enumeration
The ability to identify and locate a number of objects in a visual scene serves many cognitive functions, such as navigating through crowded environments or playing team sports. Counting is a related ability, where accurate enumeration requires individuating discrete perceptual objects. Some theories of enumeration posit that it is achieved via a magnitude estimation mechanism. This system is observed in animals and supports the internal representations of numerosity and duration (Meck & Church, 1983). This mechanism is believed to also enable verbal counting in humans when number labels are mapped to discrete units on this continuous representation (Gallistel & Gelman, 1992).
Results from some studies, however, suggest that different mechanisms may be responsible for computing small numerosities. When enumerating sets of 1–4 items, observers make few errors with modestly increasing reaction times (RT) as set size increases. The rate of errors and RT, however, increases substantially for sets with more than four items (Trick & Pylyshyn, 1993). The term subitizing refers to this quick and accurate small-set enumeration (Kaufman, Lord, Reese, & Volkmann, 1949). This observation has led some to argue that, in addition to magnitude representations, another system can individuate and select up to four items in parallel and then numerals can be mapped to these items (Feigenson, Dehaene, & Spelke, 2004; Trick & Pylyshyn, 1994a, 1994b). The interpretation of these results, however, remains contested. Some studies attribute the difference in small- and large-set enumeration to capacity limitations of information transfer into short-term memory (Cowan, 2001; Klahr, 1973). Others posit subitizing as pattern-recognition, where familiar patterns formed by fewer items are recognized more quickly, for example, like the patterns on dice (Mandler & Shebo, 1982). Another view argues that the RT slope does not change suddenly but rather increases as a continuous function of increasing variability (Whalen, Gallistel, & Gelman, 1999). Whether or not two separate systems serve enumeration remains an open question.
In this report, we introduce a new experimental methodology designed to examine the relationship between spatial representations and the representation of sets of objects in order to characterize the mechanism that supports subitizing. Specifically, we designed an experiment that measures the accuracy of the spatial encoding of objects and indirectly provides an indication of how many objects were recalled, which serves as a measure of enumeration.1 Observers were shown a brief stimulus (50, 200, or 350 ms) comprised of 2–9 small black discs randomly placed on a gray screen and immediately masked. Then the observers used a mouse to “point to” where the objects had been by placing markers on a blank screen at the former locations of each disc. This methodology allowed us to analyze both location and enumeration accuracy and their relationship.
Twenty-four Rutgers University undergraduates participated in one 45-min session for course credit or payment. The experiment was programmed in MATLAB® using Psychophysics Toolbox 3.0.8 (Brainard, 1997) and controlled by a PC computer running the Windows® XP operating system. The stimuli were displayed on a 19-inch (c. 48.3 cm) color HP P1100 CRT monitor (1,280 × 1,024 pixel resolution at 70 Hz).
The test stimulus consisted of 2–9 identical black discs that appeared on a gray background (to reduce contrast and minimize after-images and phosphor decay). These discs were 35 pixels in diameter (~1° visual angle) and randomly placed on the screen with the following constraints: disc edges could not lie within 115 pixels (~3°) and no farther than 715 pixels (~20°) of each other; additionally, discs could not appear within ~200 pixels (~5°) from the screen edges. This produced an effective viewing display of 21.1° by 16.6° (768 × 614 pixels). The minimum distance between discs was set at approximately 3° since attention requires at least 1° of visual separation for accurate discrimination (Bahcall & Kowler, 1999) and at least 2° when stimuli extend 15° into the periphery (Intriligator & Cavanagh, 2001).
Additionally, a simple linear function was computed between display numerosity and response numerosity (combined durations). For small sets (2–6 items), the degree of linear fit between display numerosity and numerical responses was high: adjusted r2 = 0.971 (p < 0.001) and ß = 0.985. For larger sets (7–9 items), the fit was not as good: adjusted r2 = 0.350 (p < 0.001) with a lower slope (ß = 0.591).
Location error in the pointing task is reported as the Euclidean distance between a stimulus disc and a paired response disc, which was determined as follows. Using Delaunay Triangulation (Kendall, 1989) and nearest-neighbor methods, we identified the likely associated response marker for each stimulus disc in each trial and then calculated the pixel distance between the centers of these paired discs. Some trials resulted with unpaired discs, for example, when an observer miscounted the stimulus. These trials were excluded from the location analysis (approximately 15% of total trials).
The results of Experiment 1 suggest that the “pointing method” allows more items to be processed (and indirectly enumerated) in subitizing than typically reported (e.g., Trick & Pylyshyn, 1994b). Since subitizing experiments have used a variety of methods, there remains the possibility that the subitizing range increase found in this experiment may be due to aspects of our methodology other than pointing to recalled object locations. Therefore, Experiment 2 compared our indirect “pointing method” of inferring how many items had been processed with the more conventional method that relies on observers’ explicit report of the cardinality of the set. Other aspects of the experiment were the same as in the first experiment (e.g., the nature of the displays, the use of the mouse to report cardinality, performance measures). Since Experiment 2 involves the use of a different set of observers, we replicated our pointing method on the new subject population in order to provide a within-subjects comparison of the pointing response versus the symbolic numeral response.
Nineteen Rutgers University undergraduates participated in one 50-min session for course credit or payment. Experiment 2 consisted of two blocks. In the first (control) block, observers simply reported the number of objects by clicking on the corresponding Arabic numeral on the screen. The 20-point Helvetica font numerals (1–12) appeared on the screen equidistant from the central fixation in the form of a clock-like ring with a radius of ~3.8° (140 pixels). The cursor always appeared at the location of the fixation cross and observers clicked on the appropriate number using the mouse pointer. The second block presented the same pointing task described in Experiment 1. This order was always maintained to discourage the use of pointing strategies in the control block (there were no practice effects in Experiment 1). This experiment tested eight numerosities (2–9) and two display durations (50 and 200 ms) for a total of 16 test conditions that were administered 10 times in each block.
Experiment 2 shows that the increased subitizing limit observed in Experiment 1 was not due to any incidental properties of the display or the presentation, but can be attributed to the need to respond by pointing to individual items rather than to a symbolic representation of the set’s cardinality.
In this study, we explored a novel and indirect way of determining how many briefly-presented items can be individuated and retained for further processing. Observers used the mouse to indicate locations of each item in a set of 2–9 discs that were displayed briefly and masked. By asking observers to indicate where each disc had been located, we showed that observers can attend to and recall up to six items. This capacity is in contrast to that obtained when observers only indicated how many items there were. The latter limit is generally known as the subitizing limit and has been widely reported to be around four items (Trick & Pylyshyn, 1994b).
Performance in reporting locations was also highly accurate (average error distance 2.5°), compared to the mean distance between stimulus objects (over 6°). Location accuracy, however, decreased as the number of objects increased, even for small-set displays with 2–6 objects (Figs. 4 and 6), whereas “enumeration” performance only decreased when there were more than six objects (Figs. 2, 3, and 5). These findings suggest that: (1) observers’ enumeration performance is based on items that they had individuated rather than on a strategy that uses some global property of the display (such as the total area of black discs); (2) observers are able to correctly individuate and report objects even when their report of locations was relatively impaired (i.e., the increasing location errors in the 2–6 item condition); and (3) with location responses such as those used here, observers can recall up to six individual objects.
There are several possible explanations for the larger subitizing range observed with the pointing response method. For example, the larger number of items recalled might be due to the use of motor “pointing” gestures. There is evidence that location information may be available for accurately executing motor gestures even when it is not available for verbal report, and vice-versa (Goodale & Milner, 2004). Therefore, the pointing response used in our experiment may tap into a different system of (motor) representation, which in turn leads to external markings that could be used by the symbolic counting process.
Another possible account of the difference between these two methods relies on Visual Indexing Theory. This theory (Pylyshyn, 1989, 2001, 2007) proposes a limited set of indexes that automatically pick out individual visual objects. The indexing mechanism does not itself encode object properties nor does it provide a numerical code for the cardinality of the set of indexed items. It merely provides an indexical reference to the individual objects so that subsequent processes can operate on them. Thus, to derive the cardinality of the set of indexed objects, a subsequent stage of enumeration is required. When there are fewer objects than the indexing limit, enumeration operates over already-individuated and indexed items, rather than over the original display, so it bypasses the slowest aspect of counting (i.e., finding, individuating and marking objects) that must be used to enumerate larger sets. This account is consistent with the finding that the pattern of performance may be affected distinctly by different factors since some factors may affect the first (indexing) stage and others may affect the second (counting) stage. Thus it is consistent with the observation that the “knee” or inflection of the performance curve in Figs. 2 and 3 appear to be shifted towards higher performance as the stimulus duration increases. This also applies to the results in Experiment 2, where the performance differences between the two reporting methodologies were more pronounced in the 50-ms displays. In such short durations, earlier processes such as individuation may be impaired (as reported by Lorinstein & Haber, 1975).
If the subitizing limit is taken to be the largest number of individuals that are retained under ideal perceptual conditions, then we see from Fig. 2 that this number is approximately six items. However, the limit may also be taken to be the largest number of items that can be retained when viewing conditions are less favorable, for example when the display duration is short. Figure 2 shows that when the display is short, the resulting subitizing limit is smaller (around five). The problem with using this as an estimate of the subitizing limit is that the reduction in the robustness is due primarily to performance at the 50-ms stimulus duration. As noted earlier, performance at 50-ms may limit the earliest individuation stage which is only one part of the subitizing process.
Perhaps an even more promising account of why enumerating by pointing may be more efficient is that it may provide a way to keep track of items that have already been counted. In some cases, this may be done by clustering already-counted objects into mnemonic groups, which may be why grouping objects into canonical patterns improves the efficiency of enumeration (Mandler & Shebo, 1982). Another way to mark the already-counted items is available when the pointing method is used and if we assume that pointing benefits from the motor representation (via the dorsal visual stream). If the objects are no longer present, as in our experiment, using this motor representation to place marks on their former locations can help keep track of already-counted objects. As long as one can associate particular objects with particular marks (located with a precision at least as accurate as the inter-object spacing), the marks placed on the screen will provide a visible mark that can be used to identify already-counted objects.
Thus, there are several candidate hypotheses for how the pointing method might help to increase the span of recall or “enumerating” in subitizing experiments. These provide theoretical challenges as well as ideas for further experiments that may support one or another of these options.
Note on terminology: We measure the number of items recalled by counting the number of item-locations observers marked. We call this the number of items “enumerated” (even though the observer does not actually provide a numerical response) because the number of item-locations marked is a measure of the number of items that the observer has attended and recalled and thus in effect non-symbolically enumerated.
This research was supported by NSF IGERT #0549115 (H.H.H.) and Rutgers University institutional research funds (Z.W.P.). The authors thank Xiaotao Su (programmer), three anonymous reviewers, and Deborah Aks, Randy Gallistel, Manish Singh, and the Rutgers Visual Attention Lab for insightful discussions.