Perceiving an object accurately in peripheral vision becomes exceedingly difficult when similar objects are nearby—a phenomenon known as visual crowding (Bouma, 1970; Stuart & Burian, 1962; Townsend, Taylor, & Brown, 1971). Flanking objects appear to “squash” the crowded target (Korte, 1923, as cited in Levi, 2008), and crowds of objects form an indistinct or unidentifiable jumble of features (Pelli, 2008). The degree of crowding increases with eccentricity (Bouma, 1970) and is more pronounced in the upper visual field (UVF), relative to the lower visual field (LVF; He, Cavanagh, & Intriligator, 1996). Crowding may result from a number of mechanisms: contour integration (Flom, Weymouth, & Kahneman, 1963); lateral interactions (Bouma, 1970); insufficiently small integration fields (Pelli, Palomares, & Majaj, 2004); coarse (He et al., 1996), unfocused, or mislocalized attention (Strasburger, 2005); or averaging (pooling) of target–flanker featural information (Parkes, Lund, Angelucci, Solomon, & Morgan, 2001). Crowding might also occur independently at different stages of visual analysis (Farzin, Rivera, & Whitney, 2009; Louie, Bressler & Whitney, 2007). Whether there is any beneficial by-product of crowding remains an open question.

When multiple objects fill a scene, observers can report the average, or global, property of the set with high precision (Bulakowski, Bressler, & Whitney, 2007; Dakin & Watt, 1997; Watamaniuk & Sekuler, 1992; Williams & Sekuler, 1984). The perception of average, or ensemble, characteristics is beneficial, since it provides information about the texture and the gist of a scene. Perceiving ensemble characteristics is possible when multiple objects are densely packed or are widely spaced and easily individuated (Ariely, 2001; Chong & Treisman, 2003). Ensemble perception results from the extraction of statistical properties of low-level features (e.g., size, orientation, and motion; Ariely, 2001; Parkes et al., 2001; Watamaniuk & Sekuler, 1992) or even high-level, holistically processed objects and attributes (e.g., faces and emotions; Haberman & Whitney, 2007). It has been speculated that ensemble perception is beneficial by alleviating the need to process and represent individual items at a higher level (Ariely, 2001; Haberman, Harp, & Whitney, 2009; Haberman & Whitney, 2007, 2009). Ensemble representations have also been hypothesized to aid in texture segmentation and detection of a deviant in a scene (Cavanagh, 2001).

Few explanations for crowding have proposed that it might also produce a beneficial by-product. One model proposed by Parkes and colleagues in 2001, however, suggested that crowding arises because the visual system automatically pools, or averages, visual features over space. When observers made ensemble orientation judgments, the orientation of a crowded Gabor patch was nevertheless precisely pooled into the perceived ensemble orientation (Parkes et al., 2001). If crowding and ensemble perception share a common mechanism, changes in the degree of crowding should result in a concordant shift in ensemble perception.

In the present experiments, we examined whether the degree of ensemble pooling, defined as an observer’s perception of the average orientation information in the stimulus array, is independent of the degree of crowding. Experiment 1 measured crowding in the UVF and LVF. The strength of crowding was manipulated while holding constant other factors known to strongly modulate this effect (e.g., density and eccentricity). In Experiment 2, we directly measured ensemble perception, using the same approach and stimuli. This allowed us to test whether the degree of integration between the individual elements—the ensemble percept (Fig. 1)—would scale with the crowding effect. Although it is possible to get ensemble perception without crowding, this study addressed the question of whether crowding adds an extra benefit. If crowding facilitates ensemble perception, estimates of ensemble orientation should become more accurate in the UVF as crowding becomes stronger.

Fig. 1
figure 1

Demonstration of crowding and ensemble perception. When viewed in peripheral vision, the orientation of the central target bar is difficult to identify, due to crowding by closely surrounding flankers. However, an observer is easily able to discriminate the average tilt of all the bars—ensemble perception. We measured crowding and ensemble perception in the upper and lower visual fields, using a stimulus similar to that shown in the figure to test the hypothesis that crowding facilitates ensemble perception

Experiment 1

Crowding is asymmetric in the UVF and LVF (He et al., 1996). In Experiment 1, we investigated whether a crowding paradigm might also reveal visual-field-dependent differences in the degree to which flanker orientation biases target judgments. If common mechanisms drive both processes, the biasing effect of flanker orientation should also vary across the UVF and LVF.

Method

Four observers (1 female, 3 male) from the University of California, Davis participated. All the participants were experienced psychophysical observers, and 3 were naïve as to the purpose of the experiment. All had normal or corrected-to-normal vision. The data were collected in the same sessions as data for another experiment on visually guided reaching.

Stimuli were presented on a Toshiba Regza LCD monitor with a display resolution of 1,024 × 768 pixels and a refresh rate of 60 Hz. An iMac computer running MATLAB (The Math Works Inc., Natick, MA) and Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) controlled stimulus presentation. The observers used a chinrest while viewing a monitor placed 51.5 cm away. A ¼-in. sheet of Plexiglas covered the screen.

The stimulus consisted of a black central target bar surrounded by an equidistant radial array of six flanker bars of identical size and color, placed at 0°, 60°, 120°, 180°, 240°, and 300° around the target (see Fig. 1). Each bar was 4.4° long and 0.24° wide, with a rounded top 0.35° in diameter. The background luminance of the monitor was 125 cd/m2, and the luminance of the bars was 0.22 cd/m2. Observers wore an eye patch to ensure monocular fixation (right eye) on a small LED mounted to the left side of the monitor throughout the trial block.

The stimulus always appeared to the right of fixation. The center of the target was separated 18.1° horizontally and 19.7° vertically from the fixation point, and the target could be in the UVF or LVF (equidistant to fixation point). The stimulus appeared in the same location, relative to the screen, for both LVF and UVF discriminations (the fixation point was moved between two locations). The density of the array (center-to-center spacing of the flankers and central target) was varied from 4.3° (most crowded) to 10.7° (least crowded) in four steps of 1.6°. The orientation of the central target bar was set at 5°, 0°, or −5° from vertical, randomly on each trial. Flanking bars were manipulated independently of the target, with each having a random orientation within 30° of vertical in intervals of 5°. The mean, or ensemble, orientation of the six flanker bars on each trial ranged from −15° to 15° about vertical in intervals of 5°. The orientation of the flankers was randomly generated on each trial and gave no information about the orientation of the target. In each trial block, observers were presented with an equal number of all the possible target and mean flanker orientation combinations. The order of testing in the UVF and LVF was randomized for each observer. Participants made 630 total judgments (2 visual fields × 5 flanker densities × 7 ensemble orientations × 3 target orientations × 3 trials).

Observers triggered each trial by pressing the space bar on a keyboard. The stimulus appeared for 500 ms or until the participant removed his or her hand from the space bar to make a response. Observers made a three-alternative forced choice (3AFC) keypress corresponding to the three possible target orientations (leftward tilt, rightward tilt, or vertical).

On separate control trials, a target appeared in isolation. Otherwise, the control trials were identical in timing and procedure to the experimental trials. Each of the four control trial blocks in the UVF and LVF consisted of 15 trials (5 trials at each of the three target orientations).

Results

Figure 2 plots the group and individual observer data for crowded orientation discriminations in the LVF (solid line) and UVF (dotted line). A 2 (visual field) × 5 (density) × 7 (ensemble orientation) × 3 (target orientation) ANOVA confirmed that crowding was greater in the UVF, F(1, 3)  =  15.7, p  <  .05, η 2  =  .84. There were also significant main effects of density, such that less dense arrays were less crowded, F(4, 12)  =  13.7, p  <  .01, η 2  =  .82, and of mean ensemble orientation, where target accuracy was highest with maximum ensemble tilt, F(6, 18)  =  2.7, p  =  .05, η 2  =  .73. Lastly, there was a significant interaction between visual field and density, reflecting the rightward shift in the UVF discriminations apparent in Fig. 2a, F(4, 12)  =  6.4, p  <  .01, η 2  =  .68. To estimate the 66.7% correct threshold for the group as a whole, we used MATLAB’s “pfit” function, running 10,000 bootstrap simulations (Monte Carlo) on the 4 observers’ combined data. To improve the fit of this simulation, data from all the participants were treated as if they had come from a single observer. In the LVF, a flanker density of 8.8° was needed to reach the 67% threshold, while in the UVF, the same threshold was 10.4°. This difference was statistically significant, p  <  .01.

Fig 2
figure 2

Crowding in the upper visual field (UVF) and the lower visual field (LVF) for the group (a) and individual observers (b). Plotted is orientation discrimination performance as a function of flanker density in the LVF (solid line) and UVF (dashed line). “Iso” indicates performance in the isolated target control condition. Individual observer data are plotted with closed symbols for discriminations in the LVF and with open symbols for the UVF. The dashed horizontal line represents chance performance in the three-alternative forced choice task. Error bars indicate standard errors

Control trials, where observers judged the orientation of a bar in isolation, showed that differences in crowding reported above were not due to differences in acuity or task difficulty. Discrimination performance for an isolated target was similar in the UVF and LVF : 90.7% in the LVF and 94.6% in the UVF. This difference was not significant, t(3)  =  2.6, p  =  .08.

How the orientation of flanker bars biased the perception of the crowded target provides insight into the relationship between crowding and integration of flanker information. Figure 3 plots the correlation between target judgments and the mean orientation of the six flanker bars, for each of the five densities tested in the LVF and UVF. Target judgments were biased by the ensemble flanker orientation when the density of the stimuli increased (smaller target–flanker spacing), F  =  12.6, p  <  .01, η 2  =  .808. However, while crowding was more pronounced in the UVF, a 2 (visual field) × 5 (density) ANOVA revealed no difference between the degree that ensemble information was utilized across visual fields, F(1, 3)  =  0.24, n.s. When the orientation of the target bar was included in this ensemble average (six flanker bars plus target), there continued to be no difference in observers’ use of the ensemble orientation in the UVF and LVF, F(1, 3)  =  0.37, n.s. Across visual fields and densities, when the target’s orientation is added to the ensemble, it increases the bias presented in Fig. 3 in the expected positive direction by 0.03 z units, F(1, 3)  =  70.2, p  <  .01, η 2  =  .959, confirming that both target and average flanker orientation contributed to perceived target orientation.

Fig. 3
figure 3

Measure of flanker-orientation-induced bias during crowded target judgments in the upper visual field (UVF) and the lower visual field (LVF) for the group (a) and individual participants (b). Plotted are Fisher z correlations between observers’ target discriminations and the average orientation of the flanker array for the LVF (closed symbols, solid line) and UVF (dashed line, open symbols). Positive values signify that observers’ responses were biased toward the average flanker orientation. There was no significant difference between the degree of ensemble averaging between the LVF and UVF. Error bars indicate standard errors

Experiment 2

Although Experiment 1 suggested a dissociation between crowding and the bias introduced by the average flanker orientation, it did not directly tap into ensemble perception, because observers were instructed only to report on, and focus their attention toward, the central target. The goal of Experiment 2 was to directly measure ensemble perception in the UVF and LVF using the same stimuli as in Experiment 1.

Method

Four observers (1 female, 3 male) participated in this experiment. Two participants in Experiment 1 also participated in this experiment. Three were naïve as to the purpose of the experiment.

The method in Experiment 2 was identical to that in Experiment 1, with the following two exceptions. First, observers were instructed to report the average orientation of the entire ensemble of bars (six radial bars and one central bar). Second, we used a 2AFC design, instead of a 3AFC design; observers discriminated whether the entire ensemble was tilted left or right of vertical. To equate the testing conditions across Experiments 1 and 2, trials on which the ensemble orientation summed to vertical (0°) were presented but were excluded from analysis because of the low number of those trials (15 per visual field). This left 600 total judgments for Experiment 2 (20 trials × 3 trial blocks × 5 flanker densities × 2 visual fields).

Results

Figure 4 presents the participants’ accuracy in discriminating the ensemble orientation in both the UVF and LVF, averaged across the five stimulus densities tested. As was expected, accuracy of ensemble orientation judgments improved with increasing degree of ensemble tilt, F(9, 27)  =  56.1, p  <  .01, η 2  =  .949. Importantly, a 2 (visual field) × 5 (density) ANOVA revealed that visual field had no effect on the accuracy of participants’ judgments of the average ensemble orientation, F(1, 3)  =  0.31, n.s. Furthermore, there was no significant interaction between visual field and density, F(4, 12)  =  1.2, n.s. To compare this ensemble effect with that of crowding in Experiment 1, we separately analyzed the middle ensemble density of 5.9° – the density that had shown the largest effect between visual fields on crowding judgments in Experiment 1. A 2 (visual field) × 10 (ensemble orientation) ANOVA showed no effect of visual field on ensemble perception at a density of 5.9°, F(1, 3)  =  0.06, n.s. Lastly, the manipulation of display density in this ensemble task had little effect on ensemble judgments, F(4, 12)  =  1.6, n.s.

Fig. 4
figure 4

Ensemble orientation task. Plotted is the group accuracy summed across the five ensemble densities tested. The dashed horizontal line represents chance performance in the two-alternative forced choice task. Error bars indicate standard errors

Discussion

This study compared crowded orientation discriminations with ensemble orientation judgments in the UVF and LVF. Experiment 1 showed higher spatial resolution (less crowding) in the lower visual field, replicating He et al. (1996). Interestingly, there was a strong influence of flanker orientation on participants’ reports of target orientation; however, this bias did not differ across the UVF and LVF, unlike crowding. Using the same stimuli, Experiment 2 specifically tested whether an observer’s ensemble percept–a measure of the ability to integrate orientation information across space–also shared crowding’s asymmetry across the UVF and LVF. The degree of flanker integration did not differ across the UVF and LVF. Furthermore, stimulus density differentially affected the flanker-induced bias in Experiment 1 and ensemble perception in Experiment 2. This dissociation, potentially owing to the different task strategies employed across tasks, may be taken as further evidence that judgments of crowded orientation and ensemble orientation are not the same. Although crowding and ensemble perception are dissociable processes that do not fully covary in the context of these findings, this does not imply that they are always completely separable processes (i.e., some aspects of crowding may, in fact, overlap with those of ensemble perception).

Although our second experiment confirmed that participants perceive the average orientation in a crowd of tilted features (Parkes et al., 2001), the biasing effect of the flankers on the perceived target orientation reported in Experiment 1 (Fig. 3) reveals both attractive and repulsive effects; targets were reported as being more similar to the flankers in dense displays but different from or in contrast to the flankers at larger separations. This was despite the target being crowded to some degree at all the densities tested (i.e., the accuracy at the largest target–flanker separation was lower than isolated target performance as seen in Fig. 2). Intriguingly, this is in some ways phenomenologically similar to tilt capture and contrast illusions, which may have center–surround organization (Clifford, 2002; Clifford, Wenderoth, & Spehar, 2000; Schwartz, Hsu, & Dayan, 2007). Similar findings with a rod-and-frame stimulus have also demonstrated that the orientation of a central target is either captured or repulsed by the frame, depending on its orientation (Beh, Wenderoth, & Purcell, 1971). Attempts to reconcile both crowding and the tilt illusion under a single opponency model have had some success (e.g., Solomon, Felisberti & Morgan, 2004). Our experiments suggest that future work may need to consider crowding, tilt contrast, and pooling as interacting effects.

The idea that “crowding and texture perception are opposite sides of the same coin,” is an interesting hypothesis, put forth by Parkes et al. (2001), because it suggests that crowding may be, in part, beneficial. They found that observers account for the orientation of a crowded target when discriminating the tilt of an array of Gabors and suggested “that ‘crowding’ is simply the name we give to texture perception when we do not wish it to occur.” This ability to extract an ensemble or average from a set of objects provides a benefit to the observer, since crowding may help by making ensembles easier to compute. Our results—that variation in crowding is not always accompanied by variation in the ensemble percept—suggest that this relationship does not always hold. Our findings suggest that crowding can degrade resolution, while not conferring much benefit to the observer.

The relationship between crowding and ensemble perception has been probed in previous research. However, the question of whether crowding adds some extra benefit has been unanswered. To answer that, we looked for covariation in crowding and ensemble perception. Livne and Sagi (2007) showed that the crowding effect is modulated by changing the configuration of flankers (smooth, interrupted, or “sun” patterned), while holding the overall orientation of the array constant. This finding is in line with that in Banks and Prinzmetal (1976), who, although not specifically testing crowding, demonstrated that changes in the flanker configuration affected reaction times in a peripheral search and detection task. Furthermore, Nandy and Tjan (2007) found that errors in recognizing a crowded target were not always predicted by ensemble pooling alone. Recently, Dakin, Bex, Cass, and Watt (2009) showed that the addition of crowding elements in an ensemble orientation task disrupts local orientation estimates, but not the absolute number of elements that can be integrated. Collectively, these results hint that the processes underlying flanker interference in crowding are more sophisticated than compulsory pooling of the flanker features. Unlike the present study, however, these studies did not test the specific hypothesis that crowding can facilitate ensemble perception.

The finding of Livne and Sagi (2007) that there is less crowding with some configurations suggests that ensembles can be computed before crowding, or that crowding can operate at different levels. In fact, both crowding and ensemble perception can occur selectively at different levels of visual analysis. Crowding operates not just on low-level features, but also on high-level object representations such as upright faces (Farzin et al., 2009 ; Louie et al., 2007 ). Crowding therefore reflects the limits of visual processing, due to both integrative and competitive interference from flanker items at multiple levels of the visual hierarchy. Ensemble perception, in turn, also occurs at multiple levels—including low-level features (Alvarez & Oliva, 2008; Ariely, 2001, Chong & Treisman, 2003; Parkes et al., 2001; Watamaniuk & Sekuler, 1992) and representations of high-level objects such as faces (Haberman & Whitney, 2007, 2009).

Crowding has a deleterious effect on many perceptual tasks: limiting letter recognition (Bouma, 1970; Yu, Cheung, Legge, & Chung, 2007) and reading speed (Chung, 2002; Pelli et al., 2007), slowing eye movements during search (Vlaskamp & Hooge, 2006), impairing the precision of grasp orientation (Bulakowski, Post, & Whitney, 2009), and driving recognition deficits in some observers with disorders of vision, including strabismic amblyopes (Klein & Levi, 1985). While it would be ideal if crowding conferred some benefit to perception in light of its cost, the experiments here suggest that crowding does not benefit our perception of ensembles. This does not, however, preclude crowding from serving some useful purpose in vision. Whether crowding results from low-level integration mechanisms, spatial imprecision, or the coarse resolution of higher level attentional processes, evidence supports the basic conclusion that crowding sets limits on spatial resolution in clutter. The present study suggests that a potential silver lining of crowding—that it facilitates ensemble perception—is also lost in the crowd.