Guided search for triple conjunctions

Nordfang, Maria; Wolfe, Jeremy M.

doi:10.3758/s13414-014-0715-2

Guided search for triple conjunctions

Published: 09 July 2014

Volume 76, pages 1535–1559, (2014)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Guided search for triple conjunctions

Download PDF

Maria Nordfang^1,2 &
Jeremy M. Wolfe²

1263 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

A key tenet of feature integration theory and of related theories such as guided search (GS) is that the binding of basic features requires attention. This would seem to predict that conjunctions of features of objects that have not been attended should not influence search. However, Found (1998) reported that an irrelevant feature (size) improved the efficiency of search for a Color × Orientation conjunction if it was correlated with the other two features across the display, as compared to the case in which size was not correlated with color and orientation features. We examined this issue with somewhat different stimuli. We used triple conjunctions of color, orientation, and shape (e.g., search for a red, vertical, oval-shaped item). This allowed us to manipulate the number of features that each distractor shared with the target (sharing) and it allowed us to vary the total number of distractor types (and, thus, the number of groups of identical items: grouping). We found that these triple conjunction searches were generally very efficient—producing very shallow Reaction Time × Set Size slopes, consistent with strong guidance by basic features. Nevertheless, both of the variables, sharing and grouping, modulated performance. These influences were not predicted by previous accounts of GS; however, both can be accommodated in a GS framework. Alternatively, it is possible, though not necessary, to see these effects as evidence for “preattentive binding” of conjunctions.

On the role of top-down and bottom-up guidance in conjunction search: Singleton interference revisited

Article Open access 05 April 2023

Conjunction search: Can we simultaneously bias attention to features and relations?

Article 17 July 2019

A theoretical attempt to revive the serial/parallel-search dichotomy

Article 18 July 2019

In a typical visual scene, many objects will share features with each other: The scene may include several big things, several blue things, several shiny things, and so forth. Consequently, looking for a specific object is likely to entail search for a conjunction of features (e.g., the big, blue, shiny thing). Conjunction searches have been a subject of considerable interest in the visual search literature for many years. In her original “feature integration theory” (FIT), Treisman classified conjunction searches as “serial,” as contrasted with “parallel” feature searches (Treisman & Gelade, 1980). Central evidence for this claim came from the functions relating set size (the number of items in a search display) to reaction time (RT). For salient features (e.g., red among green or big among small), the slope of the RT × Set Size function was near zero, suggesting no additional cost of added distractor items. For conjunction searches, in contrast, RT increased linearly with set size. Each additional distractor imposed a cost. The data were consistent with a serial search through the items at a rate of 20–40 items per second. It should be noted that the same data are also consistent with various versions of parallel models in which all items are processed at the same time (Townsend, 1971; Townsend & Wenger, 2004), but in which noise or capacity limitations cause a rise in RTs with set size (Palmer, 1995).

A key theoretical claim of FIT was that the features forming conjunctions could not be “bound” without the application of selective attention. However, whether or not conjunction identification required serial binding, subsequent work made it clear that conjunction search did not need to be particularly inefficient. With salient component features, conjunction searches tended to produce RT × Set Size slopes that were intermediate between the most efficient feature searches and the least efficient basic searches in which items were big enough to be identified without requiring fixation (e.g., Ts among Ls or 2s among 5s; Dick, Ullman, & Sagi, 1987; Egeth, Virzi, & Garbart, 1984; McLeod, Driver, Dienes, & Crisp, 1991; Nakayama & Silverman, 1986; Treisman & Sato, 1990; Wolfe, Cave, & Franzel, 1989). A continuum of search efficiency runs from highly efficient feature searches to inefficient searches for items defined by their spatial configurations (like Ts and Ls; Wolfe, 1998).

Guided search (GS) theory is one approach to understanding this continuum (Eckstein, 1998; Wolfe, 1994, 2007; Wolfe et al., 1989). GS preserves the central role for binding via selective attention. According to GS, relatively efficient conjunction search occurs because basic features can be used to guide attention to items that are more likely to be the target item. Thus, in a search for a red vertical item, attention can be guided to red items and to vertical items, the intersection of those two sets being an excellent place to look for red vertical items. The claim of FIT and GS is that the red vertical item is not bound and recognized until the item falls under the “spotlight” of selective attention. Various other experimental results (Driver, McLeod, & Dienes, 1992; Duncan, 1995; Enns & Rensink, 1990; Roggeveen, Kingstone, & Enns, 2004) and various other theoretical formulations have proposed that features can be bound without the need to focus selective attention on the item (McElree & Carrasco, 1999; Palmer, 1995). The “similarity” model proposed by Duncan and Humphreys (1989) put an emphasis on the role of grouping of items by similarity, including the grouping of items whose similarity was based on the binding of features without attention (Humphreys, Quinlan, & Riddoch, 1989). GS argued against such preattentive binding (Wolfe, 1992).

Found (1998) put these competing claims to an interesting test. He had participants search for tilted red lines among tilted white and vertical red lines. The critical manipulation was an irrelevant variation in a third variable, size: Items were either big or small, and the size of items was either correlated with the color and orientation of items or it was not. In the correlated case, within a trial, all items of one conjunction type had the same size and all items of the other conjunction type had the other size. For example, red vertical items might be big, whereas all white tilted items might be small; however, the specific relationship between size and the Orientation × Color varied from trial to trial. When size was uncorrelated, the size varied randomly with the Orientation × Color conjunctions within a trial, with the restriction that half of the elements in a trial were big and half were small. In both cases, the target item was equally likely to be either big or small; thus, size was uninformative of target presence. Found reasoned that GS should not care about whether or not the irrelevant size variable was tied to the task-relevant feature dimensions. If features were processed independently prior to the arrival of attention, the contributions of size would be the same in the two conditions. However, the results showed that the strongly correlated case was more efficient. Found argued that the size-correlated case had two groups of items (e.g., big red vertical and small white tilted), whereas the size-uncorrelated case had four. That is, the displays with more and smaller groups looked “noisier” and were somewhat harder to search through. Found considered this to be consistent with a similarity theory in which “preattentive vision delivers bound sets of features that relate to the same segmented object” (Found, 1998, p. 1123), and not consistent with GS, which would not deliver such preattentive bindings. Proulx (2007) expanded on these considerations and found that salient, task-irrelevant singleton features influence search efficiency. This led Proulx to propose that both GS and similarity theory understate the role of bottom-up saliency in conjunction searches (Proulx, 2007).

Good evidence suggests that feature conjunctions can influence behavior even for conjunction items that GS and similar serial theories assume are available only preattentively or with minimal attention. For example, Mordkoff, Yantis, and Egeth (1990) had observers look for red X targets in displays with other items that could be red or Xs, but not both. In displays of two or six items, the critical comparison was between trials with one or two red Xs: RTs are faster with two Xs (redundancy gain; see also Pashler, 1987). Importantly for the argument, the RTs were faster than would be predicted if each conjunction needed to be processed separately (Mordkoff et al., 1990). Mordkoff et al. argued that, in a redundant display, both red Xs can be processed as conjunctions of red and X at the same time.

Converging evidence for this sort of preattentive processing of conjunctions has come from Mordkoff and Halterman’s (2008) “correlated flankers task.” In the standard flanker task, observers might be shown groups of three letters and told to hit the left key if the middle letter was an A and the right key if the middle letter was a B (B. A. Eriksen & Eriksen, 1974; C. W. Eriksen & Hoffman, 1973). The standard finding is that it will take a little longer to respond if the flanking letters are incongruent with the central letter (BAB and ABA) than if the flankers are congruent (AAA, BBB). In Mordkoff and Halterman’s version of the task, the target was a color–shape conjunction (e.g., a red square), and the flankers were other conjunctions that could be correlated with the target. Thus, blue diamond flankers might be correlated with the red square, though blue and diamond by themselves were not. These conjunctive flankers had an effect on RTs to the target, indicating that the combination of blue and diamond has been registered.

There is a long-running debate about the source of the flanker effect. The original hypothesis was that the flanker effect was evidence that the flanker letters were processed without attention, because attention was directed to the central letter. Later work questioned the assumption that one could completely deny attention to the flankers. For instance, Lavie and Tsal (1994) argued that, if the central task was not very demanding, some attentional resources would spill over to process the flankers. Kyllingsbæk, Sy, and Giesbrecht (2011) demonstrated that this load effect on the flanker task can also be explained by a parallel model with limited processing capacity and limited visual working memory. Regardless of one’s position on this continuing debate (see, e.g., Lavie & Torralbo, 2010; Tsal & Benoni, 2010), results like those of Mordkoff and Halterman (2008) do indicate that, under some circumstances, the conjoint appearance of basic features in an object can be processed with little or no attention.

Krummenacher and colleagues found evidence for coactivation of multiple features in visual search tasks (Krummenacher, Grubert, & Müller, 2010; Krummenacher, Müller, & Heller, 2001, 2002). As in the Mordkoff work, conjunctions of color and shape produced RTs that were too fast to be explained if the two features were not being combined in some manner. Their “dimension-weighting” solution to this problem was a modification of GS.

In this article, we use higher-order conjunctions to revisit this issue of preattentive processing of the combinations of basic features. By “higher-order conjunctions,” we mean targets that are defined by more than two features. In the real world, most objects in a complex environment would need to be defined by multiple features. Moreover, as will be seen, higher-order conjunctions give us other tools with which to address the questions of conjunctive target feature guidance and preattentive effects of feature conjunctions separately. Earlier work with triple conjunctions has provided evidence for an ability to guide attention on the basis of multiple dimensions (Dehaene, 1989; Quinlan & Humphreys, 1987). Consistent with either GS or similarity theory, it is easier to find a triple conjunction if distractors share just one feature with the target than if they share two (Wolfe et al., 1989). Typically, some features seem to guide more effectively than others, with color being a frequent winner (Williams & Reingold, 2001).

The basic puzzle

Figure 1 illustrates the basic challenge to models like GS posed by Found’s (1998) work. The target in each case is a horizontal red rectangle. This is a triple conjunction task because some distractor items are red, some are horizontal, and some are rectangles; no single feature is adequate to do the task. In each case, one third of the items have the target properties. That is, both examples contain one third red items, one third horizontal, and one third rectangles. A standard model with separate representations for each dimension would see no preattentive differences between the two conditions. The difference between the conditions lies in the combinations of the features. On the left, every combination of the three values of the three feature dimensions is present, leading to a display with a target and 26 distractor types. On the right, only three of the 26 distractor types are used. However, the distributions of the individual features are the same in both displays; each feature is represented equally often. It is probably intuitively clear that the three-distractor case is easier than the 26-distractor case. Experiment 1a tested this intuition and showed that it can be supported by data.

Experiment 1a

In seven experiments, we examined the guidance of attention in visual search for targets defined by three or six features. We looked for and found evidence that cannot be explained by guidance by representations of independent stimulus attributes, and we considered whether these findings might require a mechanism of preattentive binding. In Experiment 1a, we provided empirical support for the impression that triple conjunctions are easier to find when there are fewer types of distractor items.

Method

Participants

Thirteen paid volunteers (nine men, four women) participated in the experiment. Age information was available for 12 of the 13 participants; for these participants, the age range was 19 to 47. The participants had normal or corrected-to-normal 20/25 vision, no history of eye or muscular disorders, and no color vision deficits when tested on Ishihara’s tests for color blindness (Ishihara, 1987). All participants gave informed consent prior to participation. One participant was excluded from the data analysis due to excessive miss rates. The miss rates of this participant exceeded the mean miss rate across all other participants by over two standard deviations.

Apparatus

The stimuli were presented on Apple Macintosh OS X 10.5.8 computers. The experiments were run using the Psychophysics Toolbox in MATLAB 7.5.0 (R2007b). Each computer was connected to a 20-in. CRT screen, and the screen resolution was 1,280 × 960 pixels with a refresh rate of 85 Hz. Participants freely viewed the screen at a distance of approximately 60 cm, and responses were collected using a standard U.S. Apple keyboard.

Stimuli

The stimulus set consisted of elements that had one of three features in each of the three feature dimensions of color, shape, and orientation. A stimulus element could be red (RGB: 200, 0, 0), green (RGB: 0, 170, 45), or blue (RGB: 0, 230, 230); vertical (0º), oblique (45º), or horizontal (90º); and rectangular, oval, or jagged. Thus, 27 types of feature conjunctions were possible (see Fig. 2 for the basic stimulus set).

In Experiment 1a, all participants searched for the same target: a red, vertical rectangle. Four distractor sets were used. In the first distractor set, all of the possible conjunction types, excluding the target, made up the set (as in the first display in Fig. 1). We call this the 26-conjunction (26D) set. Distractor Sets 2 and 3 each consisted of three conjunction types. These two conditions differed in how many features each distractor type shared with the target. In one of the three-distractor conditions, the distractors were red vertical ovals (sharing two features with the target), blue horizontal rectangles (one shared feature), and green oblique zigzag shapes (no shared features). This condition will be designated 3D(012). In the other three-distractor condition, the distractors were red oblique ovals, green vertical zigzag shapes, and blue horizontal rectangles. Each distractor shared one feature with the target; hence, this condition is designated 3D(1). The fourth and last distractor set in Experiment 1a was a 5D set and consisted of a red, vertical zigzag shape (sharing two features); a red vertical oval (also sharing two); a green oblique rectangle (sharing one); a blue, horizontal zigzag shape (sharing none); and a blue horizontal oval (sharing none). In the 26D and 3D sets, the proportions of basic features remained the same: one third of the items having each color, each orientation, and each shape. In the 5D set, there were fewer representations of green, rectangular, and oblique than of the other features. Importantly, in all conditions, the distractors shared one feature with the target, on average.

The display set size was 27 on half of the trials and 54 on the other half. These set sizes were picked so that all 26 distractors plus a target could be presented on a single trial. When distractor sets were subsets of the full set, distractors were repeated in a display. Equal (or almost equal) numbers of each distractor were presented on each trial. When the number of distractors did not divide evenly into the set size (in the 5D condition), the required additional distractors were drawn at random without replacement from the current distractor set.

The stimuli were presented on a white background (RGB: 255, 255, 255) in an 8 × 8 matrix, with a diameter of 950 pixels and centered on the screen. The stimulus elements were randomly presented in the 64 tiles of the matrix. Each element was placed in the center of a randomly chosen unoccupied tile and jittered a few pixels in order to avoid the alignment of elements.

Procedure

Participants were instructed to look for the target, defined by three target features (i.e., the red vertical rectangle), and to respond as quickly and accurately as possible as to whether the target element was present or absent. The target remained the same across the whole experiment. Responses were made by pressing the predetermined “present” or “absent” key on the keyboard. The two response keys were marked by a red and a blue sticker on top of the A key and the L key, respectively. Participants were instructed to place each of their index fingers on top of each of the two keys. Targets were present on half of the trials.

Each trial followed the same sequence of events. First, the description of the three target features appeared in the center of the screen for 500 ms, accompanied by a warning beep. This was followed by a stimulus display that remained present on screen until the participant responded. After the response, a screen showing the trial number, accuracy feedback, and RT for that trial was displayed for 500 ms. If an error response was made, three error beeps would sound, concurrent with the presentation of the feedback screen. After the feedback, the next trial was initiated after a 1,000-ms delay.

Participants started the experiment by completing 10–30 practice trials and 900 experimental trials with presentations of all display types intermixed pseudorandomly.

Data analysis

The RT data were trimmed by removing “outlier” trials with RTs more than three standard deviations greater than the mean for that participant. Trials with RTs below 200 ms were also removed from the analysis.

The RT data and accuracy data were examined separately through repeated measures analyses of variance (ANOVAs) with the factors Distractor Condition and Set Size. In the following analysis, Greenhouse–Geisser-corrected p values are reported where Mauchly’s test revealed that sphericity could not be assumed. The analyses were carried out separately for target-present and target-absent trials. For the RTs, we were particularly interested in whether the distractor sets significantly influenced search efficiency. Hence, when the general RT ANOVA revealed a significant interaction between set size and distractor condition, the relevant distractor conditions were compared by post-hoc ANOVAs or Student’s t tests. Post-hoc p values were Bonferroni–Holm corrected. For the error data, our primary interest was to ensure that speed–accuracy trade-offs were not contributing substantially to any RT differences for the various distractor conditions. Therefore, when the error rate ANOVA revealed a significant effect of distractor set, the error rates were investigated further. For all ANOVAs, generalized eta square (ges) is reported for effect sizes.

Results and discussion

Using the outlier procedure described above, 2.1 % of the trials were removed from further analysis. Mean RTs are shown in Fig. 3. First, they confirm that triple conjunction searches are very efficient when the target shares an average of one feature with the distractors. Note that all target-present slopes are less than 5 ms/item. Second, the results show reliable differences between the conditions, even though the feature maps should be equivalent in four of the five conditions (the 5D condition had slightly fewer green, oblique, and rectangular items).

Reaction times

The RT ANOVAs revealed that the effects of distractor condition and the interaction between distractor condition and set size were significant, for both target-present and target-absent trials (see Table 1). In general, RTs increased and search efficiency decreased when the number of conjunction types in the distractor sets increased. For the target-present trials, the two 3D–26D slope comparisons were significant, as was the 5D–3D(1) comparison. For the target-absent trials, all slope comparisons except for the 3D(1)–3D(012) comparison were significant. The results thus indicate that the searches were more efficient when fewer conjunction types were present, and that this pattern was more pronounced for the target-absent trials.

Table 1 Reaction time analysis for Experiment 1a

Full size table

Error rates

Out of the 5,602 target-present trials that were not removed by the outlier procedure, 255 errors occurred, in addition to 115 error trials out of the 5,337 target-absent trials. Investigations of the error rates revealed no significant effects for the target-present trials. For the target-absent trials, we found a significant main effect of distractor set (see Table 2); however, none of the separate distractor-type comparisons revealed any significant effects. Numerically, the error rates followed the pattern suggested by the RTs, with higher error rates for the 26D condition (5.6 % errors), intermediate for the 5D condition (1.7 %), and lowest for the 3D conditions (<0.1 %). The error rate analyses thus did not suggest a speed–accuracy trade-off.

Table 2 Error rate analyses of variance for Experiments 1a, 1b, 2, 3, 6, and 7

Full size table

Experiment 1b: Replication

The results of Experiment 1a clearly indicated that the efficiency of search cannot be explained entirely by the activity in individual feature maps or their linear sum. If that were the case, there should have been no difference between the 26D and 3D searches. Even though all of these searches were very efficient, the 3D searches were easier than the 26D case.

In Experiment 1a, all participants searched for the same red horizontal rectangle. Moreover, replication is good practice. Accordingly, Experiment 1b was a replication of Experiment 1a with modest modifications. The 5D condition was dropped, and the target conjunction varied between participants.