Implicit object naming in visual search: Evidence from phonological competition
- First Online:
During visual search, people are distracted by objects that visually resemble search targets; search is impaired when targets and distractors share overlapping features. In this study, we examined whether a nonvisual form of similarity, overlapping object names, can also affect search performance. In three experiments, people searched for images of real-world objects (e.g., a beetle) among items whose names either all shared the same phonological onset (/bi/), or were phonologically varied. Participants either searched for 1 or 3 potential targets per trial, with search targets designated either visually or verbally. We examined standard visual search (Experiments 1 and 3) and a self-paced serial search task wherein participants manually rejected each distractor (Experiment 2). We hypothesized that people would maintain visual templates when searching for single targets, but would rely more on object names when searching for multiple items and when targets were verbally cued. This reliance on target names would make performance susceptible to interference from similar-sounding distractors. Experiments 1 and 2 showed the predicted interference effect in conditions with high memory load and verbal cues. In Experiment 3, eye-movement results showed that phonological interference resulted from small increases in dwell time to all distractors. The results suggest that distractor names are implicitly activated during search, slowing attention disengagement when targets and distractors share similar names.
KeywordsVisual search Phonological competitors Eye movements Multiple-target search
Imagine that you are cooking dinner and have set your ingredients and utensils on the counter, within easy reach. With a quick glance, it would be trivially easy to spot carrots on the counter among onions, potatoes, and garlic. Finding an onion, however, would likely require more careful discrimination: Background items create interference when they visually resemble search targets (Duncan & Humphreys, 1989; Treisman & Gelade, 1980; Wolfe, 2007; Wolfe, Cave, & Franzel, 1989). In this study, we asked whether such interference might occur when targets and distractors share similar names, in the absence of visual similarity. For example, is searching for a beaker more difficult when the background items also have names with /bi/ onsets (e.g., beast, bean, and bee), relative to backgrounds with heterogeneous names? In three experiments, we investigated the conditions under which such phonological interference might occur.
Previous research has shown that nonvisual attributes of search items can influence the deployment of attention. For example, Moores, Laiti, and Chelazzi (2003) found that when distractor items were semantically related to targets, they attracted attention during visual search (see also Godwin, Hout, & Menneer, 2014). In their experiment, participants were given verbal target cues before the brief presentation of search arrays. In post-search recall tests, participants better remembered distractors that were semantically related to target items (e.g., a lock if the target was a key) relative to unrelated distractors, and eye movements revealed more first fixations to target-related items, relative to unrelated items. Similarly, Huettig and Altmann (2005) found that semantic competitors can capture attention, using the visual world paradigm. Here, participants were instructed to freely view displays containing four items while listening to sentences containing a target word. People were more likely to fixate items related to the target word (such as a trumpet, given the target word piano), relative to unrelated distractors. Yee and Sedivy (2006) found that this semantic attention capture extends to distractors that are semantically related to phonological-onset competitors of the spoken target word. For instance, if the spoken target was log, participants tended to fixate a key because log phonologically overlaps with lock (see also Dahan & Tanenhaus, 2005; Huettig & Altmann, 2004; Yee, Overton, & Thompson-Schill, 2009).
Several previous studies have also shown that phonological competition between targets and distractors can impair search efficiency in the absence of semantic competition. Using the visual world paradigm, Allopenna, Magnuson, and Tanenhaus (1998) presented participants with displays that contained distractors whose names partially overlapped with spoken targets, as either phonological onset or rhyme competitors (e.g., beetle or speaker, respectively, for the target word beaker). Eye-movement analyses showed that participants fixated onset and rhyme competitors in close alignment with phonological target segments as the spoken target unfolded in time (see also Huettig & McQueen, 2007; Righi, Blumstein, Mertus, & Worden, 2010). Meyer, Belke, Telling, and Humphreys (2007) obtained similar results with homophone competitors. Participants were initially familiarized with image names, then, in each experimental trial, were given visual target cues and made speeded target present–absent judgments to search displays. Search response times (RTs) were slower when homophone competitors were present (e.g., an image of the animal bat, given the search target baseball bat), and eye-movement data showed more frequent first fixations and longer fixations to homophone competitors relative to unrelated distractors. These results suggest that phonological competitors disrupt both visual guidance and decision-making processes—that is, distractor rejection. Görges, Oppermann, Jescheniak, and Schriefers (2013) found similar phonological competition effects in a paradigm identical to that from Meyer et al. (2007), except that participants were presented with either visual or verbal target cues before search. Although target detection was faster overall with visual cues, the degrees of phonological interference were equivalent in both conditions. Furthermore, this verbal interference only occurred when participants were familiarized with the names of the stimuli before the experiment, suggesting that people use both visual and verbal target information to varying degrees when searching, depending on the salience of these object properties. Taken together, these results suggest that participants may implicitly activate phonological dimensions of visual stimuli, at least under certain conditions.
Regarding the conditions that encourage such phonological activation, research has suggested that implicit object naming may increase as a function of memory demands. Zelinsky and Murphy (2000) presented participants with two different tasks, a recognition task with high memory demand, and a visual search task with low memory demand. In the recognition task, participants were initially shown multiple common objects or faces whose names contained one or several syllables. Afterward, they had to judge whether singularly presented objects were old or new. Analyses of eye-movements during encoding revealed more fixations and longer gaze durations on objects whose names contained multiple syllables, compared to single-syllable items. A subsequent analysis showed a high correlation between the time required to vocalize object names and gaze duration on those objects, suggesting that participants were implicitly naming stimuli during encoding. No effect of syllable length was found in visual search, which had comparatively smaller encoding requirements (i.e., searching for a single target).
In this study, we examined two factors that we considered likely to modulate degrees of phonological coding of search targets. First, we manipulated the number of potential targets per trial, increasing memory demand for multiple-target search (Hout & Goldinger, 2010, 2012, 2015; Zelinsky & Murphy, 2000). Second, we presented search targets either as images or as verbal labels. By manipulating both of these factors in three experiments, we examined whether the combination of memory demands (e.g., Zelinsky & Murphy, 2000) and verbal labels (e.g., Görges et al., 2013) encouraged implicit naming of search targets and thereby elicited phonological interference from background, distractor items. In contrast to previous visual search studies that have examined phonological competition with simple four-item displays (e.g., Görges et al., 2013; Meyer et al., 2007), we tested using a more traditional search task (in Experiments 1 and 3), with many items scattered in unpredictable positions around the display. This paradigm enabled us to further examine whether phonological competitors inhibit object identification and visual guidance processes when people perform challenging visual search.
Across experiments, participants were cued to search for either one or three potential target objects per trial, one of which could appear among distractor objects in a subsequent search array. In the critical condition, target and distractor objects’ names shared a phonological onset (e.g., beaker, beast, and beanie). This was compared to several control conditions (as described below). We also varied whether participants were cued using target images or names, anticipating that name cues would reduce immediate visual matching and would make target-object names more active in working memory (WM). Without having exact target representations in visual WM, we expected participants to process distractors to a greater degree, in order to reject them. Such prolonged processing would theoretically allow more time for object names to become activated, potentially creating phonological interference. However, the memory requirements of single-target search are low, relative to simultaneously searching for three potential targets. With only one target in mind, people may generate visual representations from verbal cues, using them as templates to guide search (Schmidt & Zelinsky, 2009), thereby minimizing potential phonological interference. Taking these manipulations together, we expected evidence for phonological competition to mainly emerge in the condition combining verbal cues and multiple-target search.
Experiment 1 involved standard visual search, with participants searching spatial arrays of objects for targets, confirming their presence or absence. Along the way, viewed distractors must be rejected as nontargets. Although search RTs provide a fairly coarse measure of distractor rejection time, they can indicate whether global differences arise from phonological competition. To more precisely examine distractor rejection, Experiment 2 presented each search item serially (self-paced RSVP), requiring a series of overt distractor rejections per trial. In Experiment 3, we again used standard visual search and recorded eye movements to better assess distractor processing without requiring overt decisions to each.
We examined phonological competition effects in a standard visual search task. Participants searched for one (low load) or three (high load) targets among pseudorandomly distributed distractor objects. Participants were cued with target images in Experiment 1a and target names in Experiment 1b.
Twenty and 22 undergraduate students from Arizona State University participated in Experiments 1a and 1b, respectively, for partial course credit. All participants were native English speakers and self-reported normal or corrected-to-normal vision. All procedures were approved by the Arizona State University Institutional Review Board.
Stimuli for all experiments
The stimuli were images of real-world objects, converted to grayscale and resized to 2.9° × 2.9° visual angle (centered) from a viewing distance of 60 cm. Stimuli were sampled from a list of 23 phonological competitors and 23 control items (see Table 1). The list of phonological competitors contained items whose names shared a /bi/ CV onset (e.g., beaker, beast, beanie) whereas the control list contained items with varied onsets (e.g., snail, pretzel, turtle). Target and distractor items were randomly sampled from these lists per trial, appropriate to the competition or control condition for that given trial.
All data were collected on up to 12 computers simultaneously, with identical hardware and software profiles, consisting of Dell Optiplex 380 PCs at 3.06 GHz and 3.21 GB RAM, in 1366 × 768 resolution on Dell E1912H 18.5-in. monitors at a 60 Hz refresh rate, with the display controlled by an Intel G41 Express chipset, each running on Windows XP. All stimuli were presented using E-Prime 1.2 software (Schneider, Eschman, & Zuccolotto, 2002).
Before the visual search task, participants were familiarized with the names of all stimuli to reduce name ambiguity. During the familiarization task, one item was centrally presented per trial, along with one plausible name and one foil (i.e., a concrete noun unrelated to the object). Participants chose the correct name for each object by pressing f or j on the keyboard. The locations of the plausible and foil names were randomized, and participants received accuracy feedback in every trial. A minimum accuracy of 85 % was required to proceed to the main experiment; no participants fell below this criterion.
Average search accuracy in Experiment 1a was 93 %. Three participants were excluded from analysis in Experiment 1b, two because of technical malfunctions and one because of missing data (i.e., no correct trials in one condition). Average accuracy for the remaining 19 participants in Experiment 1b was 90 %. All data were analyzed using repeated-measures ANOVAS and two-tailed t-tests for pairwise comparisons. Where applicable, multivariate tests are reported to account for violations of the sphericity assumption (Keppel & Wickens, 2004). Several analyses were conducted per experiment, first including all variables, then separately examining target-absent and target-present trials. Given that people are roughly twice as fast to terminate search in target-present trials (Treisman & Gelade, 1980), target-absent trials allow for greater inspection of distractor items, and distractor interference in these trials was our primary interest (see also Görges et al., 2013; Moores et al., 2003).
Experiment 1a: Image cues
Again, because our main interest concerned distractor rejection, we separately examined target-present and target-absent trials, anticipating stronger effects in target-absent trials (because they entail exhaustive search). In target-present trials, there were again main effects of Load, Competition, and Set Size. As the upper right panel of Fig. 3 shows, search RTs were slower in competition trials than control trials (1,956 ms and 1,786 ms, respectively). There was a marginal Load × Competition interaction (p = .077) in the same direction as in the overall ANOVA. In the target-absent trials, there were again main effects of Load, Competition, and Set Size. The Competition effect reflected slower RTs in competition trials (3,806 ms), relative to control trials (3,694 ms). The key Load × Competition interaction was again marginal (p = .067). Taking these results together, Experiment 1a provided weak evidence for the predicted result, as competition effects emerged when participants performed multiple-target search. We expected these effects to be stronger in Experiment 1b, which involved verbal target cues.
Experiment 1b: Verbal cues
The results for Experiment 1b are shown in the lower panels of Fig. 3 and in Appendix A (Table A2). In the full analysis, there were main effects of Load, Target Presence, and Set Size, again, all in their typical directions. We also observed a main effect of Competition, with slower RTs in competition trials relative to control trials. The finding of key interest was the Load × Competition interaction: In low-load trials, no competition effect was observed (with mean RTs of 2,256 ms and 2,257 ms in control and competition trials, respectively), t(18) = 0.02, p = .988. But in high-load trials, mean RTs were faster in control trials (4,282 ms) relative to competition trials (4,787 ms), t(18) = 4.37, p < .001, d = 1.00. There were two key interactions. The Load × Competition interaction verified that RTs were slowest in multiple-target, competition trials. The Load × Competition × Target Presence interaction suggests that this Load × Competition interaction mainly emerged in target-absent trials.
In the target-present trials, there were main effects of Load and Set Size, and a marginal effect of Competition. The critical Load × Competition interaction was not reliable (see lower right panel of Fig. 3). In the target-absent trials, there were main effects of Load, Set Size, and Competition. The key Load × Competition interaction showed that during multiple-target search, RTs were slowed by phonological competitors. Under low target load, mean RTs to control and competition trials were 2,942 ms and 2,916 ms, respectively, t(18) = 0.39, p = .701. Under high load, these values diverged, with mean RTs of 5,479 ms and 6,214 ms, t(18) = 6.04, p < .001, d = 1.39 (see lower left panel of Fig. 3). The overall pattern in Fig. 3 suggests stronger competition effects in Experiment 1b relative to Experiment 1a. This was tested in a combined ANOVA, conducted specifically to assess the potential Experiment × Load × Competition interaction, but this was not reliable, F(1, 37) = 1.44, p = .238, nor was the Experiment × Competition interaction, F(1, 37) = 2.39, p = .131.
In Experiment 1, participants searched for objects, either in trials wherein target and distractor names were heterogeneous or in trials wherein all depicted objects shared the phonological onset /bi/. When people searched for singular targets, RTs were unaffected by phonological similarity among the target and distractors. However, when searching for multiple potential targets, people were slower in competition trials relative to control trials. This pattern was especially evident when targets were specified verbally rather than visually. Note, however, that distractor object names were never task relevant. Instead, they appeared to be activated automatically, impacting search when processing demands were high.
The lack of phonological competition observed when processing demands were low (i.e., in single-target search) stands in contrast to Meyer et al. (2007) and Görges et al. (2013), who found phonological competition effects when people searched for single items. Task differences offer one potential explanation for the discrepant findings: Meyer et al. (2007) and Görges et al. (2013) presented smaller displays with four items in predictable locations, whereas our displays contained 12, 16, or 20 randomly distributed items. Indeed, the greater complexity of the current task was evident in slower overall search RTs (e.g., 1,865 ms in Experiment 1a in single-target search compared to 645 ms in the comparable condition of Görges et al., 2013, and ~800 ms in Meyer et al., 2007). The greater complexity in our displays could have limited the attention that participants devoted to individual items, reducing phonological activation and therefore interference in single-target search. We examine this point further with eye tracking in Experiment 3.
The discrepant findings may also reflect differences between respective stimulus sets. Our phonological competitors included items that shared only phonological onsets, whereas Meyer et al. (2007) and Görges et al. (2013) used homophone and rhyme competitors, respectively, that shared greater phonological similarity to one another. Our stimuli only shared a /bi/ onset, whereas those from previous studies shared salient rhymes as well. The greater phonological overlap between target and competitor names in these prior experiments likely elicited greater interference in single-target search that we failed to observe with weaker phonological competitors (Grosjean, 1980; Tyler, 1984).
Although phonological competition largely did not emerge in single-target search in Experiment 1, this interference was strong when people searched for multiple targets. What might explain the Experiment 1 results? To offer a potential account, we must adopt two relatively straightforward assumptions and one novel hypothesis. First, we assume that when a person views an image of some object (e.g., a dog), the experience is not entirely visual. Instead, the object name also becomes partially active in memory, along with associated knowledge (e.g., Huettig & Altmann, 2005; Yee & Sedivy, 2006). These implicit dimensions may not reach a level of conscious awareness, but we assume they receive some degree of activation. Second, we assume that multiple-target search, combined with verbal target cues, encourages verbal coding to maintain target identities during search (Baddeley & Hitch, 1974). With these assumptions in place, we hypothesize that visual search involves “resonance seeking” between targets in WM and objects in the search array. When target representations are mainly visual, attention will mainly be drawn to visually viable candidates. But, when verbal labels are more prominent in WM, attention will also be drawn to phonologically viable candidates.
Even in this simple framework, the competition effect in Experiment 1 might reveal disruption of attentional guidance, perceptual matching, or both. With respect to guidance, Menneer et al. (2012) found that visual guidance is hindered in multiple-target search (see Hout & Goldinger, 2015, for similar findings). When searching for multiple potential targets, people both fixate more target-dissimilar items and fewer target-similar items, relative to searching for singular targets. People are also more likely to revisit previously inspected objects during multiple-target search. Therefore, when searching under high WM load, the capacity to effectively guide attention is reduced. In Experiment 1, this reduced attentional guidance would suggest that people considered more distractors in the high-load condition, relative to the low-load condition. When more distractors are viewed, it may increase the likelihood of individual object names becoming active and causing interference when they overlap with the targets held in memory. Consequently, this interference might generally inhibit the effective deployment of attention, resulting in more fixations to nontargets when phonological competitors are present. Alternatively (or additionally), given greater phonological resonance between targets in WM and fixated objects, it may take more processing for people to reject distractor objects and disengage attention. In Experiment 1, this mechanism would suggest that fixated items would hold attention longer and perhaps that target identification would be slower. To simplify interpretation, we removed the need for attentional guidance in Experiment 2.
In Experiment 2, we used the same stimuli as in Experiment 1 (arranged in the same critical and control conditions). The task was changed, however, to a self-paced serial search procedure that required participants to verify whether each viewed object was a target or a distractor. Each trial consisted of a stream of distractors, with targets embedded in 75 % of trials, as explained below. As before, Experiment 2a presented visual target cues, and Experiment 2b presented verbal target cues.
Stimuli and design
All variables from Experiment 1 were included in Experiment 2, although “set size” now referred to the number of items serially presented (12, 16, or 20). Rather than search RT, the key dependent variable was mean distractor rejection times from correct trials. The stimuli and apparatus were identical to those used in Experiment 1.
Target-present trials were considered valid for RT analyses only if the participant correctly detected the target (i.e., a hit). In every trial wherein participants reached the end of the stream without pressing enter, they were prompted to indicate whether they had seen a target. In this manner, participants could correct for simple motor errors (see Fleck & Mitroff, 2007), although target-present trials with late target detection were considered incorrect in the analyses. Target-absent trials were scored as correct only if the participant rejected all distractors and subsequently verified that no target was present. Depending upon circumstances, participants were given different feedback at the end of a trial. For correct rejections and hits, they received “correct” feedback for 500 ms (see Fig. 4). If they pressed the enter key incorrectly, the message “False alarm!” was shown for 2,000 ms. If they failed to detect a target, the message “You missed!” was shown for 2,000 ms. The message “Remember to press ENTER when you see a target!” was shown for late, correct target verification, and “There was actually no target present” was shown for late false alarms, each also for 2,000 ms.
Although Experiment 1 used a 50/50 division of target-present and target-absent trials, a pilot experiment revealed low hit rates in the self-paced serial-presentation task. This appeared to reflect the disengagement of attention when so many items required the space-bar response. To improve performance, we increased target prevalence from 50 % to 75 %, with targets present in 144 out of 192 trials. Participants completed eight initial practice trials in each experiment.
When we examined the full design (including the variables Load, Competition, Target Presence, and Set Size), seven participants had missing data, with no correct trials in at least one condition. Preliminary ANOVAs were conducted, excluding these participants, showing that Set Size did not interact with the critical Competition variable. We therefore collapsed across values of Set Size, which yielded enough correct trials to retain these participants. The final analyses therefore included the variables Load (low, high), Competition (competition, control), and Target Presence (absent, present). One participant was excluded from analysis in Experiment 2a due to slow distractor rejection times (>2.5 SDs above the mean). Average accuracy for the remaining 22 participants was 79 %. Two participants were excluded from analysis in Experiment 2b—one for falling asleep during the task and one for having slow distractor rejection times (>2.5 SDs above the mean). Accuracy for the remaining 20 participants was 83 %. Because our dependent measure was average distractor rejection time, an additional 4 % of trials were excluded from analysis in both experiments because targets occurred as the first item in the search stream.
Experiment 2a: Image cues
In target-present trials (Fig. 5, upper right panel), there were main effects of Load and Competition, and another Load × Competition interaction. In the low-load condition, mean distractor rejection times in the control and competition trials were 414 ms and 416 ms, respectively, t(21) = 0.12, p = .903. In the high-load condition, these values diverged, with mean RTs of 593 ms and 642 ms, t(21) = 3.71, p = .001, d = .79. In the target-absent trials (Fig. 5, upper left panel), there were again main effects of Load and Competition, with overall faster distractor rejection in control trials (487 ms) than in competition trials (503 ms). The Load × Competition interaction was marginal, with a numerical trend in the same direction as was observed in target-present trials. As in Experiment 1, although trials with visual target cues showed phonological competition effects under high WM load, these effects were fairly small.
Experiment 2b: Verbal cues
The results from Experiment 2b are shown in the lower panels of Fig. 5, with target-absent trials on the left and target-present trials on the right, and in Appendix A (Table A4). As shown, the competition effects under high-load were far stronger, relative to Experiment 2a. In the full ANOVA, there were main effects of Load and Competition, but Target Presence was not significant. The critical Load × Competition interaction was significant, as indicated by the diverging lines in the lower panels of Fig. 5. In the low-load condition, mean distractor rejection times in control and competition trials were 453 ms and 457 ms, respectively, t(19) = 0.64, p = .528. In the high-load condition, these values diverged, with mean rejection times of 719 ms and 916 ms, respectively, t(19) = 6.94, p < .001, d = 1.55.
In the target-present trials (Fig. 5, lower right panel), there were main effects of Load and Competition, and a Load × Competition interaction. With respect to the interaction, in the low-load condition, mean distractor rejection times in control and competition trials were 456 ms and 456 ms, respectively, t(19) = 0.05, p = .962. In the high-load condition, these values increased and diverged, with means of 736 and 927 ms, t(19) = 5.26, p < .001, d = 1.18. In the target-absent trials (Fig. 5, lower left panel), there were again main effects of Load and Competition. The Load × Competition interaction was again significant, with strong competition effects emerging under high load. In the low-load condition, mean distractor rejection times in control and competition trials were 449 ms and 459 ms, respectively, t(19) = 1.04, p = .311. In the high-load condition, these values again increased and diverged, with means of 702 ms and 904 ms, t(19) = 6.06, p < .001, d = 1.36. Again, the overall pattern of Fig. 5 suggests stronger competition effects in Experiment 2b relative to Experiment 2a. This was tested in a combined ANOVA, conducted specifically to assess the potential Experiment × Load × Competition interaction, which was reliable, F(1, 40) = 26.82, p < .001, ηp2 = .40, as was the Experiment × Competition interaction, F(1, 40) = 25.73, p < .001, ηp2 = .39.
In Experiment 2, we administered a serial search task in which participants had to manually reject distractors and confirm targets in streams of centrally presented items. This procedure allowed us to assess distractor rejection processes, independent from the guidance of attention. Despite the different procedure, the results were similar to those observed in Experiment 1: When looking for multiple targets, participants were slower to reject distractors that shared a phonological onset with the potential targets held in memory. This phonological interference was again stronger when participants were given verbal target cues. Overall, this pattern suggests that distractor rejection is detrimentally affected by phonological competition. More specifically, it suggests that when distractor objects have similar names to target objects, it takes people slightly longer to reject those distractors, even though object names are irrelevant to the task. To simultaneously assess both attention guidance and distractor rejection, we next conducted a replication of Experiment 1 while recording participants’ eye movements.
Our results thus far have shown clear effects of phonological competition, evidenced by slower overall search (Experiment 1) and slower distractor rejection (Experiment 2) in the presence of phonological competitors. Although Experiment 2 showed that target-distractor name similarity interfered with the distractor rejection, several questions remain about generalizing this interference to standard visual search. For instance, we have yet to show whether such distractor rejection, attentional guidance, or a combination of both processes are hindered by phonological similarity. In Experiment 3, we returned to the standard search paradigm, coupled with eye-tracking measures that would allow for precise decomposition of search times into guidance and decision-making processes. A preliminary treatment of these data was presented by Walenchok, Hout, and Goldinger (2013). The following results expand upon these initial analyses by including more detailed eye-tracking analyses (the number of items fixated and refixated, and decision times) as well as separate analyses for target-present and target-absent trials.
Experiments 3a and 3b included 23 and 23 participants, respectively. All were Arizona State University students who met the same criteria used for Experiments 1and 2. Before beginning the experiment, all participants were screened to ensure that their eyes could be reliably tracked.
Stimuli and design
Data were collected on a Dell Optiplex 755 dual-core PC at 2.66 and 1.97 GHz, with 3.25 GB RAM, running Windows XP. Stimuli were presented at 1280 × 1024 resolution on a NEC MultiSync 2111SB CRT monitor (20-in. viewable) at a 75 Hz refresh rate. The display was controlled by an ATI Radeon HD 2400 XT video card. Eye movements were recorded using an Eyelink 1000 desktop mount system (SR Research Ltd., Mississauga, Ontario, Canada). Temporal resolution was 500 Hz, and spatial resolution was 0.01°. An algorithm used by the Eyelink system automatically partitions ocular motion into saccades, fixations, and blinks; a saccade is defined when eye velocity exceeds 30°/s and acceleration exceeds 8,000°/s2. The left eye was recorded, but viewing was binocular, and head movements were limited using a chin rest. All stimuli were presented with E-Prime 1.2 software (Schneider et al., 2002) and all eye-tracking data were formatted using Data Viewer software from SR Research.
The procedure was identical to Experiment 1, including the stimulus familiarization task, with the addition of eye-tracking procedures: Each participant underwent a 9-point calibration to ensure camera accuracy at the beginning of the experiment, and again at the midpoint of the experiment (after the 2-minute break). The maximum acceptable average calibration error (for all nine points) was <1.0°, and the maximum error for any single point was <1.5°. Drift corrections were conducted as needed, if the tracking accuracy decreased.
Three participants’ data were excluded from analysis in Experiment 3a—one due to poor eye-tracking precision, one due to slow RTs (>2.5 SDs), and one due to missing data. Average accuracy for the remaining 20 participants in Experiment 3a was 94 %. Four participants were excluded from analysis in Experiment 3b—one due to poor eye tracking precision and three due to missing data; average accuracy for the remaining 19 participants was 91 %. In addition to search RTs, several eye-movement measures were analyzed, including measures derived from fixation frequency, fixation duration, and finally, perceptual decision-making duration. These included (1) a count of distinct distractors fixated per trial (prior to fixating the target, in target-present trials); (2) a count of distractor refixations per trial, before target fixation; (3) the mean duration of distractor fixations per trial, before target fixation; (4) the summed duration of distractor fixations per trial, before target fixation; and (5) decision times in target-present trials. The decision time measure denotes the delay between a participant first fixating a target and subsequently pressing the spacebar. A “fixation” was defined as a participant’s gaze landing within the border of an invisible area of interest (AOI) immediately surrounding a given object in the display; the end of this fixation was defined as the participant’s gaze exiting the AOI. Small, corrective saccades within an AOI were treated as part of single-object fixations. Eye-movement analyses were conducted separately for target-present and target-absent trials.1
Search RTs, Experiment 3a
Search RTs, Experiment 3b
Correct search RTs for Experiment 3b are shown in the lower panels of Fig. 6, again with target-absent trials on the left and target-present trials on the right. In the full ANOVA, there were reliable main effects of Load, Competition, Target Presence, and Set Size. There were several reliable interactions (see Appendix A, Table A6), but the Load × Competition interaction was marginal. As shown in the lower panels of Fig. 6, the overall RTs in control and competition trials were 3,017 ms and 3,313 ms, respectively. In target-present trials, there were main effects of Load, Competition, and Set Size, although the Load × Competition interaction was not reliable. Again, competition effects were equivalent across low and high target load, with RTs of 2,149 ms and 2,399 ms in control and competition trials, respectively. In the target-absent trials, there were again main effects of Load, Competition, and Set Size. The Load × Competition interaction was not reliable, again demonstrating equivalent competition effects under low and high target load (3,886 ms and 4,227 ms in control and competition trials, respectively).
Overall, the behavioral results from Experiment 3 suggest stronger competition effects in Experiment 3b relative to 3a. This was tested with a combined ANOVA testing the Experiment × Load × Competition interaction, which was not significant, F(1, 37) = 0.17, p = .686, although the Experiment × Competition interaction was significant, F(1, 37) = 6.54, p = .015, ηp2 = .15.
Taken together, the search RTs replicated the previously observed phonological competition effect, although only as a main effect, rather than an interaction (with Load) in Experiment 3b. Having replicated the effect, our principal goal was to understand its basis by examining eye-movements. Our approach was fairly exhaustive, as we sought to understand what aspect of eye-movements best explained the competition effect. As might be imagined, nearly every eye-movement measure replicated the major effects observed in search RTs, such as Load and Set Size. When participants take longer to terminate search, they naturally make more fixations. In the interest of brevity, we describe all measures that we investigated, but only show figures and report statistical analyses (Appendix A) for those measures that showed meaningful competition effects.
Number of fixated distractors
We first examined the number of distractors that were fixated per trial before either finding a target or terminating search. Recall that one hypothesis states that phonological competition may affect attentional guidance, such that participants might examine more irrelevant distractors when they all have similar names. If this were correct, we would expect competition to increase distractor fixations.
Experiment 3a, image cues:
In the target-present trials of Experiment 3a, we observed main effects of Load and Set Size, but no effect of Competition or Load × Competition interaction. In the target-absent trials, we observed main effects of Load and Set Size and a Load × Set Size interaction, but again, no Competition effect or Load × Competition interaction.
Experiment 3b, verbal cues:
In the target-present trials of Experiment 3b, there were again main effects of Load and Set Size, but no main effect of Competition or Load × Competition interaction. There was a Load × Competition × Set Size interaction, F(2, 17) = 7.17, p = .006, ηp2 = .46; however, any difference in the number of fixated distractors was very small: Under low target load, the maximum difference between competition and control trials at any given set size was .79 distractor fixations. Under high target load, this maximum difference was 1.36 fixations. In the target-absent trials, we observed main effects of Load and Set Size and a Load × Set Size interaction, but no Competition effect or any interaction involving Competition. Taking Experiments 3a and 3b together, there was little evidence that phonological competition had any appreciable effect on the number of fixated distractors.
Distractor refixations per trial
We next examined the number of times that distractors were fixated and then revisited per trial (prior to finding a target in target-present trials), again guided by the hypothesis that phonological competition may affect attentional guidance. If this hypothesis were correct, we might expect to find more refixations in trials with competition, relative to control conditions.
Experiment 3a, image cues:
In the target-present trials of Experiment 3a, there was an effect of Load and a marginal effect of Competition, F(1, 19) = 3.54, p = .075, ηp2 = .16, but no Load × Competition interaction. In the target-absent trials, we observed main effects of Load and Set Size, but no main effect of Competition. We did, however, observe a Load × Set Size and a Load × Competition interaction, F(1, 19) = 5.73, p = .027, ηp2 = .23, although any difference in the mean count of distractor refixations was very small: Under low load, there were 1.54 and 1.29 refixations in control and competition conditions, respectively, t(19) = 2.61, p = .017, d = .58. Under high load, there were 5.67 and 5.84 refixations in the control and competition conditions, respectively, and this difference was not significant, t(19) = 0.97, p = .344.
Experiment 3b, verbal cues:
In the target-present trials of Experiment 3b, there were main effects of Load and Competition, F(1, 18) = 19.42, p < .001, ηp2 = .52, and a Load × Competition interaction, F(1, 18) = 9.77, p = .006, ηp2 = .35. Again, however, the effect was very small: Under low load, there were .42 and .56 refixations in control and competition conditions, respectively, t(18) = 1.62, p = .123. Under high load, there were 1.62 and 2.44 refixations, respectively, t(18) = 4.07, p = .001, d = .93. In the target-absent trials of Experiment 3b, there were main effects of Load, Set Size, and Competition, F(1, 18) = 4.65, p = .045, ηp2 = .21. There was also a Load × Set Size interaction and a marginal Competition × Set Size interaction, F(2, 17) = 2.81, p = .088, ηp2 = .25. Despite the main effect of Competition, however, the difference between control and competition trials was again negligible, with 4.48 and 4.98 refixations, respectively. Considering Experiments 3a and 3b together, there were no indications that phonological competition appreciably affected the likelihood of revisiting previously fixated distractors.
Mean distractor fixation durations
Experiment 3a, image cues:
In the target-present trials of Experiment 3a, there was a main effect of Load, but there were no other reliable main effects. The Load × Competition interaction was not significant, but the Load × Competition × Set Size interaction was reliable. In the target-absent trials, there were main effects of Load and Competition, but not Set Size. There was a reliable Load × Competition interaction, and no other interactions were reliable. As shown in the upper left panel of Fig. 7, in low-load trials, mean distractor fixation durations were equivalent in control and competition conditions (184 ms and 183 ms, respectively), t(19) = 0.54, p = .597. In high-load trials, however, distractor fixations times were slightly longer in competition trials, relative to control trials (218 ms and 210 ms, respectively), t(19) = 2.53, p = .020, d = .57.
Experiment 3b, verbal cues:
In target-present trials of Experiment 3b, there was a reliable main effect of Load and a marginal effect of Competition, but the Load × Competition interaction was not significant. In the target-absent trials, there was a marginal main effect of Load and a marginal Load × Set Size interaction, but the Load × Competition interaction was again not significant. However, we did observe a reliable Competition effect: As shown in the lower-left panel of Fig. 7, distractor fixations were longer in trials with phonological competition, relative to control trials (215 ms and 205 ms, respectively). Thus, in Experiments 3a and 3b, we found that people examined distractors slightly longer when phonological overlap existed between targets and distractors.
Summed distractor fixation durations
Given the observation that individual distractors were fixated longer in competition trials, relative to control trials, we next verified that summed distractor fixations contributed to the competition effects found in overall search RTs. As with the mean fixation duration measure, these summed durations reflect only first fixations to distractors (excluding refixations) and only distractors fixated pre-target in target-present trials.
Because the results were straightforward, we report only the key effects of interest. In Experiment 3a, there were no main effects of Competition, but we found a marginal Load × Competition interaction in target-present trials, F(1, 19) = 3.25, p = .087, ηp2 = .15, and the interaction was significant in target-absent trials F(1, 19) = 8.01, p = .011, ηp2 = .30: In target-absent trials, under low load, these durations in control and competition trials did not reliably differ (1,540 ms and 1,516 ms, respectively), t(19) = 0.80, p = .435. Under high load, the difference was marginal (2,482 ms and 2,562 ms, respectively), t(19) = 2.06, p = .054, d = .46.
In Experiment 3b, there was no effect of Competition in target-present trials, but there was a marginal Load × Competition interaction, F(1, 18) = 3.22, p = .090, ηp2 = .16. In target-absent trials, there was no Load × Competition interaction, but there was a main effect of Competition, F(1, 18) = 11.76, p = .003, ηp2 = .40, with longer summed fixation durations in the competition (2,260 ms) compared to the control condition (2,131 ms). In all cases, the summed fixation duration results closely mirrored search RTs, with weak competition effects with image target cues in Experiment 3a and stronger effects with verbal cues in Experiment 3b.
Experiment 3 was conducted to replicate and extend Experiment 1, with added eye-movement data. As in Experiment 1, search RTs revealed phonological competition effects, particularly when targets were specified via verbal cues. To better understand the phonological competition effect, we examined eye-movement behavior including (1) counts of fixated distractors (2) distractor refixations, (3) average distractor fixation durations, (4) total distractor fixation durations, and (5) decision times in target-present trials. These analyses revealed negligible differences in the measures related to attentional guidance: Patterns of fixating and revisiting distractor objects were nearly identical in competition and control trials.
Eye movement results in Experiment 3b (verbal target cues)
A. Distractors fixated
B. Mean distractor fixation duration (ms)
A * B (ms)
Decision time (ms)
Search time (ms)
Competition: Low load
Competition: High load
Control: Low load
Control: High load
Competition: Low load
Competition: High load
Control: Low load
Control: High load
These findings comport with those of Meyer et al. (2007), who also found that people take longer to disengage attention from competitors, relative to unrelated distractors. The difference in gaze durations to control and critical items in Meyer et al. (2007) was 14 ms (138 ms and 152 ms, respectively), comparable to the ~10-ms interference effect that we observed in Experiment 3. However, unlike the previous findings of Meyer et al. (2007; also Görges et al., 2013), we found no differences in search guidance attributable to phonological competition. As previously mentioned, one possible reason for this discrepancy is task differences: Meyer et al. (2007) and Görges, et al. (2013) presented simple displays with four items in fixed locations. This consistent spatial arrangement potentially enabled participants to covertly attend to objects and retrieve their names prior to fixation, leading to errors in guidance—that is, more first fixations to distractors sharing verbal similarity with the target. In contrast, objects in our displays were widely (and randomly) distributed by comparison. The unpredictable and wide distribution of our displays, combined with the speeded nature of the task, likely prevented activation of distractor names via covert attention and resulting interference in guidance. Furthermore, the fixation and refixation counts that we used to estimate guidance differed from the first fixation probabilities used in previous studies. The first fixation measure was an appropriate index of search guidance in prior studies, which presented competitors and control items in the same displays. This measure would have been inappropriate for our displays, which presented either all competitors or all control objects. In Experiment 3, global measures of guidance (total fixations and refixations) were more fitting, as they reflected overall attentional guidance in the presence of competitor versus non-competitor objects. Clearly, these measures showed no appreciable changes in global guidance due to phonological interference. In contrast, fixation times showed consistent increases for phonologically-similar versus control distractors, showing that subtle delays in distractor rejection were likely the source of the phonological competition effect.
In summary, the eye-tracking results in Experiment 3 showed that phonological competition generally slows down object processing, rather than changing search patterns. When competition is present, there are increases in distractor rejection times and in target appreciation times, leading to a substantial cumulative effect in search RTs. Both effects indicate that perceptual decisions are impaired when target and distractor names are similar and salient.
In three experiments, we investigated the impact of phonological representations in an ostensibly visual task. A well-known finding in visual search is that background items impair search when they resemble search targets (Duncan & Humphreys, 1989; Treisman & Gelade, 1980; Wolfe, et al., 1989). Our goal was to determine whether similar interference occurs when distractors’ names share phonological overlap with search targets. The answer was yes, but only when the task was challenging. When looking for singular targets, people were relatively unaffected by phonological competition, but the effect emerged when three potential targets were simultaneously considered. Also, when people were provided with visual target templates, they were relatively unaffected by phonological competition, but the effect increased when targets were specified using verbal labels. A similar pattern was found in the decision-making process of distractor rejection (Experiment 2). Eye-movement analyses (Experiment 3) showed that phonological interference slowed perceptual decisions for both targets and distractors.
As noted earlier, our results suggest two theoretical points. First, as specified in many theories (e.g., Duncan & Humphreys, 1989; Wolfe, 2007), visual search involves “resonance-seeking” between targets held in WM and objects in the environment (see also Hout, Walenchok, Goldinger, & Wolfe, 2015). When a person has a search target in mind, attention is drawn toward objects that resemble the target, and perceptual processes evaluate those objects as potential matches. Second, object images are high-dimensional stimuli: Although visual features are clearly primary, objects also have “hidden” dimensions, such as names and various semantic features. This study suggests that when simple visual matching is precluded, object names play a role in perceptual evaluation. This was especially true when search processes were challenged by requiring people to search for multiple potential targets simultaneously.
Why does phonological competition increase during multiple-target search? We suggest several potential accounts, none of which are mutually exclusive. The first relates to processing time: Multiple-target search is cognitively demanding, relative to single-target search. With each distractor fixated, one must compare this distractor to several targets held in memory. It is easy to conceive of object perception as a cascaded process (McClelland, 1979), wherein visual features are available first, followed by activation of names and other information. By such a view, object names would become more prominent as a natural side-effect of longer fixations in multiple-target search. This account nicely accords with the present finding that object names had little effect on visual guidance, but selectively affected item processing times (although, as mentioned, prior studies have shown that covert activation of object names can affect guidance; Görges et al., 2013; Meyer et al., 2007).
A second, related hypothesis is that people struggle to ignore irrelevant information when WM is taxed. For example, Hout and Goldinger (2010, 2012) found that observers are more likely to incidentally learn distractor objects when searching for multiple targets. Their findings suggest that, under WM load, people are less able to block out distracting information, such as object names in the present study. Increased memory load can be conceptualized as attention directed inward (Cowan, 2005; Kiyonaga & Egner, 2013; Oberauer, 2009), reducing capacity to direct attention outward. For example, Conway, Cowan, and Bunting (2001) found increased susceptibility to the cocktail party effect (Moray, 1959) for participants with low WM spans (Daneman & Carpenter, 1980) relative to high-span participants. In our experiments, rather than classify individuals according to span, we selectively reduced WM capacity by imposing loads.
In this study, we also found that specifying targets via verbal labels increased distractor processing. The minimal phonological interference we observed with visual cues suggests that people are able to maintain detailed target templates in memory during search (Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001). More important, they can likely reject distractors using superficial visual analysis. However, when targets are verbally specified, every distractor object requires more careful analysis—an object cannot be rejected until it has been identified. When all target and distractor names share a common onset, it creates repeated opportunities for distractors to communicate false-positive signals to the perceptual system. Indeed, Huettig and McQueen (2007) found that various object attributes (e.g., visual, semantic, phonological) can become more or less activated depending on task characteristics. Using the visual world paradigm, they instructed participants to freely view displays containing four objects while listening to spoken sentences. Critically, the sentences contained a specific target word that either shared visual, semantic, or phonological information with three of the objects in the display. In one experiment, they found that the probability of fixating each type of competitor was cascaded, with fixations to phonological competitors occurring before fixations to shape and semantic competitors. Although the present experiments did not involve spoken targets, using verbal target cues increased the salience of distractor names, creating interference. One possible mechanism for this interference is that verbal cues may require people to rehearse the target names, particularly when there are multiple potential targets. The resulting inner speech may have functioned like the spoken targets in Huettig and McQueen (2007), resulting in competition when the names of “spoken” targets and competing objects overlapped (see also Görges et al., 2013).
The phonological interference observed in this study is reminiscent of cohort competition that can occur during speech perception (Grosjean, 1980; Tyler, 1984). Spoken word perception often requires listeners to resolve ambiguity because many words share similar phonetic onsets. Competition among words particularly arises when early word segments are shared by many potential candidate words in the lexicon (the “cohort”). For instance, when hearing the word beaker, correct selection is unlikely during early perception of the utterance because there are numerous lexical candidates that also contain an initial /bi/ phoneme (e.g., bean, beast, beanie). According to prominent models of speech perception (e.g., COHORT; Marslen-Wilson, 1987, Marslen-Wilson & Welsh, 1978; TRACE; McClelland & Elman, 1986), the unfolding speech signal is disambiguated in time as the candidate pool progressively shrinks, until the correct word is determined—at the moment the /k/ in beaker is perceived, the pool shrinks to fewer words (e.g., become, beak, beaker) until beaker is finally selected (in conjunction with syntactic, contextual, and other constraints).
Supporting this cohort-like account within the visual domain, Allopenna et al. (1998) found that fixations simulated using one such model (TRACE; McClelland & Elman, 1986) closely matched data from their visual world paradigm experiment: Both participant and simulated fixations to cohort competitor objects increased as phonemes of spoken target and competitor names overlapped in time, with decreasing competitor fixations as the ongoing speech signal narrowed the pool of potential targets. In the present experiments, similar interference was evident in the process of perceptual evaluation, rather than guidance: Cohort overlap elicited longer fixations before attentional disengagement and the eyes moving off to consider another object on-screen. We suggest that, because the object names were all cohort competitors for the target-object names (in this case, all starting with /bi/), it created momentary resonances, subtle hesitations (~10 ms) per object that accumulated within trials, as name overlap incrementally increased the viability of every distractor object as a potential target.
In conclusion, our findings show that when people search for a single object, they are relatively immune to potential phonological competition from distractors. When looking for multiple simultaneous objects, however, the names of background objects have greater capacity to interfere with search, especially when targets are specified verbally. This interference does not reflect diminished search guidance, but a reduced ability to reject and disengage attention from background distractors. Taken together, the results suggest that when people look for things, they engage in implicit naming as they inspect various objects. Such name activation is likely useful in the vast majority of situations, but can cause interference when too many potential targets “sound like” the intended target.
Data filtering algorithms employed by the Data Viewer software required exclusion of a certain proportion of trials. Specifically, because of the minor spatial eye-tracking error inherent in such experiments, all fixations that fell outside the AOI of any object were placed within the AOI of the object nearest to fixation. However, because the first fixation of each trial began in the center of the display (outside of any object’s AOI), this fixation was also placed within the nearest object’s AOI. Therefore, we excluded from analysis any target-present trial in which the target was the first item fixated, because one cannot definitively attribute such fixations to participants or the algorithm (i.e., decision-time analyses could have reflected these “false fixations”). Target-present trials in which the target was not fixated were also excluded from all analyses. In target-present trials, 10 % of trials were excluded from in Experiment 3a because of data filtering, for a total of 20 % of target-present trials excluded (including incorrect trials). Three percent of target-absent trials were excluded from analyses in Experiment 3a because of incorrect responses. In Experiment 3b, 9 % of target-present trials were excluded due to eye data filtering, for a total of 21 % of target-present trials excluded (including incorrect trials). Six percent of target-absent trials were excluded from Experiment 3b due to incorrect responses.
This work was supported by NIH Grant R01 HD075800-02 to SDG. We thank Jeffrey Beirow, Kayla Block, Feng Min Chen, Raul Garcia, James Harkins, Ga Young Kim, Jorin Larsen, Jenalee Remy, and Gia Veloria for assistance in data collection. We also thank Tamaryn Menneer and one anonymous reviewer for their helpful comments.
- Huettig, F., & Altmann, G. T. M. (2004). The online processing of ambiguous and unambiguous words in context: Evidence from head-mounted eye-tracking. In M. Carreiras & C. Clifton (Eds.), The on-line study of sentence comprehension: Eyetracking, ERP and beyond (pp. 187–207). New York: Psychology Press.Google Scholar
- Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Upper Saddle River: Prentice Hall.Google Scholar
- Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime user’s guide. Pittsburgh: Psychology Software Tools.Google Scholar