Implicit object naming in visual search: Evidence from phonological competition

Walenchok, Stephen C.; Hout, Michael C.; Goldinger, Stephen D.

doi:10.3758/s13414-016-1184-6

Implicit object naming in visual search: Evidence from phonological competition

Published: 16 August 2016

Volume 78, pages 2633–2654, (2016)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Implicit object naming in visual search: Evidence from phonological competition

Download PDF

Stephen C. Walenchok¹,
Michael C. Hout² &
Stephen D. Goldinger¹

1799 Accesses
16 Citations
Explore all metrics

Abstract

During visual search, people are distracted by objects that visually resemble search targets; search is impaired when targets and distractors share overlapping features. In this study, we examined whether a nonvisual form of similarity, overlapping object names, can also affect search performance. In three experiments, people searched for images of real-world objects (e.g., a beetle) among items whose names either all shared the same phonological onset (/bi/), or were phonologically varied. Participants either searched for 1 or 3 potential targets per trial, with search targets designated either visually or verbally. We examined standard visual search (Experiments 1 and 3) and a self-paced serial search task wherein participants manually rejected each distractor (Experiment 2). We hypothesized that people would maintain visual templates when searching for single targets, but would rely more on object names when searching for multiple items and when targets were verbally cued. This reliance on target names would make performance susceptible to interference from similar-sounding distractors. Experiments 1 and 2 showed the predicted interference effect in conditions with high memory load and verbal cues. In Experiment 3, eye-movement results showed that phonological interference resulted from small increases in dwell time to all distractors. The results suggest that distractor names are implicitly activated during search, slowing attention disengagement when targets and distractors share similar names.

Searching for the right word: Hybrid visual and memory search for words

Article 19 March 2015

On the optimal viewing position for object processing

Article 19 November 2015

Words affect visual perception by activating object shape representations

Article Open access 20 September 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Imagine that you are cooking dinner and have set your ingredients and utensils on the counter, within easy reach. With a quick glance, it would be trivially easy to spot carrots on the counter among onions, potatoes, and garlic. Finding an onion, however, would likely require more careful discrimination: Background items create interference when they visually resemble search targets (Duncan & Humphreys, 1989; Treisman & Gelade, 1980; Wolfe, 2007; Wolfe, Cave, & Franzel, 1989). In this study, we asked whether such interference might occur when targets and distractors share similar names, in the absence of visual similarity. For example, is searching for a beaker more difficult when the background items also have names with /bi/ onsets (e.g., beast, bean, and bee), relative to backgrounds with heterogeneous names? In three experiments, we investigated the conditions under which such phonological interference might occur.

Previous research has shown that nonvisual attributes of search items can influence the deployment of attention. For example, Moores, Laiti, and Chelazzi (2003) found that when distractor items were semantically related to targets, they attracted attention during visual search (see also Godwin, Hout, & Menneer, 2014). In their experiment, participants were given verbal target cues before the brief presentation of search arrays. In post-search recall tests, participants better remembered distractors that were semantically related to target items (e.g., a lock if the target was a key) relative to unrelated distractors, and eye movements revealed more first fixations to target-related items, relative to unrelated items. Similarly, Huettig and Altmann (2005) found that semantic competitors can capture attention, using the visual world paradigm. Here, participants were instructed to freely view displays containing four items while listening to sentences containing a target word. People were more likely to fixate items related to the target word (such as a trumpet, given the target word piano), relative to unrelated distractors. Yee and Sedivy (2006) found that this semantic attention capture extends to distractors that are semantically related to phonological-onset competitors of the spoken target word. For instance, if the spoken target was log, participants tended to fixate a key because log phonologically overlaps with lock (see also Dahan & Tanenhaus, 2005; Huettig & Altmann, 2004; Yee, Overton, & Thompson-Schill, 2009).

Several previous studies have also shown that phonological competition between targets and distractors can impair search efficiency in the absence of semantic competition. Using the visual world paradigm, Allopenna, Magnuson, and Tanenhaus (1998) presented participants with displays that contained distractors whose names partially overlapped with spoken targets, as either phonological onset or rhyme competitors (e.g., beetle or speaker, respectively, for the target word beaker). Eye-movement analyses showed that participants fixated onset and rhyme competitors in close alignment with phonological target segments as the spoken target unfolded in time (see also Huettig & McQueen, 2007; Righi, Blumstein, Mertus, & Worden, 2010). Meyer, Belke, Telling, and Humphreys (2007) obtained similar results with homophone competitors. Participants were initially familiarized with image names, then, in each experimental trial, were given visual target cues and made speeded target present–absent judgments to search displays. Search response times (RTs) were slower when homophone competitors were present (e.g., an image of the animal bat, given the search target baseball bat), and eye-movement data showed more frequent first fixations and longer fixations to homophone competitors relative to unrelated distractors. These results suggest that phonological competitors disrupt both visual guidance and decision-making processes—that is, distractor rejection. Görges, Oppermann, Jescheniak, and Schriefers (2013) found similar phonological competition effects in a paradigm identical to that from Meyer et al. (2007), except that participants were presented with either visual or verbal target cues before search. Although target detection was faster overall with visual cues, the degrees of phonological interference were equivalent in both conditions. Furthermore, this verbal interference only occurred when participants were familiarized with the names of the stimuli before the experiment, suggesting that people use both visual and verbal target information to varying degrees when searching, depending on the salience of these object properties. Taken together, these results suggest that participants may implicitly activate phonological dimensions of visual stimuli, at least under certain conditions.

Regarding the conditions that encourage such phonological activation, research has suggested that implicit object naming may increase as a function of memory demands. Zelinsky and Murphy (2000) presented participants with two different tasks, a recognition task with high memory demand, and a visual search task with low memory demand. In the recognition task, participants were initially shown multiple common objects or faces whose names contained one or several syllables. Afterward, they had to judge whether singularly presented objects were old or new. Analyses of eye-movements during encoding revealed more fixations and longer gaze durations on objects whose names contained multiple syllables, compared to single-syllable items. A subsequent analysis showed a high correlation between the time required to vocalize object names and gaze duration on those objects, suggesting that participants were implicitly naming stimuli during encoding. No effect of syllable length was found in visual search, which had comparatively smaller encoding requirements (i.e., searching for a single target).

In this study, we examined two factors that we considered likely to modulate degrees of phonological coding of search targets. First, we manipulated the number of potential targets per trial, increasing memory demand for multiple-target search (Hout & Goldinger, 2010, 2012, 2015; Zelinsky & Murphy, 2000). Second, we presented search targets either as images or as verbal labels. By manipulating both of these factors in three experiments, we examined whether the combination of memory demands (e.g., Zelinsky & Murphy, 2000) and verbal labels (e.g., Görges et al., 2013) encouraged implicit naming of search targets and thereby elicited phonological interference from background, distractor items. In contrast to previous visual search studies that have examined phonological competition with simple four-item displays (e.g., Görges et al., 2013; Meyer et al., 2007), we tested using a more traditional search task (in Experiments 1 and 3), with many items scattered in unpredictable positions around the display. This paradigm enabled us to further examine whether phonological competitors inhibit object identification and visual guidance processes when people perform challenging visual search.

Across experiments, participants were cued to search for either one or three potential target objects per trial, one of which could appear among distractor objects in a subsequent search array. In the critical condition, target and distractor objects’ names shared a phonological onset (e.g., beaker, beast, and beanie). This was compared to several control conditions (as described below). We also varied whether participants were cued using target images or names, anticipating that name cues would reduce immediate visual matching and would make target-object names more active in working memory (WM). Without having exact target representations in visual WM, we expected participants to process distractors to a greater degree, in order to reject them. Such prolonged processing would theoretically allow more time for object names to become activated, potentially creating phonological interference. However, the memory requirements of single-target search are low, relative to simultaneously searching for three potential targets. With only one target in mind, people may generate visual representations from verbal cues, using them as templates to guide search (Schmidt & Zelinsky, 2009), thereby minimizing potential phonological interference. Taking these manipulations together, we expected evidence for phonological competition to mainly emerge in the condition combining verbal cues and multiple-target search.

Experiment 1 involved standard visual search, with participants searching spatial arrays of objects for targets, confirming their presence or absence. Along the way, viewed distractors must be rejected as nontargets. Although search RTs provide a fairly coarse measure of distractor rejection time, they can indicate whether global differences arise from phonological competition. To more precisely examine distractor rejection, Experiment 2 presented each search item serially (self-paced RSVP), requiring a series of overt distractor rejections per trial. In Experiment 3, we again used standard visual search and recorded eye movements to better assess distractor processing without requiring overt decisions to each.

Experiment 1

We examined phonological competition effects in a standard visual search task. Participants searched for one (low load) or three (high load) targets among pseudorandomly distributed distractor objects. Participants were cued with target images in Experiment 1a and target names in Experiment 1b.

Method

Participants

Twenty and 22 undergraduate students from Arizona State University participated in Experiments 1a and 1b, respectively, for partial course credit. All participants were native English speakers and self-reported normal or corrected-to-normal vision. All procedures were approved by the Arizona State University Institutional Review Board.

Design

All variables in each experiment were manipulated within-subjects, including Target Presence (present or absent), Set Size (12, 16, or 20 objects per search display), Load (one or three potential targets), and Competition (phonological competition or control). In all experiments, the variable Competition included four levels. First was the critical phonological competition condition, wherein potential targets and all distractors shared an initial /bi/ consonant-vowel (CV) onset in a trial (see Table 1). We also included three separate control conditions, with targets and distractors combined in all possible ways: (1) /bi/ targets with varied distractors, (2) varied targets with /bi/ distractors, and (3) varied targets and distractors. Each participant was presented with an equal number of trials within each combination of conditions. For example, there were four trials total in which a target was present at Set Size 12 with a single potential target among a background of phonological competitors. We included all three control conditions to ensure that any effects truly reflected competition between targets and distractors in the critical condition rather than idiosyncratic properties of any particular items. Across experiments, the control conditions produced nearly identical results, often appearing as overlapping lines when plotted. Separate ANOVAs were conducted for only the control conditions: In the few instances wherein results in these conditions reliably differed, they produced no systematic pattern across experiments (full results from each control condition in all experiments are provided in Appendix B). For clarity, we collapsed across the separate control conditions for the main analyses; this averaging had no impact on the reported results. The key dependent variable was search RT in correct trials.

Table 1 Stimuli for all experiments

Full size table

Stimuli

The stimuli were images of real-world objects, converted to grayscale and resized to 2.9° × 2.9° visual angle (centered) from a viewing distance of 60 cm. Stimuli were sampled from a list of 23 phonological competitors and 23 control items (see Table 1). The list of phonological competitors contained items whose names shared a /bi/ CV onset (e.g., beaker, beast, beanie) whereas the control list contained items with varied onsets (e.g., snail, pretzel, turtle). Target and distractor items were randomly sampled from these lists per trial, appropriate to the competition or control condition for that given trial.

Apparatus

All data were collected on up to 12 computers simultaneously, with identical hardware and software profiles, consisting of Dell Optiplex 380 PCs at 3.06 GHz and 3.21 GB RAM, in 1366 × 768 resolution on Dell E1912H 18.5-in. monitors at a 60 Hz refresh rate, with the display controlled by an Intel G41 Express chipset, each running on Windows XP. All stimuli were presented using E-Prime 1.2 software (Schneider, Eschman, & Zuccolotto, 2002).

Procedure

Before the visual search task, participants were familiarized with the names of all stimuli to reduce name ambiguity. During the familiarization task, one item was centrally presented per trial, along with one plausible name and one foil (i.e., a concrete noun unrelated to the object). Participants chose the correct name for each object by pressing f or j on the keyboard. The locations of the plausible and foil names were randomized, and participants received accuracy feedback in every trial. A minimum accuracy of 85 % was required to proceed to the main experiment; no participants fell below this criterion.

The search task included eight practice trials and 192 experimental trials, with a 2-minute break midway through the experiment. Figure 1 shows a schematic progression of trial events. In each trial, participants were cued with the possible target(s), followed by a 500-ms fixation cross, and then the search display. A search array algorithm was used to create displays with pseudorandom organization (Hout & Goldinger, 2010, 2012, 2015): Each display quadrant was divided into nine equal-sized (but invisible) cells. Within each quadrant, objects were randomly placed within these cells, with the constraint that equal numbers of objects occupied each quadrant. Each object’s location was then randomly jittered within each cell, giving the appearance of truly random organization. Participants were instructed to determine target presence or absence as quickly and as accurately as possible, terminating search by pressing the space bar. They were then shown a subsequent screen in which they confirmed target presence or absence by pressing the f or j keys, respectively, and were given accuracy feedback. Correct feedback consisted of a green check mark presented for 1,000 ms, and incorrect feedback consisted of a red X presented for 2,000 ms. In low-load trials, a single target was initially presented, whereas in high-load trials, three potential targets were initially presented (only one could appear in the search display, and instructions made participants aware of this.) Low- and high-load trials were randomly intermixed. Image target cues were used in Experiment 1a, and name cues were used in Experiment 1b (see Fig. 2).

Results

Average search accuracy in Experiment 1a was 93 %. Three participants were excluded from analysis in Experiment 1b, two because of technical malfunctions and one because of missing data (i.e., no correct trials in one condition). Average accuracy for the remaining 19 participants in Experiment 1b was 90 %. All data were analyzed using repeated-measures ANOVAS and two-tailed t-tests for pairwise comparisons. Where applicable, multivariate tests are reported to account for violations of the sphericity assumption (Keppel & Wickens, 2004). Several analyses were conducted per experiment, first including all variables, then separately examining target-absent and target-present trials. Given that people are roughly twice as fast to terminate search in target-present trials (Treisman & Gelade, 1980), target-absent trials allow for greater inspection of distractor items, and distractor interference in these trials was our primary interest (see also Görges et al., 2013; Moores et al., 2003).

Experiment 1a: Image cues

The results for Experiment 1a are shown in the upper panels of Fig. 3. For clarity, the ANOVA results are provided in Appendix A (see Table A1), and we focus on key findings in the main text. In the full analysis, there were main effects of Load, Target Presence, and Set Size, all in their typical directions: Participants were slower when searching for multiple targets, and when set sizes were larger, and when targets were absent. (Because these basic findings are well-established in visual search, we mention them only briefly in all remaining analyses.) We also observed a main effect of Competition, shown in Fig. 3 by slightly separated lines, with slower responses in competition trials, relative to control. The finding of key interest was the Load × Competition interaction: In low-load trials, no competition effect was observed (with mean RTs of 1,865 ms and 1,865 ms in control and competition trials, respectively), t(19) < 0.01, p = .999. But, in high-load trials, mean RTs were faster in control trials (3,615 ms), relative to competition trials (3,898 ms), t(19) = 2.98, p = .008, d = .67.

Again, because our main interest concerned distractor rejection, we separately examined target-present and target-absent trials, anticipating stronger effects in target-absent trials (because they entail exhaustive search). In target-present trials, there were again main effects of Load, Competition, and Set Size. As the upper right panel of Fig. 3 shows, search RTs were slower in competition trials than control trials (1,956 ms and 1,786 ms, respectively). There was a marginal Load × Competition interaction (p = .077) in the same direction as in the overall ANOVA. In the target-absent trials, there were again main effects of Load, Competition, and Set Size. The Competition effect reflected slower RTs in competition trials (3,806 ms), relative to control trials (3,694 ms). The key Load × Competition interaction was again marginal (p = .067). Taking these results together, Experiment 1a provided weak evidence for the predicted result, as competition effects emerged when participants performed multiple-target search. We expected these effects to be stronger in Experiment 1b, which involved verbal target cues.

Experiment 1b: Verbal cues

The results for Experiment 1b are shown in the lower panels of Fig. 3 and in Appendix A (Table A2). In the full analysis, there were main effects of Load, Target Presence, and Set Size, again, all in their typical directions. We also observed a main effect of Competition, with slower RTs in competition trials relative to control trials. The finding of key interest was the Load × Competition interaction: In low-load trials, no competition effect was observed (with mean RTs of 2,256 ms and 2,257 ms in control and competition trials, respectively), t(18) = 0.02, p = .988. But in high-load trials, mean RTs were faster in control trials (4,282 ms) relative to competition trials (4,787 ms), t(18) = 4.37, p < .001, d = 1.00. There were two key interactions. The Load × Competition interaction verified that RTs were slowest in multiple-target, competition trials. The Load × Competition × Target Presence interaction suggests that this Load × Competition interaction mainly emerged in target-absent trials.

In the target-present trials, there were main effects of Load and Set Size, and a marginal effect of Competition. The critical Load × Competition interaction was not reliable (see lower right panel of Fig. 3). In the target-absent trials, there were main effects of Load, Set Size, and Competition. The key Load × Competition interaction showed that during multiple-target search, RTs were slowed by phonological competitors. Under low target load, mean RTs to control and competition trials were 2,942 ms and 2,916 ms, respectively, t(18) = 0.39, p = .701. Under high load, these values diverged, with mean RTs of 5,479 ms and 6,214 ms, t(18) = 6.04, p < .001, d = 1.39 (see lower left panel of Fig. 3). The overall pattern in Fig. 3 suggests stronger competition effects in Experiment 1b relative to Experiment 1a. This was tested in a combined ANOVA, conducted specifically to assess the potential Experiment × Load × Competition interaction, but this was not reliable, F(1, 37) = 1.44, p = .238, nor was the Experiment × Competition interaction, F(1, 37) = 2.39, p = .131.

Discussion

In Experiment 1, participants searched for objects, either in trials wherein target and distractor names were heterogeneous or in trials wherein all depicted objects shared the phonological onset /bi/. When people searched for singular targets, RTs were unaffected by phonological similarity among the target and distractors. However, when searching for multiple potential targets, people were slower in competition trials relative to control trials. This pattern was especially evident when targets were specified verbally rather than visually. Note, however, that distractor object names were never task relevant. Instead, they appeared to be activated automatically, impacting search when processing demands were high.

The lack of phonological competition observed when processing demands were low (i.e., in single-target search) stands in contrast to Meyer et al. (2007) and Görges et al. (2013), who found phonological competition effects when people searched for single items. Task differences offer one potential explanation for the discrepant findings: Meyer et al. (2007) and Görges et al. (2013) presented smaller displays with four items in predictable locations, whereas our displays contained 12, 16, or 20 randomly distributed items. Indeed, the greater complexity of the current task was evident in slower overall search RTs (e.g., 1,865 ms in Experiment 1a in single-target search compared to 645 ms in the comparable condition of Görges et al., 2013, and ~800 ms in Meyer et al., 2007). The greater complexity in our displays could have limited the attention that participants devoted to individual items, reducing phonological activation and therefore interference in single-target search. We examine this point further with eye tracking in Experiment 3.

The discrepant findings may also reflect differences between respective stimulus sets. Our phonological competitors included items that shared only phonological onsets, whereas Meyer et al. (2007) and Görges et al. (2013) used homophone and rhyme competitors, respectively, that shared greater phonological similarity to one another. Our stimuli only shared a /bi/ onset, whereas those from previous studies shared salient rhymes as well. The greater phonological overlap between target and competitor names in these prior experiments likely elicited greater interference in single-target search that we failed to observe with weaker phonological competitors (Grosjean, 1980; Tyler, 1984).

Although phonological competition largely did not emerge in single-target search in Experiment 1, this interference was strong when people searched for multiple targets. What might explain the Experiment 1 results? To offer a potential account, we must adopt two relatively straightforward assumptions and one novel hypothesis. First, we assume that when a person views an image of some object (e.g., a dog), the experience is not entirely visual. Instead, the object name also becomes partially active in memory, along with associated knowledge (e.g., Huettig & Altmann, 2005; Yee & Sedivy, 2006). These implicit dimensions may not reach a level of conscious awareness, but we assume they receive some degree of activation. Second, we assume that multiple-target search, combined with verbal target cues, encourages verbal coding to maintain target identities during search (Baddeley & Hitch, 1974). With these assumptions in place, we hypothesize that visual search involves “resonance seeking” between targets in WM and objects in the search array. When target representations are mainly visual, attention will mainly be drawn to visually viable candidates. But, when verbal labels are more prominent in WM, attention will also be drawn to phonologically viable candidates.

Even in this simple framework, the competition effect in Experiment 1 might reveal disruption of attentional guidance, perceptual matching, or both. With respect to guidance, Menneer et al. (2012) found that visual guidance is hindered in multiple-target search (see Hout & Goldinger, 2015, for similar findings). When searching for multiple potential targets, people both fixate more target-dissimilar items and fewer target-similar items, relative to searching for singular targets. People are also more likely to revisit previously inspected objects during multiple-target search. Therefore, when searching under high WM load, the capacity to effectively guide attention is reduced. In Experiment 1, this reduced attentional guidance would suggest that people considered more distractors in the high-load condition, relative to the low-load condition. When more distractors are viewed, it may increase the likelihood of individual object names becoming active and causing interference when they overlap with the targets held in memory. Consequently, this interference might generally inhibit the effective deployment of attention, resulting in more fixations to nontargets when phonological competitors are present. Alternatively (or additionally), given greater phonological resonance between targets in WM and fixated objects, it may take more processing for people to reject distractor objects and disengage attention. In Experiment 1, this mechanism would suggest that fixated items would hold attention longer and perhaps that target identification would be slower. To simplify interpretation, we removed the need for attentional guidance in Experiment 2.

Experiment 2

In Experiment 2, we used the same stimuli as in Experiment 1 (arranged in the same critical and control conditions). The task was changed, however, to a self-paced serial search procedure that required participants to verify whether each viewed object was a target or a distractor. Each trial consisted of a stream of distractors, with targets embedded in 75 % of trials, as explained below. As before, Experiment 2a presented visual target cues, and Experiment 2b presented verbal target cues.