Visual attention shift to printed words during spoken word recognition in Chinese: The role of phonological information

Shen, Wei; Qu, Qingqing; Tong, Xiuhong

doi:10.3758/s13421-018-0790-z

Visual attention shift to printed words during spoken word recognition in Chinese: The role of phonological information

Published: 25 January 2018

Volume 46, pages 642–654, (2018)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Visual attention shift to printed words during spoken word recognition in Chinese: The role of phonological information

Download PDF

Wei Shen^1,2,3,
Qingqing Qu^4,5 &
Xiuhong Tong⁶

1506 Accesses
6 Citations
Explore all metrics

Abstract

The aim of this study was to investigate the extent to which phonological information mediates the visual attention shift to printed Chinese words in spoken word recognition by using an eye-movement technique with a printed-word paradigm. In this paradigm, participants are visually presented with four printed words on a computer screen, which include a target word, a phonological competitor, and two distractors. Participants are then required to select the target word using a computer mouse, and the eye movements are recorded. In Experiment 1, phonological information was manipulated at the full-phonological overlap; in Experiment 2, phonological information at the partial-phonological overlap was manipulated; and in Experiment 3, the phonological competitors were manipulated to share either fulloverlap or partial-overlap with targets directly. Results of the three experiments showed that the phonological competitor effects were observed at both the full-phonological overlap and partial-phonological overlap conditions. That is, phonological competitors attracted more fixations than distractors, which suggested that phonological information mediates the visual attention shift during spoken word recognition. More importantly, we found that the mediating role of phonological information varies as a function of the phonological similarity between target words and phonological competitors.

The role of tonal information during spoken-word recognition in Chinese: Evidence from a printed-word eye-tracking study

Article 16 July 2020

Semantic information mediates visual attention during spoken word recognition in Chinese: Evidence from the printed-word version of the visual-world paradigm

Article 18 March 2016

Early Effect of Phonological Information in Korean Visual Word Recognition: An ERP Investigation with Transposed Letters

Article 29 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Research shows that listeners tend to spontaneously shift their visual attention to referred objects while viewing a scene during spoken word recognition (Cooper, 1974; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). This suggests that spoken language processing affects visual attention. One question of theoretical interestis whether this effect arises in Chinese, a written language that represents phonological information at the syllabic level. The aim of the present study is to examine the extent to which phonological information influences the shift of visual attention to printed Chinese during spoken word recognition in a visual-world paradigm.

The visual-world paradigm is widely used to study spoken word recognition (Tanenhaus et al., 1995). In this paradigm, participants are required to view an array of pictures while listening to spoken target words. The movements of their eyes as they gaze upon pictures are recorded simultaneously. The relationship between visual attention and language processing can be examined by measuring the fixation probability on the corresponding visual objects. Allopenna, Magnuson, and Tanenhaus (1998) showed that participants produced more eye fixations on phonologically related pictures (e.g., “speaker” and “beetle”) than unrelated pictures when they heard a target (e.g., “beaker”). This was defined as the phonological competitor effect. The phonological competitor effect supports some spoken word recognition models, such as TRACE and cohort models, in which acoustic signals are assumed to be mapped continuously onto mental lexical representations, and candidates that share overlapping syllables with spoken target words are activated during spoken word recognition (Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986).

Moreover, studies conducted using the visual-world paradigm demonstrate that information processing in visual and spoken fields interact and affect our eye-movement behaviors such that the overt attention to visual objects can be driven based on different kinds of information (Dahan, Magnuson, Tanenhaus, & Hogan, 2001; Huettig & McQueen, 2007; McMurray, Tanenhaus, & Aslin, 2002). For example, Huettig and McQueen suggested that spoken word recognition is a cascade process, and overt visual attention shift to visual objects can be driven by phonological, shape, and semantic information, and this information is activated in a chronological sequence as spoken words unfold over time. Huettig, Quinlan, McDonald, and Altmann (2006) further argued that lexical representations associated with spoken words and visual objects are activated concurrently and that the overlap of lexical representations (e.g., phonological representations) between visual and auditory modalities causes visual attention shift to visual referents (Huettig et al., 2006).

Related studies use a variant of the visual-world paradigm, in which pictures are replaced with printed words (i.e., the printed-word paradigm). Such studies show that phonologically related printed words also attract more fixations than distractors (Huettig & McQueen, 2007; McQueen & Viebahn, 2007). However, the nature of phonological competitor effects observed in the printed-word paradigm remains controversial. Researchers have continued debating on what information drives these fixations caused by phonological competitors: whether it is driven by phonological information or other information (e.g., orthographic information). Two contrasting hypotheses have been proposed to explain such competitor effects: phonological and orthographic hypotheses (Huettig & McQueen, 2007; Salverda & Tanenhaus, 2010). Phonological hypothesis assumes that it is the representational overlap established in the phonological level that directs the shift of visual attention to printed words (Allopenna et al., 1998; Huettig & McQueen, 2007; McQueen & Viebahn, 2007; Weber, Melinger, & Lara, 2007). For example, Huettig and McQueen (2007) reported that the time course of different types of information processing (i.e., semantic, phonological, and shape information) varies during spoken word recognition. All competitors were found to attract more fixations than the unrelated distractors using the visual-world paradigm (i.e., when participants were presented with pictures). However, if pictures were replaced with printed words, only the phonological competitor effect was seen, wherein the phonological competitor attracted more looks than distractors. The authors argued that only phonological information was relevant in searching for printed words.

However, contrary to the phonological hypothesis, the orthographic hypothesis argues that the representational overlap established in the orthographic level mediates the shift of visual attention to printed words (Salverda & Tanenhaus, 2010). Salverda and Tanenhaus (2010) conducted two experiments to examine the question of whether more fixations caused by phonological competitors were driven by phonological information or orthographic information with a printed-word version of the visual-world paradigm. In this paradigm, they presented participants with a visual display of four printed words: a target word, a phonological competitor, and two distractors. Participants were required to follow a spoken instruction (e.g., please click on the word “bead”). In Experiment 1, they manipulated the phonological overlap between a target (e.g., “bead”) and the competitors to be high (e.g., “bean”/bin/–“bead”/bid/) or low (e.g., “bear”/bɛr/–“bead”/bid/), and therefore controlled the orthographic overlap across the two conditions (e.g., word pair “bead”–“bean” and word pair “bead”–“bear” shared the same orthographic overlap, namely, they shared the initial three letters “b-e-a”). The authors found no difference in the proportion of fixations between high and low phonological overlap competitors. This finding provided certain evidence against the phonological hypothesis by showing that the phonological information did not mediate visual attention shift during spoken word recognition.

In their Experiment 2, they further investigated the phonological competitor effect by manipulating orthographic overlap between the target words (e.g., “bead”) and the phonological competitors to be high (e.g., “bear,” which shared three letters “b-e-a” with the target word “bead”) or low (e.g., “bare,” which shared only one letter “b” with the target word “bead”) and therefore controlled the same phonological overlap between targets and phonological competitors simultaneously (e.g., target word “bead”/bid/ had the same phonological overlap with competitors of “bear” /bεr/ and “bare” /bεr/). Results of Experiment 2 showed that competitors with high orthographic overlap attracted more fixations than those with low orthographic overlap. The authors interpreted this finding as that orthographic representations rather than phonological information between the target words and the competitors coactivatedbetween spoken words and visual referents (e.g., printed words) mediates the shift of visual attention to visual printed words.

However, the above findings in the study conducted by Salverda and Tanenhaus (2010) could not be considered as robust evidence against phonological hypothesis. It is well-known that graphemes are strongly linked to phonemes, referred to as the grapheme–phoneme correspondence rule, in alphabetical languages. Thus, the sounds in words and the letters used to represent those sounds are relatively difficult to separate. This leads to the fact that the role of phonology and orthography in spoken word recognition becomes difficult to distinguish in alphabetic languages, and this is a natural consequence of using an alphabet (Leck, Weekes, & Chen, 1995; Weekes, Chen, & Lin, 1998; Zhang, Chen, Weekes, & Yang, 2009; Zhang & Weekes, 2009). Therefore, the findings in Salverda and Tanenhaus’s (2010) study could not completely disentangle phonological from orthographic information. The contribution of phonological information in their study was highly likely to be just smaller than that of the orthographic information, which led to the nonsignificant effect of phonological information.

Chinese is a widely known logographic language, which differs noticeably from alphabetic languages. Chinese words map directly into meaning units rather than phoneme units, and the rules of regular or quasiregular lettersounds expressed in all alphabetic languages cannot be found in Chinese (Tan & Perfetti, 1998). The link between orthography and phonology is relatively weak in Chinese. Chinese thus provides a fantastic window to examine the two hypotheses mentioned. Moreover, the role of phonological information in spoken word recognition in Chinese has also been a topic of debate. Some studies have shown that phonological information was activated during word recognition, whereas other studies have found no phonological information competition effect by showing that word recognition could be accomplished directly from orthography to semantics without phonological information (Chen, d’Arcais, & Cheung, 1995; Hoosain, 1991). Further examination of the role of phonological information in Chinese spoken word recognition could greatly contribute in distinguishing phonological and orthographic hypotheses.

The purpose of this research is two-pronged: First, we aimed to revisit the role of phonological information in spoken word recognition using Chinese. We investigate whether phonological information at full-phonological overlap could mediate the shift of visual attention to printed Chinese words. Shen, Qu, and Li (2016) found that both semantic and phonological information could guide visual attention shift to printed words. However, the role of phonological information was not as stable as the semantic information in such study. Shen et al. (2016) found that the phonological competitor effect was only significant in the short-preview condition (i.e., visual display was previewed for 200 ms) in Experiment 2 but was insignificant with another set of experimental materials in the same condition in Experiment 1. Another study conducted by Meng (2014), using a similar experimental design, found no phonological competitor effect when the phonological competitor shared the initial phonological information with target words. Therefore, the role of phonological information in spoken word recognition should be re-examined.

The second aim of this study was to examine the question of whether the fixations directed to printed words are sensitive to the degree of phonological overlap between targets and competitors. Most spoken word recognition models suggest that spoken word recognition is an incremental process (Marslen-Wilson, 1987; McClelland & Elman, 1986). This hypothesis suggests that word candidates that share the matched phonological information with the acoustic information can be activated as the spoken word gradually unfolds. For instance, words such as “beep”, “beaker”, and “beetle” would be activated upon hearing a segment of “/bi/” (Allopenna et al., 1998). It would be possible that the phonological information of word candidates, which are partially phonologically similar with spoken target words, can also be activated to direct visual attention shift to printed words. In addition, prior studies have showed that the fixations on visual objects are in proportion to the semantic similarity between spoken words and visual pictures (Huettig & Altmann, 2005; Huettig et al., 2006). For example, Huettig et al. (2006) found that the semantic similarity score between targets and competitors could predict participants’ fixation behavior in visual display using a visual-world paradigm. A smaller semantic competitor effect was observed even when targets were less semantically similar to the competitor (e.g., “coat”–“slipper”). If the visual shift to printed words is also sensitive to phonological information, we then speculate that phonological similarity might serve the same role of the semantic similarity as in Huettig et al.’s (2006) study. That is, we expected that the fixation proportions on printed words may vary as a function of the degree of the phonological similarity shared between targets and competitors.

The eye-tracking technique with the printed-word paradigm was adopted to investigate the above two research questions. Eye -tracking technique has at least two advantages over the traditional experimental tasks used in prior spoken word recognition studies (e.g., lexical decision and priming tasks). First,eye-tracking technique has larger ecological validity than the traditional tasks. For example, traditional tasks, such as the lexical decision task, require explicit responses to target stimuli, which may induce different processing strategies. By contrast, eye-movement recording can occur implicitly and in a natural environment. Second, eye-movement technique can be used to reveal the online time course of spoken target word recognition, given that it can accurately provide time-locked and continuous eye-movement activities. As for traditional tasks (e.g., lexical decision task), reaction time and accuracy are used as independent measures, which can only reflect cognitive processing at a certain processing stage.

Moreover, we improved the printed-word paradigm used in prior studies in the following two aspects. First, we presented all spoken target words without sentence context. In prior studies, all spoken target words were embedded in different neutral-context sentences, in which listeners might have to listen to a preceding context before hearing target words. In the preceding context, some words may share the same phonological information with the spoken target words, which may interfere or weaken the phonological competitor effect. We speculate that the phonological interference of context words might decrease the robustness of phonological competitor effects observed in prior studies. Second, we presented target referents in the visual display in this study. In some cases, the visual referents of spoken words would appear in the view field. With the display of target words, the target referent would attract the most fixation relative to other referents, which provides a better field to examine whether the phonological competitor effect would survive under a more stringent environment.

To summarize, three experiments in this study were designed to examine the phonological competitor effect in Chinese spoken word recognition. Experiment 1 aimed to examine whether the phonological competitor effect in Shen et al.’s (2016) study could be observed in a more stringent situation. In this experiment, we manipulated the phonological information between target words and competitors such that it would not only be phonologically related^{Footnote 1} but also orthographically unrelated. According to the orthographic hypothesis, if the orthographic information is critical in mediating shifts in visual attention, then only phonologically related competitors would not attract more fixations than distractors; namely, no phonological competitor effect should be observed. Otherwise, a significant phonological competitor effect would be observed, which supports the phonological hypothesis. In Experiment 2, we manipulated the phonological competitor to have partial phonological overlap with target words in order to explore whether partial phonological similarity between spoken words and phonological competitors could also mediate the shift in visual attention to printed words. Competitors with partial phonological overlap with targets were hypothesized to attract more fixations than distractors. In Experiment 3, the degree of phonological overlap between phonological competitors and targets was manipulated to be full-overlap or partial-overlap for each given target word to further investigate whether visual shifts to printed words is sensitive to phonological information. If the phonological competitor effect varied as a function of phonological similarity, then an interaction between the degree of overlap and the competitor type was expected to be observed.

Experiment 1

Method

Participants

A total of 30 undergraduate students (22 females and 8 males) ages 19 to 25 years (M = 21.20 years) were recruited from a university in Hangzhou. Each student was paid 10 Yuan (approximately 1.47 U.S. dollars) to participate in this study. All participants were native Chinese speakers. They had normal or corrected-to-normal vision and were right-handed. All were unaware of the purpose of the experiment. This study was approved by the Ethics Committee of the university.

Materials and design

A total of 48 words were selected as target words. Each printed-word display consisted of four printed words: a target word, a phonological competitor that shared the same syllable and tone of the first constituent with the target word, and two unrelated distractors that were neither semantically or phonologically related to the target word (an example, see Fig. 1). The word frequency and number of strokes across the four printed word conditions were matched (all Fs < 1;see Table 1). Phonological competitors were selected to be neither semantically nor orthographically unrelated^{Footnote 2} with the target words. However, the frequency of characters and number of strokes were unmatched across conditions due to the difficulty in material selection. The positions of the four printed words were counterbalanced across trials. Moreover, 48 filler trials, which consisted of four words, were added to the experimental list to avoid participants becoming aware of the relationships between phonological competitors and target words in the critical trials. All spoken targets were recorded in a natural voice by a female native Chinese speaker. Spoken targets were presented to participants through headphones.

Table 1 Lexical properties of experimental materials in Experiment 1

Full size table

Apparatus

Eye movements were recorded using an EyeLink1000 tracker sampling at a rate of 1000 Hz (SR Research, Mississauga, Ontario, Canada). Experimental materials were presented on a 21-inch CRT monitor with a resolution of 1024 × 768 pixels and a refresh rate of 85 Hz. Participants were asked to place their chins on a chinrest in order to minimize head movement during the experiment. Participants were seated in a chair and positioned 58 cm away from the video monitor. All printed words were shown in 30 Song font in black (RGB: 0, 0, 0) against a white background (RGB: 255, 255, 255). Each character subtended a visual angle of approximately 1.4°. Although viewing position was binocular, eye movement data were collected only from the right eye.

Procedure

After participants entered the lab, they were briefly introduced to the eye tracker. A nine-point calibration was then conducted at the beginning of the experiment, which participants completed by looking at nine white dots presented randomly on the computer screen. The validation error was smaller than 1°. A drift check was performed at the beginning of each trial. Afterwards, a blank screen was presented for 500 ms before the words were displayed. The words were displayed for 200 ms before the targetswere spoken. Participants were instructed to click the spoken target words on the screen with a mouse. Seven practice trials were initially conducted to familiarize participants with the experimental procedure. They then performed 48 experimental trials and 48 filler trials. The trials were presented randomly. The test lasted approximately 15 min.

Results and discussion

We excluded trials (0.6%) with incorrect responses for the analysis. A fixation was defined as a focus on the printed word if it fell within a square of 8× 8cm around the center of a printed word. Figure 2 shows the proportion of fixations to printed words (targets, phonological competitors, and distractors^{Footnote 3}) from 200 ms before the onset of spoken target words and 1,500 ms after the onset of spoken target words (time window was calculated in every 100 ms).

As shown in Fig. 2, fixations on phonological competitors started to diverge from distractors at around 200 ms after the target word was presented. To test whether the difference between phonological competitors and distractors reached statistical significance, we employed logit mixed models to analyze fixation proportion data^{Footnote 4} (Ferreira, Foucart, & Engelhardt, 2013; Jaeger, 2008). One advantage of the logit mixed model over traditional analyses (e.g., t test or ANOVA) is that it can evaluate different types of random effects in one model (i.e., random effects of participants and items),which is helpful in decreasing Type I error rates. Moreover, this model is also better in handling situations when the sphericity and homoscedasticity are violated (Cunnings, 2012). Following the procedure of Ferreira et al. (2013), we defined a dependent variable as whether or not a printed word received a fixation in a specific time window; hence, the value “0” means that no fixation was made to the printed word, whereas the value “1’ means that a fixation was made to the printed word. Two types of time-window analyses were conducted: (1) the global analysis that begins from 200 ms^{Footnote 5} to 600 ms after the onset of spoken target words, which was time-locked to the display duration of the first character (the average duration is approximately 350 ms) of the spoken target word; (2) the four 100-msbin analysis that starts from 200 ms to 600 ms after the onset of spoken target words.

A base model was initially created for statistical analysis, which included random intercepts for participants and items. The significance of the model was enhanced by adding a fixed factor (i.e., competitor type^{Footnote 6})and by-participant random slopes for competitor type step by step. The ANOVA function in R (Version 3.3.3; R Core Team, 2016) was used to isolate which model improved model fit significantly. In the models, one of the distractors was randomly assigned as a baseline, and phonological competitors were compared with that distractor. In each analysis, random intercepts for participants and items and by-participant random slopes for competitor type were entered as random effects (Barr, Levy, Scheepers, & Tily, 2013). In addition, competitor type was entered for each model as a fixed effect. For the model fit, we used the glmer function from the lme4 package (Version 1.1-7; Bates, Maechler, Bolker, & Walker, 2014) in R. Regression coefficient b, standard error SE, Zvalues, and corrected p values are reported in Table 3.R², coefficient of determination, is used as a measurement of effect size (see also Nakagawa & Schielzeth, 2013). Thus, the correlation between the fitted and observed values was calculated via an r2.corr.mer function developed by Jarrett Byrnes in an R environment.

Global analysis results showed that the model was significantly improved by inclusion of the competitor type,χ²(2) =37.25, p< .001. However, the addition of by-participant slopes for competitor type failed to significantly improve the model fit,χ²(5) =0.29, p = .99. The contrast test revealed that the variable phonological competitor attracted significantly more fixations compared with distractor (b = 0.42, SE = 0.08, Z = 5.43, p< .001, R²= .07).

We also performed a time-course analysis of four periods of 100-ms bins from 200 ms to 600 ms after the onset of spoken target words (see Table 2). We used Bonferroni correction to avoid the issue of multiple comparisons. The corrected p values are displayed in Table 2. The inclusion of the competitor type or by-participant random slope of competitor type did not improve model fit for the time window of 200–300 ms (p> .68). However, the model was significantly improved by the competitor type for the other three time windows (χ²s > 10.19, ps< .006). The phonological competitor diverged from the distractors significantly at 300–400 ms after the spoken word onset (b = 0.28, SE = 0.09, Z = 3.18, p = .003, R²=.05). In addition, the time windows of 400–500 ms (b = 0.61, SE = 0.09, Z = 6.87, p< .001, R²= .03) and 500–600 ms (b = 0.94, SE = 0.13, Z= 7.39, p< .001, R²= .07) showed similar data patterns, wherein the phonological competitor attracted more fixations than distractors, which suggest that the phonological information of spoken target words was activated early in the processing stage during spoken word recognition (at around 300 ms from the onset of the first constituent of the spoken target word in the present study).

Table 2 Logit mixed model analysis for competitor type in Experiment 1

Full size table

In summary, results of Experiment 1 strongly suggest that phonological information does mediate shifts of visual attention to Chinese printed words, which is independent of orthographic information even when performing a relatively stringent task. That is, the phonological hypothesis is supported by findings in Experiment 1, highlighting that phonological information plays an important role in mediating the shift in visual attention to printed words in during spoken word recognition in Chinese.

Experiment 2

Experiment 2 aimed to determine whether partially activated phonological information during spoken word recognition could also mediate the shift of visual attention to printed words by manipulating the overlap between spoken target words and phonological competitors such that a partial phonological overlap exists. If visual attention was sensitive to the phonological information, even with the partial phonological overlap between spoken target words and phonological competitors, then we would still observe a phonological competitor effect.