Learning to read and write might alter the way people process spoken language (Frith, 1998). This claim receives support from studies reporting that spoken word processing is affected by orthographic variables. In a seminal article, Seidenberg and Tanenhaus (1979) showed that rhyme judgments were made faster for word pairs that were spelt similarly (e.g., tie/pie) than for word pairs spelt differently (e.g., tie/rye). Even though this specific finding was not replicated when critical pairs were embedded in a large number of fillers (Damian & Bowers, 2010), orthographic effects have now been documented in a wide range of tasks such as phoneme monitoring (e.g., Frauenfelder, Segui & Dijkstra, 1990) and primed auditory lexical decision tasks (e.g., Chéreau, Gaskell, & Dumay, 2007). For instance, Jakimik, Cole and Rudnicky (1985) found that in a lexical decision task with spoken words, priming emerged only when primes and targets shared both orthographic and phonological codes, but not when they overlapped only phonologically, or only orthographically. Effects of orthography can also be demonstrated by varying the orthographic properties of spoken target words themselves. Ziegler and Ferrand (1998) found that the orthographic inconsistency (i.e., multiple ways to spell a pronunciation) of spoken English words affected lexical decisions; specifically, the responses were faster for the consistent words with a rhyme that can be spelt in only one way than for the inconsistent words with a rhyme that can be spelt in several ways. This finding has subsequently been replicated across different languages (English: Miller & Swick, 2003; Portuguese: Ventura, Morais, Pattamadilok, & Kolinsky, 2004; French: Pattamadilok, Morais, Ventura, & Kolinsky, 2007), and not only in normal readers, but also in alexic patients (e.g., Miller & Swick, 2003). Furthermore, orthographic neighborhood density (defined as the number of words that substitute a single letter within the word) also affect spoken word processing, with faster lexical decisions for words with many neighbors than with fewer neighbors (Ziegler, Muneaux, & Grainger, 2003), and the magnitude of the neighborhood effect being modulated by orthographic experience and proficiency. Finally, orthographic effects have emerged even in tasks that do not require a metaphonological analysis or lexical decision, such as when judging whether a spoken word belongs to a particular semantic category (e.g., Pattamadilok, Perre, Dufau, & Ziegler, 2009; see below for a more detailed outline), or when detecting a noise burst in a non-linguistic task (e.g., Perre, Bertrand, & Ziegler, 2011).

Orthographic effects in spoken word recognition can also be explored via electroencephalography (EEG), and it appears that such effects emerge quite rapidly, in a relatively early time window starting around 300 ms after stimulus onset (e.g., Pattamadilok, Perre, Dufau, & Ziegler, 2009; Pattamadilok, Morais, Colin, & Kolinsky, 2014). For instance, Pattamadilok et al. (2009) reported an EEG study in which participants pressed a key when a spoken word belonged to a semantic category, and withheld their response otherwise. On critical “no-go” trials, words were either orthographically consistent or inconsistent. ERP results revealed a consistency effect whose onset was time-locked to the position (first vs. second syllable) of the inconsistency in the spoken word. The time course of the consistency effect suggested that, in spoken word processing, orthography is activated at quite an early stage, rather than at a later decisional or postlexical stage. Overall, a large number of studies adopting various experimental manipulations and paradigms have reported an influence of orthographic information on spoken word recognition. However, it continues to be debated whether such effects are truly obligatory or rather reflect strategic adaptations to a particular task environment (Cutler & Davis, 2012), and whether they extent to conversational speech (Mitterer & Reinischb, 2015).

The studies reviewed above were conducted with speakers of languages with alphabetic orthographic systems. Because in alphabetic languages, spelling and sound systematically map onto each other, an orthographic influence on speech perception is perhaps not particularly surprising. By contrast, in languages with non-alphabetic scripts, the mapping between spelling and sound is largely arbitrary, and here it is less clear whether orthography is activated in spoken language processing. For instance, the Chinese writing system maps the basic units, written characters, onto spoken syllables. The majority (85 %) of the Chinese characters consist of a semantic radical that provides information about the meaning of a character, and a phonetic radical that represents probabilistic information about the pronunciation of the character (Zhu, 1988); however, the pronunciation of a character is oftentimes entirely arbitrary, with phonetic components providing a valid pronunciation in only 38 % of the characters in which they appear (Zhou, 1978). Hence, the relation between the orthography and phonology of Chinese words is very weak, and Chinese writing does not reflect segmental structure. Another feature of Chinese is the pervasive homophony of characters. Around 5000 commonly used words in modern-day Chinese map onto about 400 distinct syllables (disregarding tone; Language and Teaching Institute of Beijing Linguistic College, 1986). As a result, a single spoken syllable maps onto, on average, 11 characters. Therefore, a spoken word/syllable corresponding to a single character is typically ambiguous with regard to its meaning (especially for those characters with a large number of homophones), and contextual information (for spoken language) or orthography (for written text) is needed to resolve homophony and to identify the character’s meaning.

These cross-linguistic variations between languages with alphabetic and non-alphabetic orthographic systems raise the question of whether, in non-alphabetic languages, orthography is activated in the processing of spoken words to the same extent that this seems to be the case in alphabetic languages. It could be hypothesized that in non-alphabetic languages such as Chinese, the large dissociation between orthography and phonology renders it less likely that spoken word recognition entails cross-talk to orthographic properties. Conversely, one could argue that orthographic effects should be more pronounced in Chinese than in alphabetic languages, because here orthography helps to identify meaning and escape ambiguity caused by extensive homophony. A further motivation to investigate orthographic effects in non-alphabetic languages is that because orthographic and phonological codes are largely dissociated from one another, effects of spelling can be disentangled from effects of sound much better than in alphabetic languages. To the best of our knowledge, only two studies have investigated the role of orthography in spoken Chinese word recognition. Zou, Desroches, Liu et al. (2012) manipulated orthographic and phonological overlap between prime-target pairs in an auditory lexical decision task. Based on their finding that orthographic similarity modulated ERP amplitudes, Zou et al. (2012) argued that orthographic information is activated during Chinese spoken word recognition. However, a possible criticism of the lexical decision task for examining orthographic activation is that it is susceptible to strategic influences—a concern which has been voiced in previous studies conducted with alphabetic languages (e.g., Cutler, Treiman, & Van Ooijen, 1998; Pattamadilok, Perre, Dufau, & Ziegler, 2009). In metalinguistic judgments or lexical decision tasks, participants might strategically generate an orthographic image of the spoken word to facilitate phonological judgments or decisions on the lexical status of the word.

To rule out the possibility of strategies influencing the results, it is therefore advisable to further examine the issue in Chinese spoken word recognition with a task that does not involve explicit phonological analysis or lexical decisions. In a very recent study, Chen, Chao, Chang et al. (2016) investigated effects of orthographic consistency and homophone density on Chinese spoken word recognition via a task in which participants were asked to judge whether or not a spoken word represented an animal. Event-related potential (ERP) results showed that orthographic consistency, which was assumed to index orthographic variation at the radical level, modulated the amplitude of N400, whereas homophone density, taken to index orthographic variation at the character level, modulated the late positive component (LPC). Whereas ERP responses were modulated by both manipulations, reaction times were unaffected by orthographic consistency. However, the exact locus of such effects remains somewhat controversial. For example, homophone density was defined as the number of characters sharing the same sound. However, words with high homophone density not only activate multiple orthographic codes, but are also associated with multiple meanings. Therefore, the homophone density effect could be attributed to the competition between multiple orthographic, or, alternatively, between semantic, codes. In support of the latter possibility, Wang, Li, Ning, and Zhang (2012) recently examined, with EEG, neural correlates of the homophone density effect, and argued that homophone density reflects competition among a homophone’s multiple semantic meanings rather than among multiple spellings. If so, the homophone density effect would provide limited insights into orthographic access.

In the current study, we aimed to provide further evidence using a semantic relatedness judgment task performed on spoken word pairs to investigate whether orthographic information affects processing of spoken Chinese words. Semantic judgement tasks are commonly used in the literature on visual (rather than spoken) word processing (e.g., Van Orden, 1987; Tan & Perfetti, 1999), and they are generally assumed to be strategy-free because they require listeners to retrieve the meaning of spoken words, but do not involve the explicit analysis of form representations (see Pattamadilok et al., 2009, for a review). In the present study, we adopted the semantic judgment task to study the role of orthography in spoken word recognition. Participants were presented sequentially with two spoken words (“prime” and “target”), and judged whether or not the two words were semantically related. Pairs were either semantically related, orthographically related, or unrelated; hence, expected responses were “yes” for semantically related pairs, and “no” for orthographically related and unrelated pairs. We expected that responses should be faster for semantically related word pairs than unrelated ones, because negative responses typically involve more complex decisions than “yes” responses (e.g., Gomez, Ratcliff, & Perea, 2007; Wu & Thierry, 2010). The central issue was whether on semantically unrelated pairs, response latencies are modulated by orthographic relatedness. Activation of orthographic codes during spoken word processing should result in a conflict in making a semantic judgment on orthographically related, compared to unrelated, trials (i.e., in the former, orthographic overlap suggests a “yes” response whereas the correct response is “no”), and latencies on orthographically related trials should therefore be slower than latencies on unrelated trials. Such a finding would clearly support the claim that spoken-word recognition is constrained by orthography.

Method

Participants

Thirty native Chinese Mandarin speakers (16 females, mean age 22.6 years, ranging from 20 to 28 years old) participated and were compensated for their time. All participants reported normal hearing, normal or corrected-to-normal vision and no history of neurological or language problems.

Materials and design

Stimuli consisted of 105 disyllabic word quartets, each consisting of one target, and three prime words. In each set, the target was paired with a prime word, which was either (1) semantically related (but unrelated in word form; e.g., 枕头, pillow, /zhen3tou2/ - 被子, quilt, /bei4zi/), (2) orthographically related (but phonologically and semantically unrelated; prime shared one radicalFootnote 1 with the target, e.g., 破裂, break, /po4lie4/ - 被子, quilt, /bei4zi/), or (3) unrelated in phonology, orthography or meaning (e.g., 酸奶, yogurt, /suan1nai3/ - 被子, quilt, /bei4zi/). Note there was no phonological overlap between the primes and the target in any of the conditions. The full set of stimuli is available online at http://eyemind.psych.ac.cn/enpublication.html. Prime words were matched on the frequency of the first character, word frequency, stroke numbers of the first character, and stroke numbers of the whole word (F values < 1, P values > .299); the mean frequency of target words was 19.24 per million as determined by the Chinese Linguistic Data Consortium (2003) norms. From these materials, three lists were created such that each participant was presented with 35 trials in each of the three conditions, and no word was repeated in either of the lists. The three types of prime-target combinations were distributed across three blocks. Each participant was presented with three blocks of 35 trials within each block, for a total of 105 trials.

Stimuli were recorded by a female native speaker of Chinese, at a sampling rate of 44 kHz. The mean duration of primes was 812 ms (SD = 78) for semantic primes, 814 ms (SD = 88) for orthographic primes, and 804 ms (SD = 76) for unrelated primes. The mean duration of targets was 810 ms (SD = 74). There was no significant differences in duration across the three types of prime words (P values > .363).

Procedure

Stimuli were presented using E-Prime 1.1 software (Psychology Software Tools, Pittsburgh, PA). Participants were first instructed as to the nature of the task, and subsequently were presented with six practice trials, on three of which prime and target were semantically related. After the practice, three experimental blocks of 35 trials each were presented. The stimuli were presented in random sequence through headphones. There were short breaks between blocks, and the next block started after participants indicated that they were ready to continue. On each trial, participants were presented with a sequence consisting of a fixation cross (500 ms), a spoken prime word, ISI (100 ms) and a spoken target word. The intertrial interval was 1000 ms. The participants were asked to decide as quickly as possible whether or not the two words they heard were semantically related by pressing the key “f” if they were semantically related, and “j” otherwise. Response latencies were measured from the onset of the target word to the participants’ response. The experimental task session lasted approximately 20 min.

At the end of each experiment, the experimenter asked the participants whether they had noticed any associations between the two words they heard, other than the obvious semantic relatedness on some trials. None of the participants reported to have noticed orthographic overlap between words.

Results

Trials with incorrect responses (5.9 %) and trials with responses faster than 200 ms or slower than 2000 ms (2.1 %) were excluded from the response time analysis. As shown in Table 1, response latencies exhibited a substantial facilitatory effect of semantic relatedness compared to the two semantically unrelated conditions, and a substantial inhibitory effect of orthographic relatedness compared to the unrelated condition. Analyses of variance (ANOVAs) conducted on latencies, with the factor prime type as a within-participant and within-item variable, showed a significant effect, F 1(2, 58) = 38.27, P < .001; F 2(2, 208) = 43.34, P < .001. Planned comparisons revealed that responses to semantically related word pairs were significantly (102 ms) faster than to unrelated word pairs, t 1(29) = 3.13, P = .004; t 2(104) = 2.89, P = .005. Critically, responses to orthographically related pairs were significantly (42 ms) slower than to unrelated pairs, t 1(29) = –5.10, P < .001; t 2(104) = –6.30, P < .001.

Table 1 Examples and response latencies (in milliseconds; error percentages in brackets)

Parallel analyses conduced on the errors showed that prime type affected error rates significantly only by participants, F 1(2, 58) = 3.41, P = .040, but not by items, F 2(2, 208) = 1.37, P = .257. The effect on error rates was attributable to an increase (2.8 %) in error rates for semantically related pairs compared with the unrelated conditions which was significant only in the participant analysis, t 1(29) = 2.90, p = .007; t 2(104) = 1.53, P = .129. By contrast, error rates were unaffected by orthographic relatedness, t values < 1.08, P values > .291.

Discussion

Previous research has demonstrated that orthographic information plays a role in spoken word processing; however, virtually all such studies have been conducted in alphabetic languages. In contrast, the present study investigated potential orthographic effects in spoken word recognition in Chinese as a language with a deep orthography. Our results showed that when Chinese speakers made semantic relatedness judgment on word pairs, orthographic overlap between primes and targets induced an inhibitory effect on response latencies. This finding constitutes clear evidence that orthographic information affects spoken word recognition, even in a language in which the relationship between orthography and phonology is largely arbitrary. This inference hence converges with the one drawn by the only other study on the topic in non-alphabetic languages, reported by Zou et al. (2012).

To determine whether there was a difference in semantic relatedness between orthographically related word pairs and unrelated word pairs, we collected semantic rating scores on a seven-point Likert scale for all pairs of words from a group of 24 native Chinese participants (1 = “not related at all,” and 7 = “closely related”). The semantic rating scores were 5.23, 1.32, and 1.35 for semantically related, orthographically related, unrelated word pairs, respectively. The difference in semantic relatedness between semantically related word pairs and the other conditions was highly significant (P values < .001), but, critically, there was no difference between the orthographically related and the unrelated condition (P > .275). Thus, the stimuli were semantically well matched across orthographically related and unrelated word pairs, and it is unlikely that this factor contributed to the observed effect.

In the present study, we found that when the prime and target were unrelated in meaning but shared orthographic information, the effect on semantic judgment latencies was inhibitory. We speculate that the presence of orthographic overlap between word pairs erroneously favors a “yes” response, thus creating a conflict between incompatible responses that results in longer responses latencies. We note that similar inhibitory influences of involuntarily activated codes have been demonstrated in related fields. For instance, Thierry and Wu (2004) showed that when Chinese-English bilinguals performed semantic relatedness judgments on visually presented English word pairs, overlap in Chinese orthographic properties slowed down responses and elicited greater error rates. Colomé (2001) investigated whether bilinguals unconsciously activate their native language when operating in their second. Catalan-Spanish individuals performed a manual phoneme monitoring task on Catalan names of objects, and the critical trials were on “no” responses. On these trials, latencies to reject the target phoneme were longer when the target phoneme was present in the Spanish translation equivalent of the object name, then when it was not. Just as in our own experiment, a conflict on negative responses, induced via unconscious activation of a positive response, resulted in a delay in responses. In these and other cases, it seems that involuntary activation of a non-matching code results in a detrimental effect on performance. The fact that, in our study, participants accessed orthographic codes even though these codes interfered with performance on the task also suggests that orthographic codes are unlikely to be accessed due to participants’ conscious strategies, but rather as an automatic process as spoken words unfold. Another way to provide evidence for non-strategic access to orthographic information is to show that the orthographic effect did not increase over time. To this aim, we analyzed the orthographic relatedness effect across blocks, via repeated-measure ANOVAs conducted on RTs with orthographic relatedness (orthographically related vs. unrelated) and block (1–3). The results showed a significant main effect of orthographic relatedness, F values > 7.99, P values < .01, and a null effect of block, F values < 2.14, P values > .127. Critically, the interaction between both factors was not reliably significant, F1 = 2.80, P = .069; F2 = 1.99, P = .142, with the orthographic effect present in all three blocks and numerically strongest in the first block. Such a pattern is not compatible with a strategic account, and further suggests that orthographic access in our study was automatic.Footnote 2

The present study is interesting because it not only speaks to the general architecture of the spoken recognition system, but is also relevant to how Chinese spoken words are recognized. Chinese is generally characterized as being a logographic writing system with deep orthography, and the correspondence between orthography and phonology in Chinese is more arbitrary than in most alphabetic languages such as English. The assembled route from orthography to phonology, generally assumed for alphabetic languages (Paap & Noel 1991), is therefore unavailable in Chinese. For these reasons, it has long been suspected that visual recognition of Chinese characters may be unaffected by their sound properties. However, a growing number of studies suggests that this is not the case (e.g., Spinks, Liu, Perfetti, & Tan, 2000; Tan & Perfetti, 1998; Ziegler, Tan, Perry, & Montant, 2000), and phonology indeed affects Chinese visual word recognition. In the present study, we investigated the complementary issue, and demonstrated that orthography constrains Chinese spoken word recognition. As mentioned in the Introduction, spoken Chinese is characterized by pervasive homophony, and a single spoken syllable/character is oftentimes semantically ambiguous when presented in isolation. The ambiguity can be overcome either by context, or by accessing orthographic form. For this reason, orthographic information may be even more important in Chinese than in other languages. In other words, even though in the current study, orthographic access had a detrimental effect on performance in the primary task, it may well be helpful in language processing in more natural settings. Further investigations are required to directly compare orthographic effects between Chinese and alphabetic languages.