Introduction

Spoken word recognition is typically framed in terms of two processes: the activation of the word-form and the subsequent access of its meaning. However, the timing of these processes is uncertain. They could occur simultaneously, with activation of meaning building simultaneously with word-form activation, or sequentially, with activation of meaning beginning only after word-form activation reaches a threshold. The current study contrasts the former continuous cascade approach with the latter modular approach.

Early theories of word recognition viewed lexical and semantic access as autonomous processes, with activation of the word-form completing before accessing meaning (Forster, 1981; Tanenhaus, Carlson, & Seidenberg 1985). This grew out of theories of modularity, suggesting that language operates as a distinct module from other cognitive systems and that the levels of language are informationally encapsulated. In this view, word recognition is distinct from and precedes the access of meaning.

Marslen-Wilson’s (1990) cross-modal priming work provides evidence against encapsulation. He showed that words with phonological cohort competitors exhibited less semantic priming than words without competitors. These findings suggest that the degree of semantic activation depends on the degree of lexical activation of the target (which is reduced by phonological competitors). More broadly, this predicts that the number of competitors determines the strength of semantic access: words from denser phonological neighborhoods will more poorly activate semantic representations (Gaskell & Marslen-Wilson, 1997, 1999).

Zwitserlood (1989), using a gating paradigm, found that semantic associates of the words “captain” and “captive” were simultaneously activated after hearing /kæpt/, even when sentence context strongly favored one completion. This finding suggests parallel access of semantic interpretations during lexical access (Gaskell & Marslen-Wilson, 1997, 1999). Listeners do not need to hear the entire word (or even up to the uniqueness point) to begin accessing meaning, as strict autonomous perspectives predict. Instead, the meanings of both words are activated prior to disambiguation.

These studies imply that lexical competition need not be resolved before accessing meaning, and variations in activation due to competition cascade to semantic activation. While interactive activation models of spoken word recognition (e.g., TRACE, McClelland & Elman, 1986) do not attempt to model semantics, the framework underlying these models suggests cascading activation. In this view, semantic activation is a continuous function of the degree of lexical activation. All words access their semantic networks as soon as lexical access begins, with more active word forms yielding greater semantic activation. Thus, words with greater competition (e.g., words in dense neighborhoods) will show decreased word-form activation, and thereby decreased semantic activation and decreased priming, compared to words with less competition (e.g., words in sparse neighborhoods).

The distributed cohort model (DCM: Gaskell & Marslen-Wilson, 1997, 1999) captures these notions explicitly. This model uses activation across distributed semantic and phonological units to signal lexical access rather than activation of localist lexical nodes. In this model, words with greater phonological competition do not activate their semantic representations as effectively because the overlapping word-forms result in overlapping semantic representations. Thus, both word-form and semantic activation (which are not separate in the DCM) should decrease with more active cohorts, producing continuous differences in semantic activation between words in dense and sparse neighborhoods.

As a result of these empirical findings and the theoretical approaches they led to, the autonomous view has been largely abandoned in word recognition, and even many of the authors of studies favoring autonomous processing have since advanced interactive theories (e.g., Joanisse & Seidenberg, 1998; Tanenhaus, Spivey-Knowlton, Eberhart, & Sedivy, 1995). However, autonomous theories are still influential in other areas of language, like speech production (Levelt, Roelofs, & Meyer, 1999). Moreover, while the Marslen-Wilson (1990) and Zwitserlood (1989) findings are suggestive of continuous cascade models, such effects can be explained without continuous interaction between lexical access and semantics. Zwitserlood’s (1989) gating task used word fragments that stopped before enough information was heard to identify the target. This cessation of auditory information could cause an encapsulated word-form recognition system to pass on whatever information it currently has. What appears to be semantic activation during the early portions of the word (e.g., capt in captain) could be an artifact of unnaturally truncated word segments.

Similarly, Marslen-Wilson’s (1990) results could be accounted for by an autonomous model if all words activate their semantic networks with equal strength, but this activation is delayed until activation of the prime word reaches some threshold. While the magnitude of priming would be the same for all words, the delay in when the target word with a competitor reaches its threshold would result in later onset for semantic priming. Because reaction time in semantic priming tasks measures the time taken to reach the end-state of processing, not the time course of processing, this possibility cannot be ruled out.

What is needed is a way to measure the ongoing amount of semantic priming. We employed the visual world paradigm (VWP) to measure semantic priming as a function of phonological neighborhood density (Tanenhaus et al., 1995). This paradigm offers a continuous estimate of the degree of lexical activation and has shown effects of phonological competition (Allopenna, Magnuson, & Tanenhaus, 1998; Magnuson, Dixon, Tanenhaus, & Aslin, 2007; Magnuson, Tanenhaus, Aslin, & Dahan, 2003) and semantic priming (Huettig & Altmann, 2005; Yee & Sedivy, 2006; Yee, Blumstein, & Sedivy, 2008). Thus, the VWP should reveal whether such differences are continuous, reflecting dynamic interaction of lexical and semantic processing; whether the magnitude of semantic activation varies as a function of lexical density; and whether semantic activation is delayed by phonological competition.

Considering the time course of word-form recognition, a number of studies have examined the effect of phonological density on lexical access in the VWP (Magnuson et al., 2003, 2007; Sweeney, Blumstein, & Apfelbaum, 2009). They find that low density (LD) words are fixated earlier than high density (HD) words and often exhibit more fixations than HD words (Fig. 1). This suggests that decreased phonological competition leads to faster activation growth and increased total activation. These findings give rise to multiple predictions for the time course of semantic priming. Based on Marslen-Wilson’s (1990) finding of decreased priming for words sharing onsets, the patterns of semantic activation should differ between density conditions. Thus, some difference in semantic priming as a function of phonological competition is likely.

Fig. 1
figure 1

Time course of target fixation for words low density and high density words

Yet importantly, how these conditions differ depends on the link between word recognition and semantic access (see Fig. 2). An autonomous account of the Marslen-Wilson (1990) results predicts that semantic activation is delayed until a threshold of word-form activation has been reached, whereas the magnitude of semantic activation increases identically for all words after this point. Figure 2a presents such a modular pattern of results – HD items access their semantic network later than LD items because of the delay in reaching threshold (as shown in Fig. 1); however, both sets of words ultimately show similar priming.

Fig. 2
figure 2

a The predicted magnitude of priming given a hypothesis in which competition delays semantic access, but does not affect magnitude of activation. b The predicted magnitude of priming if semantic access is continuously dependent on degree of lexical activation. c The predicted magnitude of priming if semantic access is both delayed and dependent on degree of lexical activation. d The predicted magnitude of priming if semantic activation always begins at the same time, but peak magnitude of activation is dependent on degree of lexical activation

Alternatively, Fig. 2b presents a continuous cascade account in which semantic access begins at the onset of lexical activation, but the degree of semantic activation is continuously dependent on lexical activation levels. LD words are activated more quickly and ultimately reach higher levels of activation, which has two consequences for semantic activation. First, since semantic activation is continuously coupled to levels of lexical activation, semantic activation will grow more quickly for words in LD neighborhoods. Second, this semantic activation should reach a higher peak level of activation.

Figure 2c combines the two approaches, with the timing of semantic activation derived from the modular model and the ultimate peak magnitude of activation derived from a cascade model. Here, semantic access does not begin until a recognition threshold is reached, but thereafter, the magnitude of semantic access depends on the continuous level of lexical activation of the number of words in the neighborhood. As a result, HD words begin semantic access later than LD words and show a decreased peak magnitude of semantic access.

Finally, Fig. 2d presents a model in which the timing of semantic access is not affected by activation at the lexical level, but the peak activation is. In this model, semantic activation begins after a constant delay across words, regardless of their lexical access profiles, and it increases with the same speed. However, the peak magnitude of semantic activation is dependent on degree of lexical access. This is subtly different from the prediction in 2b, in which the timing of semantic activation is affected by word-form competition. In this hypothesis, activation of the semantic networks of LD words reaches a higher peak; however, the speed at which semantic access occurs is independent of phonological competition. While both this model and 2b predict a greater peak magnitude for LD words, the primary difference is in whether the speed at which activation grows is affected by competition. While this hypothesis is logically possible, it lacks strong theoretical support as there are no approaches to lexical access suggesting a fixed delay before semantic networks are accessed.

Each of these accounts predicts slower reaction times for HD words in semantic priming lexical decision tasks (e.g., Marslen-Wilson, 1990) as each possesses periods when activation for the semantically related items in LD conditions exceeds that in HD conditions. The VWP can help disentangle these predictions by measuring semantic priming over time.

Methods

Design

Neighborhood density (Luce & Pisoni, 1998) was used to manipulate phonological competition. Target stimuli were 72 monosyllabic words, divided evenly into two non-overlapping sets of lexical density distributions. Frequency-weighted neighborhood density was measured from phonological transcriptions of the Kucera and Francis (1967) database (the sum of the log frequencies of all neighbors formed by a one-phoneme change; Luce & Pisoni, 1998). LD words had a density between 0 and 42 (M = 19.47), and HD words were between 58 and 197 (M = 97.12). The distributions also differed significantly by raw number of neighbors (t(36) = 1.99, p < .0001). LD words had between 1 and 30 neighbors (only one word had more than 19 neighbors; M = 9.25), and HD words had between 13 and 35 neighbors (M = 23.97). The words did not differ on average frequency (LD = 29.83, HD = 38.11; t < 1).

For each target item, a semantically related word was selected based on association strength values of the South Florida Free Association Norms (Nelson, McEvoy, & Schreiber, 1998). There was no difference in association strength between the semantically related and target words between density conditions (t < 1), nor was there a difference in the frequency (t(70) = 1.52; p = .13) or number of syllables of the semantically related word (t < 1). An additional 144 items were chosen as unrelated fillers. Filler items were semantically and phonologically unrelated to the target and semantically related stimuli.

Previous work suggests that conceptually similar words show greater priming than associatively related items (Huettig & Altmann, 2005). It was important to ensure that our effects could not result from differences in type of semantic relations between words across the HD and LD conditions, as it is possible that priming of associatively related words and conceptually related words may unfold via different time courses. Thus, we analyzed the similarity of the words using the WordNet::Similarity (Pederson, Patwardhan, & Michelizzi, 2004) software, which measures similarity based on the number of edges between senses of words (Fellbaum, 1998). There was no difference in semantic similarity between the semantically related words in the HD and LD conditions (t < 1).

Two types of control trials were randomly interwoven with experimental trials: 36 trials contained pairs of semantically related items, but with neither as the target, and an additional 36 did not contain any semantically-related items.

Stimuli

Visual stimuli were color illustrations of objects taken from a commercial Clipart database. Images were initially selected by committee to be prototypical representations of their lexical targets. All images of targets and semantic associates were then presented to 10 naïve participants, who identified these items in a free response task. Only images appropriately named by seven of ten subjects were included.

A male native speaker of American English recorded the auditory stimuli. Words did not differ in overall length between density conditions (LD = 372 ms; HD = 360 ms; t(40) < 1).

Subjects

Sixteen participants from the University of Iowa participated in exchange for partial course credit. None had participated in the free response picture norming test. All were native speakers of American English, had no history of speech or neurological disorders, and reported normal vision and hearing.

Procedure

Participants were tested in a single 1-h session. Eye movements were tracked using a head-mounted SMI EyeLink II. At the beginning of the experiment, the eye-tracker was calibrated using the standard nine-point calibration procedure. Drift correction was performed every 36 trials to account for shifts in head position or tracker movement. Fixations were monitored at 250 Hz and automatically parsed into saccades, fixations, and blinks using the default Eyelink parameters. Adjacent saccades and fixations were combined into a “look,” starting at saccade onset, and ending at the fixation offset (McMurray, Tanenhaus, & Aslin, 2002).

Each trial presented four images, one in each corner of the screen, and a blue fixation dot at the center of the screen. This dot turned red after 500 ms, at which point the participant clicked it to initiate the trial. Once clicked, the dot disappeared, and an auditory stimulus identified the target. Participants then clicked the named picture.

Critical trials contained a target (e.g., ‘horse’), an item semantically related to this target (e.g., ‘saddle’), and two unrelated items (see Appendix). Placement of the target, competitor, and unrelated foils on the screen was randomized across trials. Unrelated items were counterbalanced between subjects, such that unrelated items in high density trials for one half of subjects were then used as low density filler items for the other half of subjects. Thus, effects of density could not arise from differential salience of unrelated items in one condition.

Participants completed four practice trials prior to the experiment. The items in the practice trials were all semantically and phonologically unrelated.

Results

One subject received only 139 trials due to a computer error. All others completed 144 trials. Trials in which the subject did not select the target were excluded. Across all subjects, this totaled 33 trials (of 1,149 experimental trials, 2.9%).

The proportion of trials in which each picture type (target, semantically related, and unrelated) was fixated was computed every 4 ms for each density condition. Since it takes approximately 200 ms to program a saccade (Matin, Shao, & Boff, 1993), and the auditory target was preceded by 100 ms of silence, only fixations initiated 300 ms after stimulus onset were examined. Analysis ended 1,000 ms after trial onset (average length of the auditory stimuli was 529 ms). This was selected because looks to the target declined after this window, indicating completed lexical access.

Throughout the time course of processing, semantically related words were fixated more than unrelated words for both density conditions (see Fig. 3). There were also fewer looks to the semantically related item in the HD than LD condition, suggesting a reduction in the magnitude of semantic priming in this condition.

Fig. 3
figure 3

The proportion of trials on which subjects were fixating the target, semantically related item and unrelated pictures as a function of time. a Trials in which the stimulus was a low density word. b High density trials

To examine the time course of semantic priming, the magnitude of priming (MoP) was computed by subtracting the proportion of looks to the average of the two unrelated items from the proportion of looks to the semantically related item. This parallels the MoP measure used in traditional semantic priming paradigms. Logistic functions were fit to MoP as a function of time (see McMurray, Samelson, Lee, & Tomblin, 2010). Logistic functions are defined by four values: the minimum, maximum (upper asymptote of MoP), slope (how rapidly the function approaches its asymptote), and crossover point (the midpoint of the rising portion of the curve – this gives a measure of the timing of MoP). In terms of the hypotheses outlined in Fig. 2, the factors of primary interest are the maximum, which will reveal if the MoP differs between conditions (predicted by cascade accounts); the crossover, which will reveal if activation of the semantic associate is delayed for one condition (predicted by autonomous accounts); and the slope, which will reveal if one condition more rapidly approaches its peak MoP.

Slope differences may be difficult to disentangle as a shallower slope can arise from averaging across subjects with steep but variable boundaries, or from a genuinely shallower slope. Moreover, our analysis relied on a jackknife procedure (see below) that makes differentiating these sources difficult. However, the predictions of the non-autonomous accounts (Fig. 2b, d) predict a lower maximum MoP for the HD condition; the autonomous accounts (Fig. 2a, c) predict a later crossover point for HD items, as they predict that phonological competition delays semantic access. Thus, these two factors alone may be sufficient.

Because each subject completed very few trials per condition (36), it was difficult to obtain accurate fits for individual subjects.Footnote 1 Thus, we adapted the jackknife method of Miller, Paterson, and Ulrich (1998; see McMurray, Clayards, Tanenhaus, & Aslin, 2008, for a VWP application) by first computing the average MoP in the entire dataset, excluding a single subject. We then fit the logistic function to this data and extracted the slope, crossover and asymptotes. This was repeated, excluding each subject in turn. The slope, crossover, and asymptotes can be compared across conditions using standard T statistics, with conservative adjusted error terms.

Figure 4 shows the MoP data plotted along with the mean logistic fits. Curve fits were excellent (mean R 2 = .97; range: .93 to .99). Both conditions showed significant priming, as indicated by a maximum MoP greater than 0 (HD: t1(15) = 5.4, p < .01; t2(35) = 20.1, < .01; LD: t1(15) = 7.2, p < .01; t2(35) = 31.2, p < .01). However, the LD condition attained a higher overall magnitude of priming than the HD condition across time (t1(15) = 4.3, p < .01; t2(70) = 2.9, p < .01). The crossover point did not differ significantly (t1 < 1, t2 < 1), though there was a slight trend toward later activation for the high density items (HD: M = 510, SD = 125, LD: M = 479, SD = 28). Similarly, the slope did not differ significantly (t1(15) = 1.08, p = .30; t2(70) = 1.29, p = .20), though differences trended in the direction predicted by the continuous cascade account, with steeper activation increases in the LD condition (HD: M = .00022, SD = .00033; LD: M = .00081, SD = .00036). Overall, the primary effect of neighborhood density was in the magnitude of semantic priming, whereas the timing and rate of semantic activation was less affected.

Fig. 4
figure 4

Time course of the magnitude of the semantic priming effect in each density condition. Mean logistic functions are overlaid on the raw data

Discussion

This study demonstrated that the size or density of a word’s phonological neighborhood affects the magnitude of semantic access. HD target stimuli elicited fewer looks to pictures that were semantically related than did LD stimuli. The timing of these looks did not differ between conditions, yet LD words showed a greater peak level of activation. Thus, phonological competition affects the strength of activation of semantically related items across time during word recognition. These findings support earlier work (Marslen-Wilson, 1990; Zwitserlood, 1989) and provide evidence for cascading activation between phonological and semantic processes during lexical access.

Considering our results in light of the predictions presented in Fig. 2, at least two can be ruled out. Our data cannot be accounted for by autonomous accounts predicting that phonological competition delays the onset of semantic activation (Fig. 2a), but does not affect the magnitude, as the peak MoP differed, whereas the timing did not. The combination of delay in onset of activation and decreased peak activation (Fig. 2c) also fails to account for the similar timing of activation between conditions.

This leaves the accounts predicting similar timing across conditions, but different maximum levels of priming (Fig. 2b and d). The non-significant slope difference seems to support the predictions of Fig. 2d that semantic access always begins at the same time, whereas the peak MoP reflects the degree of lexical activation. However, this account may be untenable since it predicts that the onset of priming is delayed a fixed amount for all words, and the most reasonable time point for such a delay would either be at the end of the word or at the uniqueness point. However, as seen in Fig. 4, semantic priming begins very quickly after the onset of the word, well before the entire word can be processed (given the 200-ms oculomotor delay) or the uniqueness point is reached. The pattern of findings shown in Fig. 4 further suggests that rather than a fixed delay, semantic access appears to begin almost simultaneously with lexical access. Additionally, is it not clear why in the account of Fig. 2d, the ultimate MoP is a function of phonological competition, but the timing of semantic access is entirely unrelated to phonological competition No extant theoretical accounts predict this pattern of findings for lexical access as a function of lexical competition. The lack of a sound theoretical backing for this account suggests that the nonsignificant slope effect may be an issue of statistical power rather than an argument against the cascade approach (as previously discussed, the jackknife analysis may have obscured true slope effects). Given the theoretical backing for the hypothesis of activation continuously cascading from lexical to semantic levels and the trend toward different slopes between the conditions, this hypothesis as exemplified in Fig. 2b appears the most tenable; even if the slope effect proves illusory, some form of cascading activation appears necessary to explain the peak MoP results.

These results challenge autonomous models in which gradations in activation affect only the current level of processing and have limited influence on later stages (e.g., Levelt et al., 1999). They are consistent with interactive activation approaches to word recognition (McClelland & Elman, 1986) in which activation levels continuously influence other stages of processing. They are especially consistent with the predicted interaction between activation levels and semantic access in DCM (Gaskell & Marslen-Wilson, 1997, 1999). Differences in the degree of phonological competition affect the activation of a target item and have a cascading effect on semantic activation, with words in dense neighborhoods showing decreased semantic priming.

However, while our results are largely consistent with DCM, one area shows discrepancy. DCM employs distributed lexical/semantic representations that create limitations on the number of candidates that can be effectively evaluated. As such, increasing the number of coactive candidates weakens semantic activation, and with a large enough cohort, semantic activation disappears. When lexical information is ambiguous between two words, activation across these nodes is averaged between the words’ semantic representations. If many lexical candidates with different semantic representations are active, the activation in semantic space becomes uninformative due to this averaging. While the current study showed decreased priming for HD words, in accordance with the predicted decrease in activation in DCM, even our HD items yielded significant semantic priming. This suggests that either the ceiling for competition is beyond the mean of 24 competitors in the HD condition, or the mechanisms of competition in DCM are not appropriately structured to explain semantic activation in dense neighborhoods. A sparser (or even localist) representation may be required.

In sum, this study demonstrates that phonological competition has cascading effects on semantic access, such that words with high degrees of phonological competition show decreased access to their semantic networks compared to words from sparser neighborhoods. These findings add to the body of evidence arguing for real-time interactions between lexical processes and other domains of cognition and perception (e.g., Dahan & Tanenhaus, 2004; Levy, Bicknell, Slattery, & Rayner, 2009; Magnuson, McMurray, Tanenhaus, & Aslin, 2003; Revill, Tanenhaus, & Aslin 2008).