Semantic richness effects in lexical decision: The role of feedback

Yap, Melvin J.; Lim, Gail Y.; Pexman, Penny M.

doi:10.3758/s13421-015-0536-0

Semantic richness effects in lexical decision: The role of feedback

Published: 09 July 2015

Volume 43, pages 1148–1167, (2015)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Semantic richness effects in lexical decision: The role of feedback

Download PDF

Melvin J. Yap¹,
Gail Y. Lim¹ &
Penny M. Pexman²

5073 Accesses
29 Citations
2 Altmetric
Explore all metrics

Abstract

Across lexical processing tasks, it is well established that words with richer semantic representations are recognized faster. This suggests that the lexical system has access to meaning before a word is fully identified, and is consistent with a theoretical framework based on interactive and cascaded processing. Specifically, semantic richness effects are argued to be produced by feedback from semantic representations to lower-level representations. The present study explores the extent to which richness effects are mediated by feedback from lexical- to letter-level representations. In two lexical decision experiments, we examined the joint effects of stimulus quality and four semantic richness dimensions (imageability, number of features, semantic neighborhood density, semantic diversity). With the exception of semantic diversity, robust additive effects of stimulus quality and richness were observed for the targeted dimensions. Our results suggest that semantic feedback does not typically reach earlier levels of representation in lexical decision, and further reinforces the idea that task context modulates the processing dynamics of early word recognition processes.

Semantic Richness and Aging: The Effect of Number of Features in the Lexical Decision Task

Article 14 February 2015

What is semantic diversity and why does it facilitate visual word recognition?

Article Open access 14 July 2020

Word contexts enhance the neural representation of individual letters in early visual cortex

Article Open access 16 January 2020

Introduction

Across a number of lexical processing paradigms, including perceptual identification, lexical decision (i.e., classifying letter strings as words or nonwords such as flirp), speeded pronunciation (i.e., reading letter strings aloud), and semantic categorization (e.g., classifying words as animate or inanimate), it is well established that semantically rich words, which are associated with relatively more semantic information, are recognized faster (Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008; Yap, Pexman, Wellsby, Hargreaves, & Huff, 2012). Importantly, the richness of a word’s semantic representation is not a unitary construct and can be reflected by a number of dimensions, including the number of semantic features associated with its referent (McRae, Cree, Seidenberg, & McNorgan, 2005), its semantic neighborhood density (Shaoul & Westbury, 2010), its number of senses (Hoffman, Lambon Ralph, & Rogers, 2013; Miller, 1990), the number of distinct first associates elicited by the word in free association (Nelson, McEvoy, & Schreiber, 1998), imageability, the extent to which the word evokes mental imagery (Cortese & Fugett, 2004), body-object interaction, the extent to which a human body can interact with the word’s referent (Siakaluk, Pexman, Aguilera, Owen, & Sears, 2008), sensory experience ratings, the extent to which a word evokes a sensory or perceptual experience (Juhasz & Yap, 2013), and emotional valence (i.e., whether a word is positive, negative, or neutral; Yap & Seow, 2014).

These findings collectively converge on the idea that the lexical system has access to meaning before a word is fully identified (Balota, 1990). While the mere existence of meaning-based influences on visual word recognition is no longer contentious, the processes and mechanisms underlying these influences remain poorly understood (for reviews, see Balota, Ferraro, & Connor, 1991; Pexman, 2012). For example, the role of word meaning is minimal in theories of lexical access (Larsen, Mercer, Balota, & Strube, 2008), and this is reflected in how computational models of word recognition have generally not implemented semantics (but see Harm & Seidenberg, 2004, for a notable exception).

Richness effects through semantic feedback

An influential theoretical framework used to explain semantic richness effects is based on the interactive activation and competition (IAC) model of letter perception (McClelland & Rumelhart, 1981). The IAC model was originally proposed to explain the word superiority effect, which refers to the counterintuitive finding that letters are identified more accurately when embedded in words, compared to when presented in isolation (Reicher, 1969; Wheeler, 1970). As can be seen in Fig. 1, the IAC model has three levels of representation (features, letters, and words), and is both interactive (i.e., activation can flow bidirectionally between levels) and cascaded (i.e., as soon as processing at a level begins, it sends activation to the next level). Cascaded processing (McClelland, 1979) contrasts sharply with thresholded processing (Sternberg, 1969), in which a later process begins only after an earlier process is completed.

As nodes at the word level receive activation, they begin to provide feedback to position-specific letter nodes (e.g., ‹c› receives feedback activation from cat). In sum, the additional top-down influence of word- on letter-level representations accounts for the word superiority effect. Using an embellished interactive activation model which incorporates meaning-level representations (see Fig. 2), Balota (1990; see also Balota et al., 1991) suggested that semantic influences on word recognition can be similarly accommodated by feedback from semantic-level to lexical-level (i.e., word-level) representations. Specifically, semantically richer words (e.g., high-imageability words or words with many semantic features) generate more semantic-level activity, resulting in stronger feedback to lexical-level units. If we assume that lexical decision and speeded pronunciation responses are respectively driven by lexical-level orthographic and phonological activity, the semantic feedback received by lexical-level units will consequently speed up lexical decision and pronunciation times (Hino & Lupker, 1996; Pexman, Lupker, & Hino, 2002). Feedback activation from phonological to orthographic representations has also been invoked as an explanation for the homophone effect, which refers to the finding that words like maid and made (i.e., words with multiple spellings but a common pronunciation) produce longer lexical decision latencies than control words (Pexman, Lupker, & Jared, 2001; Rubenstein, Lewis, & Rubenstein, 1971). According to Pexman et al. (2001), presenting a homophone (e.g., maid) activates its phonology (i.e., /meɪd/) which, via feedback, activates the homophone’s mate made. Competition between the two orthographic representations (i.e., maid and made) delays responses to homophones.

While the feedback activation account is predicated on the idea that lexical-level activity drives responses on word recognition tasks, there exist competing theoretical accounts which can accommodate semantic richness effects in lexical decision without requiring semantics-to-orthography feedback. For example, according to Borowsky and Besner’s (1993) multistage activation model, lexical decisions are primarily based on activity within the semantic system; such a framework yields semantic effects without feedback. However, Pexman and Lupker (1999) have argued that certain empirical findings are difficult to reconcile with this perspective. Specifically, if we assume that lexical decisions are driven by semantic-level activity, it is unclear how a common process can simultaneously explain effects of homophony (i.e., slower responses for homophones) and number of senses (i.e., faster responses for words with many senses) in lexical decision. For example, suppose the delayed responses for homophones (e.g., maid) are due to their activating multiple semantic representations (i.e., those for made and maid) which subsequently compete with each other, thereby prolonging semantic settling times. If this view is correct, then words with many senses (e.g., bank), which map onto multiple semantic representations, should also elicit slower responses. However, when the effects of number of senses and homophony were examined simultaneously within the same lexical decision experiment, response times (RTs) were slower for homophones but faster for words with many senses (Pexman & Lupker, 1999). The feedback account explains these findings in a principled and unified manner. Specifically, feedback from phonological to orthographic representations underlies the homophone effect, while feedback from semantic to orthographic representations underlies the number of senses effect.

The present study

In summary, the available evidence is consistent with the idea that feedback activation between different levels of representation in the lexical system is necessary for accommodating both semantic richness and homophone effects in the word recognition literature. While researchers have explored feedback from semantic- to lexical-level representations (Pexman et al., 2002), and from phonological to orthographic representations (Pexman et al., 2001), the role of word-to-letter feedback has received less attention. As described earlier, the classic explanation for the word superiority effect is based on the top-down influence of word- on letter-level representations (McClelland & Rumelhart, 1981). As a result, the architectural assumption of word-to-letter feedback is a fundamental aspect of influential word recognition models, including the dual-route cascaded (DRC) model (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001), the multiple read-out model (Grainger & Jacobs, 1996), the bimodal interactive activation framework (Grainger, Muneaux, Farioli, & Ziegler, 2005), and the CDP+ and CDP++ models (Perry, Ziegler, & Zorzi, 2007; Perry, Ziegler, & Zorzi, 2010).

More pertinently, the interaction between semantic priming and target degradation has been explained using semantic feedback to letter-level representations by way of lexical-level representations. For example, in lexical decision, words are recognized more quickly when preceded by a semantically related word (e.g., doctor – NURSE) than by an unrelated control (e.g., porter – NURSE); this is known as the semantic priming effect. A robust finding in the semantic priming literature is that semantic priming effects are larger when targets are visually degraded, compared to when they are presented clearly (Balota, Yap, Cortese, & Watson, 2008; Meyer, Schvaneveldt, & Ruddy, 1975). Using an interactive activation framework (Stolz & Besner, 1996; 1998) much like the one depicted in Fig. 2, McNamara (2005) suggested that this interaction arises because the presentation of a prime word (e.g., doctor) activates the semantic representations of related concepts (e.g., nurse, medicine, sick), and these related concepts, through feedback pathways, will then preactivate their respective lexical- and letter-level representations (see also Brown, Stolz, & Besner, 2006). As a consequence of this compensatory feedback, targets preceded by related, compared to unrelated, primes will be disrupted to a lesser extent by visual degradation, thereby yielding the overadditive priming × stimulus quality interaction.

Despite the pervasiveness of the assumption that meaning-level information reaches the letter level, this assumption has not, to our knowledge, been empirically tested. In two experiments, we explore the role of word-to-letter feedback in mediating semantic richness effects, by studying the joint effects of stimulus quality (clear vs. degraded) with four theoretically important richness dimensions (E1: imageability & number of features; E2: semantic neighborhood density & number of senses). Assuming that semantic richness effects reflect partially activated letter-level representations, the predictions are straightforward. Specifically, in addition to the main effects of stimulus quality and richness, one should observe an overadditive interaction wherein the effects of stimulus degradation are smaller for words which are semantically richer.

In order to characterize observed effects in a more fine-grained manner, the data are examined both at the level of mean RTs and at the level of RT distributional characteristics. Analyzing the influence of factors on mean RTs alone has been shown to be inadequate and indeed sometimes misleading (see Balota & Yap, 2011, for a review). For example, Heathcote, Popiel, and Mewhort (1991) examined color-naming RTs to congruent (e.g., RED displayed in red) and neutral (e.g., XXX displayed in red) Stroop stimuli, and found no difference in mean RTs. However, when they analyzed the effect of variables on different portions of the RT distributions, they found a facilitatory effect of congruency (i.e., congruent faster than neutral) on the modal portion of the RT distribution but an inhibitory effect (i.e., congruent slower than neutral) in the slow tail of the distribution. These opposing effects cancelled each other out, thereby producing a spurious null effect in means.

In the present study, empirical RT distributions are fitted to the theoretical ex-Gaussian function, which is a convolution of a normal and exponential distribution. This yields three parameter estimates: μ and σ (mean and standard deviation of the normal distribution) and τ (mean of the exponential distribution). Ex-Gaussian analysis allows us to evaluate the extent to which an effect is reflected by distributional shifting (μ) and/or an increase in the tail of the distribution (τ). These analyses are complemented by quantile plots, which provide a graphic representation of distributional effects. These distributional analyses will fulfill two important objectives. First, our results will help shed more light on the impact of semantic richness on RT distributions. For example, Yap and Seow (2014) reported that emotional valence effects in lexical decision (i.e., slower responses to neutral words, relative to positive and negative words) reflected both distributional shifting and an increase in the tail of the distribution. These results are difficult to reconcile with the view that valence effects in lexical decision are fully attributable to early, preconscious processes (cf. Kousta, Vinson, & Vigliocco, 2009); relatively automatic effects (e.g., masked repetition or semantic priming) are typically mediated exclusively by distributional shifting (Balota et al., 2008; Gomez, Perea, & Ratcliff, 2013). Instead, the findings are more consistent with the idea that positive and negative words, which are semantically richer, elicit stronger semantic feedback to word-level representations, thereby making lexical decision less attentionally demanding (Balota & Chumbley, 1984) for such words. It is unclear if other semantic richness effects (e.g., imageability, number of features, semantic neighborhood density, number of senses) are similarly mediated by distributional shifting and changes in the slow tail.

More importantly, there is compelling evidence that semantic richness effects do not tap a single undifferentiated dimension, but instead reflect distinct theoretical frameworks (Pexman, Siakaluk, & Yap, 2014). Consistent with this, intriguing between-task dissociations have been reported in the literature. For example, semantic neighborhood density facilitates lexical decision performance, but has no effect on semantic classification performance (Yap et al., 2012). Likewise, while words with more senses (i.e., more ambiguous) enjoy a processing advantage in lexical decision, the effect of ambiguity is less clear in tasks which place an emphasis on semantic activation, such as semantic categorization or semantic relatedness (i.e., are these two words related?). Specifically, there is in some cases an ambiguity disadvantage in semantic relatedness (Hoffman & Woollams, 2015; Pexman, Hino, & Lupker, 2004; Piercey & Joordens, 2000) while ambiguity effects are either inhibitory or null in semantic categorization (Hino, Lupker, & Pexman, 2002). By ascertaining how stimulus quality and semantic variables modulate the shape, rather than just the mean, of distributions, one may find dissociations that are apparent only at the level of distributional characteristics.

Experiment 1

Method

Participants

Forty undergraduates (31 females) from the National University of Singapore participated for partial course credit. The participants’ first language was English, and they had normal or corrected-to-normal vision.

Design

Two 2 × 2 designs were incorporated within the same experiment, with non-overlapping items used to examine the effects of each variable. Specifically, we examined Stimulus Quality (clear or degraded) × Imageability (high or low) and Stimulus Quality × Number of Features (high or low). All variables were manipulated within-participants and the dependent variables were RTs and accuracy rates.

Stimuli

A total of 240 words (see Appendix for a full list of stimuli) were selected, with 120 words (60 high and 60 low) each for imageability and number of features. Imageability ratings were based on the norms collected by Cortese and Fugett (2004) and Schock, Cortese, and Khanna (2012). Number of feature values were taken from McRae et al. (2005). Word sets in each of the experimental conditions were matched on number of letters, number of syllables, orthographic neighborhood size, log-transformed subtitle-based contextual diversity (Brysbaert & New, 2009), and relevant semantic variables (see Table 1 for descriptive statistics). In addition, 240 nonwords (120 for each semantic richness dimension) were generated using the multilingual pseudoword generator, Wuggy (Keuleers & Brysbaert, 2010). These nonwords were matched to their yoked controls on number of syllables and number of letters, as well as subsyllabic structure and transition frequencies.

Table 1 Descriptive statistics for the word and nonword stimuli used in Experiment 1

Full size table

Procedure

PC-compatible computers running E-prime software (Schneider, Eschman, & Zuccolotto, 2001) were used for stimulus presentation and data collection. Participants were individually tested in sound-attenuated cubicles, and positioned approximately 60 cm from the computer screen. Participants were instructed to decide whether the letter string presented formed a word or nonword by making the appropriate button press (slash key for words and Z key for nonwords). Participants were encouraged to respond quickly but not at the expense of accuracy. There were 20 practice trials, followed by six experimental blocks of 80 trials each, with breaks between blocks. The order in which stimuli were presented was randomized anew for each participant. Stimuli were presented in uppercase 14-point Courier New, and each trial comprised the following order of events: (a) a fixation point (+) at the center of the monitor for 400 ms, (b) a blank screen for 400 ms, and (c) the target. The target remained on the screen for 4,000 ms or until a response was made. If a response was incorrect, a 170-ms tone was presented simultaneously with the word “Incorrect” displayed slightly below the fixation point for 450 ms. Half the targets were degraded by rapidly alternating letter strings with a randomly generated mask of the same length. For example, the mask @$#&% was presented for 14 ms, followed by a five-letter target word for 28 ms; the two rapidly alternated until a response was detected. Mask patterns were consistent within a trial, and were generated from random permutations of the following symbols: &@?!$*%#?. Across participants, targets were counterbalanced across degraded and clear conditions. This degradation method has been used in a number of studies (Balota et al., 2008; Thomas, Neely, & O’Connor, 2012; Yap & Balota, 2007; Yap, Tse, & Balota, 2009) and has been shown to yield qualitatively similar effects to contrast reduction (O’Malley, Reynolds, & Besner, 2007).

Results and discussion

Response errors (8.3 % across all conditions) were first excluded from the analyses. Responses faster than 200 ms or slower than 3,000 ms were then eliminated before a mean and standard deviation (SD) was computed for each participant. RTs beyond 2.5 SDs from each participant’s mean were excluded, removing a further 2.6 % of the responses. Estimates for ex-Gaussian parameters (μ, σ, τ) were obtained using the quantile maximum likelihood estimation (QMLE) procedure in the QMPE program (Version 2.18; Cousineau, Brown, & Heathcote, 2004). QMLE has the benefit of providing unbiased parameter estimates and is particularly effective when fitting small samples (Heathcote & Brown, 2004). All fits converged successfully within 400 iterations. The mean RTs, accuracy rates, and ex-Gaussian parameters are presented in Table 2. All effects were analyzed with two-way ANOVAs.

Table 2 Mean response times (RTs) and accuracy rates as a function of imageability/number of features and stimulus quality

Full size table

Imageability

For RTs, the main effect of Imageability was significant by participants, F _p(1, 39) = 25.89, p < .001, MSE = 1015.34, η _p ² = .40, but not by items, p = .14; RTs were faster for high-imageability words (M = 596 ms) than for low-imageability words (M = 621 ms). The main effect of Stimulus Quality was significant by participants, F _p(1, 39) = 119.96, p < .001, MSE = 2178.28, η _p ² = .75, and by items, F _i(1, 118) = 194.87, p < .001, MSE = 2116.70, η _p ² = .62; RTs were faster for clear words (M = 568 ms) than for degraded words (M = 649 ms). The Stimulus Quality × Imageability interaction was not significant by participants or by items, Fs < 1. In order to establish the robustness of the non-significant by-participants interaction in RTs (see Gomez & Perea, 2014), we used the package BayesFactor (Morey, Rouder, & Jamil, 2015) to compute the Bayes factor (BFs) for the various alternative hypotheses in our design (see Rouder, Morey, Speckman, & Province, 2012) against the null hypothesis that there are no differences across conditions. For example, a BF of 10 means that there is 10:1 evidence in favor of the specific alternative hypothesis being tested. The additive (i.e., two main effects) model was preferred over all other models, BF = 3.97 × 10²³, compared to the model with the interaction, BF = 8.77 × 10²². Put another way, the data were 4.53 (i.e., 3.97 × 10²³ / 8.77 × 10²²) times more likely to occur under the additive model, compared to the interactive model. Turning to accuracy rates, the main effect of Imageability was not significant by participants or by items, Fs < 1. The main effect of Stimulus Quality was significant by participants, F _p(1, 39) = 11.94, p = .001, MSE = .002, η _p ² = .23, and by items, F _i(1, 118) = 8.42, p = .004, MSE = .005, η _p ² = .07; accuracy rates were higher for clear words (M = .93) than for degraded words (M = .90). The Stimulus Quality × Imageability interaction was not significant by participants or by items, Fs < 1.

We now turn to the ex-Gaussian parameters. For μ, the main effect of Imageability was significant, F _p(1, 39) = 8.71, p = .005, MSE = 1056.95, η _p ² = .18; μ was greater for low-imageability words (M = 488 ms) than for high-imageability words (M = 473 ms). The main effect of Stimulus Quality was significant, F _p(1, 39) = 54.00, p < .001, MSE = 1626.22, η _p ² = .58; μ was greater for degraded words (M = 504 ms) than for clear words (M = 457 ms). The Stimulus Quality × Imageability interaction was not significant, F < 1. For σ, none of the effects were significant. Finally, for τ, the main effects of Imageability, F _p(1, 39) = 3.46, p = .071, MSE = 2540.14, η _p ² = .08, and Stimulus Quality, F _p(1, 39) = 10.01, p = .003, MSE = 4603.63, η _p ² = .20, were significant or approached significance; τ was greater for less imageable words (M = 137 ms) than for more imageable words (M = 122 ms), and τ was greater for degraded words (M = 147 ms) than for clear words (M = 113 ms). The Stimulus Quality × Imageability interaction was not significant, F < 1.

To illustrate these effects graphically, the mean quantiles (.1, .3, .5, .7, .9) for the different experimental conditions are plotted on Fig. 3. Theoretical quantiles are calculated by line search along the numerically approximated cumulative density function (see Cousineau et al., 2004, for more information). In the top two panels of the figure, the empirical quantiles are represented by data points and error bars, while the theoretical quantiles for the best-fitting ex-Gaussian distribution are represented by lines. The bottom panel of the figure represents imageability effects as a function of stimulus quality. In general, the empirical data were well-captured by the ex-Gaussian parameters; empirical and theoretical quantiles did not diverge by more than one standard error.

Number of features

The main effect of Number of Features was significant by participants, F _p(1, 39) = 32.72, p < .001, MSE = 1080.01, η _p ² = .46, and by items, F _i(1, 118) = 10.06, p = .002, MSE = 6212.87, η _p ² = .08; RTs were faster for words with more features (M = 575 ms) than for words with fewer features (M = 605 ms). The main effect of Stimulus Quality was significant by participants, F _p(1, 39) = 161.88, p < .001, MSE = 1650.19, η _p ² = .81, and by items, F _i(1, 118) = 189.99, p < .001, MSE = 2142.63, η _p ² = .62; RTs were faster for clear words (M = 549 ms) than for degraded words (M = 631 ms). The Stimulus Quality × Number of Features interaction was not significant by participants or by items, Fs < 1. For the by-participants data, the additive model was preferred over all other models; the data were 4.28 times more likely to occur under the additive model, BF = 4.83 × 10²⁷, compared to the model with the interaction, BF = 1.13 × 10²⁷. Turning to accuracy rates, the main effect of Number of Features was significant by participants, F _p(1, 39) = 27.77, p < .001, MSE = .001, η _p ² = .42, and by items, F _i(1, 118) = 6.23, p = .014, MSE = .008, η _p ² = .05; accuracy rates were higher for words with more features (M = .96) than for words with fewer features (M = .93). The main effect of Stimulus Quality was significant by participants, F _p(1, 39) = 7.70, p = .008, MSE = .001, η _p ² = .16, and by items, F _i(1, 118) = 5.34, p = .023, MSE = .003, η _p ² = .04; accuracy rates were higher for clear words (M = .96) than for degraded words (M = .94). The Stimulus Quality × Number of Features interaction approached significance by participants, p = .06, and was significant by items, F _i(1, 118) = 4.30, p = .04, MSE = .003, η _p ² = .04; the degradation effect was larger for words with fewer features than for words with more features.^{Footnote 1}

Turning to the ex-Gaussian parameters, for μ, the main effect of Number of Features was significant, F _p(1, 39) = 5.70, p = .022, MSE = 1107.18, η _p ² = .13; μ was greater for words with fewer features (M = 477 ms) than for words with more features (M = 464 ms). The main effect of Stimulus Quality was significant, F _p(1, 39) = 67.58, p < .001, MSE = 1271.07, η _p ² = .63; μ was greater for degraded words (M = 494 ms) than for clear words (M = 447 ms). The Stimulus Quality × Number of Features interaction was not significant, F < 1. For σ, none of the effects were significant, ps > .21. Finally, for τ, both the main effects of Number of Features, F _p(1, 39) = 6.07, p = .018, MSE = 1906.65, η _p ² = .13, and Stimulus Quality, F _p(1, 39) = 20.45, p < .001, MSE = 2491.02, η _p ² = .34, were significant. τ was greater for words with fewer features (M = 129 ms) than for words with more features (M = 112 ms), and τ was greater for degraded words (M = 138 ms) than for clear words (M = 102 ms). The Stimulus Quality × Number of Features interaction was not significant, F < 1. These effects are graphically represented in Fig. 4.

Summary

In Experiment 1, reliable additive effects of stimulus quality and semantic richness were observed on RTs. That is, responses were faster for clear words and for semantically richer words, whether semantic richness reflected imageability or number of features, but there was no hint of an interaction for either dimension. The RT distributional analyses further revealed that the effects of imageability and number of features were mediated by a combination of distributional shifting and an increase in the tail of the distribution, replicating the pattern observed by Yap and Seow (2014) for emotional valence. Importantly, the interaction between stimulus quality and semantic richness was not significant for any ex-Gaussian parameter for both imageability and number of features, confirming that the mean-level additive effects generalize to the distributional characteristics. Given the theoretical importance of this pattern, Experiment 2 was designed to establish if these results were replicable when one examines two additional semantic richness dimensions, semantic neighborhood density and number of senses.