Introduction

During language comprehension, readers and listeners do not attend equally to all aspects of sentential input. Rather, they devote their attentional resources to the most relevant and important elements. Generally, emotional information obtains prioritized processing due to its intrinsic significance; for example, potentially threatening or rewarding stimuli are biologically relevant to species survival (Lang, Bradley, & Cuthbert, 1997). Aside from the intrinsic salience of the information, various linguistic devices can be used to highlight certain information in the language input. These devices constitute the markers of information structure (IS), which refers to the way in which elements of sentences are packed (Halliday, 1967; Jackendoff, 2002). Although it has been well established that both emotional salience and IS modulate attentional resources, to date no studies have investigated their interaction during language comprehension. Before illustrating why an interaction is plausible, we will briefly summarize studies concerning their roles in language processing.

Emotional affect has been conceptualized along two dimensions: valence and arousal (Russell, 1980). While valence describes the extent of pleasure or sadness, arousal illustrates the extent of calmness or excitation. These two dimensions are often correlated in real-world experience. For instance, valenced stimuli tend to be high arousal, and stimuli with higher intensity tend to amplify valence. Here, we focused on the role of emotional salience, which involves both valence and arousal during language comprehension. Due to their high temporal resolution, event-related potentials (ERPs) are an excellent tool for examining the stages of language processing that are modulated by different variables. So far, most ERP studies on emotion and language have employed isolated words as stimuli. Several ERP components have been associated with the processing of emotional words (for a review, see Citron, 2012). First, larger P1 and N1 amplitudes have been reported for emotional words than for neutral words. Following the early P1 and N1 components, a P2 difference between words with different valence values (larger P2 for emotional or positive words than for neutral words) has also been reported (Herbert, Kissler, Junghöfer, Peyk, & Rockstroh, 2006; Kanske & Kotz, 2007; Ortigue et al., 2004; Schapkin, Gusev, & Kuhl, 2000). Such early effects suggest that the emotional features of words can be identified very rapidly. Another ERP component that relates to emotion processing is an early posterior negativity (EPN), peaking between 200 and 300 ms, with an occipito-temporal scalp distribution. Most studies found larger EPN amplitudes for emotional than for neutral words (for a review, see Kissler, Assadollahi, & Herbert, 2006). This component has been associated with effortless initial stages of attention orientation, driven by the high arousal level of emotional information. The last frequently reported component that is modulated by emotional words is a late positive complex (LPC), peaking between 500 and 800 ms, with a centro-parietal distribution. Its amplitude has been found to vary across emotional salience (for a review, see Citron, 2012). The LPC presumably reflects a less automatic evaluation of the emotional valence. The variability in the ERP effects can be accounted for by the ways of presenting stimuli (subliminal vs. supraliminal; lateralized vs. central presentation), the control of other linguistic aspects (such as word length, frequency, and concreteness), the task at hand (lexical decision, grammatical judgment, semantic priming, or emotional Stroop task), and the mental state (health vs. high anxiety) of subjects.

As has been said, all of the studies above used single words as stimulus materials. However, semantic meaning in language is usually conveyed by sentences or texts. To date, only a few ERP studies have investigated how emotional words are integrated into the sentence context. A well-established ERP component, the N400, is often taken as a signature of semantic integration of words into context (Kutas & Federmeier, 2011). It is therefore also of interest for studying the integration of emotional words into their sentence context. The N400 is a negative-going shift, peaking around 400 ms after stimulus onset, with a centro-parietal maximum. It has been related to the semantic expectation for the incoming words or the ease of integration of words into the preceding sentence context, with more expected or more easily integrated words eliciting smaller N400 amplitudes (Kutas & Federmeier, 2011). It has also been associated with attention allocation, with more attended words eliciting a larger N400 (Holt, Lynn, & Kuperberg, 2009; Li, Hagoort, & Yang, 2008).

In a neutral sentence context, Holt, Lynn, and Kuperberg (2009) found larger N400s for both positive and negative words, as compared with neutral words, indicating distinct neural representations between emotional words and neutral context or additional attention directed to emotional words for detailed semantic analysis. In contrast, De Pascalis, Arwari, D'Antuono, and Cacace (2009) reported larger N400 amplitudes for negative words relative to neutral and positive words, which might indicate facilitated semantic integration of positive words into the sentence context. However, no N400 effect between negative and neutral words was revealed in the study of Bayer, Sommer, and Schacht (2010). In addition, larger LPC amplitudes were generated by the negative words (Bayer et al., 2010; Holt et al. 2009), which might reflect a reevaluation of the negative words. Surprisingly, none of these studies found any of the early emotional effects that have been prominent in single word processing. In contrast, León, Díaz, de Vega, and Hernández (2010) manipulated emotional consistency: Emotional words were either consistent or inconsistent with preceding emotional episodes. They found larger N1/P2 and larger N400 for the inconsistent emotional words than for the consistent emotional words, indicating early ERP responses to the emotional meaning in sentence context. Furthermore, Leuthold, Filik, Murphy, and Mackenzie (2011), as well as Moreno and Vázquez (2011), reported a larger N400 and a subsequent post-N400 frontal positivity for the inconsistent emotional words. Overall, the ERP studies on the processing of emotional words present a complicated picture.

It has been proposed that the amygdala plays an important role in processing visual emotion stimuli through a top-down modulation on the visual cortex (Vuilleumier, 2005). This was supported by the findings that amygdala activity correlates with enhanced responses to emotional stimuli in the visual cortex (Anderson & Phelps, 2001; Isenberg et al., 1999; Morris et al., 1998; Rudrauf et al., 2008). In particular, an fMRI study showed that amygdala lesions can diminish the enhanced visual activations for emotional stimuli and that the severity of amygdala lesions is inversely correlated with the activation of the visual cortex (Vuilleumier, Richardson, Armony, Driver, & Dolan, 2004). Moreover, Rotshtein et al. (2010) identified that amygdala damage has a crucial impact on an early P1 component (~100–150 ms). The amygdala–cortical connection is reminiscent of the visual attention mechanism through which a frontal-parietal attention network modulates neural activities in the visual cortex (Foxe & Simpson, 2002; Kastner & Ungerleider, 2000).

The early ERP effects to emotional words indicate increased processing for these words, as compared with neutral words. In addition to emotional stimuli, IS is known to modulate language processing (Wang, Bastiaansen, Yang, & Hagoort, 2011, 2012; Wang, Hagoort, & Yang, 2009). IS refers to the way of information packaging in relation to its relevance in a given situation (Jackendoff, 2002). For example, in the question–answer pair “What kind of evaluation did the principal give to the teacher? The principal gave a general evaluation to the teacher,” general is the requested information in the question context, and thus it is the focus (in italic) of the answer sentence. Aside from question context, several other approaches can be used to mark information as the focus, such as syntactic constructions like it–cleft sentences (“It was a general evaluation that the principal gave to the teacher”) and accentuation in spoken language (“The principal gave a GENERAL evaluation to the teacher.” The accented word is marked in capitals). It has been suggested that linguistically focused elements receive more attention and are processed more elaborately than nonfocused elements (Birch & Rayner, 1997; Cutler & Fodor, 1979). So far, ERP studies on IS have mostly investigated brain responses to violations of IS markings, such as the mismatch between contextually marked focus and syntactically marked focus (Bornkessel, Schlesewsky, & Friederici, 2003; Cowles, Kluender, Kutas, & Polinsky, 2007; Stolterfoht, Friederici, Alter, & Steube, 2007), the mismatch between contextually marked focus and pitch accent (Hruska & Alter, 2004; Ito & Garnsey, 2004; Johnson, Breen, Clifton, & Morris, 2003; Li et al., 2008; Magne et al., 2005; Schumacher & Baumann, 2010; Toepel, Pannekamp & Alter, 2007). These studies suggest that comprehenders actively and rapidly make use of IS cues during online sentence processing. In order to further investigate the role of IS during language processing, we compared the N400 and P600 effects in response to semantic and syntactic anomalies between focus and nonfocus conditions (Wang et al., 2011, 2012; Wang et al., 2009). We found larger ERP effects for focused than for nonfocused conditions, suggesting that focused information was processed more elaborately as a result of more processing resources. Evidence that IS modulations are based on the operation of a domain-general attention mechanism has been found in a recent fMRI study. Here, we observed that IS activates a frontal-parietal attention network, with larger activations for focused than for nonfocused information (Kristensen, Wang, Petersson, & Hagoort, 2012).

Altogether, both emotional salience and IS modulate the depth of language processing, thereby further affecting the semantic integration of the words into context. Although no one has directly investigated the interaction between these two variables, existing studies seem to imply bidirectional influences between them. Some studies concern the automaticity of emotional meaning processing, involving both the early attention orientation to emotional stimuli and the late reevaluation of emotional valence. Inconsistent findings were reported on the automaticity of the early attention orientation, as indicated by the presence (Frühholz, Jellinghaus, & Herrmann, 2011; Hinojosa, Méndez-Bértolo, & Pozo, 2010) or the absence (Bernat, Bunce, & Shevrin, 2001; Kissler, Herbert, Winkler, & Junghofer, 2009) of task modulations (deep semantic analysis vs. shallow structure processing) on early ERP responses, whereas the later reevaluation of emotional valence was generally found to require explicit attention to the semantic feature of the words (Fischler & Bradley, 2006; Hinojosa et al., 2010; Kissler et al., 2009; Schacht & Sommer, 2009). Therefore, it remains an open question to what extent the processing of emotional words requires additional processing resources, especially when they need to be integrated into sentence context. Since IS can be used to direct attention toward certain information, it provides us with a useful tool to further study the automaticity of emotional information during extraction and interpretation of meanings. On the other hand, it has been shown that the modulation of IS on language comprehension relies on the saliency of processed information, as reflected by similar P600 effects between focused and nonfocused information in response to syntactic violations when the violations were very salient (Wang et al., 2012). An interesting question is whether emotional saliency interacts with IS during language comprehension. Since the neural pathways of emotional salience and IS in modulating language processing appear to differ from each other (amygdala vs. frontal-parietal network), one may wonder whether any direct interactions between emotional salience and IS could be observed—whether the amygdala activated by emotional words interacts with the frontal-parietal network triggered by IS.

In the present study, using the ERP technique, our aim was to examine how IS and emotional salience together modulate the integration of emotional words into their sentence context. More specifically, at which moment does IS interact (if at all) with the emotional salience of the critical words during sentence processing? In our study, we manipulated IS by using question–answer pairs, such that a critical word in the answer sentence was either in focus or in nonfocus position. In addition, the emotional salience of the focused or nonfocused constituent was manipulated, such that the critical word had a negative, neutral, or positive valence. We compared the ERP responses to the critical words in the different conditions. Mixed findings have been reported regarding the early effect (such as P1, N1, P2, and EPN), so it remained an open question whether the early effects for the emotional words would be found in the present study and whether they would be modulated by IS. If emotional words capture attention automatically, an early ERP effect might be observed. Furthermore, if this automatic attention captured by emotional words triggers the attentional network independently of that triggered by IS, no interaction would be expected between IS and emotional salience in the early ERP response. As for the later ERP components, it has been shown that IS modulates the semantic integration of neutral words into context, as indicated by a larger N400 for nonfocused than for focused information when both of them are congruent with the context (Wang et al., 2009). However, it has also been shown that the IS modulation depends on the information salience (Wang et al., 2012). Given the functional significance and the saliency of emotional words, our prediction was that the IS modulation on the semantic integration (i.e., the N400 effect) would depend on the emotional salience of the words. More specifically, we expected that the integration of neutral words would be more difficult for the nonfocus condition with limited processing resources, resulting in a larger N400 for the nonfocused than for the focused neutral words. In contrast, emotional words attract processing resources regardless of their information status (being focus or nonfocus), so processing resources are fully available for emotional words even when they are in nonfocus position. Therefore, we expected similar N400s between focused and nonfocused emotional words.

Method

Subjects

Twenty-nine university students (mean age 20 years, range 18–26; 5 males) served as paid volunteers. They were all right-handed native speakers of Dutch with normal or corrected-to-normal vision. None of them had dyslexia or any neurological impairment. They signed a written consent form according to the declaration of Helsinki. Their levels of trait anxiety were measured by the Spielberger State and Trait Anxiety Assessment Inventory (STAI; Spielberger, Gorsuch, Lushene, Vagg, & Jacobs, 1983), while their empathy was assessed by the empathy questionnaire (Davis, 1983). They filled in the questionnaires before the actual ERP experiment. The data of 5 subjects (all were females) were excluded because of poor signal-to-noise ratio. The final set of subjects therefore consisted of 24 subjects (mean age 20 years, range 18–24; 5 males).

Stimulus

Stimulus construction

We selected 1188 adjectives from the Dutch CELEX corpus (Baayen, Piepenbrock, & van Rijn, 1993). These adjectives were assigned to 12 lists (99 words per list) and were rated by 132 subjects in an online manner (11 subjects for each list, mean age 22.6 years, 107 females, 114 right-handers). In order to have a better assessment of the emotional salience of these words, the valence, arousal, concreteness, imageability, and dominance (i.e., controllability of words, ranging from submissive to dominant) of these words were rated on 9-point Likert scales (9 indicates the most positive, most arousing, most concrete, most imaginable, and most dominant). During the rating, the raters rated each word on all scales before moving to the next word. Then the rating score of each word was calculated by averaging the ratings from 11 subjects in each of the five domains. Therefore, each word had five scores, which were measures of valence, arousal, concreteness, imageability, and dominance. On the basis of the valence rating scores, words with a score smaller than 3.8 were taken as negative words (N = 321), words with a score larger than 6.4 were defined as positive words (N = 309), and the rest of the words were taken as neutral words (N = 258).

Then we pooled sets of three words on the basis of the following criteria. First, the three words in each set should belong to different valence categories (negative, neutral, and positive, respectively). Second, the valence difference between neutral and positive words should not differ largely from that between negative and neutral words (the difference of the difference in rating scores is equal or smaller than 1). Third, the words should be matched on length and frequency. Fourth, it should be possible to put these three words in the same sentence context, which comprises an object constituent with the adjectives (critical words, CWs) serving as modifiers (see the answer sentences in Table 1 for examples). In the end, we constructed 235 triplets of sentences, with the triplets of each sentence differing only in the CWs. The number of words in the resulting sentences was between 6 and 12. There were always at least two words before and after the CWs in each sentence.

Table 1 Examples of one item

Finally, question contexts were constructed for all the sentences. In each question–answer pair, the question established a context that projected a focus position in the answer sentence. In the what-kind-of question context, the CW was placed in focus position, whereas in the who question context, the CW was placed in nonfocus position (see the questions in Table 1 for examples).

Stimuli pretests

In order to match the cloze probability of the CWs in their question–answer pair contexts across the three conditions, we asked 10 subjects to complete the question–answer pairs presented up until the CW. Note that we tested the question–answer pairs only in the focus condition (what-kind-of question context), since it was very likely that the cloze scores of the CWs in the nonfocus condition would be zero, because no one would introduce a new adjective in the nonfocus position (Jackendoff, 2002). The CWs in the 235 triplets of sentences showed equally low cloze probability, F(2, 18) = 1.22, p = .32, \( \eta_p^2=.12 \), (mean ± SD = 6.0% ± 3.6%, 8.5% ± 3.5%, and 6.4% ± 5.0%, respectively, for the negative, neutral, and positive words).

In order to ensure that all the question–answer pairs were plausible and that the valence of the sentences had no interaction with the valence of the CWs, another 30 subjects were instructed to rate the plausibility and the valence of the question–answer pairs on a 9-point Likert scales (9 indicates the most plausible and the most positive). On the basis of the plausibility scores, we selected 228 triplets of sentences that were highly plausible.

Table 2 presents the results of the measurements on both CWs and full question–answer pairs of the 228 triplets. The three types of CWs differed significantly with respect to the mean affective valence (negative < neutral < positive), as well as the arousal level (positive > neutral, negative > neutral). The valence differences between negative and neutral words were not significantly different from the valence differences between positive and neutral words, t(227) = 1.048, = .296. We also tried to match the CWs in different conditions on concreteness, imageability, dominance, length, and frequency. However, the neutral words showed slightly lower imageability than did the emotional words; the dominance also differed among the three conditions: positive > negative > neutral. Besides, positive words were slightly longer than neutral and negative words. Since the emotional words were usually more dominant and more imageable than the neutral words, we treated these differences as covariates of emotional salience. In addition, given the subtle difference in word length (less than one character), we did not think that it would greatly affect the ERP responses. All the CWs were relatively unpredictable in the question–answer pairs (less than 10%), but all the question–answer pairs were rated as plausible (above 6 on a 9-point scale).

Table 2 Results of pretest ratings for the critical words and sentences

Overall, two factors were independently manipulated: context (focus, nonfocus) and emotional salience (neutral, positive, negative), which created six conditions: focus/neutral, focus/positive, focus/negative, nonfocus/neutral, nonfocus/positive, and nonfocus/negative. There were 228 experimental items (i.e., question–answer pairs), with each item consisting of six conditions. The six conditions of each experimental item were assigned to six lists using a Latin square design. Consequently, no subject encountered (different conditions of) the same item more than once. Each of the six lists consisted of 228 items—that is, 38 items in each condition. In the whole set of stimuli, no critical words were repeated. In order to cover up the experimental manipulations, we also constructed 60 question–answer pairs as filler items. Among these filler items, 30 items contained an adjective modifying the subjects in the answer sentences, while the other 30 items contained two adjectives modifying both the subjects and the objects. In these fillers, each answer sentence was preceded by a question either asking about the objects or inquiring about the subjects. These fillers were assigned to two lists in a similar way as the experimental stimuli. In the end, the 6 experimental lists were combined with the 2 filler lists, resulting in 12 lists. Each list contained 288 question–answer pairs (228 experimental items and 60 filler items).

Procedure

Subjects were seated in a comfortable chair in front of a computer screen at approximately 80-cm distance. The stimuli were presented in white color on a black background, with a font size of 27 for the whole questions and of 30 for the words in answers. A trial started with a fixation cross (duration 3,000 ms) in the center of the screen, followed by a question that was presented as a whole sentence for 2,500 ms. After a 500-ms black screen, the answer was presented word by word. Each word appeared for 300 ms, with an interstimulus interval of 300 ms. The last word ended with a period. After 300 ms of the presentation of the last word, the next trial began. Subjects were told not to move or blink when individual words appeared, but they were encouraged to blink during the presentation of the fixation cross. There was no additional task other than to read for comprehension.

Subjects read 288 question–answer pairs in a pseudorandom order. No more than three items of the same condition were presented in succession. The 288 items in one list were divided into 12 blocks (24 trials per block), with each block lasting about 5 min. In between each block, there was a small break, after which subjects could start the next block by pressing a button. The whole experiment took about 2 h, including subject preparation, instructions, and a short practice consisting of 12 items.

Electroencephalogram (EEG) recording and preprocessing

The EEG was recorded in an electromagnetically shielded cabin, with 60 surface active electrodes (Acticap, Brain Products, Herrsching, Germany) placed in an equidistant montage. The left mastoid electrode served as the reference, and a forehead electrode served as the ground. Vertical and horizontal eye movements were monitored by three electrodes placed in the cap and one electrode placed below left eye. All electrode impedances were kept below 20 KΩ during the experiment, which is well below what is recommended for active electrodes. EEG data were digitized at a rate of 500 Hz, with a 100-Hz high cutoff filter and a 0.016-Hz low cutoff filter (half-power cutoffs).

Brain Vision Analyzer software 1.05 (Brain Products) was used to preprocess the raw EEG data. The EEG data were rereferenced offline to the average of both mastoids, and band-pass filtered at 0.5–30 Hz (48-dB/oct slope, half-power cutoff). Then the data were segmented from 150 ms before to 1,200 ms after the onset of the critical words, with baseline correction from 150 to 0 ms preceding word onset. After that, a semiautomatic artifact rejection procedure was applied. On average, 4% of all trials were rejected, with rejections being equally distributed across the six conditions. Finally, trials were averaged in each condition for each subject, and this average was used for further statistical analysis.

ERP data analysis

All analyses were conducted on the mean amplitude values computed for each subject and each condition, within each of the selected time windows (see the Results section). The selected electrodes are indicated in Fig. 1, with the midline and lateral electrodes being subjected to separate repeated measures ANOVAs. For the midline electrodes, anteriority (anterior, central, posterior), context (focus, nonfocus), and emotional salience (negative, neutral, positive) were taken as within-subjects factors. For the lateral electrodes, hemisphere (left, right) was an additional within-subjects factor. Overall, ANOVAs were followed up with simple effects ANOVAs when there was any interaction with our critical manipulations (emotional salience and context). When the degree of freedom in the numerator was larger than one, Greenhouse–Geisser correction was applied. In these cases, we report the original degrees of freedom with corrected p values.

Fig. 1
figure 1

Electrode layout on the scalp. Nine representative electrodes (indicated by numbers) were chosen for displaying grand averaged waveforms in the remainder of the figures. The electrodes selected for statistical analysis are grouped into six regions for the lateral electrodes: left-anterior, left-central, left-posterior, right-anterior, right-central, and right-posterior. For the midline electrodes, three regions were defined: midline–anterior, midline-central, and midline-posterior

Results

The grand average waveforms elicited by different conditions at nine representative electrodes (45/43/41, 59/30/28, 13/11/9, encircled in Fig. 1) are presented in Fig. 2 (collapsed across focus and nonfocus conditions) and Fig. 3 (collapsed across the three emotional salience conditions), showing the emotional salience and context effect, respectively. In view of the effects, four time windows were selected for the statistical analysis: (1) an early negativity between 90 and 200 ms; (2) the P2 time window, 200–300 ms; (3) the standard N400 in the time window of 300–500 ms; (4) a late positivity between 500 and 700 ms. Only the (marginally) significant effects containing the critical manipulations (emotional salience and context) are reported.

Fig. 2
figure 2

Emotional valence effects. a Grand averaged waveforms evoked by the critical words as a function of emotional salience at nine representative electrodes. The black lines represent the neutral conditions, while the color lines stand for the emotional conditions, with the green and red lines representing the negative and positive conditions, respectively. Waveforms are time-locked to the onset of the critical words. Note that the waveforms were smoothed using a 10-Hz low-pass filter for illustrative purposes only. b, c, d Topographies showing the average voltage differences for the different contrasts, for the indicated time intervals

Fig. 3
figure 3

Context effects. a Grand averaged waveforms evoked by the critical words as a function of context at nine representative electrodes. The thick lines represent the focus conditions, while the thin lines represent the nonfocus conditions. Waveforms are time-locked to the onset of the critical words. Note that the waveforms were smoothed using a 10-Hz low-pass filter for illustrative purposes only. b Topographies showing the average voltage differences for the contrast of focus versus nonfocus, for the indicated time intervals

The early negative effect between 90 and 200 ms

The neutral words elicited smaller early negativities than did the emotional words for both the lateral electrodes [main effect of emotional salience, F(2, 46) = 5.38, p = .013, \( \eta_p^2=.19 \); pair-wise contrasts: negative vs. neutral, t(23) = −2.495, p = .02; positive vs. neutral, t(23) = −3.146, p = .005; negative vs. positive: t(23) = −0.437, p = .666] and the midline electrodes [main effect of emotional salience: F(2, 46) = 4.51, p = .025, \( \eta_p^2=.16 \); pair-wise contrasts: negative vs. neutral, t(23) = −2.135, p = .044; positive vs. neutral, t(23) = −2.964, p = .007; negative vs. positive, t(23) = −.079, p = .938]. In addition, we observed (marginally) significant interactions between context and anteriority, F(2, 46) = 5.60, p = .019, \( \eta_p^2 =.20 \), and F(2, 46) = 3.036, p = .080, \( \eta_p^2=.12 \), respectively, for the lateral and midline electrodes. Further simple effect tests revealed that the nonfocused information elicited a larger negativity than did the focused information in the posterior region, F(1, 23) = 5.11, p = .034, \( \eta_p^2 =.18 \), and F(1, 23) = 3.60, p = .07, \( \eta_p^2=.14 \), respectively, for the lateral and midline posterior electrodes.

Overall, the emotional words yielded a larger negativity than did the neutral words over most scalp regions (as shown in the scalp topographies in Fig. 2b, c), while the nonfocused information produced larger negativities than did the focused information in the posterior regions (including the midline electrodes, as shown in the scalp topography in Fig. 3b).

The P2 effect in the time window of 200–300 ms

The statistical analysis of the P2 component revealed a significant interaction between emotional salience and anteriority for the lateral electrodes, F(4, 92) = 4.09, p = .018, \( \eta_p^2=.15 \). Although visual inspection showed smaller P2s for the negative and neutral words than for the positive words in the posterior region, further simple effect tests revealed that this effect was significant only for the negative words [F(2, 46) = 3.83, p = .031, \( \eta_p^2=.14 \); pair-wise contrasts: negative vs. neutral, t(23) = −2.012, p = .056; negative vs. positive: t(23) = −2.893, p = .008; positive vs. neutral: t(23) = 0.678, p = .504]. In addition, there were interactions between context and anteriority for both the lateral, F(2, 46) = 6.84, p = .011, \( \eta_p^2= .23 \), and the midline, F(2, 46) = 3.80, p = .05, \( \eta_p^2 = .14 \), electrodes, with a larger P2 for the nonfocused information than for the focused information in the lateral anterior region, F(1, 23) = 5.03, p = .035, \( \eta_p^2= .18 \), and in the midline anterior, F(1, 23) = 5.19, p = .032, \( \eta_p^2= .18 \), and central, F(1, 23) = 5.60, p = .027, \( \eta_p^2 = .20 \), regions.

Overall, the negative words yielded smaller P2 amplitudes than did the positive words in the bilateral posterior region (as shown in the scalp topographies in Fig. 2b, d), and the nonfocused information elicited a larger P2 than did the focused information in the anterior region, as well as the anterior–central midline region (as shown in the scalp topography in Fig. 3b).

The N400 effect in the time window of 300–500 ms

Although visual inspection showed the largest N400 for the negative words and the smallest N400 for the positive words, statistical analysis revealed a significant N400 effect only for the positive words, both for the lateral electrodes [main effect of emotional salience, F(2, 46) = 3.71, p = .034, \( \eta_p^2 =.14 \); pair-wise contrasts: negative vs. positive, t(23) = −2.819, p = .01; neutral vs. positive, t(23) = −2.034, p = .054; negative vs. neutral: t(23) = −0.542, p = .593] and for the midline electrodes [main effect of emotional salience, F(2, 46) = 3.43, p = .043, \( \eta_p^2=.13 \); pair-wise contrasts, negative vs. positive, t(23) = −2.763, p = .011; neutral vs. positive, t(23) = −1.929, p = .066; negative vs. neutral, t (23) = −0.499, p = .622]. We also found a marginally significant interaction between emotional salience and anteriority for the lateral electrodes, F(2, 46) = 2.60, p = .08, \( \eta_p^2 = .10 \), indicating that the smaller N400s for the positive words were most prominent over the central, F(2, 46) = 3.36, p = .046, \( \eta_p^2=.13 \), and posterior, F(2, 46) = 7.19, p = .002, \( \eta_p^2=.24 \), regions. These results are also indicated in the scalp topographies in Fig. 2b, d.

In addition, there were significant interactions between context and anteriority for both the lateral and midline electrodes, F(2, 46) = 14.80, p < .001, \( \eta_p^2=.39 \); F(2, 46) = 11.54, p < .001, \( \eta_p^2=.33 \). Further simple effect tests revealed that the focused information elicited a larger N400 than did the nonfocused information over anterior and central regions for both the lateral region [F(1, 23) = 10.89, p = .003, \( \eta_p^2=.32 \), and F(1, 23) = 5.48, p = .028, \( \eta_p^2=.19 \), respectively, for the anterior and central regions] and the midline region [F(1, 23) = 10.10, p = .004, \( \eta_p^2=.31 \), and F(1, 23) = 10.20, p = .004, \( \eta_p^2=.31 \), respectively, for the anterior and central regions]. For the lateral electrodes, there was also an interaction between context and hemisphere, F(1, 23) = 10.36, p = .004, \( \eta_p^2=.31 \), indicating that the context effect was mainly distributed in the left hemisphere, F(1, 23) = 6.51, p = .018, \( \eta_p^2=.22 \). The negative effect for the focused, in comparison with the nonfocused, condition (or a positive effect for the nonfocused relative to the focused condition) is shown in the scalp topography in Fig. 3b, suggesting that the focused words elicited larger N400s than did the non-focused words regardless of the emotional salience over these regions.

Interestingly, we also found a significant four-way interaction of emotional salience, context, hemisphere, and anteriority, F(4, 92) = 3.36, p = .028, \( \eta_p^2=.13 \). Hence, we performed three ANOVAs to the anterior, central, and posterior regions separately, with emotional salience, context, and hemisphere serving as within-subjects factors. A significant interaction among these three factors was found only in the posterior region, F(2, 46) = 4.24, p = .021, \( \eta_p^2=.36 \). Two-way ANOVAs with the factors of emotional salience and context were conducted to the left and right hemispheres separately. The results showed an interaction between emotional salience and context only in the right hemisphere, F(2, 46) = 4.02, p = .025, \( \eta_p^2=.15 \). Therefore, for the right posterior electrodes, we further tested the context effect in the three emotional salience conditions. We found that the contextual modulation existed only for the neutral words, F(1, 23) = 6.92, p = .015, \( \eta_p^2=.23 \), with a larger N400 for the nonfocused than for the focused information. No modulation of context was found for either the negative, F(1, 23) = 0.73, p = .403, \( \eta_p^2=.03 \), or the positive, F(1, 23) = 0.35, p = .558, \( \eta_p^2=.02 \), words within the right posterior region. Figure 4a presents the waveforms elicited by the focused and nonfocused words in the three emotional salience conditions at a representative right posterior electrode. Figure 4b shows the scalp topographies of the effects in the N400 time window between each of the two conditions.

Fig. 4
figure 4

The interaction between emotional salience and context. a Grand averaged waveforms evoked by the critical words as a function of context in the negative, neutral, and positive conditions. The black lines represent the neutral conditions, while the color lines stand for the emotional conditions, with the green and red lines representing the negative and positive conditions, respectively. The thick lines represent the focus conditions, while the thin lines represent the nonfocus conditions. Waveforms are time-locked to the onset of the critical words. Note that the waveforms were smoothed using a10-Hz low-pass filter for illustrative purposes only. b Topographies showing the average voltage differences for the contrast of focus versus nonfocus in the time window of 300–500 ms, separately for the negative, neutral, and positive conditions

The late positivity between 500 and 700 ms

Statistical analysis of the late positivity showed no significant effect of emotional salience or context.

The correlation between subjects’ traits and ERP effects

The STAI questionnaire measured the subjects’ anxiety level on a 1–4 scale. We found that all the subjects had a low anxiety level (mean ± SD = 1.92 ± 0.35). Meanwhile, the results of the empathy questionnaire showed high level of empathy for all the subjects: mean ± SD = 2.29 ± 0.35 on a 0–4 scale. The ERP effects in different time windows between each of the two experimental conditions were quantified on the electrode where the largest effect was revealed. Then the obtained ERP effects were subject to correlation analysis with the trait scores. We found no significant correlation between any of the ERP effects and the trait scores.

Discussion

The aim of this study was to examine the independent roles of words’ emotional salience and IS status, as well as their interactions during sentence comprehension. We found that emotional salience produced both early and late ERP effects, including larger negativities (90–200 ms) for emotional, as compared with neutral, words over the whole scalp, larger P2s for positive words than for negative words over posterior regions (exclude the midline), and larger N400s for negative and neutral words, relative to positive words, over the bilateral anterior–central, left posterior regions, as well as the whole midline. Similarly, IS had both early and late influences on the processing of emotional words in context. Relative to focused words, nonfocused words elicited a larger early negativity over posterior regions (including midline), a larger P2 over bilateral anterior and anterior–central midline regions, and a smaller N400 over the left anterior and central regions (including the anterior–central midline). Interestingly, an interaction between emotional salience and IS was observed in the N400 component over the right posterior scalp, showing that IS modulated the N400 only for neutral words (larger N400 for nonfocused than for focused words), but not for emotional words (equally large N400 amplitudes for focused and nonfocused information) in this region. We discuss the results in more detail below.

Emotional salience produced both early and late ERP effects

The emotional (both positive and neutral) words elicited larger negativities than did the neutral words in the time window of 90–200 ms. The early effect covers both the N1 and N170 components and, thus, precedes the previously reported EPN effect, which is usually observed 200–300 ms after word onset. The N1 effect has been reported in previous studies on emotional word processing (Bernat et al., 2001; Hofmann, Kuchinke, Tamm, Võ, & Jacobs, 2009; Scott, O'Donnell, Leuthold, & Sereno, 2009; van Hooff, Dietz, Sharma, & Bowman, 2008). The N170 component is a negative deflection peaking around 170 ms, with an occipito-temporal scalp distribution. It is typically related to face perception (Bentin, Allison, Puce, Perez, & McCarthy, 1996; Rossion & Jacques, 2008), but it is also associated with word recognition (for a review, see Dien, 2009). The amplitude of N170 was found to be sensitive to both emotional salience (Blau, Maurer, Tottenham, & McCandliss, 2007; Montalan et al., 2008) and attention (Aranda, Madrid, Tudela, & Ruz, 2010). We took the effect on the N1 and N170 components as one single sustained negativity, because of its topographic stability throughout the whole time window of 90–200 ms. The early negative effect in the time window of 90–200 ms likely reflects enhanced perceptual processing induced by the saliency of the emotional words. The emotional salience of the words might have triggered a quick and coarse analysis of the words’ form, before detailed semantic processing takes place (Bernat et al., 2001; Hofmann et al., 2009; Scott et al., 2009; van Hooff et al., 2008). One might argue that the early negative effects were due to the difference in word length. However, this is unlikely for two reasons. First, there was, on average, only a half-character difference between the positive words and the negative and neutral words, so we do not think that this subtle difference substantially affected the ERP responses. Second, while the positive words were slightly longer than the negative and neutral words, we found larger negativities both for the positive and for the negative words, as compared with the neutral words. Therefore, the difference between negative and neutral words cannot be explained by the word length difference.

Following the early negative effect, a smaller P2 was elicited by the negative words, as compared with the positive words, over bilateral anterior and anterior–central midline regions. The difference was quite weak, so we need to be cautious in interpreting the effect. Given that the positive and negative words elicited distinct P2 amplitudes, the P2 component might reflect a general evaluation of emotion valence (Herbert et al., 2006; Schapkin et al., 2000).

A late emotional salience effect was found in the N400 component, with positive words eliciting a reduced N400 relative to the negative and neutral words over the bilateral anterior–central, left posterior regions as well as the whole midline. The smaller N400 elicited by positive words is consistent with the study by De Pascalis et al. (2009). It indicates facilitated semantic integration of pleasant, as compared with unpleasant or neutral, input. This facilitation is probably due to the fact that healthy subjects have a natural bias toward pleasant information and, as a result, less effort is required to integrate positive words into context (Kanske & Kotz, 2007). However, Holt et al. (2009) reported larger N400s for emotional than for neutral words. In addition, they found that the N400 effects to negative versus neutral words were smaller for subjects with more trait anxiety, suggesting that the negative words were easier to be integrated for subjects with consistent (i.e., negative) mental state. Although different ERP effects were observed for the emotional words, both studies suggest that the predominant trait affects the integration of emotional words into sentence context. Further studies are required in order to clarify to what extent the trait influences the ERP responses to emotional word processing.

It should be noted that the pretest showed a significantly lower plausibility for the negative words, as compared with the neutral and positive words. This might partly explain the larger N400 amplitude for the negative words observed in our ERP study, since words with lower plausibility generally elicit larger N400s (Kutas & Federmeier, 2011). However, the observed N400 effect could not be entirely explained by the plausibility difference, since the pattern of the N400s across the three conditions was inconsistent with the pattern of the plausibility differences. While the negative words were less plausible than the neutral and positive words in the sentence context, the negative and neutral words elicited larger N400s than did the positive words. Given the plausibility difference, the N400 to the negative words should have been larger than that of the neutral words, and the N400 to the positive words should have been equal to that of the neutral words. Therefore, the observed N400 effects were not entirely due to the plausibility difference.

In contrast with other studies in which LPC effects were reported, we did not find any emotional modulation on this component. A possible reason for this is that we did not employ a secondary task in our study. The LPC has been found to be sensitive to task demands, with LPC effects being observed most prominently when explicit semantic analysis was required (Fischler & Bradley, 2006; Schacht & Sommer, 2009). Therefore, the absence of an LPC effect in our study is in line with the idea that the LPC effect reflects a reevaluation of emotional valence (Herbert et al., 2006; Hinojosa et al., 2010; Holt et al., 2009; Schacht & Sommer, 2009).

IS modulates words processing at both early and late stages

The focused and nonfocused words elicited distinctive ERP responses starting from 90 ms over the posterior regions (including the midline). The early IS effect has also been reported elsewhere (Johnson et al., 2003; Magne et al., 2005). In general, such early effects are more likely related to perceptual processing than to semantic analysis (Cohen et al., 2000). Here, we tentatively take the early effect as an indication of a mismatch between expected word form and actual bottom-up input. In the question–answer pairs, subjects had a strong expectation that given information would be presented in the nonfocus position. Then the unexpected new information conveyed at the CW position in the answer sentence of the nonfocus condition brought a mismatch between expectation and actual input, resulting in such an early negative effect. This interpretation is compatible with other findings showing early contextual influences on word processing (Penolazzi, Hauk, & Pulvermüller, 2007; van den Brink, Brown, & Hagoort, 2001; for a review on the influence of context on word processing, see Van Berkum, in press).

In addition to the early negativity, in the nonfocus condition, the CW elicited a larger P2 than did the focus condition in the bilateral anterior and anterior–central midline regions. This effect can be explained by different sentence constraints between the focus and nonfocus conditions. It has been shown that words in strongly constraining contexts elicit larger P2s than do those in less predictive contexts, regardless of the actually presented words (Federmeier, Mai, & Kutas, 2005; Wlotko & Federmeier, 2007). In our study, the contextual constraint in the nonfocus condition was very strong, since only given information would be expected in that context. For example, in the context “Who gave an evaluation to the teacher? The principal gave an . . . ,” people would tend to complete the answer sentence using the given information evaluation. In contrast, the focused condition had a less strong constraining context, since the subjects could fill in the focused position using any adjective words. Consequently, the nonfocused words in the more constraining context elicited larger P2s than did the focused words in the less constraining context.

In the N400 time window, the focused information yielded larger negativities than did the nonfocused information over the left anterior and central regions (including the midline). The distribution of this effect differs from the classical N400 effect that shows a central–posterior distribution (Kutas & Federmeier, 2011). An anterior N400 effect has been reported during reference establishment (Van Berkum, Koornneef, Otten, & Nieuwland, 2007), indicating controlled processing or increased working memory. Here, we tentatively take the anterior–central effect in the N400 time window observed in our study as an indication of more attentional resources allocated to the focused than to the nonfocused information. It has been shown that the frontal cortex is greatly activated when substantial attention is involved in cognitive operations (Corbetta & Shulman, 2002). Although the scalp distribution of EEG is not as revealing about the locus of the underlying brain activity as MEG and fMRI, the functional relevance of IS allows us to speculate that the negative effect in the left anterior–central regions is associated with attention engagement, with larger anterior–central negativities in the N400 time window indicating more attentional resources. The absence of an interaction between context and emotional salience in the anterior–central region in the N400 time window seems to suggest that IS modulates the attention allocation regardless of the emotional salience of words. Note that the IS modulated processing resources (as reflected by the larger negativity for the focused than for the nonfocused information over the left anterior–central regions) might differ from the attention that is captured by the emotional salience, which was reflected by the early negative effect for the emotional, relative to neutral, words.

Interaction between emotional salience and IS in the N400 time window over the right posterior region

In addition to main effects of emotional salience and IS, an interaction between them was observed in the N400 time window in the right posterior region. In this region, nonfocused neutral words elicited larger N400s than did focused neutral words, while the emotional words showed no difference between the focus and nonfocus conditions. For the neutral words, the larger N400 elicited by nonfocused information might indicate greater efforts for integrating the new information with limited attentional resources. For emotional words, the absence of an N400 difference in the right posterior region may be due to the saliency of emotional information: Emotional words likely capture attention even if they receive only a little attention through the role of IS in the question–answer pair. That is to say, emotional salience can override IS modulations. Note that we associated the effects in the N400 time window observed over the anterior–central regions and the right posterior regions with different cognitive processes. Whereas the effect observed over the left anterior–central regions was related to the amount of attention allocated to the words of different conditions, the effect found over the right posterior regions was linked to the semantic expectation for the incoming words or the ease of integration of words into the preceding sentence context. However, without further information on the spatial localization of the effects, it is difficult to make out whether the two effects belong to the same ERP component. Another possibility is that the right-posterior effect in the N400 time window might reflect the competition for limited resources between emotional salience and IS, with only the emotional information and focused information receiving sufficient attentional resources. No matter which interpretation is taken, the results clearly indicate a late interaction between emotional valence and IS during language comprehension.

In conclusion, emotional salience and IS exert varying influences on language comprehension at different stages. The interaction between IS and emotional salience provides evidence for attention–emotion interactions at a later stage of processing, while the absence of interaction in the early time window suggests that the processing of emotional information is highly automatic, independent of context.