Language is an important tool in conveying emotion. Emotion can either be expressed straightforwardly with emotional words (e.g., “happy,” “sad,” “beautiful,” “dirty,” “snake,” “diamond”) or be derived from descriptions of events or behaviors (e.g., “I cut my finger when I cooked dinner.”). Unification is an important aspect of language processing. It refers to the integration of words into higher-level representations beyond the meaning of the single words. This can occur at multiple levels, including phonological, semantic, and syntactic levels (Hagoort, 2005). In this study, we examined the time course of deriving emotion from the semantic unification of nonemotional words. We were particularly interested in the relative timing between emotional response and the unification process.

Numerous studies have employed event-related potentials (ERPs) to study the timing of emotional processing in emotional words (for a review, see Citron, 2012). Some early ERP effects (before 300 ms) have been taken as evidence for rapid emotional processing. For instance, the P1/N1 has been associated with enhanced perceptual analysis of emotional words (Kissler & Herbert, 2013; Sass et al., 2010; Wang, Zhu, Bastiaansen, Hagoort, & Yang, 2013b; Zhang et al., 2014; but see Bayer, Sommer, & Schacht, 2012; Briesemeister, Kuchinke, & Jacobs, 2014; Fritsch & Kuchinke, 2013; Hinojosa, Méndez-Bértolo, & Pozo, 2012; Scott, O’Donnell, Leuthold, & Sereno, 2009), whereas the P2 (Ding, Wang, & Yang, 2015, 2016; Herbert, Kissler, Junghöfer, Peyk, & Rockstroh, 2006; Wang & Bastiaansen, 2014) and the early posterior negativity (EPN; Herbert, Junghofer, & Kissler, 2008; Kissler, Herbert, Winkler, & Junghofer, 2009; Schacht & Sommer, 2009) might reflect automatic attention allocation to emotional words. In addition to the early components, a decreased N400 over the anterior region has been taken to reflect facilitated semantic processing (Kanske & Kotz, 2007; Trauer, Kotz, & Müller, 2015), whereas the late positive potential (LPP, peaking between 500 and 800 ms) has been related to the evaluation of emotional valence (Carretie et al., 2008; Fischler & Bradley, 2006; Hinojosa, Méndez-Bértolo, & Pozo, 2010; Kanske & Kotz, 2007). Overall, the emotional aspects of words can be rapidly identified, even before access to the lexical–semantic features of words, although the evidence has been mixed (Citron, 2012).

Some recent studies have also examined the integration of emotional words into sentence or discourse contexts. One ERP component, the N400, has been robustly related to the semantic aspect of language processing. The N400 has been related to the retrieval of lexico-semantic information (Lau, Phillips, & Poeppel, 2008) and the integration of words into a broader context (Hagoort & van Berkum, 2007), as well as to a dynamic interaction between these processes. It is a negativity that peaks around 400 ms after stimulus onset, with a right-lateralized centro-parietal maximum distribution (Kutas & Hillyard, 1980). The N400 amplitude indicates the ease of semantic processing, which is sensitive to the probability of the words in relation to previous contexts (Kutas & Federmeier, 2011; Lau et al., 2008). In addition, the emotionality of words has affected the N400 amplitude even when the words did not violate the semantic expectation of the sentence context. For example, in a neutral sentence context, a smaller N400 was found for neutral than for positive and negative words (Holt, Lynn, & Kuperberg, 2009), and for positive than for neutral and negative adjectives (Martín-Loeches et al., 2012). In an emotional sentence context, a smaller N400 was found for negative than for positive and neutral words (Moreno & Rivera, 2013; Moreno & Vázquez, 2011). Moreover, emotional salience has been found to override detailed semantic analysis, as reflected by a reduced N400 effect in response to expectation violation for emotional as compared to neutral words (Delaney-Busch & Kuperberg, 2013; Moreno & Rivera, 2013; Moreno & Vázquez, 2011; Parkes, Perry, & Goodin, 2016; Wang, Bastiaansen, & Yang, 2015). In addition, the emotional words elicited similar N400 amplitudes regardless of whether or not they had been put in focus by the preceding sentence context, whereas neutral words elicited a smaller N400 when focused than when nonfocused (Wang, Bastiaansen, Yang, & Hagoort, 2013a). These results have further demonstrated prioritized emotional processing.

This privileged emotional processing might be supported by a subcortical route, which exerts influence on the cortical route via a feedback loop (LeDoux, 2000). Another explanation for such rapid emotional processing could be the conditioned association between word form and emotional connotation, because repeated experiences with a particular emotional word could strengthen the conditioned associations to its lexical representation (Kuchinke, Krause, Fritsch, & Briesemeister, 2014).

In addition to word–emotional connotation associations, emotion can also be derived from descriptions of events or behaviors. For instance, the word “finger” in the sentence “I cut my finger when I cooked dinner” could trigger emotional responses even when the word itself is not emotional. The emotion was conveyed implicitly through the ideational meaning instead of the lexical items in a sentence, which has been termed implied emotion (Lai, Willems, & Hagoort, 2015; Schwarz-Friesel, 2015). Such implied emotion can only be derived from the computation of language inputs as a whole. It is highly interesting to test whether the emotional processing of implied emotion differs from that of lexical items with affective connotations, because no direct “word form–emotional connotation” association exists in the implied emotion. This allows us to test the interaction between semantic unification and emotion processing, which has theoretical implications for how emotional meaning could be derived from the language input. So far, to our knowledge, only an fMRI study has studied the neural mechanisms underlying the processing of implied emotion (Lai et al., 2015). They found that implied emotion in sentences activated emotion-related areas and led to increased activation in language-related areas, suggesting that implied emotion could be the result of unification operations. However, little is known regarding the relative timing between the emotional response and the unification process.

The study of the time course of processing implied emotion could enhance our understanding of the emotion–cognition interaction. Theories of emotion in relation to cognition can be classified as constructivist and appraisal theories, which differ in whether emotions are constructed or elicited (Gendron & Barrett, 2009). The constructivist theories assume that emotion is an outcome of conceptualization that is thought to be supported by language use (Lindquist, Barrett, Bliss-Moreau, & Russell, 2006; Lindquist & Gendron, 2013). The constructivist theories characterize emotions in terms of the situations they signify. Emotion is thus constructed as a feature of the situation by means of conceptualization (Clore & Ortony, 2013; Lindquist, 2013; Wilson-Mendenhall, Barrett, Simmons, & Barsalou, 2011). In contrast, appraisal theories assume that there is a mental process of appraisal between the situation and emotion. The vast majority of appraisal theorists hold that appraisal has a causal role in the elicitation of emotion and view appraisal as a specific mechanism that is itself distinct from the emotion and not typically thought to be a linguistic process per se (Ellsworth & Scherer, 2003; Roseman & Smith, 2001).

In this study, we measured the ERP responses to critical words that rendered the whole sentence either emotional or not emotional (e.g., “I cut my finger when I cooked dinner.” vs. “I cut my carrot when I cooked dinner.” The critical words are in boldface.). We were interested in the relative timing between the emotional response and semantic unification. To determine the time window that is associated with semantic processing, we also included a semantically incongruent condition that increased the difficulty of the unification process (e.g., “I cut my water when I cooked dinner,” with the semantically incongruent word in boldface). All the words that constituted the sentences were emotionally neutral. We expected to find an N400 effect in response to the semantic incongruence. Three scenarios have been envisaged regarding the time course of deriving implied emotion: (1) Constructivist theories assume that emotion is constructed as a feature of the situation (Clore & Ortony, 2013; Lindquist, 2013; Wilson-Mendenhall et al., 2011). In the context of the present study, the generated emotion was constituted by the high-level representation of the whole sentence. If this were the case, the emotional response would co-occur with the semantic processing. This would lead to an emotional effect in the same time window as the semantic processing (i.e., an emotional effect in the N400 time window). (2) Causal appraisal theories assume that appraisal plays a causal role in translating the situation to an emotion (Ellsworth & Scherer, 2003; Roseman & Smith, 2001). According to appraisal theories, an emotional response is the outcome of appraisal of the stimuli or situations. In the present study, the situation had to be constructed on the basis of integration of emotionally neutral words, which occurs in the N400 time window. On the basis of the resulting representation of multiword utterances, appraisal would be carried out in order to induce an emotional response. If this were the case, the emotional response should follow the unification process, and thus would be manifested by a late effect (i.e., the LPP effect). (3) According to the affective primacy hypothesis, affective and semantic processing may occur in parallel, and the affective route is typically faster than the semantic route (Arnold, 1960; Zajonc, 1980, 2000). Previous studies on emotional words have suggested that emotional saliency can override detailed lexical analysis of emotional words, which might be supported by a subcortical route for lexical processing or a conditioning association between emotion and lexical form. If emotional salience triggers emotional responses on the basis of partially available information (i.e., before full semantic analysis of the critical words), the emotion generated from the linguistic description would elicit early ERP responses (such as N1/P1, P2, or EPN effects).

Method

Participants

We recruited 25 university students (mean age: 22 years old; range: 19–25 years old; 13 males, 12 females) to participate in the electroencephalography (EEG) experiment. They were all right-handed native Chinese speakers, with normal or corrected-to-normal vision. None of them had dyslexia or any neurological impairment. They signed a written consent form before the experiment. Since emotional trait and state anxiety have been shown to affect emotional processing (e.g., Bar-Haim, Lamy, & Glickman, 2005; Larson, 2017; Rutherford, MacLeod, & Campbell, 2004), we measured the participants’ levels of trait and state anxiety using the Spielberger State–Trait Anxiety Assessment Inventory (STAI; Spielberger, Gorsuch, Lushene, Vagg, & Jacobs, 1983) prior to the EEG experiment. We also measured the participants’ levels of empathy with the interpersonal reactivity index (IRI; Davis, 1980). The data of one female participant were excluded due to slow drifts. Therefore, the data for 24 participants entered the final analysis.

Stimuli

We constructed 168 triplets of sentences, with the sentences in one triplet containing identical context except for the critical words (CWs). The CWs were all emotionally neutral words. All sentences were syntactically correct. The integration of the CWs with the sentential contexts made the sentences emotionally neutral and semantically congruent (neutral-congruent condition), emotionally negative and semantically congruent (negative-congruent condition), or emotionally neutral and semantically incongruent (neutral-incongruent condition). The sentences contained 6–12 words, and the CWs never appeared in the first, second, or final position in a sentence (see Table 1 for examples of the sentences).

Table 1 Some examples of the sentence triplets

We pretested the emotionality (i.e., emotional valence and arousal), concreteness, and cloze probability of the CWs, as well as the emotionality and plausibility of the whole sentences with participants who did not participate in the ERP experiment. First, the emotional valence, arousal, and concreteness of the CWs were rated on 9-point Likert scales (9 indicates the most positive, the most arousing, and the most concrete) by 16 participants. We kept 145 sentence triplets whose CWs’ valence ratings were between 3.8 and 6.2, to make sure that the CWs were all emotionally neutral. Then the semantic plausibility of these sentences was tested with 18 new participants on a 9-point Likert scale (with 9 indicating most plausible). The 145 sentence triplets were divided into three lists using a Latin square design (six participants to each list). No participant read the same sentence triplets more than once, and all three sentences were read across three lists. We selected 115 sentence triplets whose plausibility ratings were above 5 in the congruent conditions (including both the negative-congruent and neutral-congruent conditions) and at the same time below 5 in the incongruent condition. After that, we recruited 24 new participants to rate the emotional valence and arousal of the semantically congruent sentences (including the negative-congruent and neutral-congruent conditions) of the 115 selected triplets on 9-point Likert scales (with 9 indicating most positive and most arousing). Note that since it was difficult to integrate the incongruent words into the contexts, we did not test the emotionality of the incongruent sentences. Additionally, the participants were instructed to identify the word that made the sentence emotionally negative if they had rated a sentence to be negative. We kept 95 sentence triplets, whose valence ratings were below 4 in the negative-congruent condition and between 4 and 6 in the neutral-congruent condition. Additionally, we eliminated three triplets because the CWs in their emotionally negative sentences were not identified as the words that made the sentences negative. Next, we measured the cloze probabilities of the CWs in the three conditions of the 92 sentence triplets. Another 20 participants were asked to complete the sentences that were presented up to where the critical words would appear. The percentage of the participants who filled in the CWs was calculated for each item. Finally, we collected the log-frequencies for the CWs based on a Chinese corpus developed by Cai and Brybaert (2010). We also calculated the numbers of strokes of the CWs, to quantify the visual complexity of the words. We discarded two triplets in order to match the CWs on their emotional valence, arousal, concreteness, word frequency, and number of strokes across the three conditions, as well as the CWs’ cloze probabilities and the whole sentences’ plausibilities between the two congruent conditions (i.e., negative-congruent vs. neutral-congruent). The final stimulus set contained 90 sentence triplets.

For the final set of stimuli, the negative-congruent sentences were rated to be more negative [valence rating: mean (SD) = 3.18 (0.63)] than the neutral-congruent [valence rating: mean (SD) = 4.90 (0.35)] sentences, t(89) = – 24.45, p < .001. Also, the negative-congruent sentences were more arousing [arousal rating: mean (SD) = 6.08 (0.92)] than the neutral-congruent [arousal rating: mean (SD) = 3.16 (0.83)] sentences, t(89) = 23.40, p < .001. Moreover, the CWs were identified as the words that rendered the sentences negative by 82% of the overall participants for the negative-congruent sentences. Repeated analyses of variance (ANOVAs) of the plausibility of the sentences revealed higher plausibility for the two congruent conditions [negative-congruent: mean (SD) = 6.91 (0.86); neutral-congruent: mean (SD) = 7.00 (0.87)] than for the neutral-incongruent condition [mean (SD) = 2.07 (1.05)], F(2, 178) = 935.49, p < .001, η2 = .913, whereas the two congruent conditions (negative-congruent vs. neutral-congruent) were rated as being equally plausible, t(89) = – 0.881, p = .381. In addition, the CWs in the negative-congruent and neutral-congruent conditions showed equally low cloze probabilities, t(89) = 0.51, p = .609: means (SDs) = 3% (0.07) and 3% (0.09), respectively. Moreover, the CWs in the three conditions were matched in emotional valence, arousal, concreteness, frequency, and number of strokes (all p values > .1; see Table 2 for the rating values). For the ERP experiment, the three conditions among the 90 triplets were distributed across three lists according to a Latin square procedure, with each list containing equal numbers of items (30 items) per condition. In addition to the experimental stimuli, we also constructed 30 fillers that were rated to be emotionally positive, even though no particular word in the sentences was emotional (e.g., “The new technology developed by college students doubled the income of farmers.”). These filler sentences were rated by the same group of 24 participants who rated the emotionality of the semantically congruent experimental stimuli. We found that the positive emotion could only be derived after reading the whole sentences. This was done to make sure that participants were not biased to predict or perceive the sentences as being negative after reading emotional sentences that only conveyed negative meaning. Therefore, there were 120 sentences in each experimental list, and the three lists were equally distributed across the 24 participants.

Table 2 Rating results for the critical words

Procedure

Participants were seated comfortably in front of a computer screen. The sentences were presented one word at a time (400 ms, interstimulus interval = 300 ms). The words were shown in white font centered on a black background, and the compound two- and three-character words subtended visual angles of 4.58° and 6.87°, respectively. A trial started with a 1,000-ms fixation cross in the center of the screen. After presentation of a sentence’s final word, there was a 2,000-ms black screen, which was followed by the next trial. Participants were told that there would be a comprehension test after the whole experiment, so they needed to read and comprehend the sentences carefully. They were told not to move or blink during the presentation of words, but to blink during the after-sentence black screen or the fixation cross period.

Participants read the 120 sentences in a pseudorandom order. No more than three sentences in the same condition were presented in succession. The 120 sentences were evenly divided into three blocks, with each block lasting about 4–5 min. Between two blocks, there was a 2- to 3-min break. The whole experiment took about one and a half hours, including the participants’ preparation, instructions, and a short practice run of ten trials (which were not included in the formal ERP experiment).

EEG recording and preprocessing

The data were recorded with a 64-channel NeuroScan system (10–20 system). The left mastoid electrode served as the reference, and an electrode placed between the Fz and FPz electrodes served as the ground. Vertical (VEOG) and horizontal (HEOG) eye movements were monitored through four electrodes placed around the orbital region (bipolar montage). All electrode impedances were kept below 5 kΩ during the experiment. Recording was done with a band-pass filter of 0.05–200 Hz and a sampling rate of 1000 Hz.

The data were analyzed using the Fieldtrip software package, an open-source Matlab toolbox (Oostenveld, Fries, Maris, & Schoffelen, 2011). The EEG data were re-referenced offline to the average of both mastoids, followed by a low-band-pass filter of 100 Hz. Next, we segmented the data from 500 ms before to 1,500 ms after the onset of the words. Trials contaminated with muscle artifacts were identified and removed using a semiautomatic routine. After that, we performed independent component analysis (ICA; Bell & Sejnowski, 1995; Jung et al., 2000) on the data and removed ICA components associated with the eye-movement activities from the EEG signals. These ICA components were identified by comparing them with the EOG recordings. On average, 97.56% of trials were kept, with equal numbers of trials for the three conditions [F(2, 46) = 0.274, p = .762]. In the end, the ERPs were calculated by averaging over trials in each condition for each electrode and each participant.

Statistical analysis

The ERP differences between conditions were statistically evaluated in Fieldtrip (Oostenveld et al., 2011) by cluster-based random permutation tests over all electrodes (Maris & Oostenveld, 2007). The use of such statistical analysis does not require the arbitrary preselection and grouping of electrodes. Since we had a strong a priori hypothesis on the ERP components, we took the averaged amplitudes within predefined time windows into the permutation test, to increase the signal-to-noise ratio of the data. On the basis of earlier studies (Kanske & Kotz, 2007; Wang & Bastiaansen, 2014; Wang, Bastiaansen, et al., 2013a; Wang, Zhu, et al., 2013b) and visual inspection of the ERPs elicited (see Fig. 1A), the mean amplitudes of four ERP components were averaged within four time windows: the N100 (80–120 ms), P200 (150–250 ms), N400 (300–600 ms), and late positivity (600–1,000 ms). We conducted two contrasts: negative-congruent versus neutral-congruent and neutral-incongruent versus neutral-congruent. Neutral-congruent served as the control condition, so that the comparison of neutral-incongruent versus neutral-congruent conditions could be used to determine the time window associated with semantic processing (i.e., the N400 effect), whereas the comparison of negative-congruent versus neutral-congruent could be used to test the time course of deriving implied emotion. First, for the data sample at each electrode, we computed the mean difference between the two conditions. On the basis of the distribution of the difference values obtained for all data samples, we thresholded the observed values with the 95th percentile of this distribution, and the sums of these data samples constituted the cluster candidates. Next, we randomly reassigned the conditions among participants 1,000 times, to build a permutation distribution. For each permutation, the cluster candidate with the highest sum of the difference values was added to the permutation distribution of cluster statistics. Finally, the actually observed cluster-level summed values were compared against the permutation distribution, and the clusters falling in the highest or lowest 2.5th percentile were considered significant.

Fig. 1
figure 1

ERP effects between conditions. (A) Grand average ERP waveforms evoked by the CWs in three conditions. As compared to the words in the neutral-incongruent condition, the words in the neutral-incongruent condition elicited a larger negative amplitude in the N400 time window (300–600 ms), whereas the words in the negative-congruent condition elicited a smaller negative amplitude in the N400 time window and a larger positive amplitude in the late time window (600–1,000 ms). (B) Topographic distributions of the observed effects. The electrodes that showed significant differences between the conditions were highlighted with asterisks

Results

Figure 1A displays the grand average ERP waveforms evoked by CWs in the three conditions. As compared to the neutral-congruent words, the neutral-incongruent words elicited larger amplitudes in the 300- to 600-ms time window (Tmaxsum = 52.355, p < .001) over central-posterior regions (electrodes: F4, F6, F8, FC3, FC1, FCz, FC2, FC4, FC6, FT8, Cz, C2, C4, C6, CP1, CPz, CP2, CP4, CP6, TP8, P3, P1, Pz, P2, P4, P6, P8, PO5, PO3, POz, PO4, PO6, PO8, Oz, O2), with a right-hemisphere dominance, whereas the negative-congruent words elicited smaller amplitudes in the 300- to 600-ms time window (Tmaxsum = 3.949, p = .041) and larger amplitudes in the 600- to 1,000-ms time window (Tmaxsum = 11.878, p = .023) over the left hemisphere (electrodes: FC3, C5, C3, C1, CP5, CP3, CP1). See Fig. 1B for the topographic distributions of the observed effects. Note that we also conducted a cluster-based permutation test across both channels and time points without averaging the data in the time window of interest. While the negative cluster for the neutral-incongruent versus neutral-congruent contrast was still significant in the 300- to 600-ms time window (from 280 to 586 ms; Tmaxsum = 6,758, p = .004), the positive cluster for the negative-congruent versus neutral-congruent contrast was found only in the 600- to 1,000-ms time window (from 778 to 836 ms; Tmaxsum = 1,211, p = .042). However, the lack of any ERP effect in the 300- to 600-ms time window for the negative-congruent versus neutral-congruent contrast could have been due to reduced power for detecting an emotional effect in the 300- to 600-ms time window when the time window that showed the emotional effect was relatively short-lasting or discontinuous in time (see the uncorrected t values for all channels and time points in the supplementary figure). Moreover, it is worth noting that the statistical analysis could hardly tell us “when” the effect began or ended (Maris, & Oostenveld, 2007). The statistical analysis at best could tell us “whether” there was any difference in the time points or time windows being tested. The significance level at each time point depended on both the true underlying effect and the signal-to-noise level. Given that we obtained a significant cluster for the negative-congruent versus neutral-congruent contrast when testing the averaged amplitudes in the prior-defined time window (i.e., 300–600 ms; Tmaxsum = 3.949, p = .041), we believe that the emotional effect elicited by the negative-congruent words was already present in the N400 time window, although it was not as robust as in the later time window. No significant N1 or P2 effect was observed for any of the comparisons.

The STAI questionnaire measured the participants’ state and trait anxiety with 20 questions, each based on a 4-point Likert scale (1–4). We found that all the participants had a low anxiety level: means (SDs) = 24.33 (9.06) and 29.83 (7.88), respectively, for state and trait anxiety. The emotional ERP effects in the two time windows were subjected to a correlation analysis with the trait measurements. No significant correlation was found between the STAI measure with the ERP effects (all p values > .1). The IRI measured the participants’ empathy level with four 7-item subscales on 4-point Likert scale (1–4). Each subscale taps a separate aspect of empathy (M. H. Davis, 1980). The perspective-taking (PT) scale measures the ability or tendency to see things from the perspective or point of view of others, and the fantasy scale (FS) measures the tendency to imaginatively transpose oneself into fictional situations. These two scales represent two components of cognitive empathy. The means (SDs) were 21.38 (4.14) and 22.00 (3.66), respectively, for the PT and FS scales. The PT scores correlated with emotional effect within both the 300- to 600-ms (R = .432, p = .035) and the 600- to 1,000-ms (R = .414, p = .044) time windows, whereas the FS scores correlated with the emotional effect within the 300- to 600-ms time window (R = .417, p = .042). In addition, the empathic concern (EC) scale assessed the tendency to experience feelings of sympathy and compassion for unfortunate others, and the personal distress (PD) scale taps the tendency to experience distress and discomfort in response to extreme distress in others. Those two scales represent affective empathy. The means (SDs) were 21.13 (3.03) and 20.08 (4.20), respectively, for the EC and PD scales. No significant correlation was found between these two scales and the observed emotional ERP effects.

Discussion

In this study we examined the relative timing between emotional response and semantic unification. An emotionally neutral word in a sentential context rendered the whole sentence emotionally neutral and semantically congruent, emotionally negative and semantic congruent, or emotionally neutral but semantically incongruent. We found that the words in the negative-congruent condition elicited reduced negative amplitude between 300 and 600 ms and increased positive amplitude between 600 and 1,000 ms, relative to those in the neutral-congruent condition. Meanwhile, the words in the neutral-incongruent condition elicited increased negative amplitudes between 300 and 600 ms, as compared to those in the neutral-congruent condition. The overlapping time windows between the emotional response and semantic processing suggest that the construction of emotional meaning operates concurrently with semantic unification. We discuss the results in more detail below.

The neutral-incongruent words elicited larger negative amplitudes than did neutral-congruent words in the 300- to 600-ms time window over a central-posterior region, with right-hemisphere dominance. This effect is consistent with the classical N400 effect in response to semantic anomalies (Kutas & Federmeier, 2011), suggesting that semantic processing occurred between 300 and 600 ms (Lau et al., 2008; Pylkkänen & Marantz, 2003). Determination of the N400 time window allowed us to directly compare the timing between semantic unification and emotional processing.

As compared to the words in the neutral-congruent condition, the critical words that rendered the sentences emotional (i.e., negative-congruent) elicited reduced negative amplitudes in the 300- to 600-ms time window, and increased positive amplitudes between 600 and 1,000 ms. These effects were most robust over the left hemisphere. So far, only a few studies have reported N400 differences between emotional and neutral words when they did not violate semantic expectations of the context, with varying topographic distributions and directionality. For instance, as compared to emotionally neutral words, emotionally negative words have been found to elicit a smaller anterior negativity (Kanske & Kotz, 2007; Trauer et al., 2015), a smaller posterior negativity (Moreno & Rivera, 2013), or a larger posterior negativity (Holt et al., 2009) in the N400 time window. These findings have been interpreted as reflecting either facilitated semantic processing or increased attentional resources for emotional as compared to neutral words. Our pretests showed that the sentences in the two congruent conditions were equally congruent and that their critical words were equally unexpected. Also, the critical words in the two conditions were controlled for various lexical characteristics, including concreteness, word frequency, and visual complexity. In addition, the effect in the 300- to 600-ms time window was mainly distributed over the left temporal–parietal region, which was different from the classical N400 distribution. Therefore, the difference we observed in the 300- to 600-ms time window between the neutral-congruent and negative-congruent conditions could not be explained by differences in semantic unification difficulties.

Following the effect in the 300- to 600-ms time window, the words in the negative-congruent condition elicited larger positive amplitudes than did those in the neutral-congruent condition. Given the similar difference patterns in amplitude (i.e., more positive amplitudes for the emotional condition) and topographic distributions (i.e., over left temporal–parietal areas) between the two components, we are inclined to interpret them as one positive effect that started around 300 ms and lasted until 1,000 ms. Previous studies related to emotional processing have repeatedly reported a positive effect in response to emotional processing, both in isolation (for a review, see Citron, 2012) and in a sentence context (Bayer, Sommer, & Schacht, 2010; Holt et al., 2009; Moreno & Rivera, 2013). Such positive effects may vary in latency and distribution, probably due to differences in lexical–semantic factors (such as word frequency, concreteness, grammatical features, etc.), task demands (emotional relate or not), or individual differences (for reviews, see Citron, 2012; Okon-Singer, Lichtenstein-Vidne, & Cohen, 2013). Regardless, larger positive amplitudes in emotional than in neutral conditions have been proposed to reflect deeper/prolonged emotional evaluation, due to the motivational relevance of the stimuli (Citron, 2012; Hinojosa, Carretié, Valcárcel, Méndez-Bértolo, & Pozo, 2009; Holt et al., 2009; Kissler, Assadollahi, & Herbert, 2006; Wang, Zhu, et al., 2013b). Interestingly, we found that the emotional effect (negative-congruent vs. neutral-congruent) significantly correlated with cognitive empathy but not with affective empathy at the participant level. Specifically, participants’ ability or tendency to see things from others’ perspective (as measured by the perspective-taking subscale) correlated with the emotional effect within the 300- to 1,000-ms time window. This correlation is consistent with the finding that increased empathetic perspective taking was associated with less pleasant and more arousing ratings of sad sentences (Pinheiro, Dias, Pedrosa, & Soares, 2016). In addition, participants’ tendency to imaginatively transpose themselves into fictional situations (as measured by the fantasy subscale) correlated with the emotional effect within the 300- to 600-ms time window. Individual variation on the fantasy subscale has been related to mentalizing and inference-making during sentence comprehension (Li, Jiang, Yu, & Zhou, 2014). Therefore, the correlation between fantasy scores and the emotional effect in the 300- to 600-ms time window indicates that the derivation of implicit emotion might have been mediated by construction of a situation model. Note that we need to be cautions when testing multiple correlations (i.e., eight correlations resulting from the empathy measures of four subscales and the emotional ERP effect within two time windows). Therefore, further studies will be required in order to confirm these findings. Also, it remains unclear whether emotional response involves the recognition and/or the experience of emotion. It should also be noted that since the negative-congruent and neutral-congruent sentences differed in both emotional valence and arousal, the observed ERP effect could be attributed to either aspect of emotionality in the present study. It will be interesting to test in future studies whether sentences that convey positive meaning would produce emotional responses similar to those elicited by our negative sentences.

The concurrent engagement of emotional response and semantic unification suggests that implied emotion might emerge as part of the high-level representation of language inputs. A constructionist view of emotional perception suggests that language plays a constitutive role in emotion perception (Barrett, Mesquita, & Gendron, 2011; Lindquist & Gendron, 2013), and that language even shapes emotional perception (Gendron, Lindquist, Barsalou, & Barrett, 2012). For instance, it has been found that manipulating language, such as through verbal interference and verbal labeling, affected the categorical perception of emotional faces (for a review, see Fugate, 2013). One version of a simulation model (Barsalou, 2009) emphasized the role of the emotion system in conceptual representation and online multimodal situated conceptualization. Our study of implied emotion in language is in line with the view that the implied emotion elicited by an event lies in the language system, and that implied emotion based on the integration of linguistic information is not independent or an aftereffect of language comprehension. Instead, implied emotion exists as a part of the high-level representation from language and arises from semantic unification. Therefore, emotions, together with other conceptual interpretations, constitute a unified, meaningful representation of the language inputs (Alvandi, 2016; Hassin, Aviezer, & Bentin, 2013; Willems, Clevis, & Hagoort, 2011; Wilson-Mendenhall et al., 2011).

The concurrence of emotional response and semantic processing does not support the causal appraisal model of emotion in the field of language (Ellsworth & Scherer, 2003; Roseman & Smith, 2001). If an appraisal process is necessary to associate a situation and its emotion, the emotional response can only be elicited after the situation model has been established. A strong version of the simulation model suggests that the situation model is built on experiential (perception and action) simulations of the described situation, as provided by language cues (Zwaan, 2004). If this were the case, the emotional response would have followed the N400 effect, after a complete mental representation of the event had been established. However, the emotional response co-occurred with semantic unification. Therefore, the deriving of emotion from language descriptions is not dependent on the appraisal of the situation model as a result of the unification process. Nevertheless, we need to be cautious that, although the appraisal and constructivist theories are at odds with each other in some respects, they are not mutually exclusive. “Noncausal” appraisal models see appraisal as a useful description of a situation instead of the cause of emotion, thus merging with constructivist accounts (Clore & Ortony, 2013; Ortony, Clore, & Collins, 1988).

However, unlike emotional words, the implied emotion did not produce any early ERP effect preceding that for semantic processing. Since lexical information can only be accessed after 200 ms, the early ERP effect observed for processing emotional words might be explained by the activation of subcortical pathways before detailed lexical analysis (LeDoux, 2000). Such subcortical pathways might be established on the basis of repeated association between emotion and a particular word form during the acquisition of the word’s meaning (Kuchinke et al., 2014). As for implied emotional processing, this form of emotional meaning can only be derived after the words’ meanings have been accessed and further integrated into the sentence context. This requires recruitment of the cortical language network, as supported by the previous fMRI study showing that the emotional network involved in implied emotion was intricately related to the network for semantic processing in language (Lai et al., 2015). Using the ERP technique, our results provide further evidence on a dependence on semantic unification during implied emotional processing. Therefore, representations of emotion draw on the mental representation of situations as described by the language inputs (Clore & Ortony, 2013; Lindquist, 2013; Wilson-Mendenhall et al., 2011).

Conclusions

We measured the ERP responses to critical neutral words that made a whole sentence either affectively negative or neutral and either semantically congruent or incongruent. We found that both the emotional and semantic effects started about the same time, around 300 ms, but that they showed different topographical distributions and durations. The finding of concurrent emotional response and semantic unification suggests that emotional meaning, like other semantic features, is incrementally incorporated into ongoing sentence processing, generating an ERP response that is related to its higher motivational relevance. Our finding is consistent with predictions from a constructivist view of emotion and is in disagreement with predictions from some (causal) appraisal theories.

Author note

This work was supported by awards from the National Natural Science Foundation of China to Y.Y. [Grant No. 31070989] and L.W. [Grant No. 31200849], as well as from the Excellent Young Researcher Foundation of the Institute of Psychology, Chinese Academy of Sciences, to L.W. [Grant No. Y4CX152008]. The raw EEG data and program code used in this study are available by request from the authors.