Introduction

Human faces are complex visual stimuli that carry different kinds of information about a person. As well as indications about age, sex and identity, faces display expressions that are associated with discrete emotions such as fear, anger and sadness (Ekman, 1992). These emotional expressions are highly correlated with distinct subjective experiences, judged at discrete categories and are essential for interpersonal and social functioning as they allow rapid assessment of a person’s motivational state (Matsumoto, Keltner, Shiota, O’Sullivan, & Frank, 2008).

The structural complexity of facial stimuli and the importance of a rapid extraction of facial information for communication requires a complex and partially specialized neurocognitive structure for face processing. Stage models of face processing distinguish between structural encoding of the immediate perceptual input and later stages that match the stimulus with memory of the person’s identity and emotional expressions (Bruce & Young, 1986). According to Haxby et al. (2002, 2007) and Haxby and Gobbini (2011), emotional face processing involves a network including the occipital face area (OFA; Puce, Allison, Asgari, Gore, & McCarthy, 1996), the fusiform face area (FFA; Kanwisher, McDermott, & Chun, 1997) and the posterior superior temporal sulcus (Hoffman & Haxby, 2000). At the same time, the emotional expression carried by a face affects the visual processing of this stimulus at various stages of processing. Such affective modulation of face perception has been observed with behavioral paradigms (Calvo & Esteves, 2005; Li, Zinbarg, Boehm, & Paller, 2008) as well as functional magnetic resonance imaging (fMRI; Fusar-Poli et al., 2009; Sabatinelli et al., 2011; Vuilleumier & Pourtois, 2007) and event-related potential (ERP) studies (Eimer & Holmes, 2007; Schacht & Sommer, 2009).

Although distinct types of facial expressions and emotional categories are highly correlated and allow a good prediction of a person’s subjective experience, this association is not always unambiguous. Contextual features of a person in a specific situation provide information that may complement or contradict the facial expression (Wieser & Brosch, 2012). Recent research has shown that such contextual information is not processed independently from the facial stimulus, but affects the visual processing of the facial expression (Barrett, Mesquita, & Gendron, 2011; Righart & de Gelder, 2006). This effect has been demonstrated for contextual features that are derived from the face itself (e.g., eye gaze: Adams & Kleck, 2003, 2005; Artuso, Palladino, & Ricciardelli, 2012), but also from surrounding stimuli such as the expression of the body (Aviezer, Bentin, Dudarev, & Hassin, 2011; Meeren, van Heijnsbergen, & de Gelder, 2005) and even external stimuli (e.g., social situations: Kim et al., 2004; Schwarz, Wieser, Gerdes, Mühlberger, & Pauli, 2012).

The impact of context information on the processing of facial information depends on the amount of emotional information that can be derived from a face. Contextual information is particularly powerful when facial expressions are either ambiguous (Calvo, Marrero, & Beltrán, 2013; Kim et al., 2004; Neta, Kelley, & Whalen, 2013; Neta & Whalen, 2010) or not related to any specific emotion (neutral faces: Schwarz, Wieser, Gerdes, Mühlberger, & Pauli, 2012). For example, Kim et al. (2004) presented surprised faces as ambiguous stimuli that were paired with verbal information about the context that was either positive or negative in valence. Surprised faces that were preceded by negative contextual sentences elicited greater ventral amygdala activation compared to surprised faces cued by positive sentences. In a social conditioning study, Davis et al. (2010) paired neutral faces with a series of affective social unconditioned stimuli (e.g., affective sound or sentences representing a positive or negative outcome). Inherently neutral faces that had been conditioned with positive and negative context cues subsequently provoked higher activations in the lateral ventral amygdala. In another experiment by Schwarz et al. (2012), neutral faces were preceded by contextual sentences comprised of positive or negative evaluations. In an attempt to activate self-referential processing, part of these sentences were addressed directly to the participant. As predicted, contextual information biased the subsequent rating of the faces and modulated neural processing. Notably, the faces that were preceded by self-referential information were associated with both an increased activity of structures associated with face processing (right fusiform gyrus) as well as self-referential processing (medial prefrontal areas). This influence of self-reference on the perception of stimuli has already been demonstrated in some earlier experiments, covering different methodological approaches. Herbert, Herbert, and Pauli (2011), for example, found augmented amygdala activations for pleasant words when these words were accompanied by a self-possessive pronoun, while negative words led to enhanced cortical processing for both self- and other-relevant conditions. Another example is a study by Fields and Kuperberg (2012), who also suggest an interactive influence of self-relevance and emotion.

Next to behavioral experiments and fMRI studies, electroencephalographic (EEG) examinations have studied context effects on affective face processing with the advantage of high temporal resolution. There is considerable evidence that ERPs which are associated with stimulus processing are affected by emotional information. For example, Righard and de Gelder (2006) showed that the N170, a prominent component that is reliably triggered by face stimuli, is modulated by the context surrounding the face. In addition, the N170 seems to be most pronounced when a facial expression is consistent with the emotional information provided by the background scenery of a picture (Righart & de Gelder, 2008). The N170 has been associated with early-stage facial picture processing (Eimer, 2000, 2011; Rossion & Jacques, 2012; Rossion et al., 2000) and linked to the structural coding process in the model proposed by Bruce and Young (1986). The modulation of N170 through the interaction of context information and facial expression contradicts this assumption and indicates that the processing of contextual information and facial expression is not limited to later stages but operates at all phases of face processing. However, the alternative explanation that N170 modulation has not been caused by the emotional information but by confounding physical features of the contextual background pictures or face expressions cannot be ruled out from this study. To overcome this limitation, other experiments provided contextual emotional information independently of the visual stimulation that provided the ERP. In a study by Diéguez-Risco, Aguado, Albert and Hinojosa (2013), participants were instructed to look at pictures of angry or happy faces, which were preceded by an emotionally matching (e.g., a sentence describing a happy situation paired with a happy face) or contradicting contextual sentence (e.g., a happy situation paired with an angry face). However, the findings from this study did not confirm the context modulation of the N170 as the context effect was limited to a later component, the so-called late positive potential (LPP). The LPP typically starts approximately at 300 ms after stimulus onset and lasts up to several seconds post-stimulus (Hajcak, Dunning, & Foti, 2009). It has been associated with emotional processing of faces and other stimuli in a motivated attention framework (Frenkel & Bar-Haim, 2011; Hajcak et al., 2009; H. T. Schupp, Öhman, et al., 2004), and in particular with decoding and appraisal of affective meaning (H. T. Schupp, Flaisch, Stockburger, & Junghöfer, 2006; Wessing, Rehbein, Postert, Fürniss, & Junghöfer, 2013).

In a further study on affective context effects, Wieser and colleagues (2014) investigated the influence of verbal descriptions about the target person on the processing of neutral faces. The descriptions had different affective valences (negative, neutral, and positive) and were either directed to the participant (self-relevant; e.g., “She thinks your comment is stupid”) or directed to another person (other-relevant; e.g., “She thinks his comment is stupid”). In this study self-reference but not the valence affected the LPP, while an earlier component, the early posterior negativity (EPN), which the authors identified 220–300 ms post-stimulus in two temporo-occipital clusters, was affected by valence (Wieser et al., 2014). The EPN has generally been associated with enhanced emotional processing (Hajcak, MacNamara, & Olvet, 2010). Wieser and colleagues, to our knowledge, conducted one of the first studies to examine the interaction of self-reference and affective context variables on the processing of neutral faces using ERP analyses, allowing inferences about the specific stages at which these context variables influence neutral face processing. It would be interesting to further investigate the different stages of face processing to get more information about when the context information is integrated and if self-reference plays a crucial part in this process when the context information is given in advance.

So far, studies on emotional modulation of visual stimuli have either distinguished between emotional and non-emotional or between positive, negative and neutral stimuli. This categorization of emotional valence is a simplified classification of the potential emotional meaning of stimuli that does not represent the complexity of emotional perception. We assume that at least one further discrimination of negative stimuli is important, i.e., the differentiation between stimuli that carry a social versus a physical threat. Theories of social motivations (Baumeister & Leary, 1995) propose that social threats, i.e., threats of rejection and degradation, may be as stressful as physical threats that indicate a risk to life and limb. Indeed, indications of social exclusion cause intense emotional, physiological and behavioral stress responses (MacDonald & Leary, 2005). While some authors point out the convergence between social and physical threat reactions in thought, emotion and behavior (Macdonald & Leary, 2005), others challenge this view by arguing for a specificity of social stress reactions (for a discussion see Corr, 2005; MacDonald & Leary, 2005). A meta-analysis of studies including experimental stress tests even proposed that one of the main reasons for amplified release of stress hormones like cortisol is the social component (Dickerson, Gruenewald, & Kemeny, 2004). The study of the shared and distinct mechanisms underlying the reactions to social and physical threats is still in its infancy and only few researchers directly compared reactions to these two kinds of threat. A recent example is the study by Wabnitz and colleagues (2012), who found differences in an early ERP component (socially threatening words elicited an augmented P100 amplitude compared to positive and physically threatening words) in a passive-viewing paradigm. To our knowledge, there is no study that used pictures or faces in order to compare reactions to social and physical threats. All in all, there is reason to further examine the differential processing of socially and physically threatening stimuli. While there certainly is an overlap between this two types of threat (e.g., physical threat could also lead to the belief of social exclusion), there is some evidence that there might be processing steps that are specific for each type of threat.

Since facial expressions are of substantial importance for interpersonal interactions (e.g., Morel, Ponz, Mercier, Vuilleumier, & George, 2009; Wieser & Brosch, 2012) we predict that the social component is especially important for face processing and that the affective modulation of facial perception does not occur with negative information per se, but preferentially in relation to social threats. To test this assumption, contextual sentences that differed in three classes of emotions (physically threatening, socially threatening, neutral) were presented. The emotional information was presented as written statements preceding faces with neutral expressions. As a second factor, we varied the self-referential aspect of the context sentences by directing these statements to the participant or to other persons. By randomly pairing sentences and faces in each trial of the experiment, the focus of interest is on the one-time and short-term influence of context information about a person rather than on learning effects. The affective context modulation effect has been repeatedly demonstrated for the LPP (Foti & Hajcak, 2008; Macnamara, Foti, & Hajcak, 2009; Macnamara, Ochsner, & Hajcak, 2011; Wieser et al., 2014) and, in a similar study by Wieser et al. (2014), also for the EPN. In addition, we have theoretical reason to assume that socially threatening information may be particularly relevant, since it leads to distinct activations compared to other emotional stimuli (Wabnitz et al., 2012). Following these previous findings and assumptions, we predicted a significant modulation of the EPN and LPP. We assumed a general enhancement of the components by: (1) threatening context information and (2) self-relevant context information. Furthermore, we predicted that socially threatening context information led to the most pronounced amplitudes and assumed an interaction between self-reference and valence, with self-reference intensifying the processing of threatening information. Since there is no direct evidence for modulation of N170 by emotional context information, it was analyzed using an explorative approach.

Method

Subjects

The participant pool comprised 35 healthy individuals were recruited through the university campus bulletin boards. Six participants had to be excluded from later analyses because of technical errors in the data-recording process at an earlier stage. The mean age of the remaining twenty-nine participants was 27.52 years (SD = 5.38, range 21–44; 21 females). All subjects were undergraduate students who received either course credits or financial benefits for participation and had normal or corrected-to-normal vision. In order to make sure that the participants had no current or known lifetime history of axis I DSM-IV psychiatric disorders (American Psychiatric Association, 2000), a structured clinical interview (the German version of the Mini-International Neuropsychiatric Interview (M.I.N.I., Ackenheil et al., 1999; Lecrubier et al., 1997; Sheehan et al., 1998)) was applied prior to the experiment. No other inclusion or exclusion criteria were applied. All subjects received and signed informed consent forms before the experiment, and the experimental procedure was approved by the ethics committee of Bielefeld University. The study conformed to the declaration of Helsinki.

Stimulus material

Photographs of faces of 22 different Caucasian individuals (11 females) were taken from the Radboud Faces Database (RFDB; Langner et al., 2010). The RFDB offers a variety of different facial expressions, representing different emotions, as well as neutral facial expressions, all of them presented at different camera angles. For the present study, only neutral faces with frontal orientation were included. The pictures were resized to a resolution of 371 × 556 pixels. There was no further standardization in terms of contrast or luminance, since the randomized presentation of all faces would rule out all possible systematic influences.

The context stimuli consisted of 18 sentences from three different emotional categories (physically threatening, socially threatening, and neutral) and were couched in a self-referred and other-referred fashion as well (e.g., “He wants to hit you” vs. “He wants to hit someone”). Physically threatening sentences described situations where an aggressor intends to conduct actual physical violence or is uttering violent threats (e.g., “Er will dir die Fresse polieren”, translated as “He wants to smash your face in”). Socially threatening sentences were focused on intimidation or the impending loss of social belonging or rank (e.g., “Sie findet dich abstoßend”, translated as “She finds you abhorrent”). Neutral sentences were characterized by description of non-judgmental or non-threatening behaviors or situations (e.g., “Er sitzt neben dir”; translated as “He is sitting next to you”). All sentences had been rated previously as part of a larger sentence set according to valence and arousal using the German version of the Self-Assessment Manekin (SAM; Bradley & Lang, 1994) by university students who received either course credits or a financial bonus for their participation. The SAM is a non-verbal, pictorial assessment technique that enables measurement of valence, arousal, and dominance. It is available in 5-, 7-, and 9-point scales. In the current experiment, the domains valence and arousal were used with 5-point scales. The six physically and socially threatening sentences that gained the highest values on the valence and arousal domains and the six neutral sentences with the lowest arousal values and most mid-level ratings for valence were chosen for further analysis. The self-referred and other-referred sentences of the emotionally valent sentences differed significantly from each other on both domains (p < .01), whereas the neutral sentences only differed on the arousal domain (socially threatening self- vs. other-referred: arousal t(10) = 9.53 p < .001, valence t(10) = 10.47 p < .001; physically threatening self- vs. other-referred: arousal t(10) = 7.43 p < .001, valence t(10) = 5.25 p < .001; neutral self- vs. other-referred: arousal t(10) = 4.95 p < .01, valence t(10)= -1.13 p = .29). When compared between emotional categories, the physically threatening and socially threatening sentences did not differ from each other while diverging significantly on both domains from the neutral sentences (physically vs. socially threatening: arousal t(22) = −0.99 p = .33, valence t(18.89) = −0.53 p = .60; neutral vs. physically threatening: arousal t(13.96) = 11.50 p < .001, valence t(14.08) = 11.70 p < .001; neutral vs. socially threatening: arousal t(12.79) = 7.92 p < .001, valence t(12.35) = 7.35 p < .001). With regard to sentence length, the amount of characters per sentence did not differ significantly, F(2,15) = 0.26, p > 0.05. Detailed characteristics of the stimuli statistics are displayed in Table 1, the complete (translated) stimulus set is provided as an Appendix.

Table 1 Mean valence and arousal ratings for the three different sentence classes

Stimulus presentation

For the presentation of the experiment, the software package Inquisit 4.0.3 (Millisecond Software, Seattle, WA, USA) was used. The experiment was shown on a 19-in TFT-monitor (60-Hz refresh rate), which was located approximately 60 cm in front of the participant. The participants were asked to focus their attention on the center of the screen and just passively view the displayed pairs of faces and sentences. The paradigm was an adaption to the paradigms implemented by Kim et al. (2004) and Schwarz et al. (2012). For a schematic example of an experimental trial, see Fig. 1. Each sentence (self-referred/physically threatening, self-referred/socially threatening, self-referred/neutral, other-referred/physically threatening, other-referred/socially threatening, other-referred/neutral) was pseudo-randomly paired with one of the 22 faces, each sentence was automatically worded in an appropriate gender-specific fashion (e.g., female face – female personal pronoun). Since the experimental setting itself did not seem to provoke any distractions that may cause problems in focusing on the screen, we did not include a fixation cross in our design. In order to ensure that differences in the ERPs are only caused by the influence of the sentences, there was no fixed combination of any face-sentence pair. The experiment consisted of six blocks, each block contained 48 randomly chosen trials, leading to a total number of 288 trials. Each trial started with the presentation of a sentence for 2900 ms (inter-stimulus-interval randomized between 900 and 1500 ms), followed by a face which was presented for 500 ms. The intertrial-interval consisted of an empty gray background and varied randomly between 1900 and 2600 ms. Between each block, there was a short break for approximately 2 min. Thirty minutes after the experiment, the subjects were asked to attend an unexpected recall task as a measurement of attention. The participant’s task was to determine which faces in a list of faces were actually part of the experiment.

Fig. 1
figure 1

Schematic representation of an experimental trial

Electrophysiologic recordings

EEG was recorded using a BioSemi Active-Two system with 128 electrodes and a sampling rate of 512 Hz. Impedance was kept at 20 kΩ. Vertical and horizontal electro-oculograms (VEOG and HEOG) were recorded for later offline correction of artefacts related to eye movements and blinks. Data acquisition was done with ActiView 7.03. After recording, data were further processed with SPM8 (Ashburner, Barnes, & Chen, 2012). This included re-referencing to the average of all electrodes, segmentation (200 ms pre- to 1000 ms post-stimulus), filtering (high-pass = 0.166 Hz, low-pass = 30 Hz), baseline-correction (−100 ms to 0 ms pre-stimulus) and artefact correction (threshold method in order to identify blinks and movements). In accordance with the SPM (Statistical Parametric Mapping) 8 guidelines (Ashburner et al., 2012), a threshold of 150 microvolts was applied. The robust averaging technique in SPM was used to compute average EEG epochs for each stimulus category for each person separately. According to Litvak, Mattout, Kiebel, et al. (2011), this method downweighs artefactual outliers in the data. An advantage of this method is that the influence of artefactual data to the averaged data can be close to zero without having to reject whole trials. Relevant ERP components for further analyses were defined on the basis of similar earlier studies working with facial expressions (Eimer, 2000; Schacht & Sommer, 2009; H. T. Schupp, Öhman, et al., 2004) and visual inspection of the data. The following components were chosen for further analyses: N170 as a component of early visual processing (Eimer, 2000), EPN as an indicator for enhanced emotional processing (Schacht & Sommer, 2009), and LPP as a component for later, motivated visual processing (H. T. Schupp, Öhman, et al., 2004).

Statistical analysis

EEG scalp-data were statistically analyzed with EMEGS (http://www.emegs.org/, Peyk, De Cesarei, & Junghöfer, 2011). For statistical analyses 2 (self-reference: self-related vs. other-related) by 3 (valence: socially threatening, neutral, and physically threatening) repeated measures ANOVAs were set up to investigate interaction effects between block and condition in time windows and electrode clusters of interest. Effect sizes were calculated for all interaction effects (Cohen, 1988). Time windows were segmented to detect early differences on the N170 (140–170 ms) and EPN component (220–350 ms) (see H. T. Schupp, Öhman, et al., 2004; Wieser et al., 2014; Wieser, Pauli, Reicherts, & Mühlberger, 2010). As visual inspection of the electrode waveforms indicated an off-response to the disappearance of the faces after 500 ms, lasting up to 670 ms, we subdivided this component into an early (400–500 ms) (see Adolph, Meister, & Pause, 2013; H. T. Schupp, Junghöfer, Weike, & Hamm, 2004) and a late section (700–1000 ms) (see H. Schupp et al., 2004) of the LPP.

For the N170 and the EPN time windows, two symmetrical occipital clusters of 11 electrodes around P7 and P8 (see H. T. Schupp, Öhman, et al., 2004; Wieser et al., 2014, 2010) were examined (left: I1, OI1, O1, PO9, PO9h, PO7, P9, P9h, P7, TP9h, TP7; right: I2, OI2, O2, PO10, PO10h, PO8, P10, P10h, P8, TP10h, TP8; see Fig. 2). Late positive potential topographies have been found to vary, with some authors reporting more parietal and others more fronto-central distributions, or even both in one study (Kissler, Herbert, Winkler, & Junghofer, 2009; Schindler, Wegrzyn, Steppacher, & Kissler, 2014). Since the present data revealed differences at both the fronto-central and parietal sites (see Figs. 4 and 5), two electrode groups of interest were analyzed for this component. For the LPP time windows a large fronto-central cluster of 33 electrodes (F1, Fz, F2, FFC3, FFC1, FFCz, FFC2, FFC4, FC3, FC1, FCz, FC2, FC4, FCC3, FCC3h, FCC1, FCC1h, FCCz, FCC2h, FCC2, FCC4h, FCC4, C5h, C3, C3h, C1, C1h, Cz, C2h, C2, C4h, C4, C6h) and a large centro-parietal cluster of 27 electrodes were investigated (CCP3, CCP3h, CCP1, CCP1h, CCPz, CCP2h, CCP2, CCP4h, CCP4, CP3, CPz, CP4, CPP3, CPP3h, CPP1, CPP1h, CPPz, CPP2h, CPP2, CPP4h, CPP4, P3, P1, Pz, P2, P4, PPO1, PPOz, PPO2).

Fig. 2
figure 2

Selected electrode clusters for all time windows. Selected electrodes are highlighted by color

Behavioral data

Thirty minutes after the experiment, the individuals were asked to fill out a recognition test containing faces of 47 different persons. The amount of correctly remembered faces was used as a measurement for attention during the experiment. An outlier analysis was conducted with IBM SPSS Statistics 22.

Results

Behavioral data

The outlier analysis revealed that there were no outliers in the data that exceeded one standard deviation. Therefore, it was concluded that, overall, the participants paid equal attention to the task. The mean value of correctly remembered faces (22 possible hits) was 16.86 (SD = 3.14).

Event-related brain potentials (ERPs)

N170

There was a significant main effect of laterality for the face-evoked ERPs (F (1, 28) = 5.23, p < .05, partial η2 = .16), which was based on a larger negative deflection of the N170 over the right electrode cluster. All other main and interaction effect tests were insignificant (all p’s > .20).

Early posterior negativity (EPN)

In the EPN a main effect of self-reference was found (F (1, 28) = 5.26, p < .05, partial η2 = .16). Here, self-related faces evoked a larger EPN compared to other-related faces (see Fig. 3). Again, a main effect of laterality was found (F (1, 28) = 9.00, p < .01, partial η2 = .24), where over the right electrode cluster a larger positivity was observed. All other main and interaction effect tests were not significant (all p’s > .20).

Fig. 3
figure 3

Early posterior negativity (EPN) main effect of self-reference. (a) Difference in topographies between self-related and other-related faces: Blue indicates more negativity and red more positivity for the self-related context. (b) Selected electrode P8 displaying the time course for both context manipulations over right-parietal sites

Early late positive potential (LPP)

In the early LPP time window (400–500 ms) processing of neutral faces was modulated trend-like by the self-reference and valence of the preceding sentences over fronto-central regions, while over the parietal cluster an interaction between self-reference and valence was found.

For the fronto-central cluster, a trend for a main effects of self-reference (F (1, 28) = 3.29, p = .08, partial η2 = .11) and valence (F (2, 56) = 2.49, p = .09, partial η2 = .08) was found, while there was no interaction between the two (F (1.67, 46.65) = 0.10, p = .87, partial η2 < .01).

Centro-parietally there was a significant interaction between self-reference and valence (F (2, 56) = 3.19, p < .05, partial η2 = .10). However, despite a trend-like larger positivity for self-related neutral compared to other-related neutral sentences (p = .08), post-hoc comparisons were insignificant on socially threatening (p = .14) and physically threatening (p = .16) sentences. Centro-parietally no main effects of self-reference (F (1, 28) = 0.75, p = .39, partial η2 = .03), and valence (F (2, 56) = 0.02, p = .98, partial η2 < .01) were found.

Late LPP

For the late stage of the LPP (700–1000 ms), main effects of self-reference and valence of the preceding sentences were found fronto-centrally, while again an interaction between both factors occurred over centro-parietal locations.

Over fronto-central locations, a significant main effect for self-reference occurred (F (1, 28) = 10.62, p < .01, partial η2 = .28), which was due to a larger LPP for self-related faces compared to other-related faces (see Fig. 4). Furthermore, there was a main effect of valence (F (2, 56) = 3.32, p < .05, partial η2 = .11). Post-hoc comparisons showed a larger LPP for faces after socially threatening sentences compared to physically threatening (p < .05) and neutral sentences (p < .05), the latter two not differing from each other (p = .85). There was again no interaction fronto-centrally (F (2, 56) = 0.12, p = .89, partial η2 < .01).

Fig. 4
figure 4

Fronto-central main effects of self-reference and valence in the late late positive potential (LPP) (700–1000 ms). (a) Left: Difference in topographies between self-related and other-related faces. Blue indicates more negativity and red more positivity for the self-related context. Right: Selected electrode FCz displaying the time course over fronto-central sites. (b) Left: Difference in topographies between socially threatening, neutral, and physically threatening sentences. Blue indicates more negativity and red more positivity for the respective difference. Right: Selected electrode FCz displaying the time course for all conditions over fronto-central sites

For the parieto-central cluster however, a significant interaction between self-reference and valence was found (F (2, 56) = 5.88, p < .01, partial η2 = .17; see Fig. 5). Similar to the early LPP time window, a significantly larger positivity was found for self-related neutral compared to other-related neutral (p < .05) sentences. The different self-reference conditions did not differ with socially threatening (p = .07), or physically threatening sentences (p = .45). Finally, no main effects of self-reference (F (1, 28) = 0.75, p = .39, partial η2 = .03) and valence (F (2, 56) = 0.02, p = .98, partial η2 < .01) were found.

Fig. 5
figure 5

Centro-parietal interaction between self-reference and valence in the late late positive potential (LPP) (700–1000 ms). (a) Difference in topographies between self-related and other-related faces for the comparison of neutral and socially threatening sentences and neutral and physically threatening sentences. Blue indicates more negativity and red more positivity for the respective difference. (b) Mean amplitudes in microvolt for the centro-parietal sensor cluster in the late LPP time window

Discussion

In the present study we found that the cortical processing of inherently neutral faces depends on the context of the presentation of the face. The association of the face with socially threatening information causes a strong modulation of the ERPs starting at about 700 ms after stimulus onset. In accordance with our expectations and with previous findings, we could find strong modulations depending on the self-referential formulation of the context information at both early (EPN) and later (LPP) stages of face processing.

The affective modulation revealed in this study was restricted to the LPP, and could be observed from 700 ms and on. The LPP is related to attentional processes that depend on the motivational significance of the stimulus (Hajcak, Weinberg, & Foti, 2012). While previous studies showed that the emotional expression of faces modulates cortical processing (Kolassa & Miltner, 2006; Li et al., 2008; Schacht & Sommer, 2009), the current study established emotional valence through second-hand information, i.e., descriptive affective sentences. In line with the recent findings of Wieser et al. (2014), who used a similar paradigm, we confirmed that affective context information is sufficient to cause a modulation of the cortical correlates of face processing. Since the stimuli that were associated with different emotions were physically identical, both studies prove that face perception depends on the emotional valence associated with a face and that this effect cannot be merely attributed to differences in structural encoding caused by confounding physical features. The observed modulations are most likely a result of a top-down regulation of a higher order alerting system. Insights from recent fMRI studies (Sabatinelli et al., 2011; Schwarz et al., 2012) suggest differences in the activation of prefrontal and fusiform gyrus brain areas as indicators for a top-down modulation of social and face perception. In addition to this, Liu et al. (2012) conducted simultaneous recordings of EEG and fMRI and could show that the LPP is generated and influenced by a wide brain network, including cortical and subcortical structures typically associated with visual and emotional processing and that the modulation of the LPP is valence specific.

Beyond the effects related to the mere arousal or negativity of the stimulus, we confirmed that the processing of faces is particularly sensitive to social context information. In the later LPP time frame (700–1000 ms), post-hoc analyses revealed augmented amplitudes for socially threatening context stimuli in comparison with both physically threatening and neutral stimuli, while amplitudes of physically threatening stimuli and neutral stimuli did not differ from each other. Hence, our results indicate that inherently neutral faces that have been associated with socially threatening information are preferentially processed in comparison to neutral and physically threatening contexts. This context-based affective influence is a rather new finding; previous findings were either restricted to other types of visual stimuli (pictures: Macnamara et al., 2011) or affective differences in the facial expressions (Duval, Moser, Huppert, & Simons, 2013). As neither arousal nor valence ratings of the context sentences indicated a difference between physically and socially threatening sentences, our results indicate that the observed differences in LPP to faces seem to be specifically related to social affective information rather than just to arousing or negative stimuli in general, regardless of their meaning. This finding fits the assumption that faces are of most eminent importance for social communication (Wieser & Brosch, 2012). In this context, it may be suggested that processing of ambiguous (i.e., neutral) faces is most notably modulated by socially relevant context stimuli versus other types of context stimuli.

We also found an interaction between self-relevance and emotional valence. Neutral context stimuli evoked a larger late positivity than affective context stimuli when they were couched in a self-relevant manner. Similar findings have been reported by Fields and Kuperberg (2012), who found that self-relevant neutral sentences lead to a comparably higher late positive potential. Following a line of argument established by Fields and Kuperberg (2012), a self-relevant context leads to a higher motivation for subjects to disambiguate the valence of the neutral face. In comparison with faces that were paired with emotionally valent contest stimuli, those presented with neutral context information remain, in comparison, still more ambiguous, thus demanding additional processing. This is in line with the assumption of Hirsh and Inzlicht (2008) who argued that stimuli with an ambiguous valence cause processing resources to be oriented towards such stimuli in order to be able to fully elevate the motivational relevance of the stimuli.

Our findings are in line with Wieser et al. (2014) insofar that both studies found a large modulation of the LPP by the self-reference of the stimulus. In addition to this result, we also found an affective modulation of the LPP by the socially threatening stimuli. Regarding the EPN, we could confirm a modulation of the EPN by self-reference. At the same time Wieser et al. found an affective modulation of the EPN which was not confirmed by our study. A possible explanation for these partly overlapping, partly differing findings could be the differences in both paradigms. While we randomly paired the faces with sentences, Wieser et al. paired every face with a specific emotional category and repeatedly presented the face within a specific affective context. This could have led to a more intense conjunction of face and emotion as it represents a learning process, while our paradigm did not involve this aspect. It seems important to consider that the stimuli which were used for the ERP were neutral faces that miss inherent affective valence and only acquired a proposed valence through prior experience. MacNamara et al. (2011) suggested that context information may need some time to modulate the information processing, this could especially be crucial for context information that is only presented once and could explain the different findings at the EPN stage. Our results specifically indicate an enhancing impact of self-reference, with self-referent context cues leading to an augmented negativity. A possible interpretation is that self-referent context information leads to more intense early (EPN) and also later (LPP) processing. It is possible that the social stimuli that were used in our experiment had a higher potential to modulate the LPP in comparison with the sentences used by Wieser et al. The socially threatening sentences used in our study may have been perceived with high self-reference even if they are not directly addressed to the subject, while other negative sentences such as those used by Wieser et al. may need an augmentation through self-reference.

Contrary to a previous study (Diéguez-Risco et al., 2013), we could not find any effect on N170. These components represent early visual processing, and the strict exclusion of a confound with physical features may have prevented detection of differences in these early analytic steps (Eimer & Holmes, 2002; Vuilleumier & Pourtois, 2007). Other studies also found that the N170 and previous components are unaffected by emotional faces (Schacht & Sommer, 2009; H. T. Schupp, Öhman, et al., 2004).

This study is limited by several factors. The implemented passive viewing paradigm is useful in that it provides a standardized, economic way of measuring the influence of context information on the processing of visual stimuli. However, it does not generate an atmosphere that is fully comparable to social situations. It is also not clear how well the self-referential variations were actually perceived by the experimental subjects. In order to gain more validity (both internal and external), it would be helpful to create paradigms that involve realistic experimental situations and feedback-mechanisms to make sure that the targeted stimulus-related variations are valid. Whilst the present study manages to prevent influences of structural differences of faces that are solely used in one or another category of the used stimuli, it remains unclear whether there are between-subject factors that are important for the research question. For example, it would be highly interesting to know to what extent factors such as previous social experiences or psychopathology, especially in the field of social anxiety or other anxiety disorders, influence the impact of (negative) context information on the processing of social situations. Future studies should include at least two groups, for example a group comprising of persons with social anxiety disorder (SAD) and a control group. In line with results reported by Wieser et al. (2014), the affective ratings of the context sentences indicate that self-reference was confounded with the perceived arousal and valence of the stimuli. With regard to valence effects, patterns of the ERP cannot be attributed to differences in affective ratings of the sentences. As there were no differences in valence or arousal ratings between socially and physically threatening sentences, ERP data suggests a more pronounced activation for faces put in a socially threatening context. The observed differences in arousal ratings in the self- versus other-relevant context stimuli are, however, reflected in main and interaction effects in the ERP data. Therefore, it may be suggested that these differences are not merely related to self- or other-relevance, but to the differences in inherent arousal of the stimuli. It must be emphasized that the only part of the sentences that differed between self- and other-relevant sentences is the part which directs the sentence to the person or to another person (you/someone). Thus, it is highly unlikely that there is any other difference between these two sentences other than the self-relevance. Accordingly, differences in arousal are induced by self-relevance and therefore modulations in ERP are connected to those differences. Different patterns of results in our study suggest that the observed results represent more than just mere differences in arousal and valence of the initial context stimuli, but that modulations in self-relevance lead to distinct processing when used as social context information. However, based on our data we are not able to distinguish to what extent the effects of self-reference are mediated by arousal. Another limitation is the lack of obtained affective rating for the supposedly neutral faces. We do not know if and how the actual evaluation of the faces changed during the experiment due to the contextual manipulation. The results of Wieser et al. (2014) suggest an influence of repeatedly presented affective context information of a specific valence on neutral faces; it would be interesting to know how the random pairing which was implemented in our experiment impacted on the affective value of the different neutral faces. In order to solve this issue, it would be helpful to collect affective ratings at different time points of the experiment. An additional limitation is the possible overlap between physically and socially threatening context cues. While physically threatening information may often also be considered as a threat to social integrity, socially threatening context cues are not automatically physically threatening and may therefore initiate different processing, which is confirmed by our results. In order to further control this possible confounding factor, an additional rating of subjective feelings of threat on both domains could be beneficial.

Taken together, the results of the current study are largely consistent with other recent publications and underline the influence of affective second-hand information on the visual processing of inherently neutral faces. The clinical implications are limited, however, since social situations are rarely solely neutral and without any affective visual information. Therefore, future research should focus on the influence of second-hand information on different affectively biased stimuli, such as affective faces. It would be worthwhile finding out whether there are any boosting or debilitating effects on visual processing, especially in interaction with self-reference.