Facial expressions are by far the most frequently used stimuli in human emotion perception research. Over decades, a large body of evidence has been published showing that emotion perception is not just based on facial information alone (Hunt 1941). Indeed, in our natural world, a face is usually encountered not as an isolated object but as an integrated part of a whole body. The face and the body both contribute in conveying the emotional state of the individual. Meeren et al. (2005) show that observers judging a facial expression (fear or anger) are strongly influenced by emotional body language; an enhancement of the occipital P1 component as early as 115 ms after stimulus presentation onset points to the existence of a rapid neural mechanism sensitive to the agreement between simultaneously presented facial and bodily emotional expressions. Aviezer et al. (2008a) positioned prototypical pictures of disgust faces on torsos conveying different emotions. Their results showed that placing a face in a context induced striking changes in the recognition of emotional categories from the facial expressions to the extent where the “original” basic expression was lost when positioned on an emotionally incongruent torso (for the interested reader see Aviezer et al. 2008b). Knowledge of the social situation (Carroll and Russell 1996), body postures (Meeren et al. 2005; Van den Stock et al. 2007; Aviezer et al. 2008a), voices (de Gelder and Vroomen 2000; Van den Stock et al. 2007), scenes (Righart and de Gelder 2006, 2008a, b), linguistic labels (Barrett et al. 2007), or other emotional faces (Russel and Fehr 1987) all influence emotion perception.

Research on context effects has a long tradition in object but not in face recognition. Because of repetitive co-occurrence of objects or co-occurrence of a given object in a specific context, our brains generate expectations (Bar and Ullman 1996; Biederman et al. 1974). A context can facilitate object detection and recognition (Biederman et al. 1982; Boyce and Pollatsek 1992; Boyce et al. 1989; Palmer 1975), even when glimpsed briefly and even when the background can be ignored (Davenport and Potter 2004). Joubert et al. (2008) observed that context incongruence induced a drop of correct hits and an increase in reaction times, affecting even early behavioral responses. They conclude that object and context must be processed in parallel with continuous interactions, possibly through feed-forward co-activation of populations of visual neurons selective to diagnostic features. Facilitation would be induced by the customary co-activation of “congruent” populations of neurons, whereas interference would take place when conflictual populations of neurons fire simultaneously. Bar (2004) proposes a model in which interactions between context and objects take place in the inferior temporal cortex.

In line with the evolutionary significance of the information, the effects of the emotional gist of a scene may occur at an early level and it has been suggested that the rapid extraction of the gist of a scene may be based on low spatial frequency coding (Oliva and Schyns 1997). We previously showed scene context congruency effects on the perception of facial expressions (Righart and de Gelder 2006, 2008a, b). They were seen when participants explicitly categorized the emotional expression of the face (Righart and de Gelder 2008a) but also when they focussed on its orientation (Righart and de Gelder 2006). This indicates that affective gist congruency reflects an early and mandatory process and suggests a perceptual basis. Our EEG studies support this view: the presence of a fearful expression in a fearful context enhanced the face-sensitive N170 amplitude when compared to a face in a neutral context. This effect was absent for contexts-only, indicating that it resulted from the combination of a fearful face in a fearful context (Righart and de Gelder 2006). Righart and de Gelder (2008a) replicated this finding by briefly (200 ms) presenting fearful faces in fearful versus happy scenes.

Similar context effects have already been found for bodies. Using point-light displays, Thornton and Vuong (2004) have shown that the perceived action of a walker depends upon actions of nearby “to-be-ignored” walkers. The task-irrelevant figures could not be ignored and were processed unconsciously to a level where they influenced behavior. Another point-light study demonstrates that the recognition of a person’s emotional state depends upon another person’s presence (Clarke et al. 2005).

If indeed we recognize a person’s emotional behavior in relation to that of the social group, it is important to focus on the specific aspects of group behavior. Group behavior may be considered at different levels, of which three are relevant for understanding the visual process at stake: (1) the relative group size, (2) the dynamic motor and action aspects of the group and (3) the affective significance of the group’s activity (Argyle 1988). Context effects may take place along all three dimensions and therefore require appropriate control conditions. First, group size is not considered as a variable in our study as the different group scenes used all have similar group sizes. The second and third aspects relating to action and effect were the focus of our recent brain imaging studies (de Gelder et al. 2004; Grèzes et al. 2007; Kret et al. submitted; Pichon et al. 2008) and see de Gelder et al. (2010) for an overview.

Here we investigated whether briefly viewed information from a task-irrelevant social scene influences how observers categorize the emotional body expression of the central figure. For this purpose, we selected scenes that represent a group of people engaged in an intense action either neutrally or affectively laden. By contrasting the affective meaning and keeping the action representation similar, we manipulated specifically the affective dimension of the social scenes. Our main interests were threefold. First, we aimed to investigate the influence of a congruent versus incongruent scene on body expression recognition. We expected enhanced performance in the congruent conditions. Second, we were interested in disambiguating the contribution from the emotion versus the action component. Our hypothesis was that the similarity along the emotion dimension of the social rather than along the action dimension influences recognition of the target body expression. If so, bodily expressions may be recognized faster in an emotionally congruent than in a neutral action scene indicating that the effect derives from target-scene emotional congruency. Third, we aimed to investigate the contribution of facial expressions visible in the scene to the recognition of emotional body expressions. Based on previous studies that report strong mutual influence of face and body expressions, we expected the strongest context effects when scenic facial expressions were visible.

In Experiment 1, these predictions were tested by presenting fearful and happy bodies in fearful, happy, neutral and scrambled contexts. In Experiment 2, we compared happy with angry body expressions. Experiment 3 was similar to Experiment 1, but the faces in the background were blurred to ascertain that possible effects were a result of the body expressions visible in the scene and not of bystanders’ facial expressions. In Experiment 4, faces in the scenes were blurred and the same design as Experiment 2 was kept. We used naturalistic color photographs as color has been shown to improve object and scene recognition (Oliva and Schyns 2000; Wurm et al. 1993). Based on previous results (Righart and de Gelder 2006, 2008a, b) and on studies of rapid scene recognition (Bar et al. 2006; Thorpe and Fabre-Thorpe 2002; Maljkovic and Martini 2005), we used short presentation times.

Angry, fearful, sad and happy expressions are the emotions that are most often used in emotion research (de Gelder 2006). Here we specifically wanted to contrast a positive versus negative body expression in a positive/welcoming versus negative/threatening social scene that one wants to avoid. As an opponent of the positive, happy emotion, we could choose among angry, disgusted, fearful or sad stimuli. We did not opt for sad bodies and scenes since these contain less action than happy bodies and scenes, whereas angry and fearful stimuli contain comparable action intensity. Whereas disgust can be expressed very clearly via the face, the body expressions are more ambiguous and resemble fearful expressions (de Gelder 2006). We did not include anger and fear in one design since that would result in twice as many negative versus positive emotions. With the current design, we had an equal amount of positive and negative body expressions in each experiment.

The influence of a social context on the perception of an emotional body expression

Experiment 1. Fearful and happy bodies in a social emotional context including faces

Method

Images of emotional body postures were briefly presented in scenes showing intense group activities with either neutral or emotional valence. Participants rapidly categorized the target body expression.

Participants: Twenty-four students of Tilburg University (7 men; mean age: 20 years, range 17–25 years old) with no neurological or psychiatric history and normal or corrected-to-normal vision participated in the study. The experimental procedures were in accordance with the Helsinki Declaration and approved by Tilburg University.

Apparatus, Design and Procedure: We briefly describe the construction and validation of the target body stimuli. A total of 38 male and 46 female amateur actors were recruited. Prior to the photography session, they were instructed with a standardized procedure and received payment. As part of the instructions, the actors were familiarized with a typical scenario corresponding to each emotion; the fearful scenario was an encounter with an aggressive dog and the happy scenario was an encounter with a friend. A total of 869 body stimuli (consisting of fearful, happy, angry, sad, disgusted and neutral instrumental actions) were included in the validation study and were shown to 120 participants. Stimuli were presented for 4 s with an inter-stimulus interval of 7 s. Participants were instructed to categorize the emotion displayed by circling the correct answer on an answer sheet. Eight happy and fearful body images, correctly recognized on average for 91% (standard deviation, SD 10), were included in the experiment.

Scenes were selected from the Internet. We took care to make them gender balanced (for example, the neutral condition included a soccer field with male players; therefore, we also included a female hockey team playing). In a separate validation study, we measured affective gist recognition by presenting each image twice for 100 ms in random order. Fearful (people running away for danger), happy (people dancing at a party) and neutral scenes (people involved in sports) were correctly recognized for 87, 97, and 92%, respectively. Scenes showing bodies involved in neutral actions served as baseline.

We also validated the stimuli as they were used in the experiments described in this paper. The selected bodies were pasted on fearful, angry, happy and neutral scenes. These were presented with unlimited duration to 24 participants who had to categorize the emotion of the middle target body. The mean (M) recognition rates and SDs were as followed: angry bodies in angry scenes (M = 91%, SD 8), angry bodies in happy scenes (M = 89%, SD 11), angry bodies in neutral scenes (M = 91%, SD 9), fearful bodies in fearful scenes (M = 97%, SD 6), fearful bodies in happy scenes (M = 98%, SD 4), fearful bodies in neutral scenes (M = 96%, SD 8), happy bodies in happy scenes (M = 75%, SD 21), happy bodies in fearful scenes (M = 73%, SD 23), happy bodies in angry scenes (M = 75%, SD 22) and happy bodies in neutral scenes (M = 77%, SD 21). Body expressions were not better or worse recognized in a congruent versus incongruent or in a congruent versus neutral scene when stimuli were presented with unlimited duration. See Fig. 1 for stimulus examples.

Fig. 1
figure 1

a The upper left figure shows a man who is joyfully surprised and greets an old friend. Strangely, this man is in the middle of a fight. b The upper right figure shows a man who is threatening another person that also wants to join the fight. cMiddle left: the woman on the foreground is frightened at something but the other people in the scene do not experience the situation as threatening and are still enjoying the party, as can be read from their body language. The incongruence makes recognition of the emotion of the foreground figure difficult. dMiddle right: the girl on the foreground is happily welcoming a new visitor/a friend at the party. Her emotion matches the social situation and the emotion of the other people. e/f The figure below shows a man (left) and women (right) who are frightened at something. The people in the scene are involved in sports. Body expressions are easier to recognize when congruent with the social scene

Mosaic squared scrambles (38 × 28) were created using MATLAB, containing identical luminance, color and contrast as the originals. For each scene category, eight similar scenes were included. There were two emotions (fearful and happy) shown by eight actors (half man), four context categories (fearful, happy, neutral, scrambles) and eight different scenes (versions) per context category, yielding eight conditions of 64 stimuli each (512 stimuli). Stimuli were arranged in two equivalent blocks to allow participants a 2-min break in between. Each block thus contained 256 randomized trials. In order to have the scrambles equally represented in the experiment and not to fatigue the participant with superfluous trials, we included 64 of the 384 scrambled counterparts of all unscrambled stimuli. Since we controlled for handedness by counterbalancing two versions of the experiment across the participants (in version 1, fear was button nr.1 on the response box and happy button nr. 2 and in version 2 the inverse), we were able to select different scrambles per version and use as many different ones as possible. Initially, we meant to use the scrambled condition as a non-emotion, non-action condition. However, across all experiments, bodies were significantly better recognized in these scrambled contexts than in unscrambled contexts (possible pop-out effect or due to the semantic information content of the background) (t (175) ≤ 2.58, P ≤ .01) similar to the results for faces in scrambled context observed in Righart and de Gelder (2006). Therefore, the neutral condition was considered as a more viable baseline (see also Sommer et al. 2008). Compared to the scrambled scenes, neutral scenes still contain action (without emotion) and were therefore considered a better baseline.

Participants were seated at a table in a dimly lit, soundproof booth. Distance to the computer screen was 60 cm. Stimuli were presented on a PC screen with a 60 Hz refresh rate and subtended 19.9° of visual angle vertically and 30.8° horizontally. Instructions were given verbally and via an instruction screen. Participants were given a two-alternative forced choice task over two emotions and were instructed to focus on the main figure in the middle of the screen, to categorize as accurate and rapid as possible its emotion, to respond with their right index and middle finger and not to change the position of their fingers during the experiment. A trial started with a white fixation cross on a gray screen (300 ms), a stimulus (100 ms), followed by a gray screen shown until button press (with a maximum duration of 8 s).

Results

Trials with reaction times (RT) below 200 ms or above 2,000 ms were discarded from the analysis, leading to 2.1% outliers. Trials were also excluded from the RT analyses if the response was incorrect. Main and interaction effects of scene and body emotion for mean accuracy (ACC) and RT were tested in a 2 × 3 repeated measures analysis of variance (ANOVA) with two within-participant variables, “body emotion” (fear and happy) and “context emotion” (fear, happy, neutral). T-test planned comparisons were used to test our hypothesis of congruency effects and to compare the perception of an emotional body expression in a congruent emotional scene with a scene that contains action but is emotionally neutral. We expected that a congruent social scene would not only enhance recognition, it would also speed up recognition when compared to an incongruent and also when compared to a neutral scene. All statistical information can be found in Table 1.

Table 1  

Accuracy: There was a main effect of context emotion [F(2, 46) = 5.39, P < .01, η 2p  = .19] and for body emotion [F(1, 23) = 4.30, P < .05, η 2p  = .16]. Bonferroni corrected pairwise comparisons revealed that bodies were better recognized in a neutral context than in a happy context (P < 0.05). Happy bodies were better recognized than angry bodies (P < 0.01). An interaction effect was observed for body emotion × context emotion [F(2, 46) = 5.39, P < .01, η 2p  = .19].

Bodies, irrespective of the specific emotion, were more accurately recognized in a congruent context versus incongruent context and in a neutral context versus in an incongruent context. There was a trend toward significance for enhanced body recognition in a congruent context versus in a neutral context. Fearful body expressions were more accurately recognized in a fearful context than in a happy context or than in a neutral context. Happy bodies were better recognized in a happy versus in a fearful context but not versus in a neutral context.

Reaction time: An interaction was found for body emotion × context emotion [F(2, 46) = 3.83, P < .05, η 2p  = .14].

Bodies were faster recognized in a congruent context versus in a neutral one. There was a trend toward significance for faster recognition of body expressions in a congruent versus neutral context but there was no difference between a neutral and incongruent context. There was a faster response for a fearful body in a fearful context than for a fearful body in a happy or neutral context. A happy body in a happy context when compared to in a fearful context did not yield a significant difference and neither when compared to in a neutral context.

Discussion

Target body expressions were more accurately recognized in congruent social scenes than in incongruent or baseline social scenes. Recognizing a fearful expression was more accurate in a context of people fleeing from danger and a happy body expression was best recognized in a context consisting of people dancing at a party. An emotionally congruent scene possibly speeds up the recognition process of fearful body expressions. Although fearful bodies were recognized faster in a fearful context, we cannot draw the conclusion that happy bodies were also faster recognized in the congruent condition. Importantly, the ACC congruency effects were no speed-accuracy trade-offs.

Experiment 2. Angry and happy bodies in a social emotional context including faces

Our goal was to measure whether the observed effects of Experiment 1 would generalize to angry expressions. Therefore, we replicated the previous experiment with angry rather than fearful bodily expressions in an angry (congruent), happy (incongruent) or neutral context.

Method

Participants: A new group of 22 students participated (5 men; range 18–25 years old, Mean: 21 years old).

Apparatus, Design and Procedure: Apparatus, design and procedure were identical to the former experiment with the difference of angry rather than fearful stimuli. Angry scenes (people on strike) were 88% correctly recognized as was measured subsequent to the main experiment in a separate validation of the scenes without foreground bodies. Eight angry body images that were on average correctly recognized for 92% (SD 10) replaced the fearful body images that were used in the previous experiment.

Results

ACC and RT were calculated after exclusion of two participants due to recognition at chance level and extremely fast RTs. 2.5% of the trials fell out of the range of 200–2,000 ms and were treated as outliers. Trials were also excluded from the RT analyses if the response was incorrect.

Accuracy: A main effect was found of body emotion [F (1, 21) = 6.31, P < .05, η 2p  = .23]. Bonferroni corrected pairwise comparisons revealed that happy bodies were better recognized than angry bodies (M = 86, SD 17 vs. M = 75, SD 16) (P < 0.05). There was no main effect of scene emotion. There was a significant interaction between body emotion and context emotion [F (2, 42) = 17.41, P < .001, η 2p = .45].

Bodies were better recognized in a congruent versus in an incongruent and versus in a neutral context and also in a neutral versus in an incongruent context. Angry bodies in an angry context were better recognized than in a happy context or in a neutral context. Happy bodies were better recognized in a happy context than in an angry context. Happy bodies were not better recognized in a happy than in a neutral context, although a trend toward significance was observed.

Reaction time: There was a main effect for body emotion [F(1, 21) = 4.32, P < .05, η 2p  = .17]. Bonferroni corrected pairwise comparisons revealed that angry bodies were recognized faster than happy bodies (M = 649, SD 157 vs. M = 687, SD 210) (P = 0.05). An interaction effect between body emotion and context emotion was found [F(2, 42) = 3.36, P < .05, η 2p  = .14]. Bodies were recognized faster in a congruent versus in an incongruent context and in a neutral context versus in an incongruent context. Target body expressions were not faster recognized in a congruent context versus in a neutral context. Angry bodies were recognized faster in an angry versus in a happy context but not versus in a neutral context. There was a trend toward significance for happy bodies in a happy versus in an angry context, but not when compared to in a neutral context.

Discussion

The congruency effect we found in Experiment 1 was also present for angry expressions. Angry bodies were more accurately and faster recognized in an angry context. Happy body expressions were better recognized in a happy context. We cannot rule out, however, the possible confounding influence of the presence of faces in the scenes.

The role of facial expressions in interactions between emotion of the context and body expression

Experiment 3. Fearful and happy bodies in a social emotional context with blurred faces

We repeated the first experiment but blurred the facial expressions that were still visible in the scenes that may have confounded the obtained results in Experiment 1. Blurring the faces allowed us to measure the influence of pure bodily expressions of bystanders on the recognition of one individual’s body expression.

Method

Participants: A new group of 22 students participated (6 men; range 18–41 years old, Mean: 22 years old).

Apparatus, Design and Procedure: The single difference from Experiment 1 was the blurring of faces in the scenes.

Results

The trials that fell out of the pre-defined range that was used in the former experiments were considered as outliers (2.5%). Trials were also excluded from RT analyses if the response was incorrect.

Accuracy: An interaction effect was observed of body emotion × context emotion [F (2, 42) = 5.14, P < .01, η p = .20].

Bodies were better recognized in a congruent context versus in an incongruent context and in a congruent context versus in a neutral context. Moreover, bodies were better recognized in a neutral context versus in an incongruent context. Recognition of fearful bodies was better in a fearful context than in a happy context or than in a neutral context. Happy bodies were better recognized in a happy context than in a fearful context but not than in a neutral context, although a trend was observed.

Reaction time: A trend toward significance was found for the interaction body emotion × context emotion [F (2, 42) = 2.41, P < .1, η 2p  = .11].

Body expressions were when compared to baseline facilitated by a congruent context and slowed down by an incongruent versus a congruent context. There was no difference between a neutral and incongruent context. RTs in the congruent fear condition were shorter than in the incongruent happy context and were also shorter than in the neutral context. Happy bodies were not recognized faster in a happy context when compared to in a fearful context or in a neutral context.

Discussion

The enhanced recognition ACC of body postures in congruent versus incongruent scenes found in Experiment 1 obtains when faces in the scenes were blurred. Happy bodies were best recognized in a happy context, fearful bodies in a fearful context. RTs in the congruent fear condition were shorter than fearful bodies in the happy or neutral scenes, but there was only a trend toward significance for the interaction body emotion × context emotion. The results indicate that the emotional congruency effect that we found in Experiment 1 cannot be attributed to the presence of facial expressions. The presence of people expressing emotions via bodily postures influences the perception of body emotion from the target figure. Our next question is whether the same conclusion can be drawn for angry expressions.

Experiment 4. Angry and happy bodies in a social emotional context with blurred faces

Method

Participants: A new group of twenty students participated (7 men; range 19–30 years old, Mean: 22 years old).

Apparatus, Design and Procedure: Apparatus, design and procedure were identical to the former experiment that used blurred faces with the sole difference of angry instead of fearful stimuli.

Results

Due to recognition at chance level, three participants were excluded from analysis. Moreover, 2.4% of the trials were excluded since they fell out of the pre-defined range that was maintained across all experiments. Trials were also excluded from the RT analyses if the response was incorrect.

Accuracy: A main effect was found of context emotion [F (2, 38) = 4.57, P < .05, η 2p  = .19]. Bonferroni corrected pairwise comparisons revealed that bodies were better recognized in a happy context than in an angry context (P < 0.05). There was an interaction effect between body and context emotion [F (2, 38) = 5.22, P < .01, η 2p  = .22].

Bodies were better recognized in a congruent context versus in a neutral or incongruent context and in a neutral context versus in an incongruent context. Angry bodies in an angry context were better recognized than in a happy context but not than in a neutral context. Happy bodies in a happy context were more accurately recognized than in an angry context or than in a neutral context.

Reaction time: There were no main or interaction effects. In the perspective of the former experiments, we had clear expectations about a possible facilitating influence of a congruent scene and therefore, we conducted planned comparison paired samples t-tests. These revealed that RTs in the congruent anger condition were shorter than in the happy context but not than in the neutral context. Happy bodies were not recognized faster in happy contexts when compared to in angry or neutral contexts.

Discussion

After having blurred the faces, all effects remained in the ACC data and for the angry but not happy body expressions in the RT data. The results were not related to speed-accuracy trade-offs. See Fig. 2.

Fig. 2
figure 2

a Recognition was more accurate and faster for a fearful body in a fearful versus happy or neutral context. A happy body in a happy versus fearful context was better recognized. b Angry bodies in an angry context were more accurate and faster recognized than in a happy context and happy bodies in a happy versus angry (or neutral for ACC data) context. c Congruency effects in the ACC data for both body emotions were observed. Moreover, fearful bodies were better and faster recognized in a fearful versus neutral and happy context. d Angry bodies in an angry context were more accurately and faster recognized than in a happy context. Happy bodies in a happy context were better recognized versus in an angry or neutral context. In sum, individual body expressions were best recognized in emotionally congruent social scenes

The presence of facial expressions

Difference scores of congruent and incongruent conditions were calculated and independent sample t-tests conducted to compare the context congruency effect in the experiments where faces were still visible versus where they were not.

Angry body expressions: The congruency effects in the ACC data of angry body expressions were stronger in the experiments where facial expressions were visible: ACC anger non-blurred congruent—incongruent (Mdifference = .13, SD .10) versus anger blurred congruent—incongruent (Mdifference = .04, SD .12) (t (40) = 2.61, P < .05, d = .80). Subsequent t-tests revealed the origin of this effect; angry body expressions were better recognized in a happy context where faces were blurred than where they were visible (M = 77%, SD 18 vs. M = 70%, SD 17) (t (40) = 1.38, P = .09, d = .43). This was not due to the specific presence of happy facial expressions since a similar effect was observed for the presence of neutral facial expressions (blurred vs. non-blurred (M = 72%, SD 13 vs. M = 80%, SD 15) (t (40) = 1. 89, P < .05, d = .58). The presence of angry facial expressions did not influence ACC or RT.

Fearful body expressions: A numerically consistent but non-significant trend as with angry body expressions was observed.

Happy body expressions: The congruency effect in the ACC data of happy bodies (in happy or angry context) was larger when facial expressions were invisible (Mdifference = −.09, SD .25 vs. Mdifference = .11, SD .16) (t (40) = 2.97, P < .01, d = .93). Subsequent t-tests did not reveal any differences. There were no differences in the RTs.

In conclusion, the influence of facial expressions in the scene was dependent on the specific emotional expression. The presence of facial expressions was not the crucial factor toward a congruency effect. In all four experiments, the body expressions in the scene influenced how the target body expression was perceived.

General discussion

The aim of this study was to investigate the influence of social contexts on the recognition of a single emotional body expression. The effects of congruency on emotional body perception were investigated using manipulated photographs containing a foreground figure that was either displaying the same or a different emotion than the people in the background. In the first experiment, participants categorized as fast as possible the emotion of the actor (fear and happy). In the second experiment, the task was similar, but we used different emotions (anger and happy). The third experiment was similar to Experiment 1, but the faces of people in the scene were blurred to ascertain that the obtained effects were specifically related to the congruence of the body expressions seen in the background. In the fourth experiment, angry and happy expressions were used and faces in the scenes were blurred. Finally, the effect of the presence of facial expressions was tested by comparing reaction times and accuracy rates of all conditions of Experiment 1 with 3 and 2 with 4.

In the human emotion literature, there is thus far no answer to the question of whether our recognition of an individual’s emotional body language is influenced by bodily expressions of other individuals as perceived in a naturalistic scene. It is known that an emotional scene influences the perception of facial expressions. Body expressions, in contrast to facial expressions, represent and implement emotion, but in addition also direct action. Therefore, we go beyond our prior studies by investigating the role of the social emotional scene representing actions from other people on the perception of an individuals’ emotional body expression. During our life span, we are confronted more often with situations where people express similar (group) emotions and therefore it is likely that we are quicker and better in reacting to less ambiguous situations since this has survival value. Therefore, we predicted an enhanced recognition of body expressions in an emotionally congruent versus incongruent and neutral social action scene especially when facial expressions in the scenes were visible.

Indeed, fearful, angry and happy body expressions were more accurately recognized in congruent social emotional scenes. We observed a significant contribution to a congruency effect of the presence of facial expressions in Experiment 4 versus 2. This was merely due to increased incongruence for an angry body in a happy and neutral context with facial expressions visible. The presence of facial expressions did not increase ACC in the congruent conditions and neither specifically speeded up processing. However, different participants were involved in the different experiments. For the interested reader, see Meeren et al. 2005; Van den Stock et al. 2007; Aviezer et al. 2008a, b.

The actions we see going on in the background may automatically trigger action representation. We hypothesized that when the actions seen in the background have emotional significance similar to that of the central character, the target recognition would speed up. In Experiment 1 and 3, we indeed found that fearful bodies were recognized faster in a fearful than in a happy or neutral context. Furthermore, the presence of other angry people in the scene speeded up the RT in the observer when compared to when the angry body expression was perceived in a happy context (Experiment 2). After having blurred the faces in Experiment 4, we lost the interaction between body and scene emotion in the RTs, but a congruency effect for angry expressions was still present. All predicted effects were present in the ACC data. We found the strongest congruency effects for the RT data in the fearful body scene compounds. Although we expected to find it as clearly for the other emotions, thinking about how fast mass panic can spread out over many people versus observers’ ambivalent behavior in an aggressive situation (fight, help or flight?) or the time it takes to “warm up/drink in” at a party, this might not be a strange result after all.

Our study is in line with earlier studies about scene congruency effects in object recognition. For example, if there is high probability that a certain context surrounds a visual object, the processing of that object is facilitated, whereas unexpected contexts tend to inhibit it (Palmer 1975; Ganis and Kutas 2003; Davenport and Potter 2004; but see also Hollingworth and Henderson 1998). Studies of scene recognition and context effects show that scenes can be processed and scene gist recognized very rapidly (Thorpe and Fabre-Thorpe 2002; Maljkovic and Martini 2005; Bar et al. 2006; Joubert et al. 2007). ERPs recorded from the visual cortices demonstrate differences between emotional and neutral scenes as early as 250 ms from stimulus onset (Junghöfer et al. 2001). In a recent study, Joubert et al. (2008) investigated the time-course of animal/context interactions in a rapid go/no-go categorization task. They conclude that the congruence facilitation is induced by the customary co-activation of “congruent” populations of neurons, whereas interference would take place when conflicting populations of neurons fire simultaneously.

However, our study includes the factor ‘emotion’. Emotions are intimately linked to action preparation. The results of the current study are in line with the few experimental studies that currently exist on the influence of emotional scenes on the perception of faces and bodies and imply that the facilitating effect of context congruence reflects a mandatory process with an early perceptual basis (Righart and de Gelder 2006).

There is a possibility that the congruency effect occurs at the response level. Especially in case of a more ambiguous stimulus participants may attend to the context. However, there is not much time for that and it is against the task instructions. A presentation time of 100 ms (although not masked) is too short to make saccades from the fixation point. Biederman et al. (1982) and Davenport and Potter (2004) have shown that 100 ms is sufficient to expect pop-out effects between background and foreground object. In the current experiment, the foreground figure is pasted on the background and although we tried to make the scene as naturalistic as possible, some pop-out effect of the foreground figure may have come through. However, if so, this will be true as much in all conditions since all the foreground bodies were combined with all scenes. Moreover, when stimuli were presented with unlimited presentation duration, and scenes processed consciously, body expressions were not better or worse recognized in a congruent versus in an incongruent or neutral scene that also speaks against a response conflict.

Finally, yet other processes than the ones measured here may contribute to the observed effects. For example, the tendency to automatically mimic and synchronize facial expressions, vocalizations, postures and movements with those of another person and, consequently, to converge emotionally may play a role (de Gelder et al. 2004; Hatfield et al. 1994). The same brain areas are involved when subjects experience disgust (Wicker et al. 2003) or pain (Jackson et al. 2005), as when they observe someone else experiencing these emotions. Such a process may contribute to observers’ ability to perceive rapidly ambiguity between a person’s body language and its social emotional context. This incongruity may create a conflict in emotional contagion processes triggered by the target figure and help to explain the slower and less accurate reaction of the observer. This explanation needs further testing using EMG measurements.