Journal of Abnormal Child Psychology

, Volume 38, Issue 5, pp 587–599

In the Eye of the Beholder: Eye-tracking Assessment of Social Information Processing in Aggressive Behavior

  • Tako A. Horsley
  • Bram Orobio de Castro
  • Menno Van der Schoot
Open AccessArticle

DOI: 10.1007/s10802-009-9361-x

Cite this article as:
Horsley, T.A., de Castro, B.O. & Van der Schoot, M. J Abnorm Child Psychol (2010) 38: 587. doi:10.1007/s10802-009-9361-x


Acording to social information processing theories, aggressive children are hypersensitive to cues of hostility and threat in other people’s behavior. However, even though there is ample evidence that aggressive children over-interpret others’ behaviors as hostile, it is unclear whether this hostile attribution tendency does actually result from overattending to hostile and threatening cues. Since encoding is posited to consist of rapid automatic processes, it is hard to assess with the selfreport measures that have been used so far. Therefore, we used a novel approach to investigate visual encoding of social information. The eye movements of thirty 10–13 year old children with lower levels and thirty children with higher levels of aggressive behavior were monitored in real time with an eyetracker, as the children viewed ten different cartoon series of ambiguous provocation situations. In addition, participants answered questions concerning encoding and interpretation. Aggressive children did not attend more to hostile cues, nor attend less to non-hostile cues than non-aggressive children. Contrary, aggressive children looked longer at non-hostile cues, but nonetheless attributed more hostile intent than their non-aggressive peers. These findings contradict the traditional bottom-up processing hypotheses that aggressive behavior would be related with failure to attend to non-hostile cues. The findings seem best explained by topdown information processing, where aggressive children’s pre-existing hostile intent schemata (1) direct attention towards schema inconsistent non-hostile cues, (2) prevent further processing and recall of such schema-inconsistent information, and (3) lead to hostile intent attribution and aggressive responding, disregarding the schema-inconsistent non-hostile information.


Aggressive behaviorSocial cognitionInformation processingSocial information processing

Social information processing (SIP) plays important roles in the development of social and aggressive behavior. Aggressive behavior is associated with and predicted by specific social information-processing patterns and interventions targeting these patterns are relatively effective (Dodge et al. 2006). Social information-processing models (SIP; Crick and Dodge 1994; Dodge 1986; Lemerise and Arsenio 2000) propose that, to react appropriately to social situations, social information has to be encoded accurately, the encoded information has to be represented correctly, an interaction goal needs to be specified, adaptive emotions need to arise and to be regulated, response alternatives have to be generated, these response alternatives have to be evaluated, and the selected response has to be enacted.

Numerous studies have shown that aggressive behavior is associated with deviations in interaction goals, emotion regulation, response generation, evaluation, and enactment (Crick and Dodge 1994; Dodge 2006; Dodge and Pettit 2003; Fontaine 2008). Concerning representation of intent, a meta-analytic review demonstrated a robust relation between hostile intent attribution and aggressive behavior. However, this review also indicated that assessment of hostile intent attribution has so far been confounded with encoding, because no clear distinction in measurement has been made between which information is attended to (encoding) and how this information is consequently represented (representation) (de Castro et al. 2002).

Surprisingly little is known about encoding, the very first step in social information processing models. During the encoding stage, the most relevant social cues in a given situation are attended to and encoded for further processing. Each situation consists of an overwhelming amount of potentially relevant information. To handle information efficiently, encoding needs to be selective, fast, and automatic. Individuals may differ in the selection of information they attend to, the duration of fixation on specific information once it is attended to, and their accuracy of encoding information that is attended to. Crick and Dodge (1994) hypothesized that aggressive children pay relatively more attention to hostile than to non-hostile cues. This bias towards hostile cues would enhance the likelihood to interpret social situations as hostile and consequently increase the probability of an aggressive response. Thus, in theory, individual differences in encoding may affect all further processing, so individual differences in all following processing steps may be partly due to individual differences in encoding. For example, the well-established tendency of aggressive children to over-attribute hostile intent may be partly due to problems with encoding intention cues (de Castro et al. 2002).

Empirical findings concerning encoding in SIP are rather unclear. For over twenty years it has been noted that encoding is a crucial factor in information processing that has proven particularly hard to study (see for example Crick and Dodge 1994; de Castro et al. 2002; Dodge 1986; Gottman 1986; Lemerise and Arsenio 2000). There is a considerable discrepancy between the nature of encoding processes and the methods used to study them. By definition, encoding proceeds fast, automatically, and unconsciously. In contrast, assessment of social information processing has so far primarily relied on reflective, self-report assessments. Even though numerous studies have studied relations between such self-reported indirect indicators of encoding and aggressive behavior in children, no research to date has directly studied encoding of social information in relation with aggressive behavior, as far as we know.

Aim of the present study was to test hypotheses concerning relations between encoding and aggressive behavior by means of direct assessment of encoding with eye-tracking methodology. The direct information concerning visual attention allocation obtained with this method was related to measures of hostile intent attribution and aggressive behavior.

The hypothesized hypervigilance to hostile and threatening cues by aggressive children has been substantiated with two arguments. A cue-based ‘bottom-up’ hypothesis has been put forth by Crick and Dodge (1994). They proposed that aggressive children pay more attention to hostile than to non-hostile cues. This bias towards hostile cues enhances the likelihood to interpret social situations as hostile, and consequently increases the chances of aggressive behavior. Thus, according to this hypothesis, encoding of cues (bottom) leads to a cognitive representation of hostile intent (top). Alternatively, a schema-based top-down hypothesis has been proposed, suggesting that pre-existent schema’s of hostile intent would lead aggressive children to allocate attention to hostile cues, that would then confirm this expectation (Lochman and Lenhart 1995).

Recent insights in perception psychology do, however, suggest a contrary hypothesis that we would like to call the ‘schema inconsistency’ hypothesis. This hypothesis posits that hostile intent schemata will indeed direct attention, but will direct attention towards schema-inconsistent information, that is, away form the expected hostile cues, towards schema-inconsistent non-hostile cues. Whether these cues are then interpreted adequately may depend on their ambiguity and the strength of pre-existing schemata. This hypothesis is derived form perception psychology, where it is generally acknowledged that attention is mainly given to novel, unexpected cues, whereas little attention is devoted to schema-consistent information. An increase in attention for interpretation mismatching cues (in this case non-hostile cues, being inconsistent with a hostile schema) compared to interpretation matching cues (in this case hostile cues) has been shown for scene perception as well as for reading comprehension (Henderson et al. 1999; Rinck et al. 2003). The longer attention allocation to inconsistent information probably reflects an attempt to verify unexpected information in light of an already existing interpretation of the situation. If information is inconsistent with the interpretation then encoding takes more time due to the increase in processing difficulty (Davenport 2007; Zwaan and Radvansky 1998).

The differences between the traditional hypervigilance hypothesis and our schema inconsistency hypothesis have direct implications for our understanding of—and cognitive behavioral intervention in—SIP. If the hypervigilance hypothesis were correct, attending to non-hostile cues would be an important focus for research, assessment, and—possibly—intervention, as is common practice to date. However, if the schema-inconsistency hypothesis were true, attention allocation would not be the problem, since non-hostile cues are already attended to due to their unexpectedness. In that case the issue would rather be interpretation of these non-hostile cues than their perception.

It is unclear which of these hypotheses concerning encoding in SIP is most tenable. Our current knowledge of encoding in SIP is surprisingly limited, because it is entirely based on indirect assessments of encoding. Four indirect indicators of encoding processes have been studied, including social information recall, manipulation, preference, and reaction times. Concerning recall, retrospective self-reports on encoding of social vignettes have been used in a large number of studies. In these studies, participants were first presented with standardized vignettes and then asked to recall information they had seen or heard. This approach has indicated deficits in the memory for non-hostile cues, suggesting that aggressive children pay less attention to non-hostile than to hostile cues (cf. Dodge and Price 1994). However, these studies did not directly investigate encoding biases reflected by the allocation of attention to specific social cues, but relied on measures that incorporate processing derivatives far beyond encoding: Participants can only report their own representation of what they consciously recall to have seen, what they are able to describe verbally, in the amount of detail they spontaneously produce. Thereby, encoding is confounded with representation and limited to global descriptions that are accessible to recall and verbal description.

A second approach to the study of encoding in SIP has been to experimentally manipulate social stimuli and to measure the effects of these manipulations on representation and responses (Dodge et al. 1984). If experimental manipulations of social cues have different effects on representation and responding in aggressive than in non-aggressive children, it is concluded that these children must differ in their encoding of the manipulated social cues. This approach does allow for strong conclusions regarding the effects of specific social information on aggressive behavior. However, whether this effect is indeed due to encoding remains an open question. It is possible that the established effects were due to differences in representation or goal setting. Whether relations between social cues and aggressive responses were mediated by encoding problems can not be demonstrated with these methods, because the mediating variable was inferred, not measured.

A third indirect approach to the assessment of encoding has been analysis of preference for specific kinds of social information. Dodge and Newman (1981) asked children to play a detective task in which they were free to make use of as many cards with social cues as they wished. In this game, aggressive children preferred to use less information to decide whether a particular child acted with hostile intent. van Goozen et al. (2002) used a procedure in which the relative preference was measured for hostile pictures in comparison to other kinds of pictures (e.g. neutral, positive). First, the children were shown all pictures. Thereafter, the children viewed the pictures they selected for as long as they wanted. Aggressive children were found to have a lower preference for non-hostile stimuli rather than a higher preference for hostile information. Studies using this approach have been very informative in showing the kinds of information aggressive children prefer to look at, but they do not directly demonstrate which information is encoded spontaneously by these children.

Fourth, encoding has been studied indirectly by using reaction time measures to indicate attention allocation (van Goozen et al. 2002; Gouze 1987; Schippell et al. 2003). In one task, children watched puppet shows depicting either hostile or non-hostile events. Participants were asked to switch of a light that occasionally was switched on during the shows as fast as possible. The time it took them to switch the light of was considered indicative of the difficulty in shifting attention away from the puppet shows. In a second task, children watched hostile and non-hostile cartoons. The children were required to play a game during the presentation of the cartoons. The duration and frequency of gazes towards the cartoons were indicative of the attention grabbing potential of the cartoons. Results of both studies implied that aggressive children have more difficulty shifting their attention away from hostile cues and are more distracted by them (Gouze 1987). van Goozen et al. (2002) used a procedure in which the relative preference for hostile pictures was measured in comparison to other kinds of pictures (e.g. neutral, positive). First, participants were shown all pictures. Thereafter, the children viewed the pictures which they selected for as long as they desired. Aggressive children were found to have a lower preference for non-hostile stimuli rather than a higher preference for hostile information. Although the studies above investigated encoding in a more direct and restricted fashion, one could still argue that the measures in these studies include representation and therefore do not only reflect biases in encoding.

In a groundbreaking study, Schippell et al. (2003), operationalized encoding as attention allocation in a probe detection task. Aim of the task was to provide an indication of attention allocation on the basis of stimuli competing for attention, just as social cues would compete for attention during a social interaction. Word pairs were briefly presented at the center of a computer screen. One word was related to threat (hostile cue) and the other had a neutral meaning. Next, a dot (probe) was presented at the location of either the threat or the neutral word. When the probe appeared children had to press a button as quickly as possible. Surprisingly, results indicated that aggressive behavior was related with longer response times for threat words, suggesting that aggressive children pay less attention to hostile cues. Moreover, lessened attention to threat cues was related with hostile intent attribution, which in turn was related with aggressiveness. Schippell et al. tentatively concluded that their findings support the original hypersensitivity hypothesis, and that a cue encoding bias causes hostile intent attribution, even though the encoding bias they found was exactly opposite to the expected effect.

How can this seemingly paradoxical pattern of findings be explained? Schippell et al. (2003) suggest that, although the probe detection task is a more refined measure of attention allocation than self-report, a limitation is that it looks at attention allocation at only one moment in time. They suggest that the attention bias away from threat words they found could occur after a preceding attention bias towards hostile cues others have found. They do indicate that this tentative explanation is highly speculative, since attention allocation was not measured directly, and only a single ‘snapshot’ of the encoding process was assessed with the dot-probe task. To test this explanation, a continuous measure of attention allocation would be needed.

It seems to us that the Schippell findings are better in line with the schema-inconsistency hypothesis: Stronger hostile intent expectations may have predicted more attention to nonthreatening words because they did not fit expectations. In line with this alternative view are findings from a recent study with adults (Wilkowski et al. 2007). This study used eye tracking to measure actual looking behavior to provide a precise time course of visual attention allocation (Rayner 1998; Henderson 2003). Results showed that adults high on trait anger (which is related to aggression) allocated their attention longer to non-hostile than to hostile cues appearing in visual scenes, much like the bias towards nonthreatening words by aggressive children in the Schippell et al. study. This tentative explanation of Schippell’s findings has, however, not been tested with children yet.

Overall, studies of encoding have so far yielded inconsistent results. Indirect indicators of encoding suggest that aggressive behavior in children is related with less recall of socially relevant cues, less sensitivity to experimentally manipulated variations in social stimuli, preference for hostile over non-hostile information, and longer reaction times for threat words. Interpretation of these findings is complicated by the indirect nature of the assessments of encoding used. Encoding is generally confounded with representation, because the actual process of encoding social stimuli is not measured directly, but rather inferred from the measures used. The majority of studies provide indirect support to Crick and Dodge’s (1994) hypothesis that aggressive behavior would be related with hypervigilance to threat cues. Yet, the one study that most directly measured attention allocation found the exact opposite (Schippell et al. 2003). It appears that to further our understanding of encoding in aggressive behavior, continuous direct assessment in real-time of the exact social information children attend to when processing aggression-relevant social information is needed.

Aim of the present study was to test hypotheses concerning relations between encoding, intent attribution, and aggressive behavior by means of continuous direct assessment of encoding with eye-tracking methodology. To this end, participants were presented with series of cartoons concerning real-life ambiguous provocation situations. Cues concerning hostile vs. non-hostile intent were systematically varied over cartoon series. As participants looked at these cartoons, their eye movements were recorded in real time by means of an eye tracker. In addition to the continuous eye tracking assessment, after each series the children answered questions concerning encoding (i.e. recall of social cues) and interpretation (i.e. hostile intent attribution).

This method enabled us to test the hypervigilance and the schema-inconsistency hypothesis mentioned above. The hypervigilance hypothesis predicts hostile intent attribution and aggressive behavior to be related with longer viewing times for hostile cues. In contrast, the schema-inconsistency hypothesis predicts hostile intent attribution and aggressive behavior to be related with shorter viewing times for hostile cues and longer viewing times for nonhostile cues. This effect was expected to be particularly clear when a mismatch was induced between hostile expectations and non-hostile cues. According to the schema incongruence hypothesis, if the first cues presented suggest hostile intent, consequent presentation of hostility incongruent cues should be particularly schema incongruent for aggressive children, and therefore evoke particularly long viewing times.



A mixed 2 × 3 × 3 design was chosen with the quasi-experimental between subjects factor group (two levels: low-aggressive / highly aggressive), and the randomized within subjects factors behavioral cue (three levels: hostile / ambiguous / non-hostile) and emotion display cue (three levels: mean / neutral / sad).


Sixty children ranging in age from 10.6 to 13.7 participated in the study. These participants were a random sample from an elementary school in a low to middle SES neighborhood in a large city in the Netherlands. All participating children gave informed assent, and their parents or care-takers gave informed consent. At the end of the testing period the children received a gift coupon of 5 euro. Exclusion criteria were any known psychiatric disorder or an IQ estimate below 80. The age range and exclusion criteria were chosen to maximize comparability of findings with existent empirical findings on SIP and aggressive behavior (de Castro et al. 2002).

As part of a larger study, participants, parents, and teachers were asked to fill in several questionnaires related to behavior problems, social competence, and aggressive behavior. In addition, peer nominations of social status and behavior within the classroom were solicited. Complete teacher and child reports were available for all 60 children. Child and teacher reports correlated r = 0.49, p < 0.001. Because all aggressive behavior variables were skewed, a median split on an aggregate variable of child and teacher reports on the SDQ (see measures) was made to create two groups of 30 children each: a low aggressive group (Lo-A) and a highly aggressive group (Hi-A). The Lo-A group consisted of 17 boys and 13 girls. The Hi-A group consisted of 16 boys and 14 girls.

Descriptive statistics for the two groups are shown in Table 1. The two groups differed markedly on all indices of behavior problems according to teachers, peers, and self-reports, d’s > 0.8. Dutch norms for the Strengths and Difficulties Questionnaire (see below) were used to provide an indication of the severity of behavior problems in the two groups. In the non-aggressive group, none of the children met criteria for aggressive behavior problems according to teacher or self-reports. In contrast, of the 30 children in the highly aggressive group, respectively 14 and 16 children received scores in the clinical range for conduct problems from teachers and self-reports. In addition, respectively 4 and 7 children received subclinical scores form these informants. Groups did not differ in intelligence, age, or parental SES.
Table 1

Descriptive Child Characteristics by Group and Informant









Age (years)












Conduct problems teacher report (SDQ)






Conduct problems self-report (SDQ)






Aggregate Child+Teacher Aggression (z-score)






Peer nominations—aggression






Peer nominations—prosocial






Peer nominations—liked






* = p < 0.05,** = p < 0.01, *** p = <0.001, corrected for multiple comparisons


Children participated in two sessions, conducted on separate days. The first session included the eye tracking and social information processing assessments described below. In the second session participants completed measures concerning their own and peers behavior (see measures).

In the first session, participants were presented with ten series of three consecutive realistic cartoons each, concerning real-life ambiguous provocation situations, as is common practice in SIP research (see de Castro et al. 2002, Appendix A). As participants looked at these cartoons, their eye movements were recorded in real time by means of an eye tracker. Cues concerning hostile vs. non-hostile intent were systematically varied over cartoon series. These cues consisted of combinations of a behavioral and an emotional intent cue. The behavioral indicator varied between cues of hostility, ambiguous, or accidental intent. The emotion display indicator was either mean or apologetic (sad), in line with findings concerning the importance of such emotion displays for consecutive interpretation (e.g. Keane and Parrish 1992; Lemerise et al. 2005). For each series of three cartoons, participants were asked to pretend to be one of the two children depicted in the first picture. In the second picture the other child behaved in a seemingly hostile, non-hostile, or ambiguous manner. In the third picture the child experienced a harmful (negative) event and the other child expressed a neutral or a behavior-congruent emotion expression (i.e. a mean or a sad look after hostile behavior or after non-hostile behavior). In addition to the continuous eye tracking assessment, after each series the children answered questions concerning encoding (i.e. recall of social cues) and interpretation (i.e. hostile intent attribution).

Before the experiment proper started, the eye-tracker was adjusted and calibrated using a 5-point calibration grid presented on the computer screen. Participants were told that they would view illustrations and would be asked questions about them. In addition, the children were instructed to pretend to be one (pointed out with an arrow) of the two same-gender children depicted in the first picture. After this picture a dot in the middle of the screen appeared and the children had to focus on this dot for at least 300 milliseconds. Subsequently, the second picture appeared above this dot and after 4 seconds the third picture appeared below the second picture and both pictures remained visible for another 6 seconds. A one-point calibration was performed before the presentation of the first picture of each illustration to correct for possible drifts in gaze position.

After each cartoon series, the experimenter asked a standardized set of questions (see de Castro et al. 2005) to assess recall of social cues (encoding) and hostile intent attribution of intent (interpretation). Children completed the eye tracking experiment in a silent room and the questionnaires in the classroom at school. The experimenter was always present. The length of testing was approximately 1 hr for the experimental session and 1/2 hr for the questionnaire session.


Eye movements were measured with the EYELINK II eye tracker (SR Research Ltd., Toronto, Ontario, Canada) using pupil locations as well as corneal reflections. The video-based eye tracker consists of two high-speed cameras with built-in infrared illuminators mounted on a headband, weighing 420 grammes in total. The cameras were placed approximately 4–6 centimeters away from the eyes and sampled gaze position with 250 Hz. Typically, the gaze position resolution is less then 0.05° and the average gaze position error is less then 0.5°. Head position relative to a 21 inch diagonal computer screen at a distance of 100 cm was monitored by a camera integrated into the headband. This setup allowed for head movement (position and rotation) compensation during eye movement recordings and provided accurate gaze positions without the need for a bite bar.


Participants viewed black-white illustrations of hypothetical real-life situations while their eye movements were measured. The illustrations were based on typical peer provocation vignettes (e.g. Dodge et al. 1984; Lemerise et al. 2005; De Castro et al. 2005; Matthys et al. 2001). A professional illustrator created these illustrations with Corel Painter IX software. All cartoons were piloted with focus groups of children and clinical staff working with children with disruptive behavior problems in special education, and adjusted according to their suggestions. Each illustration had five gender-specific (boy or girl) versions (ambiguous-neutral, non-hostile-neutral, non-hostile-sad, hostile-neutral, or hostile-mean), each consisting of three pictures, see Fig. 1 for an example.
Fig. 1

Example of the different versions of each cartoon series

The first picture was equal across versions and depicted two children within a certain social setting. In the second picture one of them (for half of the illustrations the left child and for the other half the right child) behaved in a non-hostile, hostile, or ambiguous manner. The third picture contained a harmful outcome together with a neutral or behavior-congruent emotion display by the peer (i.e. a sad facial expression after non-hostile behavior or a mean facial expression after hostile behavior). For example (see Fig. 1), the first picture depicted two boys sitting together painting. In the second picture the right boy reaches for a tube without touching the left boy (ambiguous), bumps against the elbow of the left boy while he cleans his brush (non-hostile), or pulls the brush of the left boy (hostile). In the third picture the painting of the left boy has been ruined (harmful event) and the right boy displays a neutral face (this emotion can follow ambiguous, non-hostile as well as hostile behavior), a sad face (this emotion is only displayed after non-hostile behavior), or a mean face (this emotion is only displayed after hostile behavior). Note that behavior cues are always varied in the second picture, while emotion displays are only varied in the third picture of each series. Thereby, the effects of behavior cues could be studied independently from effects of emotion displays.

Thus, behavior and emotion (facial expression) were systematically varied keeping all other visual features (e.g. objects, hair, clothes, surroundings) identical across the versions of each illustration. The children were presented with 10 different illustrations, two of each version. The illustrations were arranged in five material sets which were evenly distributed across the children. The versions of each of the illustrations were counterbalanced across the sets and the order in which the illustrations were presented in each set was pseudo-randomized.


Behavior Problems

Strengths and Difficulties Questionnaire

Teachers and children completed the Dutch version of the Strengths and Difficulties Questionnaire (SDQ; Goodman 2001; Dutch version van Widenfelt et al. 2003). The SDQ consists of 25 items for all versions (child, parent, and teacher) assessing emotional, conduct, peer-relation, and attention problems in addition to pro-social behavior. The items were selected based on the DSM-IV (American Psychiatric Association 1994) and ICD-10 (World Health Organization 1994) classifications of childhood psychopathology. Questions ask about behavior in the past 6 months and responses are based on a 3-point Likert scale indicating how much each item applies to the child (0 = not true, 1 = somewhat true, 2 = certainly true). The total score is the sum of the item scores excluding the score on pro-social behavior and reflects the extent of behavioral difficulties. Psychometric properties of the Dutch teacher SDQ are generally good and of the child SDQ acceptable (van Widenfelt et al. 2003). In the present study the internal consistency (Cronbach’s alpha) was acceptable for the teacher (0.74) but quite low for the child (0.51) version.

Sociometric nominations

All children in participating classrooms nominated their classmates regarding the occurrence of several social behaviors. For the present study, the peer nominations included four items concerning aggressive behaviors (e.g. “Who hits or kicks others?”), four items on prosocial behavior (e.g. “Who is nice to others?”), and four items on social preference (e.g. “Who do you like best?”). Children were allowed to nominate as many classmates as they wanted in response to each question. The peer-nominated aggression procedure is based on peer nominations described elsewhere, which have been shown to be internally consistent (Guerra et al. 2003). This was also the case in our sample, Cronbach’s alpha = 0.93.


To take full advantage of the continuous eye-movement data, two eye-tracking indices of encoding were calculated: first-pass time and look-back time (cf Wilkowski et al. 2007). First-pass time was defined as the duration of all eye fixations on a behavior cue in the second picture, when looking at it for the first time and before the eyes fixated somewhere else. Look-back time was defined as the sum of all eye fixation durations on the behavior cue in the second picture after seeing the harmful event and the emotion cue in the third picture. Thus, look-back time primarily concerned verification or reconsideration of the intent cues seen in the second picture. In the reading comprehension literature the first-pass time is mainly related to lower-level automatic encoding (bottom-up) processes, whereas the look-back time involves higher-order strategic verification (top-down) processes (Cook and Meyers 2004).

Eye fixations while watching the cartoons were used to calculate first-pass time and look-back time as follows. Individual eye fixations on the pictures shorter than 50 ms and longer than 1500 ms were discarded (<3% of all fixations). The remaining eye fixations on a predefined region of interest were selected for analysis. This region of interest was defined as the smallest square area (typically 75 × 75 pixels) that could encompass the hostile or non-hostile behavior cue in the 2nd picture (512 x 384 pixels). The first-pass time was the sum of all fixation durations in the region before the eyes first left that region in any direction. The look-back time was the sum of all fixation durations in the region after looking at the 3rd picture.

We examined the first-pass and look-back time within the region of interest encompassing either the non-hostile or the hostile behavior cues. Figure 2 is an example (for the ‘painting’ illustration) and herein the square with the black border is the region of interest. Figure 2 also displays a ‘fixation map’ for both pictures (non-hostile vs. hostile behavior) across the participants who viewed these pictures. The fixation map makes it easy to visually identify the areas of the picture, which were most frequently looked at during presentation. In the grey-scaled pictures, dark grey represents areas which were rarely or not looked at and the whiter the area, the more often it was looked at. The fixation map was computed in such a way that for every single fixation a Gaussian was applied adding weight to that area of the picture. The Gaussian center was the fixation location. The width of the Gaussian had a standard deviation in degrees visual angle of 1. The height of the Gaussian was the same for every fixation.
Fig. 2

Example of a fixation map for looking tmes at non-hostile and hostile behavior including the region of interest (black bordered square)

Self-reported Social Information Processing

To assess self-reports of encoding participants were asked ‘What happened in the story?’ and ‘Did anything else happen in the story?’ The number of relevant information units mentioned was then counted. Each illustration contained six sources of relevant information (their own and the other child’s behavior in the 1st picture, the other child’s behavior in the 2nd picture, the harmful event in the 3rd picture, and their own and the other child’s emotion in the 3rd picture; for example, see Fig. 1). If a child recalled all relevant information then an initial score of 6 could be obtained for each illustration. However, the score was subtracted with 1 if the child did not recall the information in the correct order (i.e. information 1st picture, then information 2nd picture, and finally information 3rd picture) and / or if the child added information which was not present in the illustration. Two illustrations were presented for each version. The scores obtained with the same version illustrations were summed in order to create a final encoding score for each version (score range = −2 to12).

To assess hostile intent attribution participants were asked five questions following each cartoon series: (1) ‘Did the other child want the harmful event to happen?’, (2) ‘Why do you think (s)he did that?’, (3) ‘Would you blame the other child for what happened?’, (4) ‘Should the other child be punished for what happened?’, and (5) ‘How would the other child be feeling when such a harmful event occurs?’. Affirmative answers to the first, third and fourth question received scores of one point, all other answers to these questions zero points. Answers to the second question were scored one point if they were hostile (s/he did it on purpose, s/he was being mean, s/he thought it was very funny, or s/he wanted to bully) or zero if they were neutral/benign (I can’t tell, I don’t know, it was an accident, it was my own fault, s/he couldn’t do anything about it, s/he couldn’t prevent it from happening, or s/he was trying to be nice / to help). Answers to the fifth question were given a score of one point if the child responded with glad, happy, or satisfied. All other responses (e.g. sad, guilty, sorry, regretful, nothing) were given zero points. A total hostile intent attribution score was then calculated by summing the number of points for all five questions over all vignettes for each child (score range = 0 to 10). Twenty percent of all answers were scored by a second coder. The coders were blind to the status of the children. All kappa’s for inter-rater reliability were above 0.8.


To test whether aggressive children devoted relatively much attention to hostile cues or to schema-inconsistent non-hostile cues, we first analyzed the eye-movement data. We tested group differences in looking times and their dependence on behavior cues and emotion displays. To this end, first-pass and look-back times were subjected to mixed model analyses (SPSS) with behavior (non-hostile vs. hostile), emotion (neutral vs. congruent), and group (Lo-A vs. Hi-A) as fixed factors.

For first pass-time, no group main effect was found. A significant effect of behavior cues was found, F(1, 411) = 6.06, p < 0.02, indicating that children in both groups looked longer at the non-hostile cues (M = 497.56, SD = 31.61) than at the hostile cues (M = 422.20, SD = 31.48) when looking at them for the first time.

For look-back time we found main effects of behavior (F(1, 289) = 6.96, p < 0.01) and emotion cues (F(1, 289) = 5.25, p < 0.03). After seeing the harmful event in the 3rd picture, children looked back longer at non-hostile than at hostile cues in the 2nd picture. Moreover, children looked back the longest at the non-hostile cue after seeing the neutral emotion, whereas they looked back the shortest at the hostile cue after seeing the outcome-congruent mean look. This is consistent with the idea that the inconsistent pattern of a negative outcome paired with a non-hostile intention and a neutral emotion requires more processing time than the consistent pattern of a negative outcome with a hostile intention cue and a mean emotion display.

Interestingly, this pattern was qualified by a significant behavior x emotion x group interaction (F(1, 289) = 3.91, p < 0.05). This interaction is shown in Fig. 3. The Hi-A group looked back the longest at the non-hostile cue, especially after seeing the neutral emotion. This group also looked-back the shortest at hostile cues regardless of the emotion. Conversely, the Lo-A group looked almost equally long at the non-hostile cue after seeing both neutral and congruent emotions, and at the hostile cue after seeing the neutral emotion. Thus, as predicted by the schema inconsistency hypothesis, both groups looked back longer when more schema-inconsistent information was present: The Hi-A group looked back longer at more non-hostile (aggressive schema-inconsistent) information, whereas the Lo-A group looked back longer at more hostile (non-aggressive schema inconsistent) information.
Fig. 3

Group differences in look-back time on the behavior cue for the different behavior and emotion display cues

Next, group differences in self-report measures of encoding were tested. To this end, a univariate analysis was performed with cue recall as dependent variable, group (Lo-A vs. Hi-A) as between-subjects factor, and behavior cue (non-hostile vs. hostile) and emotion display (neutral vs. congruent) as within-subjects factors. Only a trend towards an interaction effect was obtained for cue recall (F(3, 57) = 3.61, p < 0.07). This interaction indicated a trend towards slightly worse recall for especially non-hostile cues in the Hi-A group compared to the Lo-A group. No other effects were found for self-reported encoding.

Group differences in self-report measures of hostile intent attribution were also tested with a univariate analysis. Hostile intent attribution served as the dependent variable, group (Lo-A vs. Hi-A) as between-subjects factor, and behavior cue (non-hostile vs. hostile) and emotion display (neutral vs. congruent) as within-subjects factors. The HI-A group attributed significantly more hostile intent than the low aggressive group, irrespective of behavior and emotion display cues, F(1, 58) = 4.69, p < 0.04. As expected, the experimental manipulations of behavior cues and emotion displays affected intent attribution, F(1, 58) = 29.47, p < 0.001. The cartoon versions containing hostile behavior were interpreted as most hostile. In addition, the interpretation was even more hostile if hostile behavior was followed by the congruent emotion (mean look) and even less hostile if non-hostile behavior was followed by the congruent emotion (sad look).

Finally, we examined the relations between encoding and hostile intent attribution. To this end, correlations were computed between both the eye-tracking and the self-reported encoding variables on the one hand and hostile intent attribution on the other hand. The eye-movement variable first-pass was related with less self-reported cue recall, r = −0.32, p = 0.017, indicating that children who initially looked less at the relevant cues in the first two cartoons recalled fewer cues. Less accurate cue recall in the non-hostile and ambiguous conditions was related with hostile intent attribution, r = 0.45, p < 0.001. No correlations of look-back time with cue recall or hostile intent attribution were found.


According to social information processing theories, aggressive behavior is partly caused by hypersensitivity to cues of hostility and threat in other people’s behavior and overly hostile intent attribution. The present study provides clear evidence that aggression is indeed related with atypical encoding of visual information and hostile intent attribution in provocation scenarios. However, encoding problems in aggressive behavior appear quite different than traditionally hypothesized. As predicted with our schema-inconsistency hypothesis, aggressive behavior was related with hostility schema-based processing, rather than with heightened attention to hostile cues or lessened attention to non-hostile cues. Aggressive children did not attend more to hostile cues, nor attend less to non-hostile cues than non-aggressive children. Quite contrary, aggressive children looked longer at non-hostile cues, but none the less recalled less non-hostile information, and did attribute more hostile intent than their non-aggressive peers.

To our knowledge, this is the first study to directly study children’s visual attention to social cues in ambiguous provocation situations. Since encoding is posited to consist of rapid processes, it is hard to assess with traditional self-report measures. Therefore, we used a novel approach to investigate visual encoding of social information. Thanks to the experimental design of the study, we were able to distinguish between encoding of different kinds of cues (hostile / non-hostile, behavior / emotion display) and between encoding before the negative outcome in the vignette (first pass time) and after the negative outcome (looking-back time).

The present finding of more deployment of visual attention to non-hostile cues by aggressive children may at first sight seem at odds with the findings of less recall of non-hostile cues and hostile intent attribution in this and a number of other studies. We do, however, believe that taken together all these findings are in line with top-down, hostile-intent schema driven processing of social stimuli by aggressive children. Fortunately, the experimental design of our study makes it possible to tentatively describe this process in detail. What may actually happen in real-time as ambiguous provocation scenarios unfold?

Recall that each scenario consisted of three cartoons, where the second cartoon contained behavior cues, and the third cartoon showed emotion display cues and the negative outcome. Upon presentation of the second cartoon—so before the negative outcome—first-pass looking behavior was recorded. At this point in time no group differences were found, indicating that aggressive children do not a priori deploy their visual attention differently than their peers. Thus, at first pass-time, there was no indication of hypersensitivity to hostile or threat cues.

Upon presentation of the negative outcome, however, group differences emerge. As soon as the third cartoon with the negative outcome was presented, we started recording look-back times, indicating to what extent children looked back at behavior cues in the second cartoon. According to traditional hypersensitivity hypotheses, aggressive children should have kept looking at hostile cues and the negative outcome. This was clearly not the case. We believe something quite different happens at this point. Possibly, the negative outcome triggers different schemata in the two groups of children and these schemata consequently direct further processing differently.

In aggressive children, activation of a ‘hostile intent’ schema would explain both the further eye movement and the self-report findings for this group. Schema-based perception would lead to fast processing of schema-congruent hostile cues, and to longer processing times for schema-incongruent non-hostile information. This is exactly the pattern we found for this group. Indeed, processing time increased with the amount of schema-inconsistent information present. Furthermore, schema-based perception would make further processing of schema-inconsistent information more difficult. It would be harder to remember, and harder to take into account in interpreting the situation. These processes seem to be reflected in our finding that aggressive children recalled marginally less non-hostile cues, even after having looked at them longer! They are also in line with the finding that hostile intent attributions in line with the schema were made, completely disregarding the schema-inconsistent non-hostile information these children had clearly looked back at.

In contrast, in non-aggressive children, the negative outcome presented in the third cartoon may have activated less rigid schemata (or multiple competing schemata), leaving room for them to attend to and represent all available cues in a balanced manner.

This schema-driven account of our findings is clearly highly speculative. The present findings are in agreement with this account, and schema-driven processing may account for findings of other studies on encoding by aggressive children reviewed in the introduction section, where recall of schema-inconsistent information was consistently found to be poor, and an attentional bias towards non-hostile words was also found (Schippell et al. 2003). Nonetheless, important processes like the ‘hostile intent’ schema were not directly measured, but can only be inferred from the hostile intent attributions made by the aggressive children. Even though it is quite common practice in cognitive psychology to infer such implicit cognitive processes from observables, more direct measurement of the presumed schema would clearly be preferable.

This first attempt to directly assess visual attention in encoding of social information by means of eye tracking has important limitations. First and foremost, participants are likely influenced by the procedure of being attached to the apparatus and the knowledge that others will be able to see where they look. We have the impression that participants quickly became used to this, but this is of course difficult to establish. One reassuring indication that participants were not distracted by the eyetracking procedure is that the pattern of findings on self-report measures of social information processing was comparable with findings in the SIP literature. Participants were then confronted with cartoon-based vignettes of social provocation, not with actual ambiguous provocation by a peer. Although this manner of stimulus presentation is common practice in SIP research, general concerns about its ecological validity also apply to the present study (cf. de Castro 2004; Hubbard et al. 2001, 2004). A large advantage of the use of such standardized material, though, is that it allows very specific manipulations of social cues and response patterns. The specificity of the present findings to specific kinds of cues at specific points in time underscores the importance of standardization and experimental manipulation of specific stimulus characteristics for research into social information processing.

The manipulations of stimulus characteristics in this first eye-tracking study of social information processing were rather extreme. We could not know beforehand how sensitive our eye-tracking measures would be to relatively subtle variations in social cues and therefore decided to start with relatively extreme cues, such as the very malignant face and the large smudge on the participant’s painting shown in Fig. 1. In real life, social cues tend to be much more subtle and ambiguous. Now we know that eye-tracking is a suitable method to assess deployment of visual attention to social cues in children, studying attention to more subtle variations in social cues may be very informative. Specifically, to increase our understanding of ‘top down’ processing of social cues, it will be most informative to first activate hostile schemata in aggressive children and then provide schema-inconsistent benign information. By systematically varying the nature and extent of schema-inconsistent information needed to override hostility schemata, we may ultimately discover through which cues we may best help children overcome unwarranted hostile attribution tendencies.

For use with eye-tracking assessment, visual stimulus characteristics turned out to be very influential. Possibly, the difference between non-hostile and hostile cues for first-pass time is determined by encoding complexity. The fixations clustered on informative regions of the pictures, as can be seen in the fixation map (see Fig. 3). The initial fixations are needed to identify the cues and to perceive their visual details (Henderson 2003). The increase in first-pass time may be due to the fact that non-hostile cues had a more complex visual configuration (for example see Fig. 3; the distribution of fixations was more widespread for the non-hostile relative to the hostile cue). In addition, scene semantics may have played a role (Henderson 2003). Typically, shorter first-pass times have been found for more consistent opposed to less consistent scenes. Hostile cues are clear-cut in their message, whereas the meaning of non-hostile cues is less obvious. For instance, interpreting behavior like ‘kicking someone’ as non-hostile is more improbable than interpreting behavior like ‘bumping into someone’ as hostile. In other words, kicking can not be accidental but bumping can be on purpose. Thus, seemingly minor differences in visual qualities of stimuli between conditions may have had effects on comparisons between stimulus conditions. Note, though, that such effects can not have influenced the group differences we established, since both groups were presented with the exact same stimuli.

The ten illustrations we used in this study had five different versions. Ideally, more trials are preferred to increase the reliability of the measures, especially in case of missing values, which are not uncommon in eye movement registrations. However, in these kind of child studies a balance has to be found between the ideal testing time (as many trials without unwanted habituation effects as possible) and the amount of time a child can keep up attention and motivation. Considering the relatively small group effects, future research with eye movement registrations might benefit from an increase in the number of trials by using multiple test sessions.

Participants in the present study were all non-referred children, attending regular education, and not participating in any form of treatment. We did not include children with severe behavior problems, which may explain the relatively small differences between groups. Yet, the present findings do not necessarily generalize to children with clinically severe behavior problems. Perhaps children with more severe behavior problems may not show more pronounced encoding patterns than the non-referred children in this study, but a qualitatively different eye fixation pattern, because the etiology of their behavior problems may be quite different (consider, for example, ADHD). We therefore strongly recommend follow-up studies including participants with different kinds of behavior problems. Ideally, such studies should test theory based hypotheses concerning differences in etiology and maintenance of information processing patterns and aggressive behavior. Interesting avenues for research in this regard seem to be distinctions between reactive and proactive aggression (e.g. Vitaro et al. 2006), relations with limited cognitive capacities in children with mild intellectual disabilities or ADHD (van Nieuwenhuijzen et al. 2009), and relations with aversive life experiences known to be involved in the formation of social schemata (Thomas et al. 2006).

In closing, we like to emphasize that despite attending to non-hostile information, still more hostile interpretations were made by children with higher levels of aggression. Apparently, non-hostile cues are attended to by aggressive children, but not understood or believed. The present findings thereby suggest that training aggressive children to attend to non-hostile information may not be very effective by itself, and that it may be wiser to challenge the very schemata responsible for the selective interpretation of encoded information.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Tako A. Horsley
    • 1
  • Bram Orobio de Castro
    • 1
  • Menno Van der Schoot
    • 1
  1. 1.Utrecht UniversityUtrechtThe Netherlands