Bias-based bullying is a serious public health issue having a negative impact on the well-being of children with socially devalued identities, characteristics, and attributes (Earnshaw et al., 2018). Bias-based bullying describes bullying behavior in an intergroup context (i.e., involving ingroup and outgroup members) in which someone is bullied because of his or her membership of a particular group (Palmer & Abbott, 2018). In comparison to interpersonal bullying, bias-based bullying is driven by social-cognitive factors of stigma and, therefore, different strategies are needed to address this form of bullying (Earnshaw et al., 2018).

Bias-based bullying is more prevalent in school-age children and young adolescents (Menesini & Salmivalli, 2017). While it is difficult to indicate the prevalence of this behavior, a large-scale study in Scotland indicated that almost one in four secondary-school pupils were aware of other pupils in their school experiencing bias-based bullying (Lough Dennel & Logan, 2015). Religion and ethnicity-based (i.e., racist-based) and weight-based bullying are among the most common types of bias-based bullying worldwide (UNESCO, 2019). Victims are at increased risk of experiencing serious psychosocial and psychological problems (Arens & Visser, 2020; Thompson et al., 2020). Unfortunately, there is a dearth of interventions that seek to specifically reduce these types of bullying (Earnshaw et al., 2018).

Developing an Intervention Against Bias-Based Bullying

To address bias-based bullying, multi-component school-wide interventions are needed that address multiple factors (e.g., school climate, prejudice, diversity) and/or target different types of individuals (e.g., peers, teachers, parents; Earnshaw et al., 2018). Focusing on the peer context is of particular importance, as bystanders are often present when a bullying situation occurs (Espelage, 2014; Salmivalli, 2010). Bystanders can reinforce bullying by joining in or passively watching or can contribute to the creation of an anti-bullying ethos by actively intervening in the bullying situation, supporting the student(s) being bullied or reporting the situation (Saarento & Salmivalli, 2015).

While most children find bullying immoral or wrong and believe one should intervene, only a small number actually intervene (Hawkins et al., 2001; Salmivalli & Voeten, 2016; Thornberg et al., 2015). Reasons include unawareness of their role as bystander, a lack of knowledge on how to support bullied students, or fear of retaliation or social exclusion for helping (Howard et al., 2014; Huitsing et al., 2014; Mulvey et al., 2016; Salmivalli, 2014). This has led various anti-bullying programs to focus their attention on how to encourage bystanders to adopt behaviors that support victimized students (Salmivalli et al., 2021). A meta-analysis of Polanin et al. (2012) demonstrated that such programs are effective in encouraging positive bystander intervention, with no significant difference in effect size (standardized mean difference; SMD) between experimental (SMD = 0.21) and quasi-experimental designs (SMD = 0.17). Furthermore, Polanin et al. indicated that interventions promoting positive bystander intervention are effective for both primary (SMD = 0.18) and high school students (SMD = 0.43), with high school students benefiting more from such programs. Torgal et al. (2023) however argue that these programs should be more tailored to the developmental phase of children to optimize effectiveness.

The role of peers in reducing bullying is complex and research on the most effective components of active bystander programs and their prerequisites for effectiveness is still in its infancy (Salmivalli et al., 2021). Nonetheless, both from a theoretical and an empirical standpoint, enhancing bystanders’ awareness of their role in bullying situations, as well as their empathic understanding and self-efficacy to support victimized students is recommended (Deng et al., 2021; Menesini & Salmivalli, 2017; Salmivalli et al., 2021). Examples of successful interventions that have relied on changing these determinants include the Kiva and the NoTrap! programs (Menesini & Salmivalli, 2017; Palladino et al., 2016; Saarento & Salmivalli, 2015). Moreover, several models provide useful insight in the determinants that should be addressed in interventions aimed at encouraging bystander intervention in bullying situations. DeSmet et al. (2014) developed a framework, combining the Integrative Model and Social Cognitive Theory, in understanding bystander determinants to intervene in (cyber)bullying situations. In short, bystander intervention is determined by the intention to intervene, which in turn is influenced by attitudes, expected social norms, and self-efficacy concerning helping. Attitudes include attitudes towards those bullied (such as stereotypes and prejudiced beliefs) as well as moral disengagement attitudes (see also Bayram Özdemir et al., 2015; Thornberg & Jungert, 2013). Such moral disengagement attitudes ‘can avoid self-condemnation when the behavior is not in accordance with moral values’ (DeSmet et al., 2014, p. 208; Thornberg et al., 2015; Thornberg & Jungert, 2013).

In bias-based bullying situations, however, interventions with an interpersonal approach are not sufficient, as they ignore the complex processes of intergroup relations (Earnshaw et al., 2018; Killen et al., 2013). The Developmental Intergroup Approach (DIA), therefore, provides useful insights in understanding determinants of bystander intervention specifically in bias-based bullying situations (Palmer & Abbott, 2018). According to this model, intergroup processes (e.g., group membership, group norms, intergroup contact) influence bystander intervention in bias-based bullying situations (Abbott & Cameron, 2014; Palmer et al., 2015). Regarding the role of group membership, studies have shown that helping behavior increases when the victimized student is an ingroup member compared to when they are an outgroup member (Nesdale et al., 2013; Palmer et al., 2015). Regarding the role of intergroup norms, research suggests that endorsing bias-based behaviors is more likely when a specific ingroup norm for doing so exists (Palmer & Abbott, 2018). Finally, the process of intergroup contact has been found to improve behaviors towards outgroup members (Cameron & Abbott, 2017; Turner & Cameron, 2016). For example, befriending an outgroup member increases the willingness to help outgroup victims (Palmer et al., 2017). The DIA further highlights how these intergroup processes become increasingly influential with age (Palmer & Abbott, 2018). Thus, in a situation where an adolescent ingroup member finds bullying of an outgroup member wrong, fear of group-based repercussions when challenging (exclusive) group norms may cause the person to avoid intervening (Palmer & Abbott, 2018). Therefore, intervening at a pre-adolescent age is recommended, as children of that age are yet less sensitive to the influence of ingroup norms in dealing with intergroup exclusion and are easier to influence with school-based anti-bullying programs (Palmer et al., 2015; Salmivalli et al., 2021; Yeager et al., 2015).

Another important but overlooked aspect of school ecology that can help reduce bullying is the behaviors and attitudes of teachers and educators (Yoon & Bauman, 2014). Ttofi and Farrington (2011), in their meta-analysis of anti-bullying programs, found that teacher training was one of the most important components associated with a decrease in bullying. Teachers have a critical role in nurturing a safe and inclusive environment and preventing stigmatization and oppression (Scandurra et al., 2017). This role is not just confined to intervening directly when a bias- motivated bullying episode occurs but extends to the development of an inclusive climate within their school (Smith & Lander, 2023). However, research suggests that some students hesitate to tell teachers about their bias-based bullying experiences because they perceive they will do nothing (Sapouna et al., 2023). Even worse, in some cases, teachers are personally involved in their own students’ victimization (Sapouna et al., 2023). Pearce (2014) notes that this reluctance to engage with issues of bias and discrimination sometimes arises from an acute sense of the complexity involved in bias talk. Teachers can get anxious about discussing for example racism in class out of fear of causing offence or ‘getting it wrong’ (Smith & Lander, 2023). To combat this anxiety, they sometimes avoid talking about these issues altogether. Therefore, there is a clear need to increase awareness and training about bias-based bullying and discussing sensitive topics in class (Smith & Lander, 2023).

The Current Study

The current study is part of the European GATE-BULL project (https://www.ou.nl/en/web/gate-bull/). The GATE-BULL project stands for a ‘Games Approach to TEach children about discriminatory BULLying’ and aims to address bias-based bullying by promoting positive bystander behavior in pre-adolescent children (i.e., ages 9–13). Learning safe positive bystander behaviors and setting positive peer norms in childhood are important intervention goals that can contribute to improved health outcomes for all children and young people, and the adults who care for them (Priest et al., 2021; Trent et al., 2019).

To achieve these goals, we developed a 4-week school-based intervention program, consisting of online teacher training videos, a serious game, and a series of classroom-based lesson plans. The program aims to change important determinants of bystander behavior, based on interpersonal and intergroup models of bystander behavior in (bias-based) bullying situations: intergroup anxiety and attitudes, moral disengagement, peer norms, and intention and self-efficacy to intervene (DeSmet et al., 2014; Palmer & Abbott, 2018). A serious games approach was chosen as it is suggested to be a safe and highly motivational method for raising awareness, creating empathy, and teaching new strategies with respect to addressing serious issues such as bullying (Calvo-Morata et al., 2020; Nocentini et al., 2015).

The intervention targeted specifically weight- and racist-based bullying, as these are frequent forms of bias-based bullying but very few interventions to date have been developed to address these stigmas (Earnshaw et al., 2018). Additionally, most interventions to date have been developed to address one stigma in isolation rather than multiple stigmas simultaneously (Earnshaw et al., 2018). To fill these gaps, the current study aimed to investigate if the GATE-BULL pilot program could positively influence determinants of bystander responses in weight- and racist-based bullying situations. More specifically, the study evaluated the effects of the program on intention and self-efficacy to intervene, intergroup attitudes and anxiety towards outgroup members and overweight children, perceptions of prosocial peer norms, and moral disengagement beliefs. The intervention was evaluated in three countries (The Netherlands, Scotland, and Greece). Interventions against bias-based bullying have not been evaluated cross-culturally in the past (Earnshaw et al., 2018).

Methods

The GATE-BULL pilot program was evaluated in a non-randomized controlled trial comparing an intervention group with a control group. The trial was approved by the ethics committee responsible for each participating institution: the Research Ethics Committee of the Open University of the Netherlands (U2019/03268/HVM); the Ethics Committee of the School of Media, Culture and Society of the University of the West of Scotland (4873/6709/260519) and by the Board of the Department of Education and Social Work of the University of Patras in its meeting of April 16, 2018, under Law 4485/4AUG2017.

Intervention

The GATE-BULL program consisted of three components: (1) an online teacher training course; (2) the Playground Heroes videogame, and (3) a series of classroom-based lesson plans. All resources are available online at https://www.ou.nl/en/web/gate-bull/resources in English, Dutch, Slovak and Greek.

Online Teacher Training Course

The online teacher training course covered three main parts: (1) identifying, (2) preventing, and (3) responding to bias-based bullying. The first part explained the definition of bias-based bullying and the impact it can have on children and young people. The second part discussed actions schools and teachers can undertake to prevent bias-based bullying. The final part provided practical advice on how to effectively respond to instances of bias-based bullying.

Playground Heroes Videogame

The Playground Heroes videogame aimed to provide an opportunity for children to practice positive bystander responses during incidents of bias-based bullying in a virtual, safe environment. When playing the game, children were assigned to an avatar with their own name who, in partnership with seven other characters, participated in a competition for the best school playground in their country. Across the scenarios, players witnessed bias-based bullying situations (targeted at ethnicity, religion, and weight status) that emerged during teamwork and were asked to contribute towards a resolution. When bullying situations were resolved, a more positive teamwork environment developed and players gained better resources for building the best playground and, in doing so, winning the competition. During the game, children were also presented with positive peer role models who acted as positive active bystanders to resolve bias-based bullying situations. The videogame consisted of three sessions, each with its own theme: (1) Decreasing moral disengagement and developing empathy and a sense of responsibility towards children who are bullied because of prejudice, (2) enhancing critical skills for evaluating peer norms and peer pressure, and (3) improving self-efficacy regarding strategies for effectively discouraging bias-based bullying. The videogame was supplemented with a series of classroom-based lesson plans that aimed to provide an opportunity for structured discussion on the interactions that students had through the video game.

Lesson Plans

Four lesson plans were developed to accompany the videogame. The first three lessons corresponded to the three themes of the videogame and were partially adapted from Garrity (2004); the final lesson was a general anti-prejudice lesson (Dráľ et al., 2011). The first lesson focused on decreasing moral disengagement and increasing empathy by letting pupils reflect on the feelings of the bullied avatars in the game. The second lesson focused on peer norms and autonomy by letting pupils reflect on the adherence to or rejection of the prevailing group norms. The third lesson focused on self-efficacy by increasing pupils’ knowledge on how to intervene in bias-based bullying situations safely and effectively. The fourth lesson focused on prejudice and invited students to reflect on social exclusion, inequality, and discrimination as a possible result of different life circumstances and social barriers.

Participants and Procedure

The intervention was aimed at primary school children aged 9–13. In the rare occasion that a class had a student below the age of nine, the student was still allowed to participate. When the study was initially conceived, the intention was for schools to be recruited only where at least 25% of the students belonged to an ethnic minority group. However, this did not prove possible for two of the countries (Scotland and Greece) where gaining consent from schools to participate proved very difficult. For this reason, it was eventually decided that all primary schools, regardless of their ethnic composition, were eligible for participation in the intervention. For a school to participate, it had to have Windows™ -based computers, as the developed game ran on Windows™ only.

Procedure varied slightly across countries based also on the ethical standards that were acceptable in each country. In general, representatives from various schools throughout each country were asked to take part in the study. When approval from the school was obtained, teachers then distributed the information letters and informed consent/assent forms. Active consent from parents was required in all countries. In the Netherlands, children aged 12 and older also had to sign an informed consent form. In Scotland, all children provided written informed assent.

In Scotland, 584 children (grades 6–8 of the Scottish educational system) from eight schools were eligible to participate. Of these, 298 provided informed consent, of which 238 filled in the baseline measurement (response rate = 40.7%), and 235 filled in both the baseline and post-intervention measurement (dropout = 1.3%). Mean age for the sample at baseline was 10.94 years (SD = 0.86). In Greece, all eligible children (n = 173; grades 5–6 of the Greek educational system) from three schools provided informed consent and filled in the baseline measurement (response rate = 100%). In total, 159 filled in both the baseline and post-intervention measurement (dropout = 8.1%). Mean age for the sample at baseline was 10.97 years (SD = 0.56). In the Netherlands, 258 children (grades 6–8 of the Dutch educational system) from six schools were eligible to participate. Of these, 183 provided informed consent, of which 167 filled in the baseline questionnaire (response rate = 64.7%), and 152 filled in both the baseline and post-intervention measurement (dropout = 9.0%). Mean age for the sample at baseline was 10.64 years (SD = 0.86). Thus, in total, 548 children filled in both the baseline and post-intervention measurement. In most cases, the reason for not participating in the study after signing the informed consent form was that the child was not at school or available at the time of research.

Each country assigned schools to the intervention or control group through matching, first taking into account the number of participants available per school to create equally sized groups – and second (for the Netherlands only) the ratio minority/majority group children to create groups of similar composition. In some instances, group assignment was based on school availability, that is, when a school was willing to participate, but could only participate in the control group due to time constraints or when a school had no access to Windows™ computers and, therefore, needed to be placed in the control group. However, we always considered whether this did not lead to unequal group composition.

Teachers in the intervention group were instructed by face-to-face meetings, mail, or phone on how to implement the program. First, teachers were asked to watch the online teacher training, which took about 30 min to complete. The program then ran over four consecutive weeks. At the start of the first week, the first questionnaire was administered either through an online (in Scotland and the Netherlands) or paper (in Greece) survey. Subsequently, students had the first game session and corresponding lesson plan. The second and third game session and corresponding lesson plan followed a week after the previous session. A gaming session took 15 min to complete, the corresponding lesson plan 45 min; totaling 60 min. Finally, a final classroom-based lesson on prejudice was held a week after the third session, which took approximately 45–60 min to complete. The gaming session was done individually or in small groups (depending on hardware availability); the lesson plans were conducted with the whole class. In some instances, there were practical issues in playing the game in small groups (e.g., computer hardware issues), and it was decided to play the game with the whole class on the digiboard. One week after the final lesson, the second questionnaire was administered. The control group administered the questionnaires at the same interval as the intervention group. After finishing the second questionnaire, the control groups received access to the intervention materials.

Measures

Demographics Characteristics

Demographics included age, gender, ethnicity (“White”, “Roma”, “Black”, “Asian”, “Mixed”, and “Other”), religious affiliation (“Christian”, “Muslim”, “Hindu”, “Buddhist”, “Jewish”, “None”, and “Other”), and perceived weight status. Minority/majority group status was calculated based on ethnicity and religious affiliation; those who indicated “White” as ethnicity and “Christian” or “none” as religious affiliation were coded as ‘0’ ‘majority group’, all others as ‘1’ ‘minority group’. Perceived weight status was measured by asking children ‘Do you see yourself as:’ ‘1’ ‘Very skinny’, ‘2’ ‘Somewhat skinny’, ‘3’ ‘Average weight’, ‘4’ ‘Somewhat overweight’, ‘5’ ‘Very overweight’.

Dependent Variables

Intention to Intervene

Children were asked about the frequency of intending to intervene in bias-based bullying situations by asking: “How often would you like to step in when other children say or do something that might hurt children…”. The item was adapted from Wernick et al. (2013). This question was targeted at four groups: children “of same ethnicity, skin color and religion as you”, “of different ethnicity or skin color than you”, “of different religion than you”, and “who are overweight”. Questions could be answered on a 5-point scale ranging from ‘1’ ‘Never’ to ‘5’ ‘Very often’.

The single item referring to children “who are overweight” was used as a measure of intention to intervene in weight-based bullying situations. To create a measure of intention to intervene in racist bullying situations the following procedure was followed: For majority group members, the items asking about children “of different ethnicity or skin color than you” and “of different religion than you” reflected intention to intervene in racist bullying situations. These two items were combined into a single scale, with one missing item allowed (Cronbach’s αScotland = 0.97, αGreece = 0.93, αNetherlands = 0.95). For minority group members, the single item asking about children “of same ethnicity, skin color and religion as you” reflected intention to intervene in racist bullying situations. The outcome intention to defend in racist bullying situations thus consisted of two items for majority group members and one item for minority group members.

Self-Efficacy to Intervene

Children were asked about their self-efficacy to intervene in bias-based bullying situations by asking: “How confident are you that you could successfully step in when other children say or do something that may hurt children…”. The item was adapted from Wernick et al. (2013). This question was targeted at the same four groups: children “of same ethnicity, skin color and religion as you”, “of different ethnicity or skin color than you”, “of different religion than you”, and “who are overweight”. Questions could be answered on a 5-point scale ranging from ‘1’ ‘Very unconfident’ to ‘5’ ‘Very confident’. The single item referring to children “who are overweight” was used as a measure of self-efficacy to intervene in weight-based bullying situations. The procedure for creating a measure of self-efficacy to intervene in racist bullying situations (Cronbach’s αScotland = 0.95, αGreece = 0.90, αNetherlands = 0.93) was the same as described for intention to intervene in racist bullying situations.

Moral Disengagement

Moral disengagement in bullying was measured with six items on a 5-point scale ranging from ‘1’ ‘Strongly disagree’ to ‘5’ ‘Strongly agree’ adapted from a scale by Thornberg and Jungert (2013). An example item is ‘It’s okay to bully someone who you don’t like’. Reliability was considered good for the Dutch and Scottish data, and acceptable for the Greek data (Cronbach’s αNetherlands = 0.81, αScotland = 0.81, αGreece = 0.63). Two missing items were allowed.

Intergroup Anxiety

The intergroup anxiety scale was adapted from Stephan and Stephan (1985), as reported in Stephan (2014). Children were asked: “If you spoke or did something with [name target group] children, how much would you feel [name emotion]?”. Each of three emotions (i.e., comfortable, anxious, threatened) was asked as a separate item ranging from ‘1’ ‘Not at all’ to ‘5’ ‘To a very large degree’. These questions were targeted at five characteristics (‘White’, ‘Black’, ‘Muslim’, ‘Roma’, and ‘overweight’). Reliability analyses showed that the item ‘comfortable’ structurally shared low covariance with the other two items and was therefore removed from analyses. Reliability for the two items for each target group was considered good for the Dutch and Scottish data, and acceptable for the Greek data (range Cronbach’s αNetherlands = 0.86-0.96, αScotland = 0.74-0.86, αGreece = 0.60-0.72). One missing item was allowed.

Intergroup Attitudes

Intergroup attitudes were measured by asking children about their readiness for social contact with respect to different groups (adapted from Berger et al., 2015; Teichman et al., 2016). Children were shown five pictures of different children, each with a different characteristic (i.e., White, Roma, Muslim, Black, and overweight). Pictures were tailored to gender (i.e., boys were shown pictures of boys and girls pictures of girls). For each of these pictures, children were asked “How happy would you be…” (1) “to play with”; (2) “invite to your house”; and (3) “visit his/her house …”, followed by the name of the child on the picture and its characteristic. Questions were asked on a 5-point scale ranging from ‘1’ ‘Very unhappy’ to ‘5’ ‘Very happy’. Reliability was considered good (range Cronbach’s αScotland = 0.85-0.91, αGreece = 0.82-0.91, αNetherlands = 0.86-0.91). One missing item was allowed.

Peer Norms

Children were asked four questions on a 5-point scale ranging from ‘1’ ‘None’ to ‘5’ ‘All’ concerning how many of their friends would (dis)approve of particular bullying or helping behaviors in bias-based bullying situations. The items were adapted from DeSmet et al. (2018). Two of the four items did not load adequately to the other items, a pattern also seen in the study of (DeSmet et al., 2018). This resulted in low reliability. Therefore, experienced peer norm was based on the two items asking: “Among your friends, how many…” (1) “would approve comforting a child…” and (2) “would defend a child…” “…who has been picked on offline/online because of their ethnicity, skin color, religion, weight?” (Cronbach’s αScotland = 0.85, αGreece = 0.81, αNetherlands = 0.78). One missing item was allowed.

Control Variables

The intention and self-efficacy to intervene might be influenced by the frequency of having witnessed bias-based bullying and earlier intervening behavior (Wernick et al., 2013). Therefore, we controlled for these variables in the models with intention to intervene and self-efficacy to intervene as outcomes. We measured the past experience of witnessing bias-based bullying by asking “How often in the last year have you seen other children say or do something that might hurt children…” and past intervening experience by asking “How often in the last year have you stepped in when other children said or did something that might hurt children…”. Both questions were targeted at four groups: children “…of same ethnicity, skin color and religion as you”, “…of different ethnicity or skin color than you”, “…of different religion than you”, and “…who are overweight”. Questions could be answered on a 5-point scale ranging from ‘1’ ‘Never’ to ‘5’ ‘Very often’. The single items referring to children “who are overweight” were used as a measure of past-experience of witnessing and past intervening experience regarding weight-based bullying situations. The procedure for creating a measure of past experience of witnessing racist bullying situations (Cronbach’s αScotland = 0.76, αGreece = 0.83, αNetherlands = 0.76) and past intervening experience in racist bullying situations (Cronbach’s αScotland = 0.94, αGreece = 0.82, αNetherlands = 0.88) was the same as described for the intention to intervene in racist bullying situations.

Statistical Analyses

Analyses were conducted using R 4.1.2. Baseline differences between participants in the intervention and control condition were determined using independent t-tests and chi-squared tests. Indications for intervention effectiveness were tested using multiple regression analyses (MRA). Multilevel analyses were not required as preliminary analyses showed that the intra-class coefficient was very small (0–1%; in a single occasion 6%). Sample size calculations showed that a sample size of 177 was required to test intervention effectiveness (f2 = 0.08; α = 0.05; β = 0.20, npredictors = 6). The models were corrected for (a) the baseline value of the outcome, (b) gender, (c) majority/minority group status (Dutch models only), and (d) baseline differences between the intervention and control group. In addition, the models with intention to intervene and self-efficacy to intervene as outcome were both corrected for past experience of witnessing bias-based bullying and past intervening experience as suggested by Wernick et al. (2013). Furthermore, the models with intention to intervene and self-efficacy to intervene in weight-based bullying situations as outcomes were corrected for perceived weight status. To get a full understanding on how the results changed with the inclusion of covariates, we reported the uncorrected models in Supplementary Material 1. These models were only corrected for the baseline value of the outcome (Twisk, 2006). Cohen’s d was calculated for insight into the effect sizes of the intervention effects, with effect sizes of 0.20, 0.50, and 0.80 indicating small, medium, and large effects respectively (Cohen, 1992). Cohen’s d was based on the difference between the intervention and control group at the post-intervention measurement, unless otherwise indicated.

To evaluate whether intervention effectiveness differed between minority and majority group members, moderation analyses with condition (0 = control; 1 = intervention) * group status (0 = majority; 1 = minority) were conducted (this applied to the Dutch data only). No moderation analyses were conducted for the intention and self-efficacy to intervene in weight-based bullying situations, since it is not expected that minority/majority status would influence these results. Since interaction terms have less power, the significance levels of the interaction terms were set to p < .10 (Twisk, 2006). When an interaction term was significant, the condition variable indicated the effect for minority members in the intervention group. By recoding the group status variable (0 = minority, 1 = majority) for significant interaction terms, the condition variable indicated the effect for majority members in the intervention group.

Results

Table 1 shows an overview of the baseline characteristics of the participants per country. Across countries, there is an approximately equal distribution between boys and girls. As mentioned earlier, the Scottish and Greek sample consisted of majority group members only; the proportion of majority and minority group members in the Dutch sample was approximately equally distributed within both the intervention and control group.

Table 1 Means and SDs of baseline characteristics of intervention and control group

Concerning bullying experiences, in all three countries children on average reported that they never or rarely have witnessed and intervened in racist bullying situations but report fairly high levels of self-efficacy to intervene when witnessing such situations in the future. With respect to weight-based bullying, children across countries reported on average that they have rarely to sometimes witnessed or intervened in these situations, and again report fairly high levels of self-efficacy to intervene when it might happen in the future. Average moral disengagement attitudes were low. On average, children reported that about half of their friends would approve comforting a bullied child or would actively intervene in bias-based bullying situations. In general, children reported feeling not at all to a little anxious towards children of various groups. In Greece, however, it seems that the study sample on average felt slightly more anxious towards Muslim and Roma children than towards other groups. Finally, on average, children seemed to be relatively positive towards children of all groups. However, attitude scores towards White children were more positive than towards other groups, in particular in the Greek sample.

Intervention Effects in Scotland

Analyses of baseline differences between the intervention and control group (see Table 1) show that in Scotland children in the intervention group on average reported to have more frequently witnessed weight-based bullying and lower intergroup anxiety towards White and Muslim children in comparison to the control group. Therefore, the analyses were corrected for these baseline differences.

Table 2 shows the results of the corrected intervention effect analyses for the Scottish sample. Results of the analyses show that only a small effect was found concerning intention to intervene in weight-based bullying situations (B = 0.35, SE B = 0.16, p = .036, d = 0.25), with children in the intervention group (M = 3.54, SD = 1.35) scoring higher than children in the control group (M = 3.18, SD = 1.48). For all other outcomes, no significant differences were found.

Table 2 Scottish results of the corrected multivariate regression model of the effect of the intervention on determinants of bias-based bullying (n = 235)

Intervention Effects in Greece

Analyses of baseline differences between the intervention and control group (see Table 1) show that in Greece children in the intervention group on average were slightly younger, reported to have more frequently witnessed and intervened in racist-based bullying situation, have more frequently intervened in weight-based bullying situations, and have a more positive attitude towards White children than the control group. Therefore, the analyses were corrected for these baseline differences.

Table 3 shows the results of the corrected intervention effect analyses for the Greek sample. While the uncorrected analyses showed a negligible significant effect for attitude towards Whites (B = -0.22, SE B = 0.11, p = .044, d = 0.06; Supplementary Material 1), the corrected results showed no significant between-group post-intervention differences (B = 0.02, SE B = 0.10, p = .835). The uncorrected effect might be explained by the significant baseline difference in attitudes towards Whites (t(118.22) = -3.31, p = .001, d = 0.54;). Over the intervention period, the control group slightly increased, and the interventions group decreased in attitudes towards Whites, resulting in no significant post-intervention difference.

Table 3 Greek results of the corrected multivariate regression model of the effect of the intervention on determinants of bias-based bullying (n = 158)

For all other outcomes, no significant differences were found.

Intervention Effects in the Netherlands

Analyses of baseline differences between the intervention and control group (see Table 1) show that in the Netherlands children in the intervention group on average were older and reported a lower score on moral disengagement. Therefore, the analyses were corrected for these baseline differences.

Table 4 shows the results of the corrected intervention effect analyses for the Dutch sample. While the uncorrected analyses showed a small intervention effect for self-efficacy to intervene in racist bullying situations (B = 0.34, SE B = 0.17, p = .049, d = 0.33; Supplementary Material 1) – with the intervention group (M = 3.63, SD = 0.98) reporting a higher self-efficacy to intervene than the control group (M = 3.26, SD = 1.35) – the corrected model showed no significant between-group post-intervention differences (B = 0.21, SE B = 0.18, p = .239). In the corrected model, both moral disengagement (B = -0.39, SE B = 0.14, p = .008) and gender (B = -0.39, SE B = 0.17, p = .022) were significant predictors of self-efficacy to intervene in racist bullying situations, likely resulting in a non-significant intervention effect (see Supplementary Material 2). Furthermore, the uncorrected analyses showed a small to medium intervention effect with respect to peer norm (B = 0.36, SE B = 0.18, p = .046, d = 0.40; Supplementary Material 1), with the intervention group (M = 3.45, SD = 1.24) reporting a more positive peer norm than the control group (M = 2.95, SD = 1.27). In the corrected model, this effect failed to reach significance (B = 0.30, SE B = 0.19, p = .115).

Table 4 Dutch results of the corrected multivariate regression model of the effect of the intervention on determinants of bias-based bullying (n = 152)

Results of Moderation Analyses

Finally, we investigated whether group status influenced intervention evaluation results. First, the results of the uncorrected analyses showed a medium intervention effect for self-efficacy to intervene in racist bullying situations for minority group members (B = 0.61, SE B = 0.24, 95% CI [0.13; 1.10], p = .013, d = 0.69), with minority group members in the intervention group reporting higher self-efficacy to intervene (M = 3.85, SD = 0.97) than minority group members in the control group (M = 3.10, SD = 1.32). No effect was found for majority group members (B = 0.06, SE B = 0.24, 95% CI [-0.41; 0.53], p = .802, d = 0.00). The intervention effect for minority group members was marginally significant in the corrected model (B = 0.49, SE B = 0.26, 95% CI [-0.02; 1.00], p = .061). This uncorrected effect for minority group member might be explained by the baseline difference in self-efficacy to intervene among minority group members (t(-2.43) = 59,74, p = .018, d = 0.56) or by moral disengagement (B = -0.37, SE B = 0.14, p = .012) and gender (B = -0.41, SE B = 0.17, p = .019) being significant predictors in the corrected model (see Supplementary Material 3).

Second, the results of the uncorrected analyses showed a small to medium intervention effect for attitude towards White children for minority group members (B = 0.32, SE B = 0.16, 95% CI [0.00; 0.64], p = .047, d = 0.43), with minority group members in the intervention group reporting to feel more positive towards White children (M = 3.80, SD = 0.84) than minority group members in the control group (M = 3.41, SD = 1.04). No effect was found for majority group members (B = -0.12, SE B = 0.16, 95% CI [-0.44; 0.20], p = .456, d = 0.26). The intervention effect for minority group members did not reach significance in the corrected model (B = 0.30, SE B = 0.17, 95% CI [-0.03; 0.64], p = .079), which could be explained by gender (B = -0.22, SE B = 0.12, p = .049) being a significant predictor in the corrected model (see Supplementary Material 4).

Third, the results of the uncorrected analyses showed a marginal intervention effect for anxiety towards Muslim children for majority group members (B = -0.52, SE B = 0.26, 95% CI [-1.04; 0.00], p = .052, d = 0.47), with majority group members in the intervention group reporting to feel less anxious towards Muslim children (M = 1.55, SD = 0.78) than majority group member in the control group (M = 2.03, SD = 1.27). No effect was found for minority group members (B = 0.27, SE B = 0.26, 95% CI [-0.24; 0.79], p = .302, d = 0.13). This result did not change in the corrected model (B = -0.53, SE B = 0.27, 95% CI [-1.06; 0.00], p = .051).

Discussion

The purpose of this study was to examine the effectiveness of a school-based intervention specifically designed to address racist and weight-based bullying. The intervention was evaluated in three European countries. Overall, the data indicated an increase in intention to intervene in weight-based bullying situations in Scotland and a marginal reduction in intergroup anxiety among majority group children with respect to Muslim children in the Netherlands. The program did not show any positive effects in Greece. While the program thus showed limited evidence of effectiveness, several important lessons have been learned for future development of interventions targeting bias-based bullying.

Looking at the findings in more detail, the program was found to increase the intention to intervene only in Scotland and only in relation to weight-based bullying. That is, the children in the intervention group in Scotland reported to be more willing to step in when overweight children are being bullied after the intervention. It is however difficult to explain how this effect emerged, as we were not able to conduct a process evaluation within this research project. Taking into consideration the socio-cultural context, one explanation might be that weight-based bullying is a more salient issue within the Scottish sample as childhood obesity is a particularly acute problem in Scotland (Bradshaw & Hinchliffe, 2018; Wills et al., 2006). The intervention thus might have led to a more in-depth group discussion on experiences of or witnessing weight-based bullying and prioritized tackling weight-based bullying within this group.

No other positive effects were found on intention to intervene, self-efficacy to intervene and moral disengagement in any of the participating countries. Baseline levels on these outcomes were already fairly positive, making it difficult to improve this (Thornberg et al., 2015, 2017). These outcomes might be explained by social desirability bias (Wachs et al., 2020; Wang et al., 2016). Concerning intention, while children might have indicated that they want to step in in bias-based bullying situations, it was not possible in this study to measure actual behavior. In future evaluations, it is therefore suggested to include measures that are less susceptible to social desirability bias (McKeague et al., 2015).

When analyzing changes in intergroup anxiety and attitudes, overall children reported feeling only slightly anxious towards children of minority groups at baseline, again leaving little room for improvement. In addition, while children seemed to be relatively positive towards children of all groups, attitude scores towards White children were more positive than towards other groups. While this pattern might reflect ingroup favoritism towards the majority group (Palmer & Abbott, 2018), the differences in attitudes and anxiety towards other groups appeared to be too small for the intervention to elicit change. The results did indicate that the intervention might have the potential to reduce intergroup anxiety towards Muslim children in the Netherlands. This is considered as a relevant finding, as this group is highly marginalized in the Netherlands (Thijs & Verkuyten, 2016). Being in a mixed classroom setting might have facilitated the discussion on racist-based bullying (Palmer et al., 2017). Since we do not have information available on the ethnic diversity of the school or community context, it is difficult to elaborate on whether such intergroup processes might have occurred within Scotland and Greece.

Concerning perceptions of a positive peer norm, the Dutch sample showed a shift in peer norms, but this failed to reach significance in the final model. No effects were found in Scotland and Greece. This may be due to measurement issues. First, two of the items used to measure peer norms did not sufficiently load on the factor and were removed from the scale (DeSmet et al., 2018). Second, peer norms towards racist and weight-based bullying were combined in a single item as the aim was to measure a general shift in peer norms towards bias-based bullying. There was also a pragmatic reason behind this decision as it made the questionnaire less time-consuming for children. In hindsight, it would have been more informative to be able to distinguish between racist and weight-based bullying.

The absence of consistent findings across countries is consistent with the results of a recent meta-analysis that found significant variation in the results of anti-bullying programs that were tested and evaluated in different countries such as the KIVA or the NoTrap! Program (Gaffney et al., 2019). According to this meta-analysis, differences in findings could reflect cultural and societal differences between participants in different countries. In addition, previous research has indicated that bullying manifests differently in different countries (Smith et al., 2016). For further development of the GATE-BULL intervention it is therefore suggested to tailor the program to the specific experiences and/or behaviors of children and young people in each country.

Absence of significant findings might also be explained by the fact the intervention was designed to address both racist and weight-based bullying. While the primary aim was to address the most prevalent forms of bias-based bullying in a single intervention (UNESCO, 2019), such an approach might be too generic as it does not take into account the different correlates of stigma as experienced by different minority groups (Dovidio et al., 2017). For example, to effectively address racist-based bullying, children should be knowledgeable about the cultural and historical background of this stigma (Campbell, 2015). In addition, tackling weight-based bullying should address controllability beliefs of becoming overweight (Talumaa et al., 2022). It is therefore recommended that anti-bullying interventions are adapted to specific stigmatizing conditions.

The study has several limitations. There were some practical limitations because of the time frame available for this international study. First, implementation data was not consistently collected across countries, which hindered interpretations of results between countries. In addition, since we were not able to conduct a process evaluation within the time available, it is difficult to analyze why certain effects did (not) emerge. Second, the significant baseline differences between the intervention and control group suggest that a larger trial evaluation is necessary in which schools are fully randomized to make the change of baseline differences between the intervention and control group as small as possible. Third, we did not measure the impact of the intervention on teacher responses to bias-based bullying as it was not the primary goal of this study. Fourth, we only measured intention to intervene rather than actual intervening behavior as there was no possibility to measure long-term behavioral changes.

Furthermore, there were some methodological limitations. First, some of the measures did not perform well in all countries, so there is a need to continue work on refining and testing of appropriate measures. This also includes the need to ensure that measures are age appropriate as some teachers indicated that the questionnaire was too long and difficult to understand for some students. Second, more extensive use of qualitative methods could also help reduce some of the limitations of quantitative measures. In the Netherlands, short evaluation interviews were conducted among teachers, which provided valuable insight on the perceived relevance of the program in the curriculum, observed effects within the classroom, and implementation challenges. For example, one teacher reported that the game and classroom activities encouraged a minority group child to talk about experiences of racism and the impact it had on wellbeing, which made the others more aware what it means to be discriminated. Such an experience is difficult to capture in quantitative data and therefore it is suggested to use a mixed method approach in future intervention evaluations. Third, as previously mentioned, the peer norm scale did not differentiate between racist and weight-based bullying, which should be addressed in future research. Furthermore, the recruitment rate might have impacted the external validity of the results.

Implementation limitations of the program that may have influenced intervention effectiveness include differing levels of engagement with the video game. In some schools, due to shortage of computers and other technical problems, students were not able to play the game individually as originally intended. An important limitation was that, while more schools are using iPads or Chromebooks for education, the game ran on Windows™ operating system only. For future implementation, the game should be made accessible on all devices. Furthermore, students reported that the video game challenges were too easy, and it was obvious what was the best way to resolve the bias-based bullying situations. Although these issues may have impacted on engagement with this component of the intervention, it is difficult to gauge to what extent they influenced the intervention effects, as the main aim of the video game was to provide a warm-up for the structured discussion that followed as part of the lesson plans. Also, it is important to note that although the program was multi-level and multi-component to some extent, it did not have the scope to address issues such as parent and community involvement, school policies and reporting and monitoring arrangements (Priest et al., 2021).

Key strengths of the program include its basis on a multi-disciplinary theoretical and evidence-based approach that integrated theoretical perspectives and evidence from the anti-bullying and prejudice reduction literatures. The program was also consistent with recent recommendations for anti-prejudice interventions to include all students and not just those directly targeted and to be multi-component including elements such as an anti-prejudice and anti-bullying curriculum and teacher training (Priest et al., 2021). Given the very small number of interventions targeting bias-based bullying, particularly weight- and racist-based bullying, further work in this area should continue. Important lessons from this intervention suggest that future interventions against bias-based bullying should (a) pay particular attention to selecting more nuanced outcome measures to minimize social desirability as much as possible, (b) are tailored to a specific form of bias and include challenge of specific cultural beliefs, (c) adopt both quantitative and qualitative measures evaluation measures, (d) involve majority and minority groups, (e) include parent, teacher and community involvement and (f) ensure that digital materials are available on multiple platforms (e.g. laptops, tablets). This will help build a stronger evidence base for interventions targeting bias-based bullying to ensure the human rights of students with devalued identities or attributes are respected and their wellbeing does not suffer as a result of bullying motivated by prejudice.