1 Introduction

Hate speech, understood here as an intentionally harmful and derogatory expression about people (directly or vicariously) based on assigned group characteristics (e.g., ethnicity, sexual orientation, religious affiliation), is widespread among young people (Castellanos et al., 2023; Kansok-Dusche et al., 2023) and thus also affects schools. In contrast to bullying, which refers to an individual person by definition, the term hate speech implies a devaluation related to social group categories which, unlike bullying, does not necessarily require a personalized victim (Kansok-Dusche et al., 2023; Smith et al., 2013). There is growing empirical evidence that hate speech has negative consequences for the targeted individuals (Näsi et al., 2015; Stahel & Baier, 2023; Wachs et al., 2022) and for society in general (Bilewicz & Soral, 2020). Therefore, schools and teachers are generally encouraged to moderate hate speech. We know, from research on related phenomena such as bullying, that teacher behavior in these situations plays an important role and that teachers’ interventions can help minimize problem behaviors (Fischer et al., 2022; Wachs, Bilz et al., 2019; Yoon & Bauman, 2014). To date, there is very limited evidence on how teachers deal with incidents of hate speech in schools (Krause et al., 2023). Against this background, the aim of this binational study was to present a newly developed instrument to assess teacher interventions in hate speech in schools, describing which strategies teachers use, and analyzing which individual and contextual variables are associated with the use of these strategies.

1.1 Hate speech in schools

The phenomenon of hate speech has been the subject of increased empirical research in recent years. Nevertheless, there have only been a few studies, mainly from Western countries, on hate speech in school-aged children and young people (e.g., Lehman, 2020; Wachs et al., 2021). A recent systematic review identified 18 publications that addressed the prevalence of hate speech (online and offline) in children and adolescents aged 5 to 21 years. In these studies, frequency rates for the last 12 months ranged from 26 to 68% for witnessing hate speech, 7–18% for being a victim of hate speech, and 5–32% for perpetrating hate speech (Kansok-Dusche et al., 2023). This wide range is due to differences in definitions, survey instruments, and samples. Nevertheless, these results showed that many children and adolescents have experienced hate speech, potentially at a developmental stage in which they are particularly vulnerable to negative consequences. Exposure to hate speech can be associated with lower well-being (Stahel & Baier, 2023; UK Safer Internet Centre, 2016), reduced levels of social trust (Näsi et al., 2015), and political radicalization (Bilewicz & Soral, 2020).

Because of these psychological consequences and the high prevalence of hate speech among school-aged children and adolescents, it is also a challenge for schools. Schools represent a suitable location to address hate speech incidents and to prevent future hate speech. There is also preliminary evidence from studies that school contextual characteristics are associated with the occurrence of hate speech (Krause et al., 2023; Wachs, et al., 2023). Hence, just as with the related phenomenon of bullying (Hong & Espelage, 2012), a socio-ecological perspective on hate speech seems appropriate; and teachers represent a critical element of a school’s social ecology. Through teaching and interacting with students, teachers shape the school and classroom climate and the attitudes of students (Yoon & Barton, 2008). It is therefore reasonable to assume that teachers’ handling of hate-speech incidents in schools may have an impact on student behavior and could possibly help to moderate hate speech.

1.2 Teachers’ interventions in hate-speech incidents in schools

The few publications available to date that looked at teacher interventions for hate speech in schools represent either theoretical reflections on the appropriateness of arts education and moral imagination (Arneback, 2014; Jääskeläinen, 2020), studies on how to address online hate speech or hate postings (Blaya, 2019; Strohmeier & Gradinger, 2021), or qualitative studies with small samples (Krause et al., 2023).

Further publications are available regarding the broader question of how teachers can address prejudice and racism in schools (Lynch et al., 2017). One example is affirmative action pedagogy (Boler, 2004). This approach is about how marginalized groups can have a voice in the classroom, “even at the small cost of limiting dominant voices” (p. 4). However, these approaches have a prescriptive rather than a descriptive character (an exception to this is the study by Arneback & Jämte, 2022) and are of limited help in answering the question of what teachers actually do to address the specific phenomenon of hate speech in schools.

A review of intervention programs to combat cyberhate primarily identified legislative and technological intervention strategies; the educational programs cited are preventive in nature, are almost exclusively aimed at promoting media literacy, and have rarely been evaluated (Blaya, 2019). A review of German-language anti-hate-speech programs also flagged that that these programs focus exclusively on online hate speech and suggested that more evaluation is needed (Seemann-Herz et al., 2022). In their study, Strohmeier and Gradinger (2021) examined how teachers deal with the specific phenomenon of online hate postings. A total of 130 Austrian teachers surveyed reported their reactions to a hypothetical “hate posting” incident using a newly developed questionnaire. Six dimensions were identified using exploratory factor analysis: alerting other colleagues, victim-oriented rehabilitation, alerting the perpetrators’ parents, authority-based sanctions, seeking help from external professionals, and ignoring. In their qualitative study, Krause et al. (2023) interviewed 46 German teachers and students on teachers’ interventions in hate speech in schools. They identified eight intervention strategies, some of which overlap with those of Strohmeier and Gradinger (2021): punishment, involving the police, involving the parents, involving colleagues, mediation, and working with the whole class. However, others were found only in this study: teaching-related strategies and out-of-school projects and trainings. These differences could be due to the different target variables (hate speech vs. hate postings) and different methodological approaches.

Research on how teachers react to bullying is much more advanced. Here, there are widely used survey instruments for teachers’ interventions (e.g., the Handling Bullying Questionnaire; Bauman et al., 2008), there is evidence about what teachers do in bullying situations (e.g., Burger et al., 2015), which strategies are more effective (e.g., Wachs, Bilz et al., 2019), and which competencies teachers should have to intervene successfully. For example, a consistent finding of several studies is that higher task-specific teacher self-efficacy is associated with teachers ignoring bullying less frequently and a more intensive use of intervention strategies (e.g., Fischer & Bilz, 2019; Fischer et al., 2021). Self-efficacy is the belief that one possesses the required capabilities to accomplish a given task successfully (Bandura, 1994). Teacher self-efficacy has been linked to many student outcomes and is considered an important component of teacher competence (Zee & Koomen, 2016).

Empirical evidence shows that hate speech can occur outside and within sequential bullying processes (Kansok-Dusche et al., 2023). Regarding participation roles, moderate correlations have been found between victimization through traditional bullying and online hate speech (Blaya et al., 2022) as well as between the perpetration of cyberbullying and the perpetration of online hate speech (Wachs, Wright et al., 2019). Despite this moderate empirical overlap between hate speech and bullying, they are distinct phenomena and findings from one field cannot be applied unconditionally to the other. Specific research findings on teachers’ responses in the context of hate speech could be an important prerequisite for the development of prevention and intervention measures and for teacher training.

1.3 The present study

Since there are no quantitative research findings on teachers’ responses to hate speech in schools, the first aim of this binational study was to develop a reliable and valid survey instrument. Such an instrument can be a valuable tool for further research or in evaluating teacher training interventions. For this purpose, the instrument’s factor structure and measurement invariance were examined. The second aim was to describe teacher interventions to hate speech by considering associations with teachers’ gender, professional experience, and country of residence. Finally, the specific hate-speech-related self-efficacy level and its relationship with teacher interventions were examined. Due to the lack of existing knowledge, the study was mainly exploratory in nature. Nevertheless, since dealing with hate speech does not play a major role in teacher training and school-based prevention work in German-speaking countries, we assumed that teachers may not be adequately prepared to deal with this phenomenon. Therefore, we expected teachers to have low self-efficacy expectations when dealing with hate speech. Given the results of bullying research on the subject, we expected a positive association between teachers’ task-specific self-efficacy and the frequency with which they intervene in incidents of hate speech.

2 Methods

2.1 Participants

The sample consists of 486 German and Swiss teachers. Of those, 230 teachers (47%) worked at schools in the German speaking part of Switzerland and 256 teachers (53%) worked at schools in Germany (from the federal state of Berlin and the federal state of Brandenburg). The teachers were on average M = 43.0 years old (SD = 11.4 years). The majority reported a female gender (n = 282, 58%), n = 199 reported a male gender (41%), only one indicated a diverse gender (0.2%), and four did not report their gender (0.8%). Less than a quarter (n = 102, 21%) reported having an immigrant background—the standard measure of ethnic minority status in German-speaking countries is defined as either oneself or at least one parent being born outside of the country in question. In 2021, 27% of the population in Germany (Statistisches Bundesamt, 2023) and 39% of the population in Switzerland (Federal Statistical Office Switzerland, 2023) had an immigrant background. The teachers had worked on average for 15.7 years in their profession (SD = 12 years).

2.2 Procedure

2.2.1 Scale development procedure

As a starting point, we took the German version of the Handling Bullying Questionnaire (HBQ, Burger et al., 2015) with its five dimensions (work with bully, work with victim, enlist adults, ignore, authority-based interventions). We removed items that captured bullying-specific interventions (e.g., “I would help the bully achieve greater self-esteem so that he or she would no longer want to bully anyone.“), and changed the wording of the items by replacing “bullying” or “bully” with “hate speech” or “person who carried out hate speech,” respectively. We then conducted 46 qualitative interviews with students and teachers, asking them: “In what ways do you know to respond to hate speech?” Prior to asking this question, a definition of hate speech was presented to the participants to increase the validity of their responses (Krause et al., 2023): “We define hate speech as the public, deliberate and derogatory insulting of a social group. Sometimes an individual person is also insulted because they belong to a certain group. Hate speech can happen at school, but it can also take place online.” Some of the reported new hate-speech-specific intervention strategies could easily be assigned to the existing dimensions of the HBQ (e.g., “I would work together with external partners [for instance, the anti-discrimination department, organizations etc.]”). The following three strategies could not be assigned and were used to form the new dimension “Teaching-oriented strategies”: “During class, I would educate students, challenge prejudices, and refute misinformation,” “During class, I would teach students about the consequences of the discrimination and exclusion of minorities,” and “During class, I would lead a discussion on the fine line between stirring up hatred and free speech.” Eight HBQ bullying-related items were deleted, two items were merged into one item, and seven new items were added. This led to the first version of the Hate-Speech Interventions Scale for Teachers (HIST) with 20 items and six subscales (see Table 1), used for the survey with the present sample. As in the HBQ, a hypothetical scenario (here: a hate speech scenario) was used, to which the questions on the intervention strategies refer.

2.2.2 Data collection and sampling procedure

The research was approved by the University of Potsdam Ethics Committee (UP65/2018). Teachers participated in this research on an anonymous and voluntary basis. In Germany, an acquisition pool of sample schools was created using a stratified and randomized probability-proportional-to-size scheme (Yates & Grundy, 1953). The stratification characteristics were federal state (Berlin and Brandenburg) and type of school (grammar secondary school [Gymnasium], mixed secondary schools with elements of both academic and non-academic schools [Gesamtschule or Integrierte Sekundarschule], non-academic-track secondary school [Oberschule] or school for special education [Förderschule]). In Switzerland, the acquisition pool of sample schools was designed using a contrastive sampling scheme that was based on high/low migrant background and on rural/urban geography. From the acquisition pool, a total of 100 schools (Germany: n = 76; Switzerland: n = 24) received phone calls and emails to inform them that their schools had been randomly selected to participate in this research. Due to the coronavirus pandemic, a significant proportion of German schools declined to participate (e.g., because of high regional infection rates or resource constraints). In total, 43 schools (Germany: n = 21; Switzerland: n = 22) agreed to participate (participation rate at the school level: 40% in total; 24% for Germany; 90% for Switzerland). In total, 1,621 teachers from these schools (Germany: n = 1,149; Switzerland: n = 472) were invited to participate. In Germany, teachers completed a paper-pencil survey between October 2020 and June 2021. In Switzerland, data gathering took place between December 2020 and April 2021. The teachers received an access code to an online questionnaire via email, which they subsequently completed.

The final sample for the data analysis consisted of 486 teachers (participation rate at the teacher level: 30% in total; 22% for Germany, 49% for Switzerland). In Germany, the teachers (n = 256) were distributed across different school types (grammar secondary school [Gymnasium]: n = 88, 34.4%; mixed secondary schools with elements of both academic and non-academic schools [Gesamtschule or Integrierte Sekundarschule]: n = 114, 44.5%; non-academic secondary school [Oberschule]: n = 40; 15.6%; school for special education [Förderschule]: n = 14, 5.5%). Swiss teachers (n = 230) were distributed among the country-specific school types (separated model: n = 101, 43.9%; cooperative model: n = 58, 25.2%; integrated model: n = 69, 30%; missing: n = 2; 0.9%). The participation on the school level ranged from a minimum of three teachers per school to a maximum of 29 teachers per school.

2.3 Measures

2.3.1 Teachers’ interventions in hate speech

The following introduction was given to the participants: “Please imagine the following situation: In your classroom, students make publicly offensive statements about the skin color, origins, sexual orientation, religious affiliation, or gender of a group of people (e.g., people born in other countries, Muslims, homosexuals, women, trans people).” The teachers were then asked the question: “Have you ever experienced a situation like this?” (response options: “yes” / “no”). Then the teachers were asked: “What would you do or what have you done in this situation?” They were then invited to respond to 20 items (see Table 1) using a five-point response scale: 0 = “I would definitely not do that (0% of cases)”, 1 = “I probably would not do that (25% of cases)”, 2 = “I would do that every now and then (50% of cases)”, 3 = “I probably would do that (75% of cases)”, 4 = “I would definitely do that (100% of cases)”.

2.3.2 Hate-speech-related self-efficacy

We assessed a task-specific form of self-efficacy related to the intervention in hate speech, understanding it as the teacher’s individual expectation that they were capable of handling these incidents successfully. For this, we adapted the bullying-specific self-efficacy scale (Fischer & Bilz, 2019) by replacing “bullying” with “hate speech,” removing one bullying-specific item, and adding four new items (“I am confident that I can recognize the boundary between free speech and hate speech,” “I know how to support students who have been the target of hate speech,” “If there is hate speech at my school, I know who I need to turn to for help,” “When it comes to hate speech from students, I feel confident that I can effectively counter it and take a stand.”) These additions were based on the idea that hate speech - more so than bullying - requires teachers to deal more intensively with the content of the derogatory comments. Answers were given on a four-point Likert scale: 0 = “not at all true”, 1 = “hardly true”, 2 = “moderately true”, 3 = “completely true”. CFA points to a one-factor-solution as most indices suggest an adequate fit (χ2 [20] = 77.88, p <.001; RMSEA = 0.078 [90% confidence interval CI: 0.060; 0.097; p (RMSEA ≤ 0.05) = 0.006]; CFI = 0.937; SRMR = 0.042). The Chi-square test, as well as the CFI, speak against an adequate model fit of the one-factor solution. A model with two factors (factor 1: recognizing hate speech, 4 items from the bullying-specific self-efficacy scale; factor 2: intervening in hate speech, 4 newly developed items) was tested but did not reach a better fit with an increased RMSEA (χ2 [19] = 76.97, p <.001; RMSEA = 0.080 [90% confidence interval CI: 0.062; 0.099; p (RMSEA ≤ 0.05) = 0.004]; CFI = 0.937; SRMR = 0.042). As two common indices suggest an adequate fit, the one-factor solution is chosen. Internal consistency (composite reliability) of the 8-item scale in the present sample was CR = 0.808.

2.3.3 Control variables

Teachers were asked for their gender (male, female, gender diverse) and their professional experience in schools in number of years (two groups based on median split: 0 to 11 years vs. 12 to 45 years). The country of residence was assigned based on the country in which the data was collected.

2.4 Data analyses

2.4.1 Power analysis

A priori conducted power analysis with G*Power (Faul et al., 2007) revealed that, to detect small to medium correlational effect sizes (ρ = 0.20), the present study needed a sample consisting of at least 193 participants (α = 0.05, Power = 0.80). Taking the hierarchical structure of the sample (teachers in schools) and the non-response rate into consideration, the resulting minimum sample size is N = 296 teachers (Teerenstra et al., 2010). Accordingly, the present sample size was sufficient to investigate the hypotheses.

2.4.2 Exploratory and confirmatory factor analyses

After an initial descriptive analysis of the items, the sample was randomly divided into two equally sized subsamples. The first subsample was used to examine the factor structure with an exploratory factor analysis (EFA; maximum likelihood estimation, Promax rotation) using the statistics program SPSS 28. To prevent the occurrence of random factors, a parallel analysis (O'Connor, 2000) was conducted for a randomly generated data matrix of the same size. The result of the EFA then underwent confirmatory testing using the data from Subsample 2. The Chi-square test and RMSEA, CFI, and SRMR were used as fit indices (interpretation according to Schermelleh-Engel et al., 2003: χ2/df between 0 and 2 for a good fit, between 2 and 3 for an acceptable fit; RMSEA ≤ 0.05 for a good fit, 0.05 to 0.8 for an adequate fit, 0.08 to 0.10 for a mediocre fit, > 0.10 not acceptable; left boundary of the CI < 0.05 for a close fit; CFI: >0.97 for a good fit, > 0.95 for an acceptable fit; SRMR: <0.05 for a good fit, < 0.10 for an acceptable fit). Confirmatory factor analysis (CFA) was performed using Mplus 8.7 with robust maximum likelihood estimation (MLR) and type = complex to consider clustering.

2.4.3 Measurement invariance testing

Measurement invariance of the scale was tested for gender, professional experience, and country in which the data was collected. To do so, a series of multi-group confirmatory factor analyses was calculated in the total sample in Mplus version 8.7 (configural [free factor loadings and item intercepts, factor variances and means fixed for model specification], metric [equal factor loading, free factor variances in second group], scalar [equal item intercepts, free factor means], strict [equal item residual variances, free factor means]). Cases with missing data were excluded from the analyses (Listwise Delete). The estimator used was a maximum likelihood estimator with robust standard errors, which is suitable for violations of the normal distribution assumption at the item level (MLMV was chosen; Maydeu-Olivares, 2017). Chi-square tests between the models (difference test) as well as RMSEA, CFI and SRMR were used to check the model fit. The indices were assessed according to Schermelleh-Engel et al. (2003) and Chen (2007; ∆CFI < − 0.010, ∆RMSEA > 0.015, ∆SRMR > 0.015). Unlike the calculation of CFA, it was not possible to consider the clustering of school-level data in the measurement invariance test (unreliability of the determined results due to too small sample sizes because of some schools being relatively small).

3 Results

3.1 Item analysis, EFA, and CFA

The mean values and standard deviations of the 20 items can be found in Table 1. Item analysis revealed extremely low item difficulties for items 11, 12, 13, and 14, and extremely high item difficulties for items 15 and 16. These item difficulties fall below the critical threshold of 0.8 or above the critical threshold of 3.2, respectively (representing values of 0.2 and 0.8, respectively, when the possible range of values is standardized to 0 to 1; Lienert & Raatz, 1994). These items are derived from the two HBQ scales ignore and authority-based interventions, which, in the original version, already exhibited problems in terms of item difficulty and internal consistency (Burger et al., 2015). For this reason, the seven items of these two dimensions were not included in the following analyses.

To explore the factor structure of the remaining 13 items by means of EFA, subsample 1 (n = 243) was used. The parallel analysis indicated the adequacy of a three-factor solution, because the eigenvalue of the fourth factor was smaller than the corresponding criterion value of the comparison random matrix. The scree plot and the Kaiser criterion (eigenvalue > 1) also suggested a three-factor solution. Table 1 lists the factor loadings of the rotated pattern matrix of this three-factor-solution (Maximum likelihood, Promax rotation). The first factor (“Working with those directly involved”) explained 30.7% of the variance and received high loadings from items 1 to 5, which originally stem from the two HBQ scales work with the bully and work with the victim. The new items 18 to 20 capturing “Teaching-oriented strategies” had high loadings on the second factor (explained variance: 7.6%), and the adapted items 7 to 10 from the HBQ scale enlist adults showed high loadings on the third factor (“Collaboration with others,” explained variance: 8.0%). Because Item 6 had only a small loading on the first factor and, in addition, had a very low communality of 0.15, it was not considered for the following analyses.

Table 1 Results of the item analysis and the EFA of the Hate-Speech Interventions Scale for Teachers (HIST)

The result of the EFA was tested in the second subsample (n = 243) using CFA. The 3-factor solution obtained in the EFA showed a good global model fit in the CFA2 [51] = 77.16, p =.011; RMSEA = 0.048 [90% CI: 0.024; 0.068; p (RMSEA ≤ 0.05) = 0.549]; CFI = 0.970; SRMR = 0.047). Figure 1 shows the standardized factor loadings, intercepts, residual variances, and the correlations of the latent factors. The matrix of the standardized covariance residuals (z-scores) is presented in the Online Resource. The matrix points to single problems concerning item 20 (which loads on factor 2, Teaching-oriented strategies) and item 7 and item 10 (which load on factor 3, Collaboration with others).

Fig. 1
figure 1

Results of the Confirmatory Factor Analysis of the Hate Speech Interventions Scale for Teachers in Subsample 2 (N = 226) (standardized values)

Composite Reliabilities of the three factors were above 0.7, indicating a reliable measure (factor 1, Working with directly involved students: CR = 0.815; factor 2, Teaching-oriented strategies: CR = 0.829; factor 3, Collaboration with others: CR = 0.767). The Average Variance Extracted (AVE) is acceptable for factor 2 (factor 2, Teaching-oriented strategies: AVE = 0.619), but is slightly below the level of 0.5 that is usually considered necessary for convergent valid measures (factor 1, Working with directly involved students: AVE = 0.477; factor 3, Collaboration with others: AVE = 0.454). However, as the AVE for factor 1 and factor 3 are close to the value of 0.5 and the CR for both factor 1 and factor 3 are above the threshold of 0.7, we consider the measure as generally reliable and valid. This is especially the case as the AVE is considered a more conservative measure than CR (e.g., Fornell & Larcker, 1981) and as the measure was thoroughly tested in an EFA before.

3.2 Measurement invariance testing

The results of the measurement invariance test are presented in Table 2. The results showed partial metric, partial scalar, and partial strict measurement invariance with respect to gender. Each of these could be achieved when the equality restrictions were removed for one of the four items of the factor “Collaboration with others” (item 9). The model comparison between the configural and the partial metric model was with p =.054 only slightly above the established significance threshold of p =.05. However, for all other indices (RMSEA, CFI, SRMR, χ2/df) there was also no significant deterioration in the partial metric compared to the configural model (Δ ≤ 0.007) so that in combination with the reported chi-square difference test it was assumed that the corresponding higher level of measurement invariance was reached.

Regarding professional experience, complete metric, and partial scalar as well as partial strict measurement invariance was shown to be achieved in each case when the equality restrictions were removed for two of the five items of the factor “Working with those directly involved” (items 3 and 5).

With respect to the country of residence, complete metric but no scalar measurement invariance could be achieved. The problems were constantly related to the factor “Collaboration with others.” For this reason, the measurement invariance regarding the country of residence was additionally only checked for the factors “Working with those directly involved” and “Teaching-oriented strategies.” For these two factors, complete metric and partial scalar measurement invariance was found with respect to the country, which could be achieved in each case when the equality restrictions were removed for two items (one item of each factor; items 1 and 4). However, strict measurement invariance was not achieved. Due to the lower number of items, no further restrictions were removed to potentially achieve partial strict measurement invariance.

For all models, the fit indices indicated at least an acceptable fit. The only exception to this was the CFI value, which was often below 0.95, indicating an unacceptable fit. However, since CFI values are often smaller for maximum likelihood estimation methods (Shi & Maydeu-Olivares, 2020) and all other indices indicated at least an acceptable fit, each with different underlying methods for estimating goodness of fit (Lai & Green, 2016; Schermelleh-Engel et al., 2003), a sufficient model fit was assumed.

Overall, the results of the measurement invariance test showed that mean comparisons with respect to gender and professional experience are acceptable for all three subscales. Regarding country of residence, mean comparisons are only possible for two of the three subscales and not for “Collaboration with others.

Table 2 Results of the measurement invariance testing of the Hate-Speech Interventions Scale for Teachers (HIST) across gender, professional experience, and country of residence

3.3 Group comparisons and correlations

The analysis of the means of the three subscales showed that teachers would most likely work with the students who were directly involved (M = 3.28, SD = 0.68), followed by using teaching-oriented strategies (M = 3.04, SD = 0.83), and collaborating with others (M = 2.28, SD = 0.79). When asked whether they had ever experienced the situation described in the hypothetical scenario, 60.5% of teachers answered yes, and 39.5% answered no. Teachers who had never experienced such a hate speech incident were significantly more likely to collaborate with others (M = 2.41, SD = 0.77) than teachers with this experience (M = 2.20, SD = 0.79; t[470] = 2.86, p <.01). There were no significant differences between the two groups for “Working with those directly involved” and “Teaching-oriented strategies”.

The subgroup comparisons regarding gender, professional experience, and country of residence are presented in Table 3. The analyses revealed that, when confronted with hate speech incidents, female teachers were significantly more likely to collaborate with others (M = 2.40, SD = 0.74) than male teachers (M = 2.12, SD = 0.83), and professionally experienced teachers were more likely to work with the students directly involved (M = 3.36, SD = 0.66) than less professionally experienced teachers (M = 3.20, SD = 0.70). The effect sizes of these group differences were small. All other comparisons were not significant.

The overall mean for hate-speech-related self-efficacy was M = 2.17 (SD = 0.43), indicating a mean response between the categories “quite true” and “completely true.” Significantly higher hate-speech-related self-efficacy was reported by male (M = 2.22, SD = 0.38) compared to female teachers (M = 2.13, SD = 0.46), more professionally experienced (M = 2.22, SD = 0.43) compared to less professionally experienced teachers (M = 2.12, SD = 0.42), and Swiss (M = 2.24, SD = 0.41) compared to German teachers (M = 2.10, SD = 0.44). The effect sizes of these group differences were small (Table 3).

Table 3 Univariate comparisons of the three HIST scales and hate-speech-related self-efficacy regarding gender, professional experience, and country of residence

The intercorrelations of the HIST scales were in the medium range for the correlation between “Working with those directly involved” and “Collaboration” (r =.43, p <.001) and for “Teaching-oriented strategies” and “Collaboration” (r =.44, p <.001); only the correlation between “Working with those directly involved” and “Teaching-oriented strategies” was in the high range with r =.56 (p <.001).

Multiple linear regression analyses were performed to examine the associations between hate-speech-related self-efficacy and the HIST subscales. Gender, professional experience, and country of residence were included as control variables (Table 4). The results revealed that all three subscales had a positive significant correlation with hate-speech-related self-efficacy. The associations with the subscales “Working with those directly involved” (ß = 0.36, p <.001) and “Teaching-oriented strategies” (ß = 0.34, p <.001) were in the medium range, and in the low range with the subscale “Collaboration” (ß = 0.13, p <.01).

Table 4 Regression coefficients of hate-speech-related self-efficacy and control variables on HIST scales

4 Discussion

The starting point for this study was, on the one hand, the growing significance of hate speech in schools and, on the other hand, the lack of empirical findings on teachers’ intervention strategies. Against this background, we developed a new survey instrument, which we tested psychometrically. In addition, we analyzed the characteristics of three different intervention strategies, taking into account differences in gender, professional experience, and country of residence. Finally, we examined the correlations between the three intervention strategies and hate-speech-related self-efficacy.

4.1 Factor structure of the Hate-Speech Interventions Scale for Teachers (HIST)

This study demonstrates that teachers’ intervention strategies in response to hate speech incidents can be validly and reliably captured by HIST with three dimensions: “Working with those directly involved,” “Teaching-oriented strategies,” and “Collaboration with others. The first dimension combines the two adapted scales “Work with bully/victim” from the HBQ, which is used in bullying research. Since, by definition, hate speech—unlike bullying—does not always necessarily involve identifiable victims who are present, it is not possible to separate, empirically, intervention strategies based on whether they target perpetrators or victims. The identification of victim-related strategies in Strohmeier and Gradinger’s (2021) study of teacher interventions for hate postings could be related to the hypothetical scenario used in the study, which describes an incident in which an individual student is targeted. This is thus closer to the phenomenon of bullying than to hate speech, in which, as mentioned, there is not always an obvious binary of perpetrator and victim.

The second dimension captures teaching-oriented strategies in which teachers challenge prejudice or address the boundary between free speech and hate speech. These strategies appear to be specific to dealing with hate speech and have not previously appeared in research on bullying (Bauman et al., 2008; Burger et al., 2015), but have been mentioned in approaches for dealing with racism and prejudice in schools (Lynch et al., 2017). In contrast, the third dimension, which addresses collaboration with others (parents, external partners, colleagues), is used in bullying research and is also described by Strohmeier and Gradinger (2021) in their study on hate postings.

Authority-based strategies and ignoring the problem, which are strategies known from bullying research, cannot be confirmed as distinguishable dimensions for interventions in incidents of hate speech. The HBQ has already shown problems with internal consistency for these two scales (Burger et al., 2015). In this study, it is not possible to reliably capture the strategy of ignoring the problem. Although agreement rates on these items are encouragingly very low in our study, it can be of great practical value in capturing tendencies of individual teachers to ignore hate speech in schools. For this purpose, we suggest additionally using item no. 12 (“I would not take the incident very seriously”) separately from the three dimensions of HIST, as the agreement rate was still somewhat higher for this item.

In contrast, almost all teachers agreed with the statements that described authority-based interventions. The issue here for items 15 and 16, in particular, was probably that the items did not succeed in capturing the authoritarian nature of these interventions (e.g., punishing, not explaining sanctions) that we had intended them to incorporate. These items focus more on stopping hate speech quickly, meaning that many teachers may, presumably, have agreed with these statements, without it necessarily revealing an authoritarian tendency in their behaviors. For further development of the instrument with regard to this aspect, it could be helpful to take established instruments of parenting-style research as a starting point.

The testing of the measurement invariance shows that the three HIST subscales can be used to compare the mean values of female and male teachers, teachers with little vs. a lot of professional experience, and teachers from Germany and Switzerland (though, in the case of the latter, not for the subscale “Collaborating with others”).

4.2 Teachers’ self-reported intervention strategies in hate-speech incidents

If they imagine a case of hate speech in their classroom, most teachers would direct their interventions toward the students directly involved. Teaching-oriented strategies ranked second, and collaboration with parents, colleagues, or external partners ranked last. These findings are not consistent with the few previous findings on teacher responses to hate postings, where teachers were most likely to involve other colleagues, followed by victim-oriented strategies, involving parents, authority-based sanctions, and lastly seeking help from external professionals (Strohmeier & Gradinger, 2021). Only two dimensions (“Working with those directly involved” and “Collaboration with others”) can be compared to research on teacher interventions for bullying; when compared to teachers’ self-reports in these areas, we find poor agreement (Bauman et al., 2008; Burger et al., 2015) but greater agreement when students report on their teachers’ interventions (Wachs, Bilz et al., 2019). The lower preference for involving others and the preference for teaching-oriented strategies found in this study may indicate that teachers attribute different causal factors to hate speech compared to bullying. While bullying may be understood more as a social phenomenon that can only be countered through joint efforts, the causes of hate speech are more likely to be seen in personal aspects like knowledge deficits and prejudices. Also, because hate speech is more closely linked to social inequalities, teachers may be more inclined to address these incidents in their teaching. However, because teachers’ causal attributions for hate speech and bullying were not explored in this study, these considerations remain speculative.

To qualify the findings on group differences by gender, professional experience, and country of residence, it is necessary to refer to findings on related phenomena, since no findings are available on hate speech. The finding that female teachers are more likely to involve others in their interventions aligns with Bauman et al.’s (2008) findings on U.S. teachers’ interventions for bullying. Burger et al. (2015), on the other hand, could not confirm the same with Austrian teachers. And there were also no gender differences in how teachers involve others when dealing with hate postings (Strohmeier & Gradinger, 2021). The role of professional experience has only been examined in the context of bullying interventions. As was the case in this study, Burger et al. (2015) reported a stronger focus on interventions with those directly involved among more professionally experienced Austrian teachers, whereas these differences did not exist among U.S. teachers (Bauman et al., 2008). However, it should be noted that the differences found in the present study are relatively small in magnitude.

The correlation analyses show a close relationship between the two HIST strategies “Working with those directly involved” and “Teaching-oriented strategies.” In contrast, “Collaborating with others” seems to be less strongly correlated with these strategies. One possible explanation for this could be that collaborating with colleagues, parents, and external partners on hate-speech incidents requires more specific competencies that not all teachers possess. This is consistent with the finding that these strategies are used comparatively seldom.

4.3 Associations with hate-speech-related self-efficacy

The hypothesis that teachers would have comparatively low self-efficacy when it comes to dealing with hate speech cannot be confirmed. Teachers feel confident in recognizing and countering hate speech and supporting those affected. This is particularly true for male teachers with more professional experience and more so for Swiss teachers than for German teachers. One reason could be that, although dealing with hate speech is not yet a prominent part of teacher training, teachers may experience it in practice (Kansok-Dusche et al., 2023) and thus develop confidence in their competencies in dealing with it. Another reason for the high level of self-efficacy could be that teachers can identify links between the content of their professional training and the competencies needed to deal with hate speech. For example, the standards for teacher training that are binding for Germany list some relevant competencies for prospective teachers (e.g., “Teachers teach values and norms, an attitude of appreciation and recognition of diversity, and support self-determined, reflected judgment and action by students,” KMK, 2019, p. 10).

As expected, teachers with higher self-efficacy expectations do use all intervention strategies for hate speech more often than teachers with lower self-efficacy. This association is stronger for the two strategies “Working with those directly involved” and “Teaching-oriented strategies,” and only very weak with the strategy “Collaboration with others. The first two strategies may be more likely to be composed of practices that teachers use in other contexts, while more specific skills are needed to collaborate with others on hate speech incidents (especially when external partners are involved). As such, confidence in one’s own competencies in this area may not be sufficient. In the area of bullying, it is particularly the collaboration between colleagues at all levels that is an important part of prevention, for example in the whole-school approach of Dan Olweus (Olweus & Limber, 2010). Also, the link between teacher interventions and self-efficacy has already been demonstrated in many studies on bullying (Bradshaw et al., 2007; Fischer & Bilz, 2019; Williford & Depaolis, 2016). Wachs, et al. (2023) also confirm the importance of self-efficacy for students’ use of counter-speech in hate-speech incidents in school, stressing the importance of further investigation into hate-speech-related self-efficacy, both in students and teachers.

4.4 Limitations, strengths, and further directions

Despite the new findings in the current study, several limitations should be taken into account. Firstly, the data used here are based on teachers’ self-reports and thus may be susceptible to self-referential bias. Future studies should also consider student reports of their teachers’ intervention behaviors, as is already common in bullying research (Bilz & Fischer, 2020; Wachs, Bilz et al., 2019 ). Secondly, teachers’ intervention behavior is assessed using a hypothetical scenario. Although this scenario encompasses various forms of hate speech, it may not represent the full range of hate speech incidents that teachers may encounter in practice. Given that 60% of teachers stated that they had already experienced such a situation, this suggests that many based their responses on actual experiences with hate speech. However, only 40% expressed behavioral intentions. Even though our analyses point to only small differences between the two groups, this must be considered a methodological limitation. Studies using teacher reports of real hate-speech incidents could potentially come to different conclusions and would also be able to consider more situational and contextual determinants of teacher interventions. Another limitation is the low response rate at the school level and the teacher level, especially in the German subsample. Therefore, further caution is needed when generalizing the study results. In addition, using a four-point response scale when assessing hate-speech-related self-efficacy could be associated with lower psychometric precision compared to a five- or six-point response scale (Simms et al., 2019). The items and item structure of HIST have thoroughly been analyzed in exploratory and confirmatory factor analyses based on two sub-samples. The standardized covariance residuals point to problems with three items (items 7, 10, 20). In addition, the AVE for factors 1 and 3 are too low. As most standardized covariance residuals and the CR for all three factors point to a reliable and valid measure, no further changes have been made to the item structure that had been identified in the EFA. However, the described problems suggest that the measure could be improved by re-formulating particular items and adding further items. Finally, the fact that strategies that are authority-based or involve ignoring what is going on cannot be confirmed as part of teacher responses to hate speech in this study may have been due to the wording of the corresponding items. The challenge for future studies would be to find better ways of describing these issues that do not result in such extreme item difficulties. The examination of contextual variables could also be a worthwhile objective of future studies in this area. The differences found between German and Swiss teachers point in this direction. It can be assumed that school- and class-specific factors are also related to how teachers deal with hate speech. In addition to variable-oriented data analysis, person-oriented analysis strategies such as latent profile analysis could provide interesting insights into the behavior of intervention within specific groups of teachers in future studies. A strength of this study is the rigorous psychometric testing of a new instrument in two different countries and the demonstration that the HIST can be used in a reliable and valid way to assess teacher interventions for hate speech in schools. Nevertheless, it would be desirable to establish further evidence of validity.

4.5 Practical implications

The finding that most teachers would use a wide range of intervention strategies in response to hate speech and report high confidence in their own abilities to intervene in hate speech is an encouraging result and shows that teachers and schools should play an important role in reducing hate speech. Teachers’ real-life experiences in using strategies, such as transferring knowledge, challenging prejudices, and countering misinformation, could be used to develop new evidence-based prevention programs. The relationship found between the use of these strategies and teacher self-efficacy could also motivate efforts to increase teachers’ confidence in their own competencies. Bullying research revealed that teacher self-efficacy can be increased through training (Byers et al., 2011; Crooks et al., 2017; Newman-Carlson & Horne, 2004). The comparatively low use of collaborative strategies and the weaker association with self-efficacy may indicate that teachers need more support in this specific area. The setup of support networks with external partners or established forms of staff cooperation within the school could be helpful in this regard.

4.6 Conclusions

This study examines teachers’ intervention strategies in response to hate speech in school. We present and test HIST as a newly developed survey instrument that can be used to reliably and accurately measure intervention strategies along three dimensions. Data from teachers in two different countries show that teachers focus their interventions primarily on those directly involved in hate speech and that they frequently use teaching-oriented strategies. In comparison, external partners or colleagues are rarely involved in interventions. These results, and the association found with self-efficacy, indicate that schools should play an important role in moderating hate speech and that increasing teachers’ confidence in their own ability to address this phenomenon could be an important element of teacher training.