1 Introduction

The basic tenet of inclusive education is to promote the individual academic, social, and personal development of every student to the best of their abilities within the same classroom. Social development can be fostered by allowing students with disabilities to learn from and make friends with other students in the regular school setting. Social participation in a school context means that students interact with peers, are accepted by peers, have friendships, and feel socially integrated (Koster et al., 2009). However, research consistently shows that this is one of the main challenges within inclusive education. Mainstreaming alone does not necessarily lead to social participation: students with disabilities often have fewer interactions with their peers, are rejected more often, have fewer friends, and tend to feel less socially integrated than students without disabilities (Bossaert et al., 2015; Koster et al., 2010; Petry, 2018; Pijl & Frostad, 2010; Schwab, 2015). This is not solely due to individual factors (e.g. the weaker social skills of students with social-emotional disabilities). Contextual social factors play an equally important role, such as attitudes and behaviours of peers and teachers (Mikami & Normand, 2015). Negative peer attitudes toward students with disabilities form a significant obstacle (de Boer et al., 2012). In their interviews, Nowicki et al. (2014) identified perceived differences as the overarching theme in primary school students’ reasoning on why peers with disabilities were more likely to be socially excluded. Differences in ability, behaviours and physical appearance were frequently mentioned, with many statements displaying negative stereotypes and prejudice. Therefore, to improve the social participation of students with disabilities, intervention studies often aim to strengthen positive peer attitudes. The following study is the first to study the effects of a novel, low-threshold, curriculum-based eight-week intervention program to foster the appreciation of human diversity. Furthermore, it is the first intervention study that examines not only impacts on explicit, but also on implicit student attitudes toward peers with disabilities.

1.1 Interventions to improve attitudes toward peers with disabilities

Based on the contact hypothesis (Allport, 1954), one promising approach to improving intergroup attitudes is the promotion of positive interactions with peers with disabilities. However, disability interventions can also foster positive attitudes in other ways. Interventions are often differentiated according to whether they use direct contact with peers or people with disabilities, indirect contact (e.g. using multi-media interactions), or provide information or conduct classroom activities (e.g. Armstrong et al., 2017; Chae et al., 2019; Lindsay & Edwards, 2013; McManus et al., 2021). There is good evidence for the effectiveness of interventions that promote contact and cooperation (such as joint activities in sports, theatre, or other projects) and cooperative learning in inclusive classrooms (Armstrong et al., 2017; Chae et al., 2019; McManus et al., 2021). However, non-contact interventions such as curriculum- or multi-media-based interventions have also been shown to be effective (Chae et al., 2019; Lindsay & Edwards, 2013). Indeed, combined contact- and information-based approaches—or multicomponent approaches in general—have stronger effects on attitudes than single-component or contact-based approaches alone (Lindsay & Edwards, 2013; McManus et al., 2021).

Despite this seemingly solid evidence (with large effect sizes in some meta-analyses, see Chae et al., 2019), important questions remain regarding the nature of the changes in children’s attitudes following interventions, how long changes in attitudes last and their relationship with actual behaviour (Flórez García et al., 2009; McManus et al., 2021).

1.2 Explicit and implicit attitudes

The success of many intervention studies is assessed by asking students about their beliefs, feelings or intentions to interact with peers with disabilities. Questionnaires often target different components of attitudes (i.e. cognitive, affective or behavioural aspects; Eagly & Chaiken, 1993), but before reporting on these, students need to be able to accurately access their feelings or predict their own behaviour. However, they may not be aware of their own biases or discriminatory behaviours, and explicit self-reporting tests only capture the conscious aspects of their attitudes. Additionally, explicit measurements of attitudes may suffer from social desirability bias (Greenwald & Banaji, 1995). Studies have shown that explicit and implicit attitudes diverge with age when dealing with socially sensitive subjects and prejudice and that explicit and implicit attitudes independently predict behaviour among children and adolescents (Phipps et al., 2019). Children may learn to withhold expressions of socially undesirable attitudes as they grow older (e.g. Rutland et al., 2005). Implicit assessments of attitudes, such as the Implicit Association Test (IAT; Greenwald et al., 1998), try to overcome this bias by measuring more associative and automatic evaluations of attitude objects (Gawronski & Bodenhausen, 2006). They mainly reflect the affective component of attitudes because the association strength between attitude objects and positive or negative valence categories is measured (Fazio, 2007). According to the associative-propositional evaluation (APE) model (Gawronski & Bodenhausen, 2006), such automatic affective reactions depend on processes of pattern activation in associative memory (e.g., an image of person with down syndrome might activate automatic associations such as “childish” or “incapable”), which precede explicit attitude judgements based on propositional, inferential processes (“I like/dislike people with down-syndrome”). Regarding such automatic associations toward disability, large-scale studies and reviews suggest strong negative implicit bias (i.e., a strong association of disability with negative valence categories) among adults, with only weak correlations with explicit attitudes (Charlesworth & Banaji, 2019; Wilson & Scior, 2014). In general, it has been demonstrated that implicit biases exist in children as young as four (e.g. Cvencek et al., 2011) and that self-reported attitudes, but not implicit attitudes, become considerably less biased with age (e.g. Baron & Banaji, 2006).

Implicit attitudes have long been thought to be relatively inflexible due to the repeated learning process of social categories. However, manifestations of implicit attitudes have been shown to be affected by personal and contextual aspects, such as motives, goals, and internal & external situational cues (Blair, 2002; Wittenbrink et al., 2001). For example, attempting to suppress a stereotype or imagining counterstereotypic events or group members has a significant effect on the subsequent display of automatic stereotypes (Blair, 2002). In contrast, changing implicit attitudes for the long term seems much more difficult, at least what is known from studies using brief experimental interventions (Forscher et al., 2017; Kurdi & Banaji, 2022). However, very few studies have investigated long-term changes in implicit attitudes following interventions administered for more than a single session. Extended interventions might be more successful through repeated activation of counterstereotypic information and better training in suppressing stereotypic thinking. Further, because implicit attitudes adapt to and are shaped by the social environment (Dasgupta, 2013), interventions are probably most effective when aimed at changing the environment (e.g., by changing social norms). Also, studies suggest that change in attitudes might be easier during childhood than later in development because of shorter exposure to cultural bias. For example, Neto et al. (2016) found reductions in implicit racial bias for up to two years after a long-term musical class intervention (18 sessions of 60 min of cross-cultural music education over half a year), while other studies found changes in implicit attitudes in children but not in adults (Gonzalez et al., 2021). In contrast, Žeželj et al. (2015) assessed attitudes in Serbian school classes with or without included Roma children and found better explicit, but not better implicit attitudes toward these children in inclusive school classes.

While most research regarding implicit attitudes in children has been studied in the domain of racial attitudes, there are surprisingly few studies regarding children’s implicit disability attitudes. O’Driscoll et al. (2012) studied differences in children and adolescents’ explicit and implicit attitudes toward peers with attention deficit hyperactivity disorder (ADHD) and depression. They found that explicit attitudes toward peers with ADHD were generally more negative. Implicit attitudes showed differential patterns: male adolescents gave more negative evaluations of peers with depression than younger males and adolescent females. A similar study assessing explicit and implicit attitudes toward peers with autism spectrum disorder again revealed negative explicit and implicit attitudes. Whereas explicit attitudes improved with age, the authors found no such effect on implicit attitudes (Aubé et al., 2021). These findings highlight important differences in implicit and explicit attitudes toward peers with disabilities. But to our knowledge, no previous disability-related intervention study has examined changes in implicit attitudes. As argued by McManus et al. (2021), considering implicit attitude measurements in combination with explicit attitudes may be an especially valuable means of predicting and explaining findings related to disability interventions. Therefore, the present study aimed to assess the impact of a low-threshold, curriculum-based intervention programme on both students’ explicit and implicit attitudes toward peers with disabilities.

2 Methods

2.1 Sample

The study protocol was reviewed and approved by the local institutional ethics committee. A convenience sample of 24 primary school classes (3rd to 6th grade) from 11 different schools was recruited for the study, with 12 recruited to participate in the experimental group and 12 recruited as controls. All classes were regular, non-inclusive school classes, i.e., there were no included students with more severe physical, social-emotional or intellectual disabilities or receiving a special education plan. Control group classes were selected to match the experimental group classes, but the two groups nonetheless differed slightly regarding their social backgrounds (see below). Students’ parents were provided information about the study beforehand, and informed consent was obtained for 440 of the 482 students (= 91.3%) to participate in the study. At the beginning of all test sessions, students were informed verbally that they were participating in an experiment, about the content of the upcoming tasks, that participation was voluntary, and that they could withdraw at any time if they did not enjoy the experiment. The respective teachers provided books and drawing materials to provide alternative activities for non-participating students or in case of waiting time for participating students during the experiment. None of the students chose to withdraw from participation. After excluding students who did not complete both the pre- and post-intervention tests (e.g., due to sickness), the final analytical sample consisted of N = 384 students (nexp = 195 students, ncont = 191 students; mean age M = 10.6, SD = 1.22 years; 52% male).

2.2 Intervention

The intervention was based on existing teaching material, called the Prinzip Vielfalt in German or the Diversity Principle in English (Meyer et al., 2015). This is based on a printed booklet, a freely available web platform with teaching resources and a game app, which is available for free for iOS and Android. Prinzip Vielfalt aims to enhance children’s appreciation of diversity by developing four core topics: opinions (discussing the value of diversity), knowledge (identifying and overcoming stereotypes and prejudices), skills (learning to work cooperatively with all peers) and application (mastering tasks together) (Meyer et al., 2015). Accordingly, Prinzip Vielfalt is theoretically driven, using elements hypothesised to positively influence the cognitive, affective and behavioural components of attitudes (Eagly & Chaiken, 1993) and promoting notions of common humanity and inter-group similarities in feelings, needs and interests (Allport, 1954). Prinzip Vielfalt was specifically designed as self-explanatory teaching material to promote an appreciation of human diversity. To test its effectiveness, we used Prinzip Vielfalt to create an intervention with eight detailed, standardised lesson plans. The intervention is based on learning objectives formulated in the Swiss national curriculum (“Lehrplan 21”; develop a constructive, open-minded attitude toward human diversity). It uses teaching materials and instructions that teachers can easily adopt to facilitate the likelihood of implementation. During these eight lessons, students worked with and reflected upon the game app (where they learned to cooperatively solve tasks and riddles with four characters who had individual strengths and disabilities), learned about the automatic “inner images” (stereotypes) that we have of other people and how to challenge them, used role-play and imagined interactions to practise how to communicate with or include peers, and solved cooperative group tasks (see Table 1). The intervention was conducted by class teachers and lasted eight weeks, with one 45-minute lesson per week. Before applying the intervention, teachers in the experimental group were taught about Prinzip Vielfalt’s aims, learned about the core ideas of cooperative learning and familiarised themselves with each of the eight standardised lessons’ teaching materials, lesson plans and goals. The briefing for teachers lasted four hours. Teachers in the control group were advised to teach their standard programme. They were promised to receive the intervention material for free after the second (i.e., post-test) test session.

Table 1 Intervention content

2.3 Procedure

Students’ explicit and implicit attitudes toward peers with disabilities were assessed in two test sessions, each lasting 45 min. In the experimental group, these tests occurred one-to-three days before the intervention started and after the intervention finished. Students in the control group were tested at an interval of eight weeks but without the intervention. During the test sessions, school classes were split into halves. One half went into the computer room with two study team members to assess implicit attitudes toward peers with disabilities (Disability IAT). Students were seated individually in front of computers. One study team member gave detailed instructions about the test, and the other assisted students if necessary. The other half of the class stayed in the classroom and completed a pen-and-paper questionnaire monitored by a third study member to assess explicit attitudes toward peers with disabilities. After 20 min, the two halves switched rooms to take the other test.

2.4 Measures

2.4.1 Student questionnaire: explicit attitude measurement and control variables

Students filled out a questionnaire giving information about their sex, age, the country in which they and their parents were born, and their parents’ professions. Students’ attitudes toward peers with disabilities were assessed using an adapted short version of the Chedoke–McMaster Attitudes Towards Children with Handicaps scale (CATCH). The full-length CATCH scale was originally designed to measure the different components of attitude (i.e. cognitive, affective and behavioural) but has been shown to capture mainly a one-dimensional construct, which is adequately represented by its short version (Bossaert & Petry, 2013). We used the adaptation made by Schwab (2015), which incorporates four descriptions (vignettes) of children with disabilities (physical disability, learning disability, intellectual disability and ADHD) and six statements that students rated on a five-point Likert scale ranging from strongly disagree (scored − 2) to strongly agree (scored + 2). An example vignette was “Alex is new in town and attends the same school as you. Alex cannot walk. Alex is in a wheelchair” (physical disability). An example statement was “I would be happy if Alex lived next to me”. A fifth vignette involving a reference child without a disability was created to assess relative attitude judgements toward children with disabilities (i.e. to calculate an explicit bias score). Each vignette and its respective statements were read aloud one by one to the students by a study member, and the students made their judgements. The questionnaires existed in two forms (female and male student vignettes) and were gender-matched, i.e., names of the student vignettes were adjusted to match the gender of the student.

Each child’s responses to the five vignettes were averaged. The internal consistencies of each of the six statements were good across all the vignettes (the range of Cronbach’s alphas from pre-test to post-test was α = 0.89 to 0.94). To make the results on explicit and implicit attitudes better comparable (as the IAT uses a relative rather than absolute attitude measure), a relative explicit bias score was calculated (see Hofmann et al., 2005; Rae & Olson, 2017): the average score for the four disability vignettes was calculated (pre-test and post-test α = 0.83 and 0.84, respectively), and the score for the reference vignette without a disability was subtracted from this. Thus, positive values in the explicit bias score indicated an explicit preference for peers without disabilities, and negative values indicated an explicit preference for peers with disabilities.

The following control variables were included in our models to adjust for potential differences between the experimental and control groups: student sex (dummy coded, 1 = male), age (in years), immigrant background (dummy coded, 1 = the student or at least one parent was born in another country) and highest international socioeconomic index (HISEI) of parents’ status, as estimated based on students’ information on their parents’ profession (Ganzeboom, 2010).

2.4.2 Implicit attitude measure: the disability IAT

A Disability IAT for children was created specifically for this study. It followed the traditional IAT procedure’s rationale (Greenwald et al., 1998). The IAT assesses implicit biases by measuring the strength of associations between concepts. The strength of these associations is measured by requiring the same or different responses to the categorization of stimuli belonging to certain concepts (e.g., disability and non-disability related pictures, positive or negative words). If concepts are strongly associated and categorization requires the same response, responses are usually effortless. However, if incongruent concepts require the same response, cognitive interference occurs, leading to slower response times and potential errors (Greenwald et al., 1998). The Disability IAT was programmed using free, web-based, PsyToolkit 2.3.6 software (Stoet, 2010, 2017), and used pictures and words. The pictures comprised eight coloured drawings of children, two boys and two girls with disabilities (e.g. boy in a wheelchair, girl with Down syndrome) and two boys and two girls without disabilities. The pictures were used to represent the two categories of “typical” and “different” (to draw attention to how the children were categorised, not the disability). The category “different” was illustrated with a boy in a wheelchair, a boy with cerebral palsy, a blind girl, and a girl with Down syndrome. Before starting the IAT, all the children in the drawings were introduced to the students with a name and a category (e.g. “This is Tom. Tom is a quite typical boy” for a child without a disability, or “This is Mia. Mia is a bit different from the other children: she cannot see, she is blind”, for a child with a disability). A similar procedure was used to introduce the words to the students that represented negative (enemy, stench, betrayal or pain) and positive (flower, friend, present or party) content (these words were validated in previous studies using an IAT with a German-speaking sample, e.g. Lüke & Grosche, 2017). Because of the limited attention span of younger children, trial numbers of the IAT were reduced compared to the traditional IAT by 20% (Cvencek et al., 2011). The Disability IAT started with two practice blocks (one for categorising drawings of children as “typical” or “different” by pressing the left or right button, and one for categorising the positive and negative words, using 16 trials for each block). This was followed by two blocks (24 trials each) for categorising pictures and words together, with “typical” and “positive” sharing one response button and “different” and “negative” sharing another. This was followed by a practice block for categorising the drawings (32 trials). Two final blocks (of 24 trials each) followed, where the categories were mixed, with “different” and “positive” sharing one response button and “typical” and “negative” sharing another. Students were instructed to react as fast as possible without making errors.

Implicit preferences for children in the no-disability group (“typical”) over those in the disability group (“different”) were calculated using the algorithm proposed by Greenwald et al. (2003): Response latencies in error trials were replaced by the block’s mean response latency for correct trials, with an additional penalty of 600 milliseconds. Subjects with more than 10% of trials with latencies below 300 milliseconds were excluded from analysis (one case or 0.2%). Trials with missing responses were excluded (if a student did not respond within a time window of 10 s, the next trial started automatically). Next, mean latencies were calculated for each block. To calculate the standardised differences in response latencies across the critical blocks, the mean latencies from block 3 were subtracted from the mean latencies from block 6 and then divided by the pooled standard deviation of the response latencies for these two blocks. This procedure was repeated for blocks 4 and 7. The two resulting values were averaged to produce the implicit bias score (the so-called “D score”; Blanton et al., 2015; Greenwald et al., 2003), with negative values indicating an implicit preference for “typical” children and positive values indicating a preference for “different” children.

To assess the internal consistency of the Disability IAT, trials in blocks were split into four sub-blocks each. The aforementioned procedure for calculating the D score was then applied to each sub-block, and the internal consistency of the four D scores was assessed by calculating Cronbach’s Alpha (Williams & Steele, 2016). The internal consistency was adequate for the pre-test (α = 0.76), but slightly lower in the post-test (α = 0.66), a common finding when using IATs (Williams & Steele, 2016).

2.5 Data analysis

Changes in students’ implicit and explicit attitudes were assessed using hierarchical linear regression models to address data clustering (students in schoolclasses) and to predict post-test preference scores in the experimental group relative to the control group while adjusting for pre-test preference scores (i.e. baseline adjustment) and control variables. Prior to data analyses, predictor variables were centered at the grand mean (Enders & Tofighi, 2007). Further, assumptions for the use of hierarchical linear regression models (linearity, homoscedasticity of residual error variance, normally distributed residuals, and normal distribution of random effects) were checked.

Among students who participated in both the pre- and post-test measures, a significant amount of missing data (> 5%) was observed only for the socio-economic index (HISEI, 8.3% missing data), which was considered missing at random. Therefore, listwise case deletion was considered appropriate. However, analyses were also run without the HISEI on the total sample. Because this did not change results in any meaningful way, only results of the main analyses are reported.

The raw data, the data preparation- and analyses scripts, and a codebook of the prepared data set, have been published on the Open Science Framework (OSF) platform and can be retrieved under the following link: https://osf.io/y3zdf/.

3 Results

3.1 Descriptive results

Students performed well on the IAT with general response accuracy above 90%. The average response latency across the students for the first block with congruent trials was 1062 milliseconds, with 95% of all students having mean response latencies between 740 and 1576 milliseconds. Response accuracy and response times for individual blocks are shown in Table 2.

Table 2 IAT response accuracy and response times

The pre-test assessment showed no significant differences in implicit or explicit attitudes between the experimental and control groups (all p > .60; Table 3).

Table 3 Explicit and implicit attitudes: descriptive results

Nonetheless, in the pre-test, students in the experimental group were on average younger (Mexp= 10.32, SDexp = 1.17; Mcon= 10.84, SDcon = 1.22), were more likely to have an immigrant background (Mexp= 49%; Mcon= 37%) and had a lower socioeconomic status (Mexp= 43.3, SDexp = 16.6; Mcon= 50.9, SDcon = 18.0) than students in the control group (t-tests and chi-square tests all p < .05). There was no difference regarding sex (Mexp= 50% male; Mcon= 53% male). These variables were included in the subsequent regression models to adjust for the two groups’ sociodemographic differences.

The correlation between explicit and implicit bias was low and insignificant (r = .03, p > .05; for correlations of independent and dependent variables see Table 4).

Table 4 Correlation matrix of independent and dependent variables

3.2 Results of the intervention

Initially, students’ explicit attitudes toward children with disabilities ranged from neutral (child with ADHD) to slightly positive (child with a physical disability). However, these were negatively biased regarding the child without a disability, as indicated by the negative explicit bias score (Table 3). Although there remained a clear discrimination in attitudes, there was a significantly greater decrease in the experimental group’s explicit bias score after the intervention compared to the control group (Table 5). Compared to a model without the factor group, the ICC reduces from 0.056 to 0.039, meaning that 1.7% of the variance of post-test attitudes is explained by the intervention, which translates to a Cohen’s d of 0.26 and which can be considered a small effect. Among the control variables, only age had a significant influence on changes in students’ explicit attitudes: older students showed a greater decrease in explicit bias between the pre- and post-tests. This effect was not group-specific, as including an additional interaction term (group x age) did not yield a significant interaction (p = .41). Hence, this effect seemed to be an age-specific post-test effect unrelated to the intervention.

Table 5 Prediction of explicit and implicit post-test bias scores

As with explicit attitudes, implicit attitudes showed a strong negative bias toward children with disabilities (Table 3); however, there was no significant change in negative bias after the intervention (Table 5).

Because it was observed that explicit bias was slightly higher in students that took the explicit test after the implicit test (r = .16 and r = .15, p < .05, pre- and post-test), both models were additionally calculated controlling for test order, but this factor had no influence on the observed intervention effects.

4 Discussion

Students with disabilities face a higher risk of social exclusion than their typically developing peers, and negative peer attitudes may play an important role in this. The positive impact of using different intervention strategies to improve children’s attitudes toward their peers with disabilities is well documented (Armstrong et al., 2017; Chae et al., 2019; Lindsay & Edwards, 2013; McManus et al., 2021). However, many of these studies relied on self-reported measures of students’ explicit attitudes, assessed short-term effects only and failed to assess links to actual behaviour (Flórez García et al., 2009; McManus et al., 2021). They thus provided limited insights into the nature of attitude changes and intervention efficacy. Therefore, the present study explores as a novum whether a curriculum-based intervention to foster the appreciation of human diversity improves not only students’ explicit but also implicit attitudes toward peers with disabilities.

4.1 Changes in explicit but not in implicit attitudes

Students displayed both negative explicit and implicit bias regarding peers with disabilities before the intervention. After the intervention, students reported less negative bias toward their peers with disabilities than the control group. However, this was only true for explicit bias, as implicit bias did not change.

In contrast to explicit measures of attitudes, implicit measures capture automatic and spontaneous evaluations. These spontaneous evaluations are relatively stable, the result of long-term socialisation experiences (Gawronski & Bodenhausen, 2006) and are hard to change using brief interventions (Lai et al., 2016). The absence of any substantial change in implicit attitudes after the intervention indicated that peers with disabilities were still spontaneously negatively evaluated, even if self-reported attitudes became more positive. Gawronski and Bodenhausen (2006) explain this type of asymmetric change in explicit and implicit attitudes by their associative-propositional evaluation (APE) model: the intervention probably influenced the propositional validation process (i.e., whether automatic affective associations are found to be subjectively valid or not according to processes of propositional reasoning), but not so much the automatic associations themselves. This means the intervention mainly changed how students thought about their peers with disabilities. During the intervention, students learned about stereotypes, reflected upon human diversity, took different perspectives and learned to work cooperatively. This may have had a more substantial effect on the cognitive component of attitude (e.g. “People with disabilities have similar feelings, needs and interests to me” or “Excluding others is unfair”) but less so on the (automatic) affective component (e.g. “People with disabilities are nice and likeable”). Thus, improvements in students’ self-reported attitudes may have reflected their heightened awareness of stereotypes and prejudices—and their more deliberate efforts to overcome these—while spontaneous feelings remained unchanged and predominantly negative.

Nevertheless, less stereotypical or more positive explicit attitudes could lead to a greater willingness to make contact with peers with disabilities and to more positive feelings in the longer term. For example, greater efforts to overcome stereotypical attitudes could lead to more inclusive behaviour, heightening the chances for positive contact, which in turn strengthens trust, sympathy and reduces intergroup prejudice (Pettigrew & Tropp, 2006). However, interventions that mainly affect the deliberate, cognitive components of attitudes may also fail to have lasting effects if there are no opportunities for positive interactions. Attitudes and behaviours are considered to work bi-directionally (Holland et al., 2002); if no positive contact occurs and negative feelings remain, students will probably quickly fall back to stereotypical thinking. This could explain why studies investigating the longer-term effects of interventions often fail to find any significant effects on attitudes (e.g. Godeau et al., 2010).

Thus, although it is encouraging to find that students’ attitudes toward peers with disabilities can be positively affected by a low-threshold, curriculum-based intervention, questions remain about how much such changes reflect “real” attitude changes among children, how long such effects might last and how much they impact students’ willingness to include their peers with disabilities in social activities. The fact that implicit attitudes were unaffected by the intervention indicated that affective-evaluative associations remained negative. Changes in implicit attitudes probably need more time to bear fruit, with stronger or repeated interventions and increased positive contact. In natural settings (e.g. observational studies with a minority group and a majority group of students in a high school), studies have demonstrated that over longer periods, implicit attitudes are influenced by both the quantity and quality of contact (Shook & Fazio, 2008; Vezzali et al., 2023). Considering these aspects, demonstrations of robust changes in implicit attitudes after interventions could be a sufficient (although not necessary) condition for inducing successful changes in attitudes and behaviours.

4.2 Limitations

The present study was one of the few to have assessed children’s implicit attitudes toward peers with disabilites and the first to assess changes in those attitudes after a curriculum-based intervention programme. It nevertheless had some limitations. First, allocations to experimental or control groups were not completely random, as participating classes were drawn from a convenience sample. Although analyses were adjusted for differences in students’ background variables, this aspect may limit the generalisability of the intervention’s effects. Second, although, at the group level, implicit attitudes remained remarkably stable in both the intervention and control groups, pre- and post-test correlations of implicit bias scores were low (r = .23, p < .001) compared to explicit bias scores (r = .62, p < .001), making it difficult to detect smaller, intra-individual intervention effects. This finding is similar to other studies (e.g., Rae & Olson, 2017, assessed implicit racial attitudes in children across a period of 1 month and reported a test-retest reliability of r = .25). Low pre-test to post-test reliability is a common critique of implicit association tests, and although there are techniques for improving this (e.g. using more test trials or administering multiple IATs; Greenwald et al., 2022), they were not feasible in this study. Nevertheless, post-hoc analyses of achieved power indicated that test power to detect even a small intervention effect (f = 0.10) was high (= 88%) due to the sample size. Third, a limitation of the IAT is that it measures only the relative strength of associations between concepts, not absolute measures of attitudes. In this study, explicit peer attitudes were calculated to reflect a similar relative bias (i.e., preference of one group over the other). While this is not a limitation per se, it is important to note that these bias scores do not necessarily reflect negative attitudes (both could theoretically be positive, as indicated by the absolute values of the explicit attitude measure in Table 3). Therefore, future studies could also incorporate variants of other implicit measures, such as the Single-Target Implicit Association Test (ST-IAT; Bluemke & Friese, 2008) to assess absolute changes in attitudes. Fourth, although efforts were made to maximize comparability between explicit and implicit attitude measures, the disability categories of the IAT could not be precisely matched to the vignettes of the CATCH due to challenges in depicting all disabilities equally well (e.g., ADHD). Fifth, implementation or treatment fidelity was not systematically assessed in this study, although this is an important aspect to ensure that the intervention was carried out as intended. Similarly, potentially moderating variables such as teacher attitudes toward students with disabilities or students contact with peers with disabilities were not assessed, which could have a significant influence on intervention effects and which future studies could further investigate. Finally, the present study did not assess longer-term intervention effects on attitudes or on actual student behaviour, aspects which should be incorporated into future studies because they form the broader impact goals of intervention studies and they would reveal the development and patterns of interactions between explicit attitudes, implicit attitudes and actual student behaviour.

4.3 Conclusions

Although the present study found positive effects following a low-threshold, curriculum-based intervention to change students’ explicit attitudes toward their peers with disabilities, no changes were found in their implicit attitudes. Unlike most other evaluations of intervention studies on children’s attitudes, the present study widened the focus and considered explicit and implicit attitudes. This was important, as it enabled a better understanding of how propositional and associative processes were influenced. Additionally, further consideration of measures of implicit attitudes could provide insights into which forms of intervention (e.g. direct interaction, indirect interaction, information-based approaches or a combination) are the most effective for promoting positive student attitudes and lead to better acceptance and social participation of peers with disabilities in inclusive education settings. Demonstrating changes in both explicit and implicit attitudes would significantly increase the robustness of arguments in favour of intervention effects. Finally, as implicit attitudes tap into different domains than self-reported attitudes, are less prone to social desirability bias and predict behaviours going beyond explicit attitudes, the concurrent use of explicit and implicit attitude measurement instruments could be complementary in predicting the longevity of intervention effects and the effects on behaviour themselves (i.e. the social inclusion of peers with disabilities).