1 Introduction

Since the adoption of the Convention on the Rights of Persons with Disabilities (United Nations, 2006), many countries have undertaken to implement an inclusive school system. Although inclusion affects all students, the political commitment to inclusion has, in particular, brought students with special educational needs into mainstream classrooms (Savolainen et al., 2022). However, under the policy of inclusion, the Dilemma of Difference (Norwich, 2008, 2014) remains a persistent problem. This term describes the controversy that legal protection, entitlements, special accommodations and special education services are invariably attached to an official diagnosis of special educational needs — a label that comes with the risk of stigmatisation. The failure to issue a diagnosis poses the risk that the students will not receive the required teaching (Hornby, 2015). The stigmatisation of students with special educational needs can manifest itself in different ways. Shifrer et al. (2013) and Shifrer (2016), for example, demonstrate that students who are labelled with a learning disability (LD) enrol in less demanding high school courses than similarly performing students without the LD label. They argue that this outcome is partially caused by lower teacher expectations that are rooted in stigmatisation processes. Our study also focuses on teachers’ performance expectations in the context of the LD label. Specifically, we aim to examine whether the expectations of regular and special education teachers concerning low-performing students decrease in the presence of the LD label.

Teachers’ expectations are defined as the “inferences that teachers make about the future behaviour or academic achievement based on what they know about these students now” (Good, 1989, p. 32). They are important because they can influence students’ academic performance. Research spanning several decades offers evidence regarding the existence of the self-fulfilling prophecy in the school context (e.g. de Boer et al., 2018; Gentrup et al., 2020; Papageorge et al., 2020; Ready & Wright, 2011; Rubie-Davies, 2007, 2010). Meta-analyses have shown small (Jussim & Harber, 2005) to moderate (Hattie, 2009) effects of teachers’ expectations on students’ performance.

Wang et al. (2018) propose three fundamental mediating mechanisms that explain how teachers’ expectations influence students’ performance. Firstly, teachers behave differently based on their expectations; secondly, students perceive teachers’ expectations through differential teacher behaviour and treatment; and thirdly, students’ socio-psychological factors work as a mediator for the effects of teachers’ expectations of students’ academic achievement, in the sense that teachers’ expectations may influence and alter students’ academic beliefs and motivation.

Several studies have provided evidence for the first mechanism, which shows that teachers treat students differently when they have high or low expectations for a student (e.g. Chen et al., 2011; de Boer et al., 2010; Ready & Chu, 2015). For example, de Boer et al. (2010) showed that negatively biased teachers’ expectations have a negative influence on students’ performance, while positively biased expectations have a positive influence. Evidence for the second mechanism has been found in studies indicating that students are already able to perceive differential treatment by teachers at a young age (e.g. Babad et al., 1991; Babad & Taylor, 1992; Brattesani et al., 1984; Weinstein et al., 1982). For example, Babad et al. (1991) demonstrated that after just ten seconds of observing or hearing a teacher, grade four and older students were able to detect whether the teacher was talking about or to a high- or low-performing student.

Studies about socio-psychological factors, the third mechanism, have examined the influence of teachers’ expectations on students’ beliefs and motivations, indicating that among others, the self-concept (e.g. Benner & Mistry, 2007; Chen et al., 2011; Friedrich et al., 2015), self-expectation (Kuklinski & Weinstein, 2001; Shifrer, 2013), self-efficacy (Chen, 2006; Karwowski et al., 2015; Tyler & Boelter, 2008), and motivation (Gilbert et al., 2014; Woolley et al., 2010) of a student can be influenced by teachers’ expectations. For example, Chen et al. (2011) showed that Taiwanese primary teachers give more positive oral feedback to students for whom they hold high-performance expectations than those for whom they hold lower expectations. The effect was found to exist for both academic and non-academic feedback. The authors also demonstrate that Taiwanese primary students who perceive more positive academic feedback have a higher self-concept, particularly a higher academic self-concept.

2 Theoretical and empirical background

2.1 Teachers’ expectations – a matter of the social group?

Acknowledging the importance of teachers’ expectations, the question arises as to how they are formed. While there is evidence that teachers generally predict a student’s ability and performance on the basis of their previous academic performance (Hoge & Coladarci, 1989), other variables also influence teachers’ performance expectations. One of these other variables is the group a student is (believed to be) part of. Many studies give evidence that teachers’ expectations can be systematically biased for different groups of students and are lower for minority ethnicities and students with lower socioeconomic status (e.g. Lorenz et al., 2016; Sorhagen, 2013; Tenenbaum & Ruck, 2007; Tobisch & Dresel, 2017; van den Bergh et al., 2010). Jussim and Harber (2005) summarise that “students from privileged socio-demographic backgrounds may consistently benefit from high teacher expectations, whereas those from culturally stigmatised backgrounds may be consistently undermined by low teacher expectations” (p. 147). For example, Papageorge et al. (2020) showed that teachers are overly optimistic that students will be able to complete a four-year college degree. However, this over-optimism of Caucasian teachers was significantly smaller for students of colour in comparison with Caucasian students.

Another group of students for whom teachers have lower expectations are students diagnosed with special learning needs. This group actually carries a threefold risk for lower teachers’ expectations, as it is often low-achieving, comes from families with lower socio-economic status (Shifrer et al., 2011), and has a migration background (Gabel et al., 2009; Pit-Ten Cate & Glock, 2018; Sanatullova-Allison & Robinson-Young, 2016). The explicit naming of the diagnosis appears to play a special role in the context of teachers’ expectations. For example, Thelen et al. (2003) investigated the influence of special education labels using vignettes with students described as having a learning disability, mild mental retardation (sic!), emotional disturbance, and students without any label. The authors found that when the fictitious students were labelled, teachers’ behavioural and academic expectations were lower than they were for the non-labelled counterpart. In the following section, we summarise the results of studies that have examined the influence of the LD label on teachers’ performance expectations.

2.2 Learning disability labels and teachers’ academic performance expectations

Several experimental and representative longitudinal studies have investigated the effects of learning disability labels on teachers’ performance expectations. Starting with the experimental studies using vignettes for fictitious students with or without LD, Clark (1997) examined primary teachers’ attributions and ratings of academic failure for fictitious male students. The results showed that teachers were more likely to expect future academic failure if the students were labelled. This effect was particularly strong when the student was described as highly capable and putting a high degree of effort into their work. Woodcock and colleagues used adapted versions of Clark’s (1997) vignettes in a series of studies. They found a negative effect of the LD label on the expectations of students’ future academic failure in pre-service primary teachers in Australia (Woodcock & Vialle, 2011), trainee primary teachers in Canada (Woodcook, 2014), and pre-service secondary school teachers in Australia (Woodcock & Vialle, 2010). The effect was also found for the label Specific learning disability in trainee teachers in Australia (Woodcock, 2013) and for the label Specific learning difficulties in pre-service teachers (Woodcock & Vialle, 2016) and secondary teachers in the UK (Woodcock & Hitches, 2017).

Fewer studies have looked at the role of the students’ gender in this issue. In Tournakis’ (2003) study, the outcomes were varied when no explanation for a delay in reading two years below grade level was given or learning difficulties or a learning disability were mentioned as the cause. Among the students labelled with delayed reading, teachers only predicted less academic success for female students and not for male students. The author assumes that because more boys than girls are labelled as learning disabled, the teachers might judge the girls’ reading problems as more vital. The negative effect was also found among university teachers for fictitious 18-year-old students (Minner & Prater, 1984). These teachers had the highest academic expectations for a non-labelled, positively described student (regarding grades, attendance and extracurricular activities), followed by the non-labelled, negatively described student, followed by the positively described LD student. The lowest expectations were held for the negatively described LD student, by contrast.

In addition to the experimental studies using vignettes, further evidence of the negative influences of the LD label can be found in longitudinal studies. Shifrer (2013), who used data from the Educational Longitudinal Survey of 2002 (USA), found that among adolescents who are similar in terms of their social and academic background, early high school achievements, attitudes and behaviours, the LD label significantly lowered teachers’ graduation expectations. The labelled adolescents were 82% less likely expected to achieve a bachelor’s degree or higher education. Furthermore, the label lowered the expectations of both their parents and themselves. Similar results were obtained from the Canadian National Longitudinal Survey of Children and Youth data, showing that teachers have lower expectations of students with LD status achieving the highest educational level than they do for non-labelled students with a similar academic performance (Whitley, 2010).

2.3 Expectations of regular school teachers in comparison with special education teachers

The above-mentioned studies primarily investigated the expectations of regular education teachers (RETs). The results and assumptions concerning the performance expectations of special education teachers (SETs) are fewer and rather ambivalent. On the one hand, two studies indicate that SETs have higher expectations and attribute fewer achievement problems in school to the learner (Vlachou et al., 2014; Woolfson et al., 2007). On the other hand, many SETs have so much patience for their students with disabilities that it “often translates into slowed-down instruction, lowered expectations for performance, fewer opportunities to respond, and fewer in-class and homework assignments” (Heward, 2003, p. 94). Two qualitative studies support these assumptions: Arabsolghar and Elkins (2000) observed that 31 SETs have relatively low expectations regarding the memory skills of their students with intellectual disabilities. Moscardini (2015) interviewed twelve teachers in special primary schools for children with moderate learning difficulties and found that most teachers had low expectations regarding their students. These results therefore support the assumption that SETs generally have lower performance expectations than RETs, and that the LD label lowers the academic performance expectation of SETs even more compared with RETs. Research to test this assumption is lacking, however.

2.4 Stereotypes and attribution processes as possible causes

Both stereotypes and attribution processes may help to explain why the LD label lowers teachers’ expectations. As a rule, labelling and stereotypes are closely linked. Link and Phelan (2001) distinguish between four components that lead to the stigmatisation of the labelled individuals. Firstly, people distinguish between humans and label the differences. Secondly, the labelled individuals are linked to undesirable characteristics – the negative stereotypes. Thirdly, the label facilitates placement in clear categories between them and us, and fourthly, the labelled individuals experience a loss of status and discrimination, which leads to unequal outcomes between the labelled and non-labelled individuals.

Focusing on the second component, social categorisation and the stereotype content model give further insights into the connections between labelling and stereotypes. Stereotypes are viewed as the cognitive component of attitudes (Eagly & Chaiken, 1993). As a part of attitudes, they can elicit positive or negative expectations and judgements (Krischler et al., 2018), and are therefore relevant in the context of teachers’ expectations.

But how are stereotypes activated? The continuum model of impression formation describes impression formation as a gradual succession with different processing depths on a continuum between category-based to individuation processes (Fiske, 1993; Fiske et al., 2018; Fiske & Neuberg, 1990). According to the model, immediate and automatic reactions are category-based and can be distinguished from individuating processes that occur under certain circumstances (Fiske, 1993; Fiske et al., 2018). It is assumed that category-based processes are used when resources are scarce, e.g. under time pressure or when cognitive capacities are low (Dijksterhuis, 2010; Macrae, Stangor, & Milne, 1994). Also, attention allows the perceiver to accumulate more information, which consequently leads to the confirmation of the initial category, the subcategorising or the individuating of the target person (Fiske, 1993; Fiske et al., 2018).

One explanation for the negative effects of the LD label on teachers’ expectations is that the label facilitates social categorisation. Specifically, it may function as an inhibitor for impression formation beyond the initial social categorisation. As teachers are aware that an LD diagnosis requires a diagnostic process, they possibly assume that other professionals have already carried out a thorough investigation which led to the diagnosis. This means that they might not see the need to pay more attention to the individual characteristics of the labelled student.

The above-mentioned problems regarding cognitive capacities may be relevant as well. Teaching demands both cognitive and emotional resources (Simbula, 2010) and is a stressful profession (Harmsen et al., 2018; Johnson et al., 2005; Kyriacou, 2001). Among others, stress is caused due to the considerable demands of the job, students’ misbehaviour, poor working conditions, role conflicts and poor relationships at work (Harmsen et al., 2018). As stress is known to have negative effects on cognitive functioning (Boals & Banks, 2012, 2020; Klein & Boals, 2001), in such situations, teachers may lack the cognitive capacities for further individuation processes. Indeed, stereotypes have been found to save cognitive energy and free up resources for other tasks (Macrae et al., 1994). Thus, they are more likely to be activated when the cognitive load is high (Gilbert & Hixon, 1991; Griffiths et al., 1993). However, even though social categorisation helps free resources and is a functional process (Park & Judd, 2005) it is also problematic as it can lead to stereotype activation (Dijksterhuis, 2010), which is particularly controversial when the content of the stereotype is harmful.

The stereotype content model (Cuddy et al., 2008; Fiske et al., 2002) proposes the two dimensions of warmth and competence that form peoples’ impressions of others, stating that both traits can be perceived as low or high. It is therefore possible to distinguish between four different types of prejudice: contemptuous prejudice when both warmth and competence are perceived as low (e.g. welfare recipients), admiration when both dimensions are high (e.g. in-group members), envious prejudice when competence is perceived as high but warmth is perceived as low (e.g. rich people), and paternalistic prejudice when competence is perceived as low but warmth is perceived as high (e.g. persons with a disability). Each combination is associated with different emotions (e.g. pity and sympathy for paternalistic prejudice). These emotions mediate the stereotype to behaviour connection (Cuddy et al., 2007).

Krischler et al. (2018) and Krischler and Pit-Ten Cate (2019), who examined the stereotype content model for students with learning difficulties, showed that teachers perceive this group of students as low in competence and neutral in warmth. They used a vignette to describe a fictitious student with learning difficulties, but did not explicitly mention a diagnosed learning disability. It is possible that the presence of the LD label as an official diagnosis causes the competence to be perceived as even lower, which would therefore lead to lower performance expectations. Groups who are perceived as warm but incompetent elicit pity and sympathy (Eckes, 2002; Fiske et al., 2002; Glick & Fiske, 2001).

This is in line with findings stemming from attribution research, indicating the fact that pity is usually directed to people with stigmata that they are not responsible for (Cuddy et al., 2005; Weiner et al., 1982). Attribution theory (e.g. Weiner, 1979, 2010) explains how individuals perceive the cause of other individuals’ success or failure. Assessing the cause, the combination of the three dimensions of causality – locus, stability, and controllability – leads to the distinct responses of the attributing person. When students fail in school and their teachers attribute their failure to a lack of effort, a controllable cause, the student can be held responsible for the negative outcome. This usually evokes the teacher’s anger. However, if the cause of a performance failure is perceived as a lack of aptitude, the student cannot be held responsible for the incident, leading to a feeling of sympathy. These two different emotional responses lead to different behaviours. Condemnation or reprimand, among others, are typical responses to anger, while sympathy evokes behaviour such as help, and not reprimand (Weiner, 2000).

There is evidence that the LD label encourages an internal, stable and uncontrollable attribution of causes to learning difficulties. Experimental studies comparing the emotional response of teachers towards fictitious students who had failed a test indicate different feelings toward LD students compared with their non-labelled counterparts (Woodcock & Vialle, 2010, 2011). Generally, teachers give more positive feedback and have higher levels of sympathy for students described as having low abilities compared with students who have high abilities. However, the LD label also influences the emotional response: the least frustration, greatest sympathy and most positive feedback is given to students of low ability who were described as showing high effort and were labelled as learning disabled. Clark (1997), who demonstrated that teachers generally reward boys with an LD more than their non-labelled counterparts and feel less anger and more pity after they fail a test, suggests that the reason for this might be a “norm to be kind” (Weiner, 1986, p. 146) to individuals with disabilities. Following Weiner’s (2000) attribution theory, the emotional response and the given feedback support the hypothesis that a learning disability is viewed as an uncontrollable, internal and stable cause for failure. If a teacher’s response to a learning disability is more pity and sympathy and less anger or frustration, it indicates that an LD is considered internal, stable and uncontrollable. The logical consequences of this attribution of a learning disability are lower expectations for the further academic performance of students who are diagnosed as learning disabled.

2.5 The term learning disability in the international context

To summarise, teachers lower their performance expectations for LD labelled students, and SETs might be especially prone to these adverse effects. Stereotype activation as well as internal and stable attributions of a LD can explain the adverse effects. However, these label effects have, until now, mostly been studied in English-speaking countries. We aim to explore whether the label effects are specific to English-speaking countries or if they are attached to learning disability labels in general.

In order to do so, it should be noted that the meaning of the same diagnostic terms in special education may vary from country to country, as the classification systems in each country are socially and historically determined (Powell, 2010; Sideridis, 2007). Therefore, from an international perspective, the meaning of an LD can differ strongly between cultures, countries, and school systems. It ranges from a specific disorder characterised as a specific academic problem despite at least average intelligence, to a condition that could also be described as a mild intellectual disability (Grünke & Cavendish, 2016). Even within English-speaking countries, the definitions of an LD and a specific learning disability (SpLD) differ. In the United Kingdom (UK) for example, an LD includes the presence of three core criteria: a significant impairment of intellectual functioning (IQ < 70), a significant impairment of adaptive/ social functioning, and an age onset during the formative years, before adulthood (Gates & Mafuba, 2016). However, these criteria are very similar to those that define an intellectual disability (ID) in the USA (American Association on Intellectual and Developmental Disabilities, 2022).

Our study examines the effect of the German label Lernbehinderung (LB), which literally translates to learning disability and is a legal school term to describe students who receive special education support in the category of learning (Gold, 2018). Even though there is no generally accepted definition, the following criteria are usually applied: the deficits (a) cover at least two school years, (b) affect several subjects, especially mathematics and language arts, (c) persist for several years, (d) are not the result of insufficient learning opportunities, but related to deficits in basic cognitive abilities, and (e) are not caused by a sensory impairment (Grünke & Grosche, 2014).

Students who have school subject-specific deficits are diagnosed with specific learning disorders such as dyslexia or dyscalculia (American Psychiatric Association, 2022, DSM-5). However, in the German school system, these categories do not qualify for special education services, but fall under the responsibility of regular education.

2.6 Research questions and hypotheses

We aim to answer three research questions. Firstly, what are the LB label’s positive and negative effects on prospective teachers’ academic performance expectations for students with learning difficulties? On the basis of the previous findings about the influence of the LD label on teachers’ expectations, we hypothesise that teachers have lower performance expectations for a fictitious student when the student is labelled as learning disabled and that these lower expectations are expressed in lower performance requirements:

H1a: In the short-term, teachers assign fewer and easier tasks to the labelled student.

H1b: In the long-run, teachers foster less academic performance, expect a lower graduation level, and recommend a special school more frequently for the labelled student.

Secondly, we aim to explore the role of the professional group regarding the performance expectations for students with learning difficulties. Based on the assumptions of Heward (2003) and the results of qualitative studies, we hypothesise that SETs have lower expectations for the fictitious student than RETs:

H2a: In the short-term, SETs assign fewer and easier tasks to the student than RETs.

H2b: In the long-run, SETs foster less academic performance, expect a lower graduation level, and recommend a special school more frequently than RETs.

Thirdly, we aim to explore whether the LB label has a differential effect on prospective SETs versus RETs. We hypothesise an interaction between the experimental condition and the study programme. Specifically, we believe that the LD label lowers the expectation of SETs even more compared with RETs, resulting in the lowest expectations from SETs for the labelled student:

H3a: In the short-term, SETs assign the fewest and easiest task for the labelled student.

H3b: In the long-run, SETs pursue the lowest fostering of academic performance, have the lowest expected graduation level and give the most recommendations for a special school for the labelled student.

To summarise, we use a classical experimental approach that examines the main effect of the LB label (question 1), the main effect of the professional group (question 2), and the interaction of these two variables (question 3) on teachers’ expectations.

3 Method

3.1 Participants

We conducted an experimental online study in the spring of 2021. The participants were undergraduates at German universities qualifying to become either RET or SET. The participants were recruited in two ways: in the first step, lecturers from various universities who work in teacher education were asked to promote the study to their undergraduates. The study was then advertised in various online-groups for prospective teachers. Of the 552 participants who initially started, 352 (63.9%) completed the study. Of the participants who did not finish the study, 79.5% stopped relatively early before the first dependent variable was assessed. Of the remaining 352 participants, 72 participants (20.5%) were excluded because they had not read the vignette carefully and gave wrong answers to one or more of the four comprehension questions. These questions are presented in Sect. 3.3.2. Two participants (1.3%) were identified as multivariate outliers (Mahalanobis distance, df = 23, critical Chi-square value of 49.73 at α = .001). After the examination of these cases, one participant was excluded due to a conspicuous response pattern (i.e. choosing the same answer for most of the questions and a combination of a special school as the track recommendation and the highest academic track for the expected graduation level; see Sect. 3.3.2 for the explanation of this variables).

It had been planned to exclude participants who had not passed the manipulation check (see Sect. 3.3.3): at the end of the study, the participants were asked to remember the description they read in the beginning and to answer whether it had been explicitly mentioned that the student has an LB. However, more than half of the EG participants (56.2%) incorrectly answered that the label had not been mentioned in the description. Rather than excluding these participants, we decided to split the EG into two groups: while EG1 included the participants who had read the vignette with the LB label but answered incorrectly that the label had not been present, EG2 included the participants who correctly recalled the LB label in the vignette. This procedure only excluded three participants (1.1%) in the CG who answered incorrectly that the LB label had been present in the vignette. This approach differs from the traditional procedure to exclude participants who failed the manipulation check. Because our manipulation (see Sect. 3.3.1) was very subtle, it may be presumed that some participants would not remember whether the LB label was used in the vignette or not, especially after some time had passed since reading the vignette. By keeping these participants in the study, we were able to examine whether the LB label has an effect even when it is not consciously remembered. Also, had we excluded these participants the results comparing the CG and EG2 would have been the same, so the decision to incorporate a second EG increases the amount of information that can be drawn from the study.

Table 1 shows the composition and demographic information of the final sample. It included 276 participants (85.1% women, 11.6% men, 0.03% diverse/ not specified) of which 114 (41.3%) were in the CG, 92 (33.3%) in EG1 and 71 (25.7%) in EG2. The distribution of prospective RETs and SETs did not differ between the three experimental groups (χ² = 0.02, df = 2, p = .99, Cramer’s V = .01). The average age of the participants was 22.6 years (SD = 3.8), and they had completed 3.8 (SD = 3.0) semesters in their current programme. RETs (M = 2.8, SD = 6.5) had significantly more practical experience in non-inclusive regular schools than prospective SETs (M = 1.2, SD = 3.6); and SETs (M = 7.3, SD = 13.9), had more experience in special schools than RETs (M = 0.1, SD = 0.6). Inferential statistics can be found in an electronic supplement.

Table 1 Composition and demographic information of the final sample

3.2 Procedure

After consent was given, the participants were randomly assigned to the initial CG or EG, where they had to read a vignette about a student named “L.” Immediately after, the participants answered four simple content questions about the vignette to control for careful reading. After assessing the dependent variables, the study finished with four manipulation check questions, the collection of demographic data and open feedback. Except for the open feedback, it was not possible to skip the other items.

3.3 Material

3.3.1 Independent variables/ manipulation

The participants were presented with a vignette describing a student with learning difficulties. The student description was adapted from Lanfranchi and Jenny (2005), and reviewed by three special education teachers working in schools for social validity. Information about L.’s gender and other information that might potentially influence the participants’ answers, such as support from family or learning motivation, was not given. With the exception of the LB diagnosis, which was added in the former EG, the vignette read by the CG and (EG) participants was as follows:

L. is ten years old (and has a learning disability). A few weeks ago, L.’s fourth year of school started. Since the first grade, L. has shown learning and performance problems in reading, writing, and mathematics which have increased over time. L. works very slowly, is quickly overwhelmed and appears helpless. Once taught, L. struggles to remember things in the long term. For example, L. is unable to remember calculation methods that the teacher had practised intensively with L. the day before. L. calculates by counting in the number range up to 100 and often uses fingers. Subtraction tasks are particularly difficult for L. In addition, L. reads practised words very slowly and often by stringing letters together. L. also makes a lot of mistakes when writing.

In addition to the manipulation of the vignette, the instruction that the participants read was manipulated in the CG and (EG):

You should now work on a few tasks. In this context, imagine that you are the class teacher in the class of (the learning disabled child) L.

3.3.2 Comprehension questions and dependent variables

Comprehension questions:

To exclude participants who had not read the vignette carefully, we asked the following four questions. Firstly, how old is L., with answers between nine and eleven years being accepted as correct. The next three questions were single choice items, with the correct answer being underlined. Question two asked about the subjects in which L. has difficulties (in mathematics/ in language arts/ in mathematics and language arts/ in science), and question three asked which of the two statements was true: L. has only recently started having learning and performance problems or L. has had learning and performance problems since the first grade. The last question was which tasks L. finds the most difficult (addition tasks/ subtraction tasks/ L. does not have problems in mathematics).

We collected data for four dependent variables that are relevant in the context of the teachers’ expectations.

Choice of tasks for L.:

We asked the participants to assess which type of tasks they would assign to L. in terms of their difficulty and number in a typical mathematics and language arts class, with students working on individual tasks. The four items were assessed on a 5-point Likert scale. For the difficulty of the tasks, the scale ranged from 1 (tasks with the lowest difficulty that the lowest-performing students work on) to 5 (tasks with the highest difficulty that the best-performing students work on). For the number of tasks, 1 indicated the lowest number of tasks that the lowest-performing students work on, and 5 indicated the largest number of tasks that the best-performing students work on. The procedure was repeated for the language arts class. The four items were combined into one scale (α = .84).

Fostering of wellbeing vs. performance for L.:

The participants were instructed that the academic performance and wellbeing of students can be defined as different goals in class. Next, they were asked which goals they pursue to support L. The item was assessed on a 6-point Likert scale, with 1 indicating exclusive fostering of wellbeing and 6 indicating exclusive fostering of academic performance.

Track recommendation for the secondary school for L.:

In the German school system, students are segregated into different types of secondary schools relatively early. In most federal states, this takes place after four years of primary school. For the transition, class teachers give a (in some federal states binding) track recommendation at the end of primary school. Traditionally, the regular secondary school system has been three-tiered, consisting of basic, intermediate, and academic track school. In recent years, the number of comprehensive schools has been growing, and replacing basic and intermediate track schools. However, many comprehensive schools also offer academic track education and thus combine all the levels of secondary education. Apart from the regular education system, special schools also exist (Autorengruppe Bildungsberichterstattung, 2020). In our study, the participants were informed that as the class teacher, they must give a track recommendation for L. for secondary school. The choice was between a special, basic, intermediate, academic or comprehensive school.

Highest expected level of graduation that L. can achieve:

Traditionally, every secondary school track leads to a different graduation level which can be hierarchically ordered. The participants were asked to estimate the highest graduation level that L. would be able to achieve. The item was assessed on a 5-point Likert scale, with 1 indicating no graduation, 2 indicating the graduation level that children who were classified as learning disabled can legally achieve, 3 representing the basic track graduation, 4 representing the intermediate track graduation, 5 indicating the lower academic track graduation, and 6 meaning the highest academic track graduation. Both academic track graduations are a condition for enrolling in academic tertiary education. The lower academic track graduation allows a student to apply to universities of applied sciences, and the highest academic track graduation allows a student to apply to all types of universities (Autorengruppe Bildungsberichterstattung, 2020).

3.3.3 Manipulation check

Participants were asked to remember the vignette presented at the beginning of the study and to answer if they think that L. has an LB (question 1) and whether it had been mentioned in the vignette that L. has an LB (question 3). These two questions had to be answered with yes or no. For both questions, a confidence rating (questions 2 and 4) was requested on a 5-point Likert scale ranging from 1 (very unsure) to 5 (very sure).

3.3.4 Statistical analyses

ANOVAs using Type III Sum of Squares were conducted for all the dependent variables (except for the track recommendation). Levene tests showed that the assumption of homogeneity of variances was not rejected for any of these variables. For the track recommendation, binomial logistic regressions were carried out. This is because although educational tracks in Germany can generally be ranked hierarchically according to their academic requirements, the above-mentioned comprehensive school cannot be positioned anywhere in this order, as it is open to students of all academic abilities.

Because there were differences between the duration of practical experiences in non-inclusive regular schools and special schools between the prospective RETs and SETs, additional ANCOVAs and logistic regressions were conducted, controlling for practical experiences. However, as the ANCOVAs did not alter the overall results and the distribution of both prospective RETs and SETs was spread equally over the three experimental groups, only the ANOVA results without the practical experiences as a covariate are reported.

Analyses were conducted with R version 4.1.1 (R Core Team, 2021), in R Studio version 2022.7.1.554 (RStudio Team, 2022). We used the afex package version 1.0–1 (Singmann et al., 2021), and the tidyverse package version 1.2.1 (Wickham et al., 2019).

4 Results

The mean values and standard deviations for the fostering of L., the choice of tasks for L., and the highest expected graduation level for L. are presented in Table 2. As assumed, most of the expectations for future academic performance were below average. Table 3 shows the percentages of the track recommendation for each type of school, and Table 4 provides the answers to the first manipulation check question. In Table 5, the intercorrelations of the demographic and dependent variables across all experimental groups are provided. The intercorrelations separated by the experimental group are provided in the electronic supplement.

Table 2 Descriptive statistics (means and standard deviations) for fostering of L., choice of tasks for L., and highest expected graduation level
Table 3 Percentage table for the track recommendation
Table 4 Percentage table for manipulation check question 1: Do you think that L. has a learning disability?
Table 5 Intercorrelations of the dependent and demografic variables

Table 6 presents the results of the ANOVAs with the choice of tasks, the fostering and the highest expected graduation level as dependent variables. The results of the binomial logistic regressions for the track recommendation are given in Table 7.

Table 6 Results of ANOVAs for fostering, choice of tasks, and highest expected graduation level
Table 7 Results of logistic regressions for track recommendation and manipulation check question 1

Choice of tasks for L.:

There was no effect of the experimental group, F(2,270) = 0.30, p = .742, ηp2 = .002, or the study programme, F(1,270) = 3.63, p = .058, ηp2 = .013. However, we found a medium-sized significant interaction effect of the two variables, F(2,270) = 3.91, p = .021, ηp2 = .028. Post-hoc analyses showed that prospective RETs (M = 2.33, SD = 0.61) only assigned a significantly higher number and more difficult tasks to L. in the CG (p = .003) than SETs (M = 1.94, SD = 0.65). In both EG1 (p = .316) and EG2 (p = .145), SETs and RETs did not differ in their choice of tasks.

Fostering of wellbeing vs. academic performance for L.:

The analysis revealed a small main effect of the study programme for the fostering of wellbeing vs. academic performance for L., F(1,270) = 3.99, p = .047, ηp2 = .015, with prospective RETs (M = 3.51, SD = 0.65) supporting the academic performance more than prospective SETs (M = 3.36, SD = 0.67), who foster the wellbeing more. There was no main effect of the experimental group, F(2,270) = 0.27, p = .736, ηp2 = .002, or interaction of the two variables, F(2,270) = 1.39, p = .198, ηp2 = .012.

Track recommendation for the secondary school for L.:

Two logistic regressions were performed to examine whether the experimental group and the study programme influenced the track recommendation for L. In the first logistic regression, we tested the frequency of the recommendation for a special school vs. any other kind of regular school. Therefore, all types of regular secondary schools were grouped as one. The binomial logistic regression model was statistically not significant, χ²(5) = 7.27, p = .201, Nagelkerke R² = .035. With the CG and prospective RETs set as the respective reference groups, neither the experimental group (p = .144 for EG1, p = .508 for EG2), the study programme (p = .415) nor any interaction (p = .359 for EG1*SETs, p = .189 for EG2*SETs) was a significant predictor for the track recommendation. In the second regression, we differentiated further between the type of regular secondary schools. The dependent variable was split into three groups: special school vs. basic/ intermediate track vs. comprehensive/ academic track (with academic track chosen by only one participant). This logistic regression model was not significant either, χ²(5) = 5.42, p = .366, Nagelkerke R² = .044. The experimental group (p = .626 for EG1, p = .406 for EG2), the study programme (p = .122), and the interactions (p = .351 for EG1*SETs, p = .541 for EG2*SETs) were no significant predictors for the track recommendation.

Highest expected graduation level that L. can achieve:

For the highest expected graduation level, no effect of the experimental group, F(2,270) = 1.60, p = .203, ηp2 = .012 or interaction of the experimental group or study programme was found, F(2,270) = 0.10, p = .908, ηp2 = .001. However, a medium-sized main effect of the study programme was detected, F(1,270) = 13.42, p < .001, ηp2= .047, showing that prospective SETs (M = 3.45, SD = 1.04) expect a significantly lower graduation level for L. than prospective RETs (M = 4.02, SD = 1.26).

Figures for the choice of tasks, the fostering and the highest expected graduation level are shown in the electronic supplement.

Further analyses:

In the first question of the manipulation check, participants were asked to judge whether L. has an LB. A binomial logistic regression was performed to examine whether the experimental group and the study programme influenced this judgement. The regression model was significant, χ²(5) = 19.36, p = .002, Nagelkerke R² = .101. With the CG and prospective RETs set as reference groups, there was a significant effect for EG1 (p = .007) and EG2 (p = .002). We also found a significant effect of the study programme (p = .020). This finding reveals that compared with the CG (64.9%), more participants in EG1 (81.3%) and EG2 (85.9%) believe that L. has an LB. Also, more prospective SETs (83.7%) than RETs (72.1%) answer that L. has an LB (see Table 7). There was no difference in how confident participants were about their judgement depending on the experimental group, F(2,270) = 2.69, p = .219, ηp2 = .011, the study programme, F(1,270) = 1.09, p = .266, ηp2 = .005, or the interaction of the two variables, F(2,270) = 1.37, p = .462, ηp2 = .006.

5 Discussion

5.1 Summary of the results

Based on the results of studies that found adverse effects of the LD label on teachers’ performance expectations, we aimed to answer the research questions of whether (1) these adverse effects can also be found for the German LB label, whether (2) SETs and RETs differ in their performance expectations, and whether (3) the effects of the LB label differ between RETs and SETs. For the first question, we hypothesised that the LB label has a negative effect on teachers’ short-term and long-term performance expectations for a fictitious student. For the second question, we hypothesised that SETs have lower expectations than RETs. Moreover, for the third question, we hypothesised that the adverse effects of the LB label on the performance expectations are especially strong for SETs.

Contrary to experimental studies using the LD label, our study did not reveal any main effects of the LB label on the performance expectations of prospective teachers. The LB label did not influence the choice of tasks assigned to the fictitious student, the fostering of the student, the highest expected graduation level or the track recommendation for the secondary school of the student.

For the second research question, two main effects of the study programme were found: in comparison with RETs, SETs foster less performance and more wellbeing, and expect a lower graduation level for the fictitious student.

For the third research question, we found one interaction effect indicating that RETs gave a higher number of and more difficult tasks than SETs in the CG, but made an equal choice in both EGs. Hence, the hypothesised differential effect of the LB label between SETs and RETs was not found.

Even though there was no main effect of the LB label for any of the dependent variables, our study reveals that the label does influence the assumption if the fictitious student has an LB: participants who read the vignette with the LB label state that the student has an LB significantly more often. This effect also occurred when participants stated that they did not recall having seen the LB label in the vignette, indicating a subconscious effect of the label. Furthermore, we found a main effect of the study programme: compared with RETs, prospective SETs more frequently state that the student has an LB.

5.2 The role of the LB label

It was not possible for the negative impact of the LD label on teachers’ expectations to be replicated for the German LB label in a sample of prospective RETs and SETs. The negative effects of the LD label in English-speaking countries might not just be influenced by the label itself, but also by the culture and school system.

In the following, we will discuss five reasons as to why we failed to replicate the known negative effects, starting with the difference in the meaning of an LB and LD, respectively. While the LD and LB labels share the defining core feature of low academic achievement, both differ in their diagnostic definition. This is particularly pronounced when the meaning of the German LB is compared with what is defined as an LD in the USA. Therefore, it is not possible to exclude the possibility that the different meaning of the two diagnoses may be the reason why we did not find a negative effect of the LB label on the teachers’ expectations.

Secondly, the severity of the learning problems with which the fictitious student was described in the vignette may have influenced the results. Experimental studies that have used vignettes to examine the effects of the LD label not only manipulated the presence of the label but also the description of the academic ability the student has and the effort that the student puts into their work (e.g. Clark, 1997; Woodcock & Vialle, 2010, 2011; Woodcook, 2014). Even though in these studies, the label had a main effect on the expectation of future failure, the discrepancy between the labelled vs. non-labelled counterpart was the largest when the student was described as of high ability and demonstrating high effort (Clark, 1997; Woodcock & Vialle, 2010), or when the student was described as of high ability and low effort (Woodcook, 2014). While the role of effort seems inconclusive and no information about the effort was given in the present study, it appears that ability may play a moderating role in the effects of the label. In this study, the student’s academic problems were described as being rather severe, which may have led to participants thinking that the student’s ability was low. However, the label may reveal its negative effects to a greater extent when the description of the learning problems is less strong. Referring to attribution theory (e.g. Weiner, 1979, 2010), a pronounced strong problem description may facilitate an attribution to a stable and uncontrollable cause for the issues, while a less pronounced description may facilitate an attribution to a less stable cause.

Thirdly, while in the present study it was mentioned that the student has an LB, the previously cited studies did not explicitly mention the LD diagnosis, but operationalised the LD by mentioning that the student is in receipt of special education support (see also Vlachou et al., 2014; Woodcock, 2013; Woodcock & Hitches, 2017; Woodcock & Vialle, 2016) or is in a special education class (Thelen et al., 2003). This different way of operationalising the LD may influence teachers’ expectations. Specifically, participants may think that the problems are more severe if they persist even though the student is already receiving special education support. However, one study that operationalised the LD label as in the present study did discover a negative effect between labelled and unlabelled students (Minner & Prater, 1984).

Fourthly, our study used a different approach to measuring expectations than other studies, in which participants were asked to rate the likelihood of future failure for the student (Clark, 1997; Vlachou et al., 2014; Woodcock, 2013; Woodcock & Hitches, 2017; Woodcock & Vialle, 2010, 2011, 2016). While a future failure in an exam, for example, is a relatively short-term expectation, it might not be possible to transfer these expectations much further into the future, such as to the expected graduation level or a different general goal for the fostering of the student.

Finally, it is possible that the participants answered in a socially desirable way (Lüke & Grosche, 2018). In their educational programme, prospective teachers might learn about labelling theory and have the possible negative consequences attached to labels in mind (Goffman, 1963; Link & Phelan, 2001). This might have made prospective students aware of the LB label in the present study and led them to answer in a socially desirable way. However, two indicators speak against this assumption: firstly, while many participants criticised the fact that the information provided about the fictitious student was insufficient to make good assessments in the open feedback, only one participant criticised the use of the LB label. Secondly, the result that participants who read the vignette with the LB label stated that the student has an LB significantly more often speaks against a critical attitude towards this diagnostic term. Without such a critical attitude, it is not necessary to answer the other questions in a socially desirable way. Even though the LB label did not affect teachers’ expectations in the present study, the latter result supports the assumption that the presence of the diagnostic label facilitates a medical model of a learning disability. That is, the student’s problems are due to the student’s mental or physical impairment and thus inherent within the student, while ignoring factors such as the social structures that contribute to the situation (Ho, 2004).

5.3 The role of the education profession

While it is believed that SETs have lower expectations for their students with a LD (Heward, 2003), research on the differential effect of the LD label on the expectations of RETs compared with SETs is scarce, and the results are inconclusive. Vlachou et al. (2014) found that RETs in Greece have a higher expectation of future failure for both labelled and non-labelled students. Our results cannot replicate this effect, i.e. with prospective SETs having lower performance expectations than prospective RETs, irrespective of whether or not students with learning problems are labelled. While one reason for the different outcome could be the differences in the education of SETs in Greece versus in Germany (Tsibidaki et al., 2020), other factors such as different participants with prospective teachers in the present study might have also contributed to the discrepancies.

Differences in practical experience are a factor that might have contributed to SETs holding a lower graduation expectation. With studies indicating either no connection (Agirdag et al., 2013; Barriga et al., 2019) or a negative connection (Riegle-Crumb & Humphries, 2012; Whitley, 2010) between the practical teaching experience and teachers’ expectations, it seems unlikely that the relation between the two variables is a positive one (Wang et al., 2018). In the present study, prospective SETs had significantly more practical experience in special schools than prospective RETs, and might already have lowered their expectations concerning academic performances. This hypothesis can be supported by a significant negative correlation (r = − .183, p = .002) between the expected graduation level and the practical experience in special schools, which was not found between experiences in inclusive regular (r = − .019, p = .755) and non-inclusive regular schools (r = − .066, p = .273) and the expected graduation level.

When compared with RETs, SETs also favoured fostering wellbeing rather than academic performance. An explanation for this finding may also be the practical experience in special schools, possibly evoking teaching goals for students with learning difficulties that focus more on wellbeing than academic performance (Heward, 2003).

To summarise, while we did not find that the presence of the LB label decreases teachers’ performance expectations, compared with RETs prospective SETs sometimes seem to have lower performance expectations.

Something of an unexpected result of the present study is that significantly more prospective SETs than RETs believed that the fictitious student has an LB. For instance, Woolfson et al. (2007) demonstrated that SETs attributed less stability to learners’ difficulties and saw a higher possibility of change than RETs. Also, Vlachou et al. (2014) found that compared with RETs, the responses of SETs conveyed less stable and uncontrollable attributions for failure. How can the different results in this study be explained? One reason might be that in Germany, SETs are responsible for diagnosing special educational needs. Therefore, the criteria of the LB diagnosis may have been more salient to prospective SETs when reading the vignette. The description of the fictitious student in the present study fulfils all the definition criteria for an LB (Grünke & Grosche, 2014), which may have facilitated the assumption that an LB is present. Prospective RETs are likely to have very little experience in this process. On the one hand, this knowledge about the diagnostic process may, as mentioned, facilitate the recognition of the indicators that speak for an LB. On the other hand, one can argue that SETs in particular are aware that a diagnosis of special educational needs in learning should also involve other criteria than academic problems, which were not described in the present study. Therefore, we had expected a more critical attitude towards the diagnosis of an LB based on the insufficient information provided.

5.4 Limitations and directions for future research

At least five caveats need to be considered. Firstly, although using vignettes is a practical approach to measuring teachers’ reactions to students, vignettes typically have a high internal validity but lack external validity (Aguinis & Bradley, 2014). Further studies evaluating the role of the LB label should focus on increasing the external validity.

A second caveat is that the manipulation in the present study was very subtle, mentioning the LB label only twice. Therefore, more than half of the participants in the initial EG did not recall the presence of the label in the vignette. Even though this situation allowed us to examine the effects of the label when it is not consciously remembered, it led to experimental groups with an unequal number of participants. Further studies should investigate different ways of manipulating the presence of the label, with a shorter vignette, for example.

Thirdly, the description of the typical definition criteria of an LB may have been too pronounced in a way that it clearly fulfilled all its criteria (Grünke & Grosche, 2014), possibly left little room for other attributions and may have facilitated the activation and retention of stereotypes (Fiske et al., 2018; Fiske & Neuberg, 1990). Further research should investigate whether the effect of the LB label depends on the severity of the described problems in school.

Fourthly, some dependent variables consist only of one manifest item with unclear validity and reliability. Future research should focus on fewer dependent variables that are operationalised more extensively.

Finally, our participants were prospective teachers. Following studies should concentrate on in-service teachers. In this way, one could further explore the role of teaching experience on teachers’ expectations. Specifically, investigations could look at whether the experience in special schools compared with (inclusive) regular schools influences teachers’ experiences differently for labelled and non-labelled students. This would also allow us to examine whether the experience of teaching in a particular type of school has a stronger influence on teachers’ expectations than the profession.

5.5 Practical relevance in the context of inclusion

We began our article with the international commitment to an inclusive school system that has brought many students with special educational needs into inclusive schools. To provide the appropriate support to students with special educational needs in inclusive schools, resources (like teachers, time, and money) are required. Such resources are only activated with an official diagnosis of support needs in many school systems. The diagnosis functions as a label and carries the risk of stigmatisation – a controversy known as the Dilemma of Difference (Hornby, 2015; Norwich, 2008, 2014). Teachers are crucial for the implementation of inclusion in schools (Avramidis & Norwich, 2002; Dignath et al., 2022), but like anyone else, they are at risk of having stereotypical expectations of students (Wang et al., 2018). The results of our study can be seen in a positive and negative light in the context of inclusion. Negative, because the presence of the label makes it more likely that prospective teachers believe that a learning disability is present. It seems likely that a diagnosis made by someone else is not questioned, but accepted as a given fact. It is positive that this assumption does not seem to change the performance expectations that prospective teachers have for students with academic performance problems. Our results might lead assumption that the LB does not carry the negative risks attached to disability labels (Link & Phelan, 2001). However, the facilitated assumption of a learning disability caused by the LB label may manifest itself in other aspects that were not examined in this study, such as teachers’ emotions. Studies have shown that the LD label evokes more pity (Clark, 1997), more sympathy, and less frustration (Woodcock & Vialle, 2010, 2011). These emotions are likely to be perceived by the students (Frenzel et al., 2018; Keller & Becker, 2021) and might influence the inclusion of students with special educational needs into the class community.