1 Introduction

Teachers’ evaluations can be biased by students’ characteristics, such as their ethnic background (e.g., Harber et al., 2012; Holder & Kessels, 2017; Nishen & Kessels, 2022). Most studies indicate a negative bias, that is, a bias to the disadvantage of ethnic minority students. Teachers tend to judge these students’ abilities lower and often have lower expectations for ethnic minority students than for ethnic majority students (e.g., Bonefeld et al., 2020; Glock & Böhmer, 2018; Tenenbaum & Ruck, 2007). Surprisingly, when it comes to feedback, several studies have found a bias in the other direction: Ethnic minority students received more positive and less critical feedback than ethnic majority students (e.g., Harber 1998; Harber et al., 2012). However, feedback that sounds positive does not necessarily have positive consequences for students. For example, praising students’ abilities can strengthen the belief that abilities are fixed and thus dampen enthusiasm for learning, and overly positive feedback can set unrealistic, unattainable standards (e.g., Brummelman et al., 2014; Mueller & Dweck, 1998). Such dysfunctional feedback might be particularly detrimental to minority students, who tend to be very sensitive to teachers’ expectations and feedback (e.g., Cohen et al., 1999; Major et al., 2016).

The aim of this pilot study was to investigate whether teachers provide more dysfunctional feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds. We developed a scenario-based test that measured content-related aspects of dysfunctional feedback (e.g., whether feedback addressed students’ abilities) as well as style-related aspects of dysfunctional feedback (e.g., whether feedback contained inflated praise). The pilot study was followed up with an expert survey to evaluate the feedback test.

1.1 Feedback that is functional or dysfunctional for supporting learners’ motivation

Teachers’ feedback can have a huge impact on students’ motivation and learning (e.g., Hattie & Timperley 2007). In our research, we focus on positively worded feedback – that is, feedback that expresses teachers’ intentions to praise, encourage, or comfort a student. Research has demonstrated that such feedback can have ambivalent consequences for the students (e.g., Meyer 1992; Mueller & Dweck, 1998). When feedback has positive consequences for students’ learning-related beliefs and motivation, we define it as functional. In contrast, feedback that has negative consequences for learning-related beliefs and motivation is defined as dysfunctional (for a similar classification, see, e.g., Baadte 2019).

1.1.1 Functional feedback

Which feedback is functional for supporting learners’ motivation? We adapted three central criteria from Henderlong and Lepper (2002): First, functional feedback is sincere and appropriate. Feedback works best when the feedback recipient perceives the feedback sender as honest, when the feedback fits the specific situation, and when feedback is consistent with the recipient’s own perceptions (Henderlong & Lepper, 2002). Second, functional feedback encourages attribution to controllable causes and supports a growth mindset. Explaining one’s performance with controllable factors such as effort or strategies often goes along with higher motivation, because it offers possibilities to improve future learning processes (e.g., Weiner 1985). Effective feedback should therefore address, for example, the learning process or self-regulation strategies (Hattie & Timperley, 2007). Such feedback can also be regarded as helpful because it fosters a growth mindset, that is, the belief that abilities are malleable (Dweck & Leggett, 1988; Dweck & Yeager, 2019). Third, functional feedback communicates attainable goals and expectations. It should convey standards that are high but realistic (Henderlong & Lepper, 2002).

1.1.2 Dysfunctional feedback

How can feedback be dysfunctional for learners’ motivation, even when it is positively worded? Our central criteria for dysfunctional feedback complement those for functional feedback: First, dysfunctional feedback is insincere or inappropriate. For example, feedback can be perceived as inappropriate when it is very general or overly positive (Henderlong & Lepper, 2002). Second, dysfunctional feedback encourages attribution to uncontrollable causes and supports a fixed mindset. Explaining one’s performance with uncontrollable, internal factors such as one’s fundamental abilities can reduce learners’ motivation as it often goes along with a fixed mindset (i.e., the belief that abilities are abilities are fixed and unchangeable; Dweck & Leggett 1988; Dweck & Yeager, 2019). That is also a reason why feedback on the self-level has shown to be less effective than, for example, feedback on the process-level (Hattie & Timperley, 2007). Third, feedback is dysfunctional when it communicates low or unrealistic goals and expectations (Henderlong & Lepper, 2002).

Based on these criteria, we describe four examples for dysfunctional feedback: ability feedback (e.g., Rattan et al., 2012), effort feedback (e.g., Amemiya & Wang 2018), inflated praise (e.g., Brummelman et al., 2014), and simplified language (e.g., Goodman & Freeman 1993) The first two examples are matters of content (i.e., what does the teacher say?), while the third and fourth examples are matters of style (i.e., how does the teacher say it?).

Ability feedback refers to feedback based on students’ traits or characteristics, such as “You are really smart!” when students are successful, or “Never mind, not everybody can be good at this” when students struggle (e.g., Lou & Noels 2020a; Mueller & Dweck, 1998; Reavis et al., 2018). Ability feedback can boost students’ self-efficacy in the short run (e.g., Schunk 1983) but impairs learning in the long run because it can strengthen fixed mindsets and, in turn, reduce motivation and lead to unfavorable learning behavior (Dweck & Leggett, 1988; Dweck & Yeager, 2019). Students who have been praised for their abilities strive to demonstrate their abilities, and shy away from mistakes or challenges (Brummelman et al., 2014; Haimovitz & Corpus, 2011; Kamins & Dweck, 1999). Students who have been comforted for their low abilities often develop feelings of helplessness and lower self-esteem (Kamins & Dweck, 1999; Lou & Noels, 2020a; Rattan et al., 2012).

Effort feedback means feedback that refers to the effort students have invested or not invested in their learning, such as “Your effort has paid off!” when students were successful, or “If you try harder next time, you will succeed!” when students struggle. Such feedback is generally considered helpful because – as described above – it is process-oriented and provides students with a concrete possibility to improve their learning (e.g., Haimovitz & Corpus 2011; Rissanen et al., 2019; Weiner, 1985). However, effort feedback affects students differently depending on the situation. Dweck and Yeager (2019) argue that praising effort is helpful when students’ effort was productive; and motivating students to try harder is helpful when students failed due to a lack of effort. In all other cases, effort feedback might not have the anticipated effects. For example, when students struggle because they lack good strategies or need task-related support, it is insufficient to tell them to “just try harder”. Instead, teachers need to figure out the reason for the problem and provide targeted individual support (see also Buttrick 2020; Goegan et al., 2021; Pelletier et al., 2020). Praising effort can also be dysfunctional when students succeed without having invested much effort, or when they think the teacher had no opportunity to actually judge how hard they worked (see Henderlong & Lepper 2002). In these cases, praising effort may seem effusive and insincere, and can even be interpreted as a sign that the teacher perceives the student’s ability to be low (e.g., Amemiya & Wang 2018; Meyer, 1992). An explanation for this adverse effect is that students, especially adolescents, tend to regard ability and effort as opposites. From this perspective, people with high abilities do not need to put in much effort to perform well, whereas people with low abilities need to put in effort to compensate for their poor capacities (e.g., Nicholls 1978). Inappropriately praising effort can therefore leave students with the impression that the teacher believes their level of ability is low (see also Henderlong & Lepper 2002; Xing et al., 2018).

If feedback contains inflated praise, as indicated by overly positive words such as “incredible” or “fantastic”, it can set standards that appear unrealistically high and put pressure on students to continue to meet these exceptionally high standards (e.g., Baumeister et al., 1990; Brummelman et al., 2014). Moreover, inflated praise may be perceived as inappropriate or untrue, and is likely to be inconsistent with students’ beliefs about themselves (Henderlong & Lepper, 2002). Consequently, inflated praise may be rejected by students and can even reduce their self-esteem in the long run (Brummelman et al., 2017).

Feedback in simple language, such as very simple words and short sentences, can also have unfavorable effects (e.g., Goodman & Freeman 1993). In the context of the present study, teachers may assume that students from immigrant backgrounds are less fluent in the language of instruction, and thus use simplified language in an attempt to increase comprehensibility and reduce students’ cognitive load (see Crossley et al., 2012; Harber et al., 2019). Certainly, this assumption can be correct, given that poor language proficiency is a major barrier to the education and integration of immigrant students (e.g., de Paiva Lareiro 2019; Schipolowski et al., 2021). However, such simplification can create two problems: First, simple language can prevent students from making progress in learning the local language because their language proficiency develops more quickly when teachers use authentic language (e.g., Goodman & Freeman 1993). Second, automatically simplifying one’s language when talking to a student from an immigrant background may communicate low expectations – similar to when students are praised for simple tasks (Meyer, 1992).

1.1.3 Situational and personal influences – when and for whom is feedback dysfunctional?

It is important to emphasize that whether feedback is functional or dysfunctional depends on the situation and on the person that receives the feedback. With respect to situational influences, even small situational cues can determine how people interpret feedback (e.g., social situation vs. achievement situation, see Binser & Försterling 2004). In the present study, the distinction between situations of success and situations of failure, and between situations in which learners have or have not invested effort is particularly relevant. For example, as described above, effort feedback can be very helpful when given in situations when students have successfully invested effort or failed due to a lack of effort, but the same feedback can be demotivating when students have unsuccessfully invested effort or failed despite strong effort.

With respect to personal influences, research has shown that interpretations of feedback depend on learners’ individual preconditions and beliefs (e.g., whether they belief that negative feedback is related to liking, or whether they have low or high self-esteem; Binser & Försterling 2004; Brummelman et al., 2014). Here, we focus in particular on whether learners are from immigrant or non-immigrant backgrounds. Overall, research indicates that ethnic minority students are more vulnerable to feedback effects that ethnic majority students (e.g., Major et al., 2016; McKown & Weinstein, 2002). Several studies have demonstrated that students who are stigmatized due to their ethnic background are more sensitive to teachers’ expectations than ethnic majority students (Hinnant et al., 2009; Jussim & Harber, 2005; Wang et al., 2018). In particular, underestimation of ethnic minority students often goes along with factual underperformance (McKown & Weinstein, 2002). One explanation for these strong self-fulfilling effects is that individuals are aware of negative stereotypes that exist about their social group, feel threatened, and confirm the stereotype by performing below their actual ability level (Steele & Aronson, 1995). Recent studies imply that certain teaching practices, such as linking success to abilities, increase stereotype threat (e.g., Bian et al., 2018; Canning et al., 2020; Muenks et al., 2020). Stereotyped students also tend to be susceptible to adverse consequences of positive feedback: When Black students received positive feedback from a White person, they assumed the feedback was biased by the person’s motivation to behave positively towards minorities, and reacted with feelings of threat and uncertainty (Major et al., 2016). With regard to these findings, it seems particularly important that teachers provide accurate, functional feedback to ethnic minority students.

1.2 Teachers’ feedback to students from immigrant and non-immigrant backgrounds

Immigrant status often goes along with disadvantages in the new country’s educational system. In Germany, for example, students from immigrant backgrounds – that is, students who were born outside Germany or who have at least one parent who was born abroad – attend lower-track schools, exhibit poorer school performance, and receive lower school-leaving qualifications than students from non-immigrant backgrounds (e.g., de Paiva Lareiro 2019; Schipolowski et al., 2021; Schleicher, 2019). These discrepancies are particularly problematic when students’ trajectories do not align with their actual abilities but result from unfavorable expectations or decisions (e.g., Schindler & Lörz 2012; Weiss & Steininger, 2013). In this regard, teachers play a major role, as they accompany and influence students’ development.

Prior research has shown that teachers, rather than compensating for immigrant students’ disadvantages, often contribute to inequality in the classroom as their judgments and decisions regarding students from immigrant and non-immigrant backgrounds differ systematically (e.g., Bonefeld et al., 2017; Decristan et al., 2019; Lorenz et al., 2016). An explanation for such differences is that when people interact with members of certain ethnic groups, stereotypes are activated that can lead to biased judgments or behavior (e.g., Fiske & Neuberg 1990). For example, people in Germany often have negative stereotypes about people from Arab countries (e.g., Morocco, Syria, Tunisia) or Central Asia (e.g., Afghanistan; see Froehlich & Schulte 2019; Kotzur et al., 2019). Such stereotypes can then lead to biased evaluations or biased behavior towards members of the stereotyped group (e.g., Cuddy et al., 2007).

1.2.1 Stereotypical bias in teachers’ evaluations and feedback

Regarding teachers’ evaluations, stereotypical bias differs depending on whether we consider non-communicated judgments or feedback. Non-communicated judgments mean evaluations that are not communicated to the target student, whereas feedback means evaluations that are communicated or at least intended to be communicated to the student (see Nishen & Kessels 2022). With respect to teachers’ non-communicated judgments, most studies point to a negative bias, that is, a bias to the disadvantage of students from immigrant backgrounds. For example, preservice teachers evaluated students’ language skills lower when they had a Turkish name as opposed to a traditionally German name (Glock, 2016; Holder & Kessels, 2017), and judged a text less positively when they believed that it was written by a student of Turkish rather than German descent (Bonefeld & Dickhäuser, 2018). In a study conducted by Tobisch and Dresel (2017), primary school teachers rated students’ abilities and learning behavior lower when the student’s name implied an immigrant background, as compared to a traditionally German name. The teachers also held lower expectations for these students’ academic development. While these findings are based on laboratory studies with fictive students, a similar pattern has been found in fieldwork: Teachers had lower expectations of Turkish-origin students than students from non-immigrant backgrounds, even when their test performance was equal (Lorenz et al., 2016), and assigned poorer grades to students from immigrant backgrounds than native students (Bonefeld et al., 2017). However, teachers seem able to suppress their biased thoughts to a certain extent (Glock & Krolak-Schwerdt, 2014), or even compensate for potential disadvantages by making more favorable judgments when they evaluate students from immigrant backgrounds (e.g., Biernat & Manis 1994; Glock & Kleen, 2019; Holder & Kessels, 2017; Nishen & Kessels, 2022).

With regard to feedback, research also points to both negative and positive biases. Examples of the former include, for example, teachers’ speech being less positive when addressing ethnic minority students (Tenenbaum & Ruck, 2007). In contrast, a series of studies conducted by Harber and colleagues (1998; 2004; 2010; 2012; 2019) demonstrated a positive bias with respect to feedback. Teachers provided more positive feedback on an essay when they believed the writer was a Black student as compared to a White student. The researchers observed this effect when ethnic background was implied by a typical name, as well as when the feedback giver and recipient faced each other in person. In a recent study conducted by Nishen and Kessels (2022), German teacher students gave overly positive feedback to a student with a Turkish name, compared to a student with a German name. Similar studies indicated that ethnic minority students received less critical feedback and were made aware of difficult tasks less frequently than ethnic majority students (Croft & Schmader, 2012; Crosby & Monin, 2007). These differences relate to stereotypes, as they have been found with regard to negatively stereotyped groups but not with regard to groups that are associated with weaker or more positive stereotypes (Croft & Schmader, 2012; Harber et al., 2012). However, even though this bias appears positive at first glance, it can carry adverse implications for minority students. This is because the feedback ethnic minority students received was framed in a more positive way, it was neither more informative nor more accurate. Moreover, this feedback bias reduced the amount of constructive criticism minority students received, which might have been helpful for future learning (e.g., Croft & Schmader 2012; Harber et al., 2010; Ruscher et al., 2010). Taken together, it can be assumed that ethnic minority students do not receive more positive feedback but more dysfunctional feedback from teachers.

1.2.2 Influences of teachers’ beliefs on dysfunctional feedback

Teachers’ instructional practices are shaped by their beliefs about teaching and learning (e.g., Schoenfeld 1998). In the present study, we selected three kinds of beliefs and explored whether they would influence dysfunctional feedback to students from immigrant backgrounds: (a) Beliefs about the nature of abilities (growth mindset vs. fixed mindset; Dweck 2007) were chosen as they are likely to affect the types of dysfunctional feedback that we focused on, that is, ability-based or effort-based feedback. (b) Prejudices and (c) multicultural beliefs were chosen because they are likely to affect teachers’ evaluations of students from immigrant and non-immigrant backgrounds, and are therefore often included in similar studies (e.g., Glock 2016; Hachfeld et al., 2012). In the following, we address these beliefs in more detail. (a) A growth mindset is defined as the belief that abilities are malleable and can be improved by effort and practice (Dweck, 2007; Dweck & Yeager, 2019). Several studies indicate that teachers’ mindsets affect how they perceive and interpret classroom situations (e.g., De Kraker-Pauw et al., 2017; Kuntze, 2012). Teachers with a fixed mindset tend to attribute performance to students’ fundamental abilities, whereas teachers with a growth mindset focus more on variable factors such as effort or strategy use (Butler, 2000; Rattan et al., 2012). Hence, a fixed mindset is associated with dysfunctional ability feedback (Jonsson & Beach, 2012; Rattan et al., 2012). In contrast, a growth mindset is associated with more functional, motivating feedback (Dweck & Yeager, 2019) – and with more positive attitudes and a better communication style in intercultural contexts (Lou & Noels, 2020b). Thus, it can be assumed that a growth mindset supports functional feedback and reduces stereotype-based feedback differences. (b) Prejudices are defined as a (usually negative) attitude towards people based on the social group to which they belong (Allport, 1954). Prejudices include an affective as well as a cognitive component, whereby the cognitive component is often referred to as stereotype (for a discussion about prejudices vs. stereotypes, see, e.g., Devine 1989). Another feature of prejudices is that they are mostly unjustified or incorrect. In German schools, for example, students from immigrant backgrounds are often regarded as being less motivated and showing problematic learning behavior (see Hachfeld et al., 2012), although in fact most are even more motivated and interested in learning than non-immigrant students (OECD, 2006). Research has shown that stereotypes/prejudices based on ethnicity can affect evaluations in general (e.g., Fazio & Olson 2014; Fiske & Neuberg, 1990) and teachers’ evaluations and feedback in particular (e.g., Croft & Schmader 2012; Froehlich et al., 2015; Glock & Böhmer, 2018). In the context of this study, we assume that teachers’ prejudices towards students from immigrant backgrounds increase stereotype-based feedback differences. (c) Multicultural beliefs describe the view that cultural differences should be valued and positively emphasized in social interaction (e.g., Hachfeld et al., 2012). In contrast, color blindness means that cultural differences should be mostly ignored (e.g., Plaut et al., 2018). Research findings about the effects of multicultural beliefs are mixed: On the one hand, multicultural beliefs can increase positive attitudes towards people from other countries, reduce prejudices, and support teachers’ conflict management in multicultural classes (e.g., Plaut et al., 2018; Wagner et al., 2001). On the other hand, multicultural beliefs can create an illusion of fairness or lead to overly friendly behavior towards people from other countries (Plaut et al., 2018; Rattan & Ambady, 2013). In our study, both directions seemed possible: Multicultural beliefs could either reduce or increase stereotype-based feedback differences.

2 Pilot study

2.1 Hypotheses

The aim of this pilot study was to investigate whether teachers would provide more dysfunctional feedback to students from immigrant backgrounds than to non-immigrant students. Dysfunctional feedback was assessed with scenario-based tasks and included ability feedback, effort feedback, inflated praise, and simple language. We experimentally varied the students’ immigrant status by using typical German names (i.e., non-immigrant background) or typical names from Arab and Central Asian countries (i.e., an immigrant background associated with the lowest competence stereotypes in Germany; see Froehlich & Schulte 2019). We addressed the following hypotheses:

Ability feedback hypothesis:

One kind of dysfunctional feedback is feedback that refers to students’ fundamental abilities or traits (e.g., Rattan et al., 2012). We expected teachers to provide more of such dysfunctional ability feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds.

Effort feedback hypothesis:

Another kind of dysfunctional feedback is feedback that refers to students’ effort when students fail despite putting in effort or succeed without effort (e.g., Dweck & Yeager 2019). We expected teachers to provide more of such dysfunctional effort feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds.

Inflated praise hypothesis:

Inflated praise can put pressure on students and weaken their self-esteem (e.g., Brummelman et al., 2014). We expected teachers to use more inflated praise when they address students from immigrant backgrounds compared to students from non-immigrant backgrounds.

Simple language hypothesis:

Simplified language can have a negative impact on students’ language learning and competence beliefs (e.g., Goodman & Freeman 1993). We expected teachers to use simpler language when they address students from immigrant backgrounds compared to students from non-immigrant backgrounds.

Additionally, we aimed to explore whether teachers’ growth mindset, prejudices, and multicultural beliefs would moderate feedback differences by immigrant status.

2.2 Method

2.2.1 Design

The study followed an experimental design with students’ immigrant status as the independent variable (immigrant vs. non-immigrant background) and different facets of feedback as dependent variables (ability feedback, effort feedback, functional feedback, inflated praise, and simple language). The feedback assessment was based on written scenarios, in which the teachers needed to provide feedback to a fictitious student. Six scenarios measured ability feedback, effort feedback, and functional feedback with closed-ended questions; three scenarios measured inflated praise and simple language with open-ended questions. Participants were randomly assigned to one of two test versions. Both test versions contained the same feedback scenarios in the same order; however, the names of the students in the scenarios differed. For example, participants in test version A (n = 91) read the name of a student from an immigrant background in the first scenario, whereas participants in test version B (n = 95) read the name of a student from a non-immigrant background in the same scenario. Note that the student names also varied within each test version, so that all participants worked on scenarios describing students from immigrant and non-immigrant backgrounds (see Fig. 1). As can be seen in the figure, the fictitious students were either male or female, whereby their gender was constant within each situation type. Hence, gender was not confounded with immigrant status (see limitations for a detailed discussion on gender-related aspects).

Fig. 1
figure 1

Overview of the scenarios

Notes. m = male student, f = female student, + IB = student from immigrant background, - IB = student from non-immigrant background; open-ended questions measured inflated praise and simple language, closed-ended questions measured ability feedback, effort feedback, and functional feedback.

2.2.2 Participants

A total of 186 teachers participated in the study (72% female; age M = 43.40, SD = 10.32). The sample exceeded the desired sample size of 146, which resulted from an a priori power analysis with the software G*Power based on the small effect sizes reported by similar studies (e.g., Bonefeld et al., 2017; Holder & Kessels, 2017) and a power of 1-β = 0.90. We only included those teachers who had completed the full survey and thus showed almost no missing data. Most of the participating teachers worked at a secondary school (76%), followed by primary school (12%), comprehensive school (11%), and special needs school (1%). Their average amount of teaching experience was 14.88 years (SD = 9.58). The sample included teachers from different subject areas: languages (59%), mathematics/science (56%), social sciences (46%), arts/sport (38%), and technology (12%), with multiple answers possible as teachers often teach multiple subjects. We contacted the teachers via email and invited them to participate in a study on feedback. They were informed that the study aimed to investigate how teachers provide feedback to students, with no reference to students from immigrant backgrounds. All participants were aware that they were participating in research and took part voluntarily. The study was registered and approved by the Regional Administrative Council (Regierungspräsidium) in Freiburg, Germany.

2.2.3 Materials

2.2.3.1 Feedback scenarios

Feedback was assessed with written scenarios, which described classroom situations (ca. 60–100 words) requiring the teacher to provide feedback to a student. To address teachers from all subject areas and school types, the scenarios did not refer to a particular subject or school context. We constructed the scenarios by drawing upon evidence-based resources (e.g., Mindset Works, 2016) and in cooperation with practitioners. Based on the Growth Mindset Feedback Tool (Mindset Works, 2016), we defined three types of classroom situations: situations in which a student failed despite strong effort (effort-fail), situations in which a student succeeded easily without strong effort (without-effort-succeed), and situations in which a student succeeded with strong effort (effort-succeed). We created two scenarios for each type of situation (see Table 1). Note that a fourth situation type – failing due to no effort – was not considered in this study because the focus was on dysfunctional feedback, and we aimed to describe situations in which both ability and effort feedback would be dysfunctional. In without-effort-fail situations, however, effort-based feedback such as motivating students to try harder would have been functional, similar to the effort-succeed situations in which effort feedback was also functional. We decided to leave out this situation type so that we had just one situation type in which effort feedback was functional.

Table 1 Overview of the feedback scenarios: situation types, scenario content, and expert ratings
2.2.3.2 Manipulation of students’ immigrant status

The names of the students implied either an immigrant or non-immigrant background. As names implying a non-immigrant background, we used popular names among non-immigrant schoolchildren in Germany (Gesellschaft für deutsche Sprache e. V., 2020). These were Sophie and Marie for girls, and Maximilian and Paul for boys. As names implying an immigrant background, we used popular Arab and Central Asian names (Names.org, 2021). These were Khadijah and Safija for girls, and Omar and Hamza for boys. In addition to the students’ names, we also mentioned the country of origin: Syria, Tunisia, Afghanistan, and Morocco. For example, the descriptions in the scenarios referred to “Safija from Tunesia” or “the Tunisian girl Safija”. We added the country to ensure that the teachers associated the (in Germany very unfamiliar) names with the correct country, and thus the same negative stereotypes would be activated. A similar procedure was used, for example, by Glock (2016) who added information about the language spoken at home and the ethnicity of the students’ friends to the mere names of the students. The countries – Syria, Tunisia, Afghanistan, and Morocco – were chosen because they are relevant immigrant groups in Germany and, most importantly, associated with the strongest negative stereotypes about competence in Germany (Froehlich & Schulte, 2019). In the study conducted by Froehlich and Schulte (2019), all four countries were in the same cluster, and we therefore expected no differences between the countries regarding competence stereotypes.

2.2.4 Measures

2.2.4.1 Ability feedback, effort feedback, and functional feedback

Based on the scenarios, we assessed (dysfunctional) feedback with closed-ended and open-ended questions. Content-related aspects of feedback (i.e., ability feedback, effort feedback, and functional feedback) were measured with closed-ended questions to obtain values from all participants, not just from those who happened to address abilities or effort in their feedback. Style-related aspects of dysfunctional feedback (i.e., inflated praise and simple language) were measured with open-ended questions to receive an authentic impression of how teachers would phrase their feedback. However, as answering the open-ended questions required more survey time, we used only half of the scenarios for these open questions (i.e., one scenario per situation type). Open-ended and closed-ended questions were analyzed separately (see data analyses).

For assessing the content-related aspects of feedback, we presented three options beneath each scenario: (a) ability feedback, (b) effort feedback, and (c) functional feedback. For each option, the teachers indicated their likelihood of conveying such feedback to the student on a scale ranging from 1 (= do not agree) to 6 (= agree very much). We phrased the feedback based on previous studies (Brummelman et al., 2014; Lou & Noels, 2020a; Rattan et al., 2012) and resources from the mindset scholars network (e.g., Mindset Works, 2016; PERTS, 2021). All feedback options were positively framed, that is, they expressed the teacher’s intention to motivate, encourage, or comfort the student. (a) Ability feedback was defined as feedback that refers to the student’s fundamental abilities or characteristics, such as “Well done! You are a very smart student!” when a student was successful, or “That’s okay! Maybe writing is just not your strength, but you’re good at oral tests” when a student failed. (b) Effort feedback was defined as feedback that refers to the effort a student has invested or not invested in learning, such as “Well done! You really did your best, like always!” when a student was successful, or “Never mind. If you try harder next time, you’ll succeed” when a student failed. As praising effort is helpful when students invested effort into a task and were successful, we only considered effort feedback as dysfunctional when students failed despite strong effort or when they succeeded easily without strong effort (effort-fail and without-effort-succeed scenarios). (c) Functional feedback conformed to the principles of feedback that fosters a growth mindset, motivation, and learning (e.g., Henderlong & Lepper 2002; Rissanen et al., 2019), and referred to students’ learning process, strategies, or dealing with difficulties and mistakes. For example, a teacher might say “Well done! Now you need something more challenging!” when students succeed easily without effort, or “That’s not yet quite as good as it could be. But if it were too easy, you wouldn’t learn anything” when students struggle. Functional feedback served as a control option in this study.

2.2.4.2 Inflated praise and simple language

Style-related aspects of dysfunctional feedback (i.e., inflated praise and simple language) were assessed with open-ended questions based on the scenarios. We instructed the participants to think about what they would say in the specific situation to motivate the student. Participants read a scenario and typed their answer to the question “What would you say to student x?” into an open response field. They were informed that their answers should be short (1–2 sentences) and spontaneous. Inflated praise was defined as use of words or expressions that seem exaggerated and go beyond “normal” praise, such as “incredible” or “I’m speechless” (see Brummelman et al., 2014, 2017). For each answer, we counted the number of overly positive expressions. Two independent raters, native German speakers who were blind to the hypotheses, coded the data. Interrater reliability was high (ICC = 0.95). Simple language was determined using the comprehensibility formula by Flesch (1948), adapted to the German language by Amstad (1978): Comprehensibility = 180 – ASL – (58.5 * ASW), with ASL = average sentence length and ASW = average number of syllables per word. Scores range from 0 to 100. Low values represent complex language; high values represent simple language.

2.2.4.3 Growth mindset

Beliefs about the malleability of cognitive abilities were measured with eight items from the Theories of Intelligence Scale (Dweck, 2007). The items were translated from the original scale by our research team. Participants agreed to statements about the nature of abilities on a scale from 1 (= do not agree) to 6 (= agree very much). Example items are “No matter who you are, you can always significantly change your level of cognitive abilities” and “You can’t really change how intelligent you are”. Items that symbolized a fixed mindset were reverse-coded, so that a high mean score represented a growth mindset. Internal consistency was good (α = 0.87).

2.2.4.4 Prejudices

Prejudices towards students from immigrant backgrounds were measured with five items from Hachfeld et al., (2012). To make sure that prejudices were assessed and not actual facts, the items addressed negative statements about immigrant students’ motivation, interest, and learning behavior – which are refuted by research (e.g., OECD 2006). Participants rated the statements on a scale from 1 (= do not agree) to 6 (= agree very much). Example items are “Students from immigrant backgrounds are less interested in school subjects than students from non-immigrant backgrounds” and “Students from immigrant backgrounds put less effort into learning than students from non-immigrant backgrounds”. High mean scores represent strong prejudices. Internal consistency was good (α = 0.86).

2.2.4.5 Multicultural beliefs

Teachers’ beliefs that diversity in the classroom should be emphasized and valued were measured with three items adapted from Hachfeld et al., (2012). Participants rated the statements on a scale from 1 (= do not agree) to 6 (= agree very much). Example items are “In my teaching, I try to address cultural differences” and “Students from immigrant backgrounds have different educational needs than students from non-immigrant backgrounds”. High mean scores represent strong multicultural beliefs. Internal consistency was acceptable (α = 0.68).

2.2.5 Procedure

Teachers first received information about the procedure and data protection, and provided informed consent as well as demographic data. Then, they were randomly allocated to one of the two test versions. Both versions contained the same scenarios in the same order, but differed in the names of the students (from immigrant vs. non-immigrant backgrounds) presented in each scenario. The first set of scenarios was followed by open-ended questions to assess style-related aspects of feedback. The second set of scenarios was followed by closed-ended questions to assess content-related aspects of feedback (see Fig. 1). Afterwards, the teachers indicated their growth mindset, multicultural beliefs, and prejudices. At the end of the study, the participants received an information sheet about feedback effects. Interested teachers could provide their email address in a separate survey, which was detached from the main survey for reasons of data protection, and be later informed about the study findings. The entire study took place online and took about 25 min for participants to complete.

2.2.6 Data analyses

We tested the hypotheses with multivariate ANOVAs, using immigrant status (immigrant vs. non-immigrant background) as a between-subject factor and the different feedback measures as dependent variables. Scores from the single scenarios were grouped for each feedback type into one MANOVA. Significant overall effects were followed up with univariate analyses. To account for the number of tests, we adjusted the alpha level with a Bonferroni-Holm correction. According to this method, different alpha levels are calculated for each post hoc test: The original alpha level (α) is divided by the number of comparisons (C) for the largest p-value, then divided by (C – 1) for the second largest p-value, divided by (C – 2) for the third largest p-value, and so forth. We thus obtained alpha levels of .008, .010, .013, .017, .025, and .050 for tests that included six scenarios (i.e., ability and effort feedback), and alpha levels of .017, .025, and .050 for tests that included three scenarios (i.e., inflated praise and simple language). Exploratory moderation analysis was conducted by adding growth mindset, prejudices, and multicultural beliefs as covariates into the multivariate analyses. The potential moderators were not associated with experimental condition (ps > .11). When the (multivariate) interaction between a covariate and students’ immigrant status was significant, we investigated the single scenarios with (univariate) moderation analysis (Hayes, 2013). To estimate effect sizes, we report partial eta-squared. Values of ηp2= 0.01, 0.06, and 0.14 are regarded as small, medium, and large effects, respectively.

2.3 Results

2.3.1 Preliminary analyses

Descriptive values indicated that the likelihood of conveying dysfunctional feedback was quite different depending on the situation type (see Table 2). For example, teachers were less likely to provide ability feedback when students failed despite effort, compared to when students succeeded with strong effort. Mean values also showed that there were inconsistencies between scenarios of the same situation type. For example, the likelihood to provide ability feedback was higher in the scenario effort-succeed B than in the scenario effort-succeed A, and the likelihood for effort feedback was higher in the scenario effort-fail A than in the scenario effort-fail B. With respect to beliefs, teachers scored rather high on growth mindset and multicultural beliefs, and low on prejudices (see Table 3). There were no substantial correlations between the beliefs.

Table 2 Descriptive statistics of the feedback measures
Table 3 Descriptive statistics of and correlations between teacher beliefs

2.3.2 Effects of students’ immigrant status on ability feedback and effort feedback

First, we tested the assumption that teachers would provide more dysfunctional ability feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds (ability feedback hypothesis). The multivariate analysis indicated an overall effect of students’ immigrant status on dysfunctional ability feedback, V = 0.11, F(6,179) = 3.70, p = .002, ηp2 = 0.11. This effect remained significant when controlling for teachers’ beliefs (p < .001). Univariate analyses revealed that teachers reported to convey more ability feedback to an immigrant student in one of two without-effort-succeed scenarios (see Table 4).

Table 4 Dysfunctional feedback to students from immigrant and non-immigrant backgrounds: descriptive values and results of the univariate analyses

Second, we tested the assumption that teachers would provide more dysfunctional effort feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds (effort feedback hypothesis). The multivariate analysis indicated an overall effect of students’ immigrant status on effort feedback, V = 0.08, F(6,179) = 2.75, p = .014, ηp2 = 0.08. This effect remained significant when controlling for teachers’ beliefs (p = .008). Univariate analyses revealed that teachers reported to convey more effort feedback to an immigrant student in one of two without-effort-succeed scenarios (see Table 4).

We conducted an additional exploratory analysis to investigate whether there were differences in functional feedback by immigrant status. Functional feedback served as control option in the study. The multivariate analysis indicated no effect of students’ immigrant status on functional feedback, V = 0.03, F(6,179) = 0.81, p = .562, ηp2 = 0.03, also when controlling for teachers’ beliefs (p = .567).

These findings support our hypotheses just partly. In five of six scenarios, teachers did not differentiate substantially between students from immigrant and non-immigrant backgrounds with respect to ability or effort feedback. Yet in one of the two situations in which a student was successful without strong effort, teachers were more likely to provide dysfunctional ability and effort feedback when this student was from an immigrant background.

2.3.3 Effects of students’ immigrant status on inflated praise and simple language

First, we tested the assumption that teachers would use more inflated praise when addressing students from immigrant backgrounds compared to students from non-immigrant backgrounds (inflated praise hypothesis). The multivariate analysis indicated an overall effect of students’ immigrant status on inflated praise, V = 0.05, F(3,182) = 2.98, p = .033, ηp2 = 0.05. This effect remained significant when controlling for beliefs (p = .039). Univariate analyses revealed that teachers used more inflated praise when the fictive recipient was an immigrant student who succeeded without effort (see Table 4).

Second, we tested the assumption that teachers would use simpler language when addressing students from immigrant backgrounds compared to students from non-immigrant backgrounds (simple language hypothesis). The multivariate analysis indicated an overall effect of students’ immigrant status on simple language, V = 0.09, F(3,180) = 6.18, p = .001, ηp2 = 0.09, which remained significant when controlling for beliefs (p < .001). Univariate analyses revealed that teachers used simpler language when the fictive recipient was an immigrant student who had failed despite effort (see Table 4).

These results partly support our hypotheses. Teachers did not generally use more inflated praise and simpler language when the scenarios described students from immigrant backgrounds, but there were situation-specific differences. Inflated praise was more likely when students from immigrant backgrounds succeeded without effort, and simple language was more likely when students from immigrant backgrounds failed despite strong effort, compared to students from non-immigrant backgrounds in the same situations.

2.3.4 Exploratory moderation analysis

We tested whether feedback differences were moderated by (a) growth mindset, (b) prejudices, or (c) multicultural beliefs. (a) Regarding growth mindset, multivariate interactions between immigrant status and growth mindset were non-significant for all feedback measures (ps > .05). (b) Multivariate interactions between immigrant status and prejudices were also non-significant for all feedback measures (ps > .07). (c) Regarding multicultural beliefs, there were multivariate interactions with respect to ability feedback, V = 0.14, F(12,348) = 2.17, p = .013, ηp2 = 0.07 and effort feedback, V = 0.13, F(12,348) = 2.09, p = .017, ηp2 = 0.07. Separate moderation analyses indicated, however, that there were no significant interactions between immigrant status and multicultural beliefs within the single scenarios (all ps > .08). It can be assumed that there were tendencies towards significant interactions within the scenarios that added up to the significant overall effect. Furthermore, multivariate interactions were non-significant with respect to inflated praise (p > .42) but significant with respect to simple language, V = 0.11, F(6,350) = 3.50, p = .002, ηp2 = 0.06. Moderation analysis revealed that there was a significant moderation effect when students succeeded without effort, b = -5.91, t(182) = -2.56, p = .011. Teachers with low multicultural beliefs used simpler language when addressing a student from an immigrant background whereas teachers with high multicultural beliefs used less simple language when addressing a student from an immigrant background, compared to a student from a non-immigrant background. In sum, the analysis did not reveal a clear pattern of how teachers’ growth mindset, prejudice, and multicultural beliefs affect feedback preferences when addressing students from immigrant and non-immigrant backgrounds. Further research and more specific measures are needed to shed light on such effects.

2.4 Discussion

The pilot study aimed to investigate whether teachers would be more likely to provide dysfunctional feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds. There were no consistent patterns of feedback differences across situations and across the different facets of dysfunctional feedback. Instead, we found selective effects: In one of two scenarios describing students who succeeded easily without effort, teachers were more likely to provide dysfunctional ability feedback, dysfunctional effort feedback, and more inflated praise if these students came from immigrant backgrounds rather than non-immigrant backgrounds. In a situation in which a student failed despite strong effort, teachers were more likely to use simple language with a student from an immigrant background than with a student from a non-immigrant background. The fact that there were no differences in functional feedback indicates that teachers did not report to provide more “positive” feedback to students from immigrant backgrounds per se, but that differences were specific for dysfunctional feedback. Exploratory analysis further indicated no moderating effects of teachers’ growth mindset or prejudices on dysfunctional feedback. Multicultural beliefs were associated with use of less simple language when addressing students from an immigrant background who had succeeded without effort compared to students from non-immigrant backgrounds. In summary, the pilot study indicated that the likelihood to convey dysfunctional feedback differed to some extent with regard to students’ immigrant status. However, it became apparent that feedback varied between situation types and also between the scenarios within one situation type. We conducted a brief expert survey to shed light on these differences.

3 Expert survey

3.1 Aims

In the expert survey, we investigated the scenario-based feedback tests with its scenarios and feedback statements in more detail. Experts in teaching and learning research were asked to evaluate the feedback scenarios and the feedback statements that we had used to assess ability feedback, effort feedback, and functional feedback. The aims of this survey were twofold: Looking back, we aimed to explain unclear results of the pilot study. Looking ahead, we aimed to derive ideas from the expert survey for further improvement of the feedback test.

3.2 Method

We conducted a brief online survey, which took about 15 min. Twelve experts in teaching and learning research completed the survey. They read and evaluated all scenarios and feedback statements that we used in the pilot study. For each scenario, the experts rated (a) students’ success and (b) students’ effort. For each feedback statement, they rated (c) ability-relatedness, (d) effort-relatedness, and (e) potential to foster motivation. As we were not interested in effects of students’ names, the scenarios just referred to “a student” (without name). (a) Students’ success was measured with the question: “How successful do you assume the student was in the described situation?” Participants rated the extent of the student’s success on a scale ranging from 1 (= not successful at all) to 6 (= very successful). (b) Students’ effort was measured with the question “How much effort do you assume the student has invested for getting into the described situation?” Participants rated the extent of the student’s effort on a scale ranging from 1 (= no effort at all) to 6 (= very much effort). (c) Ability-relatedness was measured with the item “The statement addresses students’ fundamental abilities”. Participants indicated their agreement on a scale ranging from 1 (= do not agree) to 6 (= agree very much). (d) Effort-relatedness was measured with the item “The statement addresses students’ effort”. Participants indicated their agreement on a scale ranging from 1 (= do not agree) to 6 (= agree very much). (e) Potential to foster motivation was measured with the item “This statement hinders/fosters students’ long-term motivation”. Participants rated the item on a scale ranging from 1 (= hinders motivation) to 6 (= fosters motivation).

3.3 Results

3.3.1 Scenarios

The results indicated that the scenarios were largely evaluated as intended (see Table 1). In the effort-fail scenarios, experts rated the student’s success below the mean of the six-point scale (Ms < 3.5) and the student’s effort above the mean. However, success ratings of the scenario effort-fail A were quite high (M = 3.2) and (descriptively) higher than success ratings of scenario B. In the without-effort-succeed scenarios, the experts rated the student’s success as high and effort below the mean of the scale. In the effort-succeed scenarios, experts rated both success and effort above the mean of the scale, but effort ratings were (descriptively) higher for scenario A than for scenario B. We determined intraclass coefficients (ICC, two-way random) to estimate consistency between the ratings. Interrater reliability was high for the success ratings (ICC = 0.98) and effort ratings (ICC = 0.90).

3.3.2 Feedback statements

Regarding ability feedback, expert ratings indicated that the six feedback statements clearly addressed students’ fundamental abilities, with mean ratings above M = 4.75 on the six-point scale. Yet, the experts’ evaluation of whether such feedback would foster students’ motivation was somewhat inconsistent. In line with previous research (e.g., Mueller & Dweck 1998; Rattan et al., 2012), experts rated five of the six ability feedback statements’ potential to foster motivation below the mean of the six-point scale (Ms < 3.5). In contrast, the statement “That’s a great result, I knew that you could do it!” was rated relatively positively (effort-succeed scenario B; M = 4.17; SD = 1.53). Perhaps, the focus on abilities was less obvious in this statement because it contained neither adjectives such as “smart” or “clever” nor ability-focused nouns such as “talent” or “strength”.

Regarding effort feedback, expert ratings indicated that the six feedback statements clearly addressed students’ effort, with mean ratings above M = 5.08 on the six-point scale. Based on the literature, we defined four of the statements – effort feedback when students fail despite effort or succeed without effort – as dysfunctional feedback; and two of the statements – effort feedback when students succeed with strong effort –as functional feedback. With regard to dysfunctional effort feedback, experts rated the potential to foster motivation below the mean of the six-point scale (Ms < 3.5) for three of the four statements. However, the statement “It’s great how much effort you put into writing the essay!” was rated relatively positively (effort-fail scenario A; M = 4.50; SD = 1.31). One possible explanation for this finding is that the associated scenario was somewhat ambiguous: The effort the student put into writing the essay in a neat manner seemed effective, whereas the effort put into achieving good content quality seemed ineffective. Given that praising effort is only dysfunctional in the latter case (see Dweck & Yeager 2019), experts might have interpreted the feedback in the first sense and therefore given the statement more positive ratings. With regard to functional effort feedback, experts rated the potential to foster motivation above the mean of the six-point scale (Ms > 3.5).

Regarding functional feedback, experts rated five of six feedback statements’ potential to foster motivation above the mean of the six-point scale (Ms > 3.6). Only in one of the effort-fail scenarios, experts rated the statement “That’s not yet quite as good as it could be. But if it were too easy, you wouldn’t learn anything” to have only a medium potential to foster motivation (effort-fail scenario A; M = 3.50; SD = 1.68). The statement was adapted from the growth mindset feedback tool (Mindset Works, 2016) and should foster students’ motivation by focusing on the process of learning (“not yet”) and encouraging students to take on challenges. Yet, the experts might have perceived the statement that the result was “not good” as demotivating.

We determined intraclass coefficients (ICC, two-way random) to estimate the consistency of the ratings. Expert ratings exhibited good interrater reliability for ability-relatedness (ICC = 0.94), effort-relatedness (ICC = 0.96), and for the statements’ potential to foster motivation (ICC = 0.89).

3.4 Discussion

The expert survey helped us understand inconsistent results of the pilot study and gave us ideas how to improve the feedback test. We summarize the findings according to the situation types: (1) effort-fail, (2) without-effort-succeed, and (3) effort-succeed. (1) The survey implied three problems regarding the scenarios describing how students failed despite strong effort. First, the student’s success was rated as fairly high in scenario A, although it was supposed to describe low success. The text described a student who has written an essay in a neat manner but with insufficient content quality. Perhaps, the text could be revised insofar that it emphasizes more the low quality and less the neat writing. Second, the effort feedback statement in this scenario was rated as relatively motivating, although it was supposed to represent dysfunctional feedback. This finding explains why the probability of providing this feedback statement was quite high in the pilot study . A revised test should emphasize the statement’s dysfunctionality, for example, by adding the word “nevertheless” (“It’s nevertheless great how much effort you put into writing the essay”) and thus making it more comforting. Additionally, when the scenario is changed in such a way that the ineffective effort is emphasized, this feedback statement will automatically become more meaningless. (2) Regarding the scenarios describing how students succeeded easily without strong effort, the expert survey approved the scenarios and feedback statements. These scenarios had also produced the clearest results in the pilot study. (3) Regarding the scenarios describing how students succeeded with strong effort, the expert survey uncovered two problems: First, effort ratings were higher for scenario A than for scenario B, Hence, when revising the test, the text in scenario B should put more emphasis on the effort that the student made. Second, the ability feedback statement in scenario B was rated as relatively motivating, although it was supposed to represent dysfunctional feedback. Again, this finding explains why the probability of providing this feedback statement was quite high in the pilot study . Instead of saying “I knew that you could do it!”, a revised statement should refer more explicitly to fundamental abilities or talents of the student.

4 General discussion

The central aim of our study was to test whether teachers would provide more dysfunctional feedback to students from immigrant backgrounds than to students from non-immigrant backgrounds. The pilot study indicated that dysfunctional feedback was mostly unaffected by students’ immigrant background. However, in one of two situations describing an immigrant student who succeeded easily without effort, teachers reported to give more dysfunctional ability feedback, dysfunctional effort feedback, and more inflated praise to this student, compared to a non-immigrant student. In a situation, in which a student failed despite strong effort, teachers were more likely to use simple language when this student was from an immigrant background as compared with a student from a non-immigrant background.

Regarding potential moderators, we found no influences of teachers’ growth mindset or prejudices, and just a single, unsystematic influence of multicultural beliefs: In one scenario, multicultural beliefs were associated with lower use of simple language when addressing students from an immigrant background. Thus, in this situation, teachers with stronger multicultural beliefs were less likely to use this questionable strategy. This result is in accordance with studies finding that multiculturalism reduces biased behavior and fosters positive intercultural interactions (e.g., Plaut et al., 2018; Rattan & Ambady, 2013). Further research is needed to investigate how teachers’ beliefs and assumptions shape their feedback to students from different ethnic backgrounds.

Two main implications arise from the findings. First, even a very small manipulation such as varying students’ names and stated countries of origin can evoke differences in teachers’ feedback preferences. Second, these differences in teachers’ dysfunctional feedback are situation-specific.

4.1 Feedback differences caused by students’ names

It is noteworthy that the feedback differences we observed resulted from a rather small and subtle manipulation. We kept all information about the situation and the student constant, while changing only the students’ names and countries of origin, for example by writing “Safija from Tunesia” instead of “Sophie”. A country of origin associated with low competence in Germany, such as Arab or Central Asian countries (Froehlich & Schulte, 2019), led to more dysfunctional feedback, at least in part. As in similar studies (e.g., Bonefeld et al., 2017; Holder & Kessels, 2017), the effect sizes were rather small. However, despite the small level of statistical significance, we see our findings as having practical significance because even occasional dysfunctional feedback can have detrimental effects on students’ learning (e.g., Brummelman et al., 2014; Meyer, 1992; Mueller & Dweck, 1998). Overall, the findings are in line with previous studies, in which student names were sufficient to activate group-specific stereotypes (e.g., Bonefeld et al., 2020; Bonefeld & Dickhäuser, 2018; Holder & Kessels, 2017; Sprietsma, 2013). Our results add to these studies by indicating that foreign-sounding names can distort not only teachers’ judgments but also their tendencies to provide dysfunctional feedback.

Certainly, teachers in real classrooms have much more information about their students than just their name and country of origin. One could thus argue that situations, in which teachers need to provide feedback to students about whom they have only little information, never occur in practice. Moreover, when teachers know their students, they are also better able to judge whether a feedback statement will be functional or dysfunctional for a specific student. However, it seems likely that stereotypes bias teachers’ thoughts and actions even when they know their students better. Teaching situations are cognitively demanding, as teachers need to deal with many aspects at the same time and often experience stress and time pressure (e.g., Maas et al., 2021). Such pressure can tie up the cognitive capacities teachers would need to judge and interpret situations thoroughly (e.g., Becker et al., 2020) – and when people act under pressure, they do not have the opportunity or capacity to control stereotype-based thoughts and actions (Chen & Chaiken, 1999; Fazio & Olson, 2014; Gilbert & Hixon, 1991). Situations in which teachers provide (verbal) feedback are also characterized by pressure because such feedback needs to be spontaneous, with little time to think about it beforehand. These situations seem to be prone to biases stemming from stereotypes rooted in salient student characteristics such as their name or country of origin (see also Reyna 2008).

4.2 Situation specificity of feedback differences

In the present study, we investigated three types of situations: situations in which a student failed despite strong effort, situations in which a student succeeded easily without effort, and situations in which a student succeeded with strong effort. Interestingly, the feedback differences we observed did not emerge equally across situation types, but were more pronounced for the first two types of situations than for the latter.

When a student failed despite strong effort, teachers used simpler language with a student from an immigrant background than with a student from a non-immigrant background. It was striking that the teachers engaged in this facet of dysfunctional feedback – simplification of language – only in this type of situation. To explain this finding, one should bear in mind that the “struggling student” situation confirmed negative stereotypes about students from immigrant backgrounds, whereas the other two situations disconfirmed stereotypes by describing successful students. Thus, this situation was the only one consistent with the prevalent stereotype that people from Arab countries possess lower competence and that students from immigrant backgrounds have more problems at school (see Froehlich & Schulte 2019; Hachfeld et al., 2012). Stereotype-confirming situations facilitate the activation of stereotypes (e.g., Fiske & Neuberg 1990), and are thus likely to trigger stereotype-based judgments. For example, Glock and colleagues (2016, 2018) found that teachers evaluated immigrant students’ language skills lower than the skills of non-immigrant students when these students performed below average (i.e., when they confirmed negative stereotypes) but not when they performed above average (i.e., when they disconfirmed the negative stereotype). Similarly, the struggling student in the present study, who confirmed negative stereotypes, might have activated stereotype-matching strategies teachers had available for supporting students from immigrant backgrounds, such as simplifying language. This strategy, which is common in the context of second language learning despite its potentially negative effects (see Crossley et al., 2012; Goodman & Freeman, 1993), might have been suppressed in the stereotype-disconfirming situations that described successful students.

In one of two scenarios describing students who succeeded easily without effort, teachers indicated that they would give more dysfunctional ability feedback such as “Well done! You are a very smart student!”, more dysfunctional effort feedback such as “Well done! You really did your best, like always!”, and more inflated praise such as “That’s incredible” or “I’m amazed” when addressing an immigrant student compared to a non-immigrant student. These findings are in line with previous studies demonstrating that ethnic minority students receive more positive feedback than ethnic majority students (e.g., Harber et al., 2010, 2012). Our study expands upon this research by showing that immigrant students are at risk of receiving not only more positive feedback, which might eventually have a negative impact, but also more feedback, which is known to have a negative impact – for example by reinforcing a fixed mindset or reducing self-esteem, motivation, and persistence (e.g., Brummelman et al., 2014; Mueller & Dweck, 1998). The tendency to provide more dysfunctional praise to a student from an immigrant background was specific for a situation when the student succeeded easily without effort. In contrast to the situation of the struggling student described above, this situation disconfirmed negative stereotypes. We consider two reasons why this situation led to increased dysfunctional feedback: First, people often devote special attention to stereotype-disconfirming information (e.g., Erber & Fiske 1984), overemphasize unexpected characteristics of the target person, and seek for explanations for the inconsistencies (see Glock 2016). In the present study, teachers might have attributed the immigrant student’s unexpected success, for example, to being exceptionally smart – thus leading to increased ability praise, or to having invested a lot of effort – thus leading to increased effort praise. In general, the special emphasis on student’s unexpected good qualities might have made the teachers more prone to provide overly positive, dysfunctional feedback. A second reason why we found these feedback differences particularly in situations when students succeeded without effort may be that these situations referred to female students, whereas the other situations referred to male students. Previous research has revealed that teachers evaluate immigrant girls quite favorably, for example, regarding their motivation and achievement at school (Glock & Kleen, 2019; Kleen & Glock, 2018). Hence, the described situation might have been stereotype-confirming insofar as teachers have high expectations for female students from immigrant backgrounds. Future studies are needed to disentangle the interplay between gender stereotypes and ethnic stereotypes with respect to dysfunctional feedback (see limitations).

4.3 Limitations and directions for future research

The study has several limitations. First, it is an open question whether our findings based on written scenarios and default feedback scales generalize to a real classroom setting. Text-based measures have been used in similar studies (e.g., Bonefeld & Dickhäuser 2018; Glock & Kleen, 2019; Holder & Kessels, 2017) and allow researchers to monitor closely which information is given to participants. For example, in order to manipulate students’ immigrant status in a text, the student’s name can simply be varied, whereas in a video or live setting, much more unintended and distracting information about the students would be available, such as appearance, likeability, or language skills. The present study’s lower external validity thus goes hand in hand with increased internal validity. With respect to teachers’ feedback, we are aware of the great distance from survey response to actual behavior. In similar studies (e.g., Harber et al., 2010, 2012; Nishen & Kessels, 2022), participants were also in an artificial, experimental situation, but were at least told that their feedback would be returned to the student. The feedback assessed in these studies was thus more authentic than in the present study, in which participants did not believe that their feedback would be communicated to anyone outside the research team. Further research is needed to investigate whether the feedback preferences in our survey are actually related to feedback preferences in practice. To make the feedback more authentic, participants should be told that their feedback would be communicated to the students. Furthermore, it would be interesting to examine whether feedback differences by immigrant status can be observed in real classrooms, for example by analyzing videos from authentic teaching situations.

A second limitation is that our study did not explicitly address the issue of gender. Research on gender stereotypes has a long tradition, and there is no doubt that gender stereotypes affect teachers’ thoughts and actions as well as ethnic stereotypes. For example, teachers judge boys from immigrant backgrounds less accurately than girls of all backgrounds and boys from non-immigrant backgrounds (Bonefeld et al., 2020). In our study, we included both male and female students, and kept gender constant within each situation type. On the one hand, this design has the advantage that the manipulation of immigrant status was not confounded with gender differences, and thus feedback differences could be clearly traced back to students’ immigrant status. On the other hand, this design has the disadvantage that gender was confounded with situation type. Thus, if we were to analyze gender differences in dysfunctional feedback, we would actually be analyzing differences between situation-specific feedback statements. However, we conducted exploratory analyses regarding feedback style because these variables (inflated praise and simple language) were coded similarly across the situation types. Teachers used more inflated praise when addressing a girl in the without-effort-succeed scenario than when addressing a boy in the two other scenarios. This finding may imply that the without-effort-succeed situation triggered more inflated praise generally, but might also indicate that teachers praise female students more effusively than male students – which would correspond with the finding that teachers often judge female ethnic minority students quite positively (e.g., Glock & Kleen 2019; Kleen & Glock, 2018). Moreover, teachers used simpler language when addressing a boy who succeeded with strong effort than when addressing a girl who succeeded without effort. Again, this difference might have merely been due to the different situation, but could also imply gender differences: In line with research on the academic performance of immigrant girls and boys (e.g., Dronkers & Kornder 2014), teachers may assume that male immigrants may have more language problems than female immigrants, and therefore use simpler language when providing feedback to immigrant boys. To shed more light on these assumptions, future studies should vary gender as an additional experimental factor and analyze gender-related feedback differences systematically.

Third, our manipulation of immigrant status by student names creates some difficulties. Names do not only imply a certain origin but also evoke many other associations, for example, about the students’ socioeconomic status – which can have a considerable impact on teachers’ expectations (e.g., de Boer et al., 2010; Elhoweris, 2008; Tobisch & Dresel, 2017). As being an immigrant in Germany is often connected with a lower social status (see, e.g., Beyer 2017), our manipulation of students’ immigrant background might have also triggered unfavorable assumptions about their social situation. Further research is needed to investigate effects of perceived immigrant status and perceived socioeconomic status on dysfunctional feedback separately. Furthermore, our choice of countries for implying an immigrant background (Syria, Tunisia, Afghanistan, and Morocco) was challenging because names from these countries are rather unfamiliar in Germany and, to our knowledge, there are no studies investigating names from these countries systematically. Most studies, which are similar to the present study, addressed people from Turkey as they are the biggest minority group in Germany (see Froehlich & Schulte 2019). Thus, there is a broad data base about effects of Turkish names (e.g., Kleen & Glock 2020). In contrast to these studies, we decided to focus on students that experience the most negative competence stereotypes, even more negative than Turkish students. These were, according to Froehlich and Schulte (2019), the four countries mentioned above. Names were then chosen on the base of popularity, and information about the country was added to make sure that participants thought of the correct origin. One might argue, however, that this information evoked specific associations about when the student arrived in Germany – “Safiya from Tunesia” implies that the student came just a short time ago. A systematic pretest would be necessary to find names that are clearly assigned to the correct country or at least the correct region.

Fourth, the instruments that we used to measure feedback and beliefs should be further improved. Regarding the feedback test, the suggestions that we derived from the expert survey need to be implemented to ensure that all scenarios and feedback items work as intended (see discussion of the expert survey). Systematic errors could be reduced by assigning the feedback items randomly to the scenarios. In the present study, we phrased items that represented the same feedback type always a bit different so that they fit exactly the described situation. These different formulations, however, worked differently, as became apparent in the expert survey. Therefore, follow-up studies should use formulations that fit both scenarios of a situation type, and assign them randomly to the scenarios. Furthermore, the order of the scenarios should be randomized to rule out sequence effects. Also, the feedback test should be tested on a student sample. Hence, instead of asking experts how motivating a feedback comment would be for a student, one could ask students directly how they perceive the feedback. Such a survey should be done before using the feedback test in a next study. Regarding the belief measurements, it is questionable whether beliefs, which are supposed to be rather subconscious, can be assessed with self-report measures – as often done in educational research. Implicit or qualitative measures might be preferable (see, e.g., Lüftenegger & Chen 2017). For example, the prejudice scale might be replaced by an implicit association test of stereotypes (Greenwald et al., 1998). The growth mindset questionnaires further had the disadvantage that it assessed just general beliefs about abilities. In the given school context, it would be useful to assess teachers’ growth mindset specifically about students’ abilities.

Fifth, follow-up studies might address the conditions under which feedback differences arise. Stereotypic reactions are less likely when people have the time and motivation to control their otherwise automatic thoughts and actions (e.g., Fazio & Olson 2014; Fiske & Neuberg, 1990). With regard to time, the participants in our study could take as long as they needed. Setting a time limit for the answers would resemble classroom situations in which teachers usually act under time pressure, and might strengthen the effects of the experimental manipulation. With regard to motivation, people’s goals influence the extent to which evaluations are shaped by stereotypes. For example, the goal of communicating one’s judgment to a third party usually leads to more individualized judgments (see effects of accountability; Pendry & Macrae 1996). It would be interesting to vary the goals that teachers have in mind when they provide feedback to students from immigrant and non-immigrant backgrounds. We assume that teachers, who are motivated to provide good feedback, for example, because they aim to justify their decision to an expert or a colleague, are less likely to reveal stereotype-based feedback preferences.

4.4 Conclusion

Students from immigrant backgrounds often face challenges and inequalities in the destination country’s educational system. To support such students’ academic development, it is important to uncover systematic inequalities. In this pilot study, we investigated whether teachers were more likely to provide dysfunctional feedback to students from immigrant backgrounds compared to non-immigrant backgrounds. Despite its small effects, the study has a twofold significance for the field: First, we provide tentative signs that students from immigrant backgrounds might be at risk of receiving more dysfunctional feedback. Previous studies indicated that these students often receive more positive feedback, which might have positive or negative effects on students (e.g., Harber et al., 2012; Nishen & Kessels, 2022). Adding to these studies, we were able to show that students from immigrant backgrounds might also receive more dysfunctional feedback – that is, feedback that actually has a negative impact on learners’ motivation – such as feedback, which fosters a fixed mindset or communicates low expectations. These findings are particularly worrying because ethnic minority students are very sensitive to the expectations that teachers communicate directly or indirectly (e.g., Major et al., 2016; McKown & Weinstein, 2002; Steele & Aronson, 1995). Second, the study presents a new instrument to measure dysfunctional feedback. This instrument is practice-oriented, standardized, and economic, and thus suitable for teachers who often have little time for participating in research. The instrument appears to be sensitive to stereotypical bias and – after thorough revision – can thus be used to investigate systematical influences on dysfunctional feedback. Overall, we regard the study as a valuable starting point for future research on teachers’ feedback preferences.