Introduction

Teacher expectations and the impact of those expectations on students’ subsequent academic performance have been investigated for five decades, starting with Rosenthal and Jacobsen’s (1968) ground-breaking experimental study, Pygmalion in the Classroom. That study demonstrated that when teachers expected students to perform at a high level, students tended to confirm this expectation. This phenomenon became known as the self-fulfilling prophecy, as originally defined by Merton (1948). Pygmalion in the Classroom marked the beginning of a flourishing tradition of investigating teachers’ expectations in regular classroom settings (Wang et al., 2018). In this field, the term teacher expectations has been described as follows: “Expectations are primarily cognitive phenomena, inferential judgments that teachers make about probable future achievement and behavior based upon the student’s past record and his present achievement and behavior” (Brophy & Good, 1974, p. 129).

Fifty years of research has provided a substantial body of evidence demonstrating that teachers can indeed develop differential expectations for their students (e.g., Dusek & Joseph, 1983; Timmermans et al., 2015). Various studies have demonstrated that negative achievement stereotypes and lower teacher expectations exist for students in minority groups (Wang et al., 2018). This holds, for example, for African-American and Latino students in the USA (e.g., Hughes et al., 2005; McKown & Weinstein, 2008; Ready & Wright, 2011), for aboriginal students in Canada (e.g., Corenblum et al., 1997; Fitzpatrick et al., 2016; Riley & Ungerleider, 2008), for Māori and Pacific Island students in New Zealand (e.g., Meissel et al., 2017; Turner et al., 2015), and for students with immigration backgrounds in Europe (e.g., Holder & Kessels, 2017; Tobisch & Dresel, 2017; van den Bergh et al., 2010). Teachers express those expectations in differential treatment towards students and classes (e.g., Weinstein, 2002), and children perceive and appraise differential teacher expectancy behavior (e.g., Babad et al., 1989; Weinstein & Middlestadt, 1979). These differential expectations are confirmed in generally small to moderate self-fulfilling prophecies on student academic achievement (e.g., Jussim & Harber 2005; Rubie-Davies 2008).

Two developments in teacher expectation research formed the basis for the current study. First, a recent development, noted in the Wang et al. (2018) review, is that researchers have begun investigating the self-fulfilling prophecy effects of teacher expectations on a variety of psycho-social outcomes (i.e., self-concept, motivation, interest), in addition to the effects on academic achievement (e.g., Gilbert et al., 2014; Karwowski et al., 2015; Rubie-Davies et al., 2020; Upadyaya & Eccles, 2015). Second, over time, researchers started to realize that there were important moderators that could influence the size of teacher expectation effects (e.g., Brophy, 1983; Jussim & Harber, 2005), mostly with reference to groups of students who appeared to be more vulnerable or susceptible to teacher expectations. Unfortunately, empirical studies testing whether stigmatized groups of students (for example, girls in mathematics and minority background students) are more vulnerable to self-fulfilling prophecies are still scarce (McKown & Weinstein, 2002). The aim of the current study was to combine these two developments by investigating whether gender and minority background were moderators of teacher expectation effects for both academic outcomes and a variety of psycho-social outcomes (self-concept, utility value, intrinsic value) within the domain of mathematics.

Development 1: Teacher expectation effects on psycho-social outcomes

In the past 10 years, research into the self-fulfilling prophecy effects of teacher expectations on psycho-social variables has increased considerably (e.g., Boerma et al., 2016; Chen et al., 2011; Gilbert et al., 2014; Karwowski et al., 2015; Pesu et al., 2016; Woolley et al., 2010). In line with Expectancy-Value Theory (Eccles, 1983), the evaluations of significant others (e.g., teachers, parents, classmates) as well as reinforcements of one’s behavior by those significant others (e.g., Bong & Skaalvik, 2003; Gniewosz et al., 2014; Shavelson et al., 1976) may result in students’ self-perceptions being affected. Positive evaluations by a student’s teacher, felt and regarded as support and acceptance, could lead the student to evaluate themselves more positively, resulting in a more favorable self-concept (Liu & Wang, 2008). According to Expectancy-Value Theory, children’s perceptions of the expectations and attitudes of socializers may also affect their subjective task values (Wigfield & Eccles, 1992, 2000). This includes students’ interest in a particular task or domain, or how they appraise a task or domain in terms of utility.

Numerous studies have found positive associations between teacher expectations and (subsequent) student self-perceptions, including self-efficacy (e.g., Chen, 2006; Gilbert et al., 2014; Karwowski et al., 2015; Tyler & Boelter, 2008; Vekiri, 2010) and self-concept (e.g., Benner & Mistry, 2007; Blöte, 1995; Chen et al., 2011; Jussim, 1989; Liu & Wang, 2008; Pesu et al., 2016; Urhahne et al., 2011). However, many studies investigating the association between teacher expectations and student self-concept have relied on cross-sectional data or have failed to control for measures of initial levels of achievement and self-concept. Therefore, it is not possible to disentangle teacher expectation effects on these outcomes from potentially higher initial expectations for more confident or efficacious students (Timmermans et al., 2016).

Stronger evidence that teacher expectations affect self-concept of students can be derived from a paucity of studies in which initial self-concept was considered. By comparing changes in self-concept across a year of students who were in classes with high, average, and low expectation teachers, Rubie-Davies (2006) found students’ self-concept changed to fall in line with their teachers’ expectations. Moreover, using latent growth curve models, Upadyaya and Eccles (2015) investigated whether teacher expectations predicted student self-concepts of ability in reading and mathematics. Teacher expectations predicted both students’ concurrent and subsequent self-concept in those two academic domains, even after students’ achievement and general verbal intelligence were controlled for.

However, the number of studies focusing on the effects of teacher expectations on students’ subjective task values (intrinsic and utility value) remains small. The common finding seems to be that teacher expectations are associated with more favorable outcomes on a variety of subjective task values (e.g., Boerma et al., 2016; Gilbert et al., 2014; Woolley et al., 2010). Significant and positive correlations have been found between teacher expectations and utility value (i.e., usefulness of mathematics; e.g., Benner & Mistry, 2007; Boerma et al., 2016; Gilbert et al., 2014; Lazarides & Watt, 2015) and intrinsic value (i.e., mathematics interest; Woolley et al., 2010). Again, most studies investigating an association between teacher expectations and subjective task value have suffered from a lack of baseline measures.

Development 2: Gender and minority background as moderators of teacher expectation effects

Studying student groups who are particularly vulnerable to teacher expectation effects may add to our knowledge of how teacher expectations contribute to educational inequality. Several mechanisms have been considered to explain why teacher expectations may have stronger effects for particular student groups. First, Attributional Ambiguity Theory (Crocker & Major, 1991) describes why being a member of a stigmatized group may be related to different responses to teachers’ expectations (McKown & Weinstein, 2002). Students who are members of academically stigmatized groups (for example, girls in mathematics) may interpret behavioral cues about low teacher expectations differently from students who are members of non-stigmatized groups. For example, a girl who perceives that her teacher expects low math performance may wonder whether the teacher’s belief is based on her individual ability or on the teacher’s general belief that girls are not good at mathematics. Attributing the teacher’s expectation to her own ability may erode her confidence, and subsequently negatively affect performance. Attributing the teacher’s expectation to the teacher’s stereotype may protect the student’s self-esteem, but may also lead to disengagement from schooling, which may eventually erode performance (Crocker & Major, 1991). Whatever the girl’s attribution about the teacher’s low expectation, cues about low expectations may have a more deleterious impact on members of stigmatized groups. Children’s responses to high expectations may depend on group membership as well (McKown & Weinstein, 2002). Members of stigmatized groups may mistrust and discount positive feedback when it is perceived as arising from sympathy for a stigmatized social identity, rather than from merit (Crocker & Major, 1991), leading positive expectations to be less beneficial for members of stigmatized groups.

A second explanation stems from generally lower teacher expectation accuracy for stigmatized students (Jussim et al., 1996). The more inaccurate an expectation, the larger its potential to create self-fulfilling prophecies (Jussim & Harber, 2005). Glock et al. (2015) showed that teacher judgments were less accurate for ethnic minority students than for ethnic majority students. Teachers felt less confident about the judgments they made for ethnic minority students and under- and overestimation of ethnic minority students were due to a less accurate encoding of the information about ethnic minority students compared to ethnic majority students. In particular, information about the grades of ethnic minority students was not strongly encoded by teachers.

Starting from the assumption that stigmatized students, or students who feel devalued in education, may be particularly vulnerable, Jussim and colleagues (Jussim et al., 1996) empirically tested student gender, SES, and ethnicity as moderators of teacher expectation effects on mathematics achievement in a sample of US middle school students. They showed that teacher expectation effects were more powerful among girls, students from lower socioeconomic backgrounds, and African-American students. Later, McKown and Weinstein (2002) showed that, among a sample of US primary school students and their teachers, gender and minority background moderated the effects in mathematics, but not in reading. In particular, when considering mathematics, girls and African-American students were more likely to confirm low expectations and less likely to benefit from high teacher expectations. Jamil and colleagues (Jamil et al., 2018) also showed that teacher expectation effects were gender specific, as effects were stronger for White girls, minority girls, and minority boys than they were for White boys.

The previous moderator effects of gender and minority background may not only hold for general achievement or mathematics. For example, girls have been found to be more susceptible to teacher expectation effects on their creativity (Karwowski et al., 2015) and reading motivation (Boerma et al., 2016). In contrast, no significant moderation effects of gender were found on the long-term effects of teacher expectations on students’ educational careers (De Boer et al., 2010).

The current study

The current study aimed to investigate whether gender and minority background were moderators of teacher expectation effects for both academic outcomes and self-concept and subjective task value (psycho-social factors) in the mathematics domain, in a sample of intermediate school students (Grades 6 and 7) in New Zealand. This study adds to the current knowledge base by empirically studying moderator effects of both gender and minority background. Additionally, whereas previous research mostly investigated self-concept only, we investigated the association between teacher expectations and students’ self-concept as well as their subjective task values (intrinsic and utility value). The following research questions guided our research:

  1. 1.

    To what extent are beginning-year teacher expectations associated with gender, minority background, beginning-year achievement in mathematics, and students’ self-concept, utility value, and intrinsic value?

  2. 2.

    To what extent are beginning-year teacher expectations associated with students’ end-of-year mathematics achievement and students’ self-concept, utility value, and intrinsic value?

  3. 3.

    To what extent is the association between beginning-year teacher expectations and students’ end-of-year mathematics achievement and students’ self-concept, utility value, and intrinsic value moderated by gender and minority background?

The association between the research questions and the function of the variables is depicted in Fig. 1.

Fig. 1
figure 1

Association between research questions and function of relevant variables

Method

Context of the primary data collection

The data analyzed for this study were collected as a part of a larger research project examining relations between student and teacher beliefs in New Zealand (e.g., Meissel & Rubie-Davies, 2015; Rubie-Davies & Peterson, 2016; Timmermans & Rubie-Davies, 2018). The New Zealand compulsory education sector is comprised of primary and secondary components. Students attend primary school from Year 1 to Year 8 (aged 5–12 years), with intermediate schools catering for Years 7 and 8 (Grades 6 and 7). Thereafter, students move to the secondary system which caters for Years 9 to 13. All New Zealand schools are self-governing, which means that a board comprised of the principal, a staff member, and several community members plays a role in the governance of the school. Most New Zealand primary students attend schools in their local area.

Intermediate school enrolment in urban and suburban Auckland, where the study took place, ranges from approximately 250 students to just over 1000. All schools in Auckland (the largest city) are ethnically diverse. In Auckland, 34% of intermediate-age students are New Zealand (NZ)/European, 17% are Māori, 21% are Pasifika, and 23% are Asian (Education Counts, 2021). As in many other Western societies, the non-dominant groups (in this case Māori and Pasifika students) achieve at lower levels than New Zealand/European and Asian students and they also tend to be located more frequently in schools situated in low socioeconomic areas (e.g., Bishop & Berryman, 2006; Hattie, 2008).

Sample and participants

Intermediate-level students were selected as participants in preference to younger primary school students because of evidence that questionnaire responses of older students are more reliable and valid than those of younger students (Rubie-Davies & Hattie, 2012). A list of all intermediate schools in one geographical Auckland area was downloaded (https://www.educationcounts.govt.nz/data-services/directories/list-of-nz-schools). Then, one school in each of a high-, middle-, and low-income area was randomly selected to participate. The principal of the first school in a middle-income area refused to take part and so another middle-income school was then randomly selected. As a result, the three intermediate schools participating in the study were located in different areas of Auckland with different student populations. Teachers in each of the schools (n = 72) were then approached about being part of the study and, of those, 18 declined to participate (teacher level response rate 75%).

Students were included in this study if the beginning-year teacher expectation for the student was available (n = 1663 students in n = 42 classes). Of these students, 51% were boys and 49% were in Year 7. Students were aged from 10 (1%) to 13 (5%) although most were 11 (40%) or 12 (54%). In relation to ethnicity, 38% were NZ European, 12% Māori (the indigenous group), 27% Pasifika (originating from one of the Pacific Islands), and 21% Asian (originating from South-East Asia).

Procedure and data collection

Following ethical approval for data collection by the University of Auckland, principals in three schools agreed to their teachers and students participating. Teachers and students participated voluntarily. Parent consent and student assent were sought for potential student participants. No parents or students declined to be part of the study.

Data were collected from teachers and students at the beginning and end of 1 school year. Three weeks into the academic year, in the absence of school records and away from their classroom, teachers completed a teacher expectation scale for all their students. Raudenbush’s (1984) meta-analysis established that teachers form their expectations early in the school year, normally within the first weeks, and, after that time, expectations are assumed to remain relatively stable. One week later, students completed standardized mathematics tests. The tests were couriered to each class, teachers administered the tests, and then, they were returned to the researchers who marked them. At the beginning of every test was a very clear protocol with explicit instructions, which teachers read aloud to the students. This helped to ensure consistent delivery across classes. At the end of the year, a similar mathematics test was administered following the same procedure. A researcher administered the student questionnaire to each class; at the same time, the teacher completed their questionnaire. A research assistant was on hand to assist if any students had difficulties completing the questionnaire.

Instruments and variables

All items, factor loadings, and reliability indices of the scales from the teacher and student questionnaires described in this section are presented in Table 1.

Table 1 Items, factor loadings, and reliability indices

Mathematics achievement

Student mathematics performance was assessed at the beginning and end of the academic year using e-asTTle mathematics (Electronic Assessment Tools for Teaching and Learning). e-asTTle is a standardized mathematics test used in New Zealand with Years 4–12 students (aged 8–16 years). The e-asTTle system can create tests of varying lengths, at different curriculum levels, assess different aspects of the curriculum, and be completed either online or in a paper-and-pencil version. All items were pre-calibrated in national norming trials using item response theory (Embretson & Reise, 2000), which means that students can be expected to score similarly, no matter which e-asTTle test they are given. Therefore, scores can be compared across classes, schools, and year levels. Once a test has been created, e-asTTle has the facility to generate a comparable test, at a later time. Thus, the tests that students took at the beginning and end-of-year were not identical as they consisted of different items but scores of these two tests could be transformed to a single underlying latent scale allowing the possibility to compare beginning with end-of-year scores. Using non-identical tests avoided practice effects.

In consultation with the deputy principals of the schools involved, a 40-min mathematics test was created that included items ranging from Levels 2 to 6. The levels related to the New Zealand curriculum levels. Students spend approximately 2 years at each curriculum level. Hence, average Year 7 and 8 students would normally be working at Level 4. At both the beginning and end-of-year, the tests included items related to number knowledge, number sense, and algebra. All students completed the tests in paper-and-pencil form, and the tests were then marked online in the e-asTTle system. Total scores for mathematics can range from 1100 to 1900 points. In the current study, scores at the beginning of the year ranged from 1226 to 1845 (M = 1500.11, SD = 92.97), and at the end of the year 1271 to 1845 (M = 1544.19, SD = 92.14). To be able to include both Year 7 and 8 students in a simultaneous analysis, the e-asTTle scores were standardized by first subtracting the student scores from the Year 7 and 8 national means (available for every 3 months).

Teacher expectations

Teachers provided their expectations in mathematics for each student at the beginning of the academic year. Teacher expectations were assessed using a 1–7 Likert five-item scale. This scale was developed specifically for the current project (Rubie-Davies & Peterson, 2016) to avoid the use of just one item to assess expectations and enable reliability estimates to be calculated. In relation to the five-item scale, teachers provided (1) a judgment in relation to mathematics of where students were currently achieving; (2) the level in mathematics they predicted students would achieve by end-of-year; (3) whether they predicted students would receive a good initial school report; (4) the degree to which they believed the student would be successful in their class; and (5) the degree to which they thought the student would have a successful school career. The scale showed good reliability at the beginning of the year.

Self-concept in mathematics

Self-concept was measured in a student questionnaire using a 1–5 Likert five-item scale, adapted from Wigfield and Eccles (2000). An example item was “Compared to your other school subjects, how good are you in math?” The scale showed good reliabilities both at the beginning and end-of-year.

Intrinsic value

Students’ interest in mathematics was used as a measure of intrinsic value and was measured in a student questionnaire using a 1–5 Likert three-item scale, derived from Wigfield and Eccles (2000). An example item was “I find working on math activities interesting.” The scale showed good reliabilities both at the beginning and end-of-year.

Utility value

Students’ perceived value of mathematics was used as a measure of utility value and was measured in a student questionnaire using a 1–5 Likert three-item scale, derived from Wigfield and Eccles (2000). An example item was, “I will use math in many ways when I grow up.” The scale showed sufficient reliabilities both at the beginning and end-of-year.

Gender and minority background

Regarding gender, boys were the reference group. For minority background, NZ European students were used as the reference group for comparisons with Māori, Pasifika, Asian, and students with other minority backgrounds.

Descriptive statistics are presented in Table 2. Given Bulmer’s (1979) guidelines teacher expectations and utility value were somewhat skewed to the left. All other variables showed approximately normal distributions.

Table 2 Descriptive statistics of the core variables at the beginning and end of the school year

Analytic strategy

Missing values

Collecting data through multiple questionnaires and at several moments during the school year inevitably leads to incomplete records. Of all values, 96.4% were observed; however, missing values were distributed over 355 (21.4%) students. Incomplete records were mostly due to missing values in the mathematics tests either at the beginning (14.3%) or end (14.1%) of the school year. Regarding the other variables, the percentage of missing values ranged between 0.0% (gender, minority background, teacher expectations) and 3.9% (end-of-year intrinsic value). The pattern of missing values was not completely at random; Little’s MCAR test χ2 (93) = 200.99, p < 0.001. The exact mechanism of the missing values is unknown; however, for both mathematics at the beginning and end-of-year, the pattern of missing values depended on the observed values of the other variables included. We therefore assumed that the values were missing at random and continued with the analyses of complete cases given that the sample size was still sufficient. For a full overview of missing values for mathematics test scores, see Table 3.

Table 3 Overview of the association between missing values in mathematics achievement at the beginning and end of the year in relation to other observed variables

Multilevel modeling

Given the nested structure of the data with students nested within classes (e.g., Snijders & Bosker, 2012), two-level hierarchical regression modeling with students at Level 1 nested within classes at Level 2 was conducted using MLwiN 3 software (Charlton et al., 2020). The school level was not included in the multilevel model because the data were gathered at only three intermediate schools, which was an insufficient number to be included as a hierarchical level. Given that the research question related to associations at the student level, we did not expect that omitting the school level (potential Level 3) would impact the significance testing. Effects of ignoring a hierarchical level on the standard errors are almost exclusively found at the ignored and the adjacent levels (Van den Noortgate et al., 2005). Nevertheless, to take into account that students attended three intermediate schools we created dummy variables that were included as fixed effects.

For all multilevel models, continuous predictor variables were centered around the grand mean (Enders & Tofighi, 2007). Unstandardized regression coefficients are reported in the tables. Standardized coefficients were derived by multiplying the unstandardized regression coefficient by the standard deviation of X and dividing by the standard deviation of Y (Snijders & Bosker, 2012).

To answer the first research question, three linear multilevel regression models were estimated. In all models, teacher expectations served as a continuous dependent variable. First, an empty model with teacher expectations as the dependent variable was estimated to assess the proportion of variance in expectations at the teacher level (Model 0). Second, a model was estimated in which beginning-year mathematics achievement, gender, and minority background were included as predictor variables (Model 1). Additionally, and presented in Supplementary Files S1, whether the extent to which expectations were dependent by gender and minority background differed per class was tested. This was investigated by allowing random slopes at the class level for gender and minority background (Models 1A–1E). By means of the random slopes, it is possible to investigate whether the coefficient of a predictor variable varies among classes. We tested whether the random slopes improved model fit on a one-by-one basis (Hox et al., 2017). In case the random slopes led to a significant improvement of model fit, differences between classes in intercepts and slopes are presented by means of 95% coverage intervals (Leckie, 2013).Footnote 1 Third, a model was estimated in which students’ beginning-year self-concept, utility value, and intrinsic value were added as predictor variables (Model 2).

For the second research question, a multivariate multilevel regression model with students (Level 1) nested within classes (Level 2) was estimated using the end-of-year measures of mathematics achievement, students’ self-concept, utility value, and intrinsic value as continuous dependent variables. The analyses were conducted in two steps. First, for each dependent variable, the same set of predictor variables was used that included the control variables gender and minority background and the beginning-year mathematics achievement, self-concept, utility value, intrinsic value, and teacher expectations (Model 3). Including these beginning-of-year variables is important to exclude them as potential alternative explanations for effects of teacher expectations on end-of-year scores. The variable of interest was the predictor teacher expectations. We allowed the multilevel regression model to estimate separate coefficients for the predictor variables for each of the four dependent variables, thereby allowing, for example, that expectations were significantly related to some, but not all of the dependent variables. For the third research question, Model 3 was expanded with interaction terms between the predictor variables teacher expectations and gender and between teacher expectations and minority background to test for moderation effects (Model 4).Footnote 2

Results

Bivariate associations

Bivariate associations between the continuous variables are presented in Table 4, and between-group differences for gender and minority background are presented in Table 5. The teachers’ beginning-year expectations were positively related to all other beginning-year variables, but most strongly with students’ beginning-year mathematics achievement (r = 0.355, n = 1425, p < 0.001) and students’ self-concept (r = 0.285, n = 1661, p < 0.001). Moreover, significant differences were observed between the teachers’ expectations for boys and girls (t = -3.61, df = 1650, p < 0.001), with more positive expectations for girls (M = 5.16, SD = 1.10) compared with boys (M = 4.95, SD = 1.26). With respect to students’ minority background, significant differences were also observed between groups (F(4, 1658) = 42.11, p < 0.001), with the highest teacher expectations for students with Asian (M = 5.51, SD = 1.18) and NZ European (M = 5.26, SD = 1.13) backgrounds and the lowest for students with a Māori (M = 4.54, SD = 1.22) background.

Table 4 Correlation table (Pearson) for the core variables
Table 5 Descriptive statistics of the core variables split for categories of gender and minority background

Similar bivariate associations were observed at the end of the year. The teachers’ beginning-year expectations were significantly positively correlated with all end-of-year variables. Again, the strongest correlations were found between expectations and end-of-year mathematics achievement (r = 0.331, n = 1429, p < 0.001) and students’ self-concept (r = 0.307, n = 1609, p < 0.001). Moreover, strong correlations were observed between the beginning and end-of-year measurements of the same variables.

Predicting beginning-of-year teacher expectations (research question 1)

The results of the multilevel regression models for predicting beginning-year teacher expectations are presented in Table 6. From Model 0, the empty model, it appears that 22.1% of the variance in teacher expectations was associated with the teacher level. This showed that hierarchical modeling was necessary and that the largest part of the variance could potentially be explained by variables at the student level.

Table 6 Results of two-level multilevel models for predicting beginning of the year teacher expectations

In Model 1, the control variables were included as predictors of teacher expectations. By including beginning-year mathematics achievement, gender, and minority background, the fit of the model was substantially improved compared with Model 0: Δχ2(6) = 920.798, p < 0.001. In general, higher expectations were associated with better mathematics achievement (b = 0.003, β = 0.232, t(1415) = 10.06, p < 0.001). Moreover, after taking performance into account, expectations seemed higher for girls compared with boys (b = 0.224, β = 0.188, t(1415) = 4.23, p < 0.001). Regarding minority background, compared with the NZ European students, significantly higher expectations were found for students with an Asian background (b = 0.352, β = 0.296, t(1415) = 4.76, p < 0.001), and lower expectations for students with a Māori (b =  − 0.585, β = 0.492, t(1415) = 5.79, p < 0.001), Pasifika (b =  − 0.289, β = 0.243, t(1415) = 3.25, p = 0.001), or other backgrounds (b =  − 0.397, β = 0.509, t(1415) = 2.50, p = 0.013).

An additional series of five random slopes models were estimated to assess whether the association between teacher expectations and gender and between teacher expectations and minority background varied between classes. The full models can be found in Supplemental Materials S1. Adding random slopes for gender and minority background mostly resulted in non-significant results with the exception of Model 1B with random slopes for the association between expectation and Māori background: Δχ2(2) = 9.900, p = 0.007. Assuming a normal distribution of between-class differences, 95% of the classes are expected to lie in the range of − 1.122 and − 0.048 with respect to the difference in expectation for students with a Māori background compared with NZ European students.Footnote 3 This finding implies that in some classes the differences in expectations for students with a Māori background compared with NZ European students were up to 1 point on the 7-point Likert scales, whereas in other classes, the differences were very close to 0.

Students’ self-concept, utility value, and intrinsic value were added as predictor variables in Model 2, which, compared with Model 1, led to a significant improvement in model fit: Δχ2(3) = 239.809, p < 0.001. After taking the previous control variables into account, higher expectations were found for students with higher initial self-concept in the domain of mathematics (b = 0.439, β = 0.354, t(1405) = 12.91, p < 0.001), and lower levels of interest in mathematics (b =  − 0.092, β = 0.077, t(1405) = 2.63, p < 0.009). The latter is a relatively small effect, because each standard deviation increase in students’ mathematics interest is associated with 0.077 expected standard deviations increase in teachers’ expectations.

General teacher expectation effects (research question 2)

The results of the multivariate multilevel model (Model 3) in which teachers’ expectations as measured at the beginning of the year predicted end-of-year levels of mathematics achievement, self-concept, intrinsic, and utility value are presented in Table 7. For the dependent variable end-of-year mathematics achievement, only two significant associations were observed, which were beginning-year mathematics achievement (b = 0.794, β = 0.808, t(1334) = 46.71, p < 0.001) and teacher expectations (b = 3.210, β = 0.077, t(1334) = 2.51, p = 0.006). Both associations were positive indicating that higher achievement at end-of-year was observed if the student performed well at beginning-of-year and to a much smaller extent when the teachers’ expectation at beginning-of-year was relatively high.

Table 7 Results of multivariate multilevel models for testing teacher expectation effects on end of the year mathematics achievement, self-perception, intrinsic, and utility value

For the dependent variable self-concept, three significant positive associations and one significant negative association were found. Positively related to students’ end-of-year self-concept were their beginning-of-year self-concept (b = 0.711, β = 0.704, t(1334) = 35.55, p < 0.001) and to a smaller extent for beginning-of-year intrinsic value (b = 0.069, β = 0.070, t(1334) = 3.45, p < 0.001), and the teachers’ expectations (b = 0.077, β = 0.079, t(1334) = 5.13, p < 0.001). This latter finding implied that higher teacher expectations at beginning-of-year were somewhat predictive of higher self-concept of students at end-of-year. Compared with boys, and after controlling for the other variables in the model, girls had lower self-concept (b =  − 0.184, β =  − 0.190, t(1334) = 6.13, p < 0.001).

The intrinsic value of mathematics at end-of-year was positively predicted by beginning-of-year self-concept (b = 0.064, β = 0.063,t(1334) = 2.56, p = 0.011), intrinsic value (b = 0.628, β = 0.641, t(1334) = 26.17, p < 0.001), and utility value (b = 0.079, β = 0.064, t(1334) = 2.93, p = 0.003). Moreover, intrinsic value was also dependent on the student’s gender with lower levels for girls compared with boys (b =  − 0.072, β = 0.092, t(1334) = 2.00, p = 0.046), and minority background, with the highest values for students from a Pasifika (b = 0.180, β = 0.186, t(1334) = 3.05, p = 0.002) or Asian background (b = 0.099, β = 0.102, t(1334) = 2.02, p = 0.044). Utility value at end-of-year was positively associated with intrinsic (b = 0.051, β = 0.064, t(1334) = 2.32, p = 0.020) and utility value (b =  − 0.603, β = 0.595, t(1334) = 24.12, p < 0.001) at beginning-of-year. Moreover, the utility value of mathematics was scored lower by girls compared with boys (b =  − 0.093, β =  − 0.116, t(1334) = 2.82, p = 0.005). The teachers’ expectations were not significant in predicting intrinsic and utility value.

Moderation effects of gender and minority background (research question 3)

It should be noted that the effects of teacher expectations in the models as presented in Table 6 are general effects, assuming that the effects of expectations are similar for various groups of students. However, it is questionable whether this assumption holds if some groups are indeed more vulnerable to high or low expectations. The models testing for moderation effects (Model 4) of gender and migration background are presented in Table 8. Adding the interactions to the model did not lead to a significant improvement of the model fit: Δχ2(20) = 9.972, p = 0.524. Moreover, none of the coefficients of the interaction variables was significant.

Table 8 Results of multivariate multilevel models for testing moderator effects of gender and minority background and teacher expectation effects on end of the year mathematics achievement, self-perception, intrinsic, and utility value

Discussion

The aim of the current study was to investigate whether students’ gender and minority background were moderators of teacher expectation effects for mathematics outcomes as well as self-concept and subjective task value in the mathematics domain. This study adds to the current knowledge base by empirically studying moderator effects of both gender and minority background. Additionally, whereas previous research mostly investigated self-concept only, we investigated the association between teacher expectations and students’ self-concept as well as their subjective task values (intrinsic and utility value).

Research question 1

Regarding the first research question, in line with societal stereotypes and after controlling for students’ beginning-of-year mathematics achievement, teacher expectations were higher for Asian and lower for Māori, compared with NZ European students. These findings are in line with previous research in the same context (Turner et al., 2015), but also corroborate findings from other educational systems that teachers differentiate in their expectations based on students’ minority background (e.g., Glock & Krolak-Schwerdt, 2013; Tenenbaum & Ruck, 2007; Timmermans et al., 2015, 2018). However, contrary to the expected stereotypes, expectations within the domain of mathematics were higher for girls than for boys. This finding is inconsistent with earlier research as it has also been found that, in line with stereotypes, girls have been the target of low expectations in mathematics (Jussim et al., 1996). However, there is also some evidence (Jaremus et al., 2020) that expectations for girls and boys in mathematics at the elementary and middle school level are less differentiated (e.g., Gentrup & Rjosk, 2018) and that it is at higher levels of schooling (secondary and tertiary) where teachers tend to have higher expectations for boys in the STEM fields. Perhaps, this finding is a reflection of teachers more often disapproving of boys’ classroom behavior, causing boys generally to be perceived as academically poorer students (e.g., Bennett et al., 1993; Harlen, 2005; Hecht & Greenfield, 2002; Kenney-Benson et al., 2006). Moreover, the New Zealand public have been made aware of societal stereotypes related to girls in STEM and how greater numbers of girls need to be encouraged into these fields (e.g., https://www.curiousminds.nz/actions/community/women-and-girls/). It may be that this information has led teachers to more carefully consider the capabilities of girls in mathematics. All in all, these findings suggest that for students with comparable levels of achievement in the mathematics domain, teachers may hold different expectations because of their gender or ethnicity.

Moreover, after controlling for background variables and beginning-of-year achievement, student self-concept and, to a lesser extent, interest in mathematics were associated with teachers’ expectations. This implies that teachers base their expectations on a wider range of student characteristics than just achievement and demographic background (Timmermans et al., 2016, 2019). Teachers’ expectations for the academic achievement of elementary school students have been found to be positively related to students’ perceived assertiveness, independence (Alvridez & Weinstein, 1999; Bonvin & Genoud 2006; Rubie-Davies, 2010), self-confidence (Driessen, 2006; Rubie-Davies, 2010), and self-concept (Upadyaya & Eccles, 2015). These studies have indicated that teachers tend to have higher expectations of a student they perceive of as independent, more confident, or with a greater self-concept. Although it is generally assumed that teachers use these student attributes in shaping their expectations (e.g., Rubie-Davies, 2008), the empirical evidence is still rather limited. The current finding therefore presents an important addition to the limited evidence base.

Research question 2

Regarding the second research question, teachers’ beginning-year expectations were predictive of achievement and self-concept of students at end-of-year. For mathematics achievement, these findings correspond to numerous studies showing that differential expectations are confirmed in generally small to moderate self-fulfilling prophecies in various academic domains (e.g., Jussim & Harber 2005; Rubie-Davies 2008; Tenenbaum & Ruck, 2007; Wang et al., 2018). The finding that beginning-of-year teacher expectations are not only predictive of achievement but also of students’ self-concept is an important contribution for two reasons. First, many previous studies have failed to take initial self-concept into account; therefore, it has remained largely unknown whether the reported positive associations were spurious. Second, children’s perceptions of themselves affect their motivation and subsequent behavior (Bandura, 1986; Dweck & Leggett, 1988; Harter, 1983; Marsh, 1990). The finding that teacher expectations are predictive of later student self-concept is in line with Expectancy-Value Theory (Eccles, 1983). It seems that the teachers’ expectations serve as an evaluation of a significant other (e.g., Bong & Skaalvik, 2003; Gniewosz et al., 2014; Shavelson et al., 1976), and when felt and regarded as support and acceptance by students, these expectations lead the student to evaluate themselves more positively (Liu & Wang, 2008).

Previous research had shown positive correlations between teacher expectations and subjective task values, as expectations were positively related to utility value (e.g., Benner & Mistry, 2007; Boerma et al., 2016; Gilbert et al., 2014; Lazarides and Watt, 2015) and intrinsic value (Woolley et al., 2010). The current study, however, failed to find this association. Various methodological differences may account for the difference in findings. Some prior studies tested this association in a different or combined subject domains (Benner & Mistry, 2007; Boerma et al., 2016), or included students’ perceptions of teacher expectations instead of teacher ratings (Gilbert et al., 2014; Lazarides & Watt, 2015; Woolley et al., 2010). Furthermore, all studies above are based on data collections in which teacher expectations and task values were collected at a single moment. Perhaps, once controlled for beginning-of-year levels of intrinsic and utility value, teacher expectations are not strong enough to predict intrinsic and utility value about 8 months later. Alternatively, although it may be assumed that attitudes of socializers (i.e., teachers) affect students’ subjective task values (Wigfield & Eccles, 1992, 2000), this perhaps holds for other teacher attitudes but potentially does not include teachers’ performance expectations.

Research question 3

Regarding the third research question, we expected that gender and minority background would moderate the effects of teacher expectations on mathematics performance (e.g., Boerma et al., 2016; McKown & Weinstein, 2002), with larger effects of expectations for stigmatized groups (girls in mathematics, students with minority backgrounds). In the current study, we did not find evidence for moderation effects of gender and minority background, which implies that, at least in the current sample, the effects of teacher expectations were of roughly similar magnitude for various student groups.

Strengths and limitations

In interpreting the results of this study, a number of strengths and limitations need to be considered. Many available studies investigating the association between teacher expectations and subsequent student self-concept and task values have failed to control for measures of current levels of achievement and self-concept. It is therefore not possible from these studies to disentangle teacher expectation effects on these outcomes from potentially higher initial expectations for more confident, efficacious, or interested students (Timmermans et al., 2016). A major strength of the current study was the design with measures at both the beginning and end-of-school year and therefore the possibility to control for beginning-of-year ratings of students’ self-concept and task values. Therefore, the association that higher teacher expectations at beginning-of-year are predictive of greater self-concept at end-of-year and of the absence of an association with end-of-year intrinsic and utility value is possibly a more reliable and robust finding than that of earlier studies. Nevertheless, we cannot infer general conclusions about causal effects of teacher expectations as a potential important confounder may have been omitted from the multilevel models.

Moreover, the sample of the study consisted of 1663 students and their teachers from three intermediate schools in the geographical Auckland area. Although the sample was of reasonable size at the student (n = 1663) and teacher (n = 42) level, the number of participating schools is rather small. Nevertheless, in the selection of the three participating intermediate schools, variation in student population in relation to socioeconomic status was explicitly considered. For future research, it would be beneficial to assess the generalizability of these associations using samples consisting of more schools. Moreover, it remains worthwhile to replicate this study in various contexts, educational systems (outside the Auckland area), and for different age groups, in order to investigate the generalizability of the findings.