Introduction

The disparities in mathematics achievement by student subgroups have been a critical issue in addressing the inequities in mathematics education (Hanushek et al., 2019). In previous achievement gap studies, attention has been given to highlighting persistent achievement gaps by accounting for students’ sociodemographic characteristics and school contextual differences (e.g., Lee & Reeves, 2012; Lubienski & Lubienski, 2006; Young et al., 2017). However, caution must be stressed in overemphasizing the associations between students’ demographic characteristics and mathematics achievement, because it may reinforce the deficit views toward the underrepresented students when years of efforts to enhance the achievement of underrepresented students do not yield desirable educational outcomes (Garcia & Guerra, 2004). Also, over-emphasis on the effects of demographic markers may take our attention away from possible omitted variables that are associated with achievement, potentially obscuring the broader landscape of influences, including psychological aspects (Wilms et al., 2021). Therefore, we propose that studies exploring achievement gaps should be reframed from the anti-deficit and asset-oriented approach (Harper, 2010) to identify how some malleable factors, such as self-efficacy, are associated with the observed mathematics achievement gaps.

Mathematics self-efficacy is closely associated with students’ mathematics achievement. Self-efficacy refers to an individual’s beliefs in their capacities to execute courses of action to accomplish and succeed in given tasks (Bandura, 1977). According to social cognitive theory (Bandura, 1986), students’ self-efficacy is associated with individual factors, such as goal setting (Schunk & DiBenedetto, 2021), as well as by environmental factors, such as schools where students belong (Bandura, 2001). Studies (e.g., Sakellariou, 2022) have shown a consistent positive correlation between self-efficacy in mathematics and students’ mathematics achievement.

Nevertheless, at least three gaps in the current studies on the role of mathematics self-efficacy in enhancing mathematics achievement remain. First, methodologically, previous studies (e.g., Soland & Sandilos, 2021) tend to overlook the potential associations between school environments and mathematics self-efficacy or mathematics achievement; as a result, the broader interpretative value of these findings may be constrained. Second, the relationship between mathematics self-efficacy and achievement has been shown to co-vary when considering some affective variables (e.g., mathematics anxiety, Kalaycıoğlu, 2015; interest, Zhang & Wang, 2020). These variables were often reported to be moderately to highly correlated with self-efficacy (e.g., Hiller et al., 2022). This may result in underappreciation of the extent to which mathematics self-efficacy and other affective variables are associated with mathematics achievement. Finally, while the association between mathematics self-efficacy and achievement may differ by students’ developmental stages (Shell et al., 1995), such potential differences by age or grade levels are rarely discussed. Thus, we designed this study to address these gaps in the literature by providing a better understanding of the association between mathematics self-efficacy and achievement gaps across grades and contexts. More specifically, we used a multilevel modeling (MLM) framework to nest students into schools to explore the association of mathematics self-efficacy with achievement, considering a series of student- and school-level demographics with data from the National Assessment of Educational Progress (NAEP) in Grades 4, 8, and 12.

With the findings from this study about academic self-efficacy, we aim to shift the attention to psychological variables that are malleable and may correlate with narrower achievement gaps among students from different demographic subgroups. Therefore, in this study, we sought to (1) estimate the degree to which mathematics self-efficacy is associated with student mathematics achievement; (2) examine changes in mathematics achievement gaps among student demographic subgroups when accounting for variations in mathematics self-efficacy; and (3) explore if average mathematics self-efficacy at each school is related to their overall school mathematics achievement.

The association between mathematics self-efficacy and mathematics achievement

Research tends to report a positive association between self-efficacy and mathematics achievement. For example, Kalaycıoğlu, (2015) explored the association of mathematics self-efficacy, socioeconomic status (SES), and mathematics anxiety with secondary school students’ mathematics achievement in PISA 2012 and found mathematics self-efficacy to be a strong correlational predictor across all countries with medium effect sizes. Similarly, in a study using PISA 2012 in Greece, Pitsia et al., (2017) found that mathematics self-efficacy was a significant correlational factor in predicting middle-school students’ mathematics achievement after controlling gender and school mean SES, among a group of non-cognitive factors (e.g., mathematics self-concept, attitudes toward school, intrinsic motivation, and instrumental motivation). Moreover, in a meta-analysis on psychological correlates of academic achievement, Richardson et al., (2012) reported a medium-sized positive correlation between academic self-efficacy and academic achievement. They also found self-efficacy had the strongest correlations with academic achievement among compared to approximately fifty other achievement-related cognitive and non-cognitive variables. Mathematics self-efficacy was associated with a considerable proportion of the variance in mathematics achievement alone, accounting for about 54% in Cheema and Kitsantas, (2014) and 40% in Kitsantas (2011) within U.S. PISA data.

The association of mathematics self-efficacy with mathematics achievement across student subgroups

The literature generally highlights the positive correlation between self-efficacy and academic performance, noting variations in the association among student subgroups. For example, gender has been linked to differences in students’ mathematics self-efficacy, particularly during their high school years. Huang’s meta-analysis (2013) of 187 studies from elementary to high school ages indicated no statistically significant gender differences in mathematics self-efficacy existed among students either at elementary or middle school, but found such difference in high school with males reporting higher self-efficacy in mathematics. Conversely, Schwery, (2015) compared fifth- to eighth-grade students and found no significant gender gap in mathematics self-efficacy levels or their mathematics achievement, nor in the strength of the association between mathematics self-efficacy and mathematics achievement. These findings may suggest that the often-reported gender gap in mathematics achievement in high school or beyond may be associated, to some extent, with the reported difference in mathematics self-efficacy between genders (e.g., Cheema & Galluzzo, 2013).

Likewise, a student’s status as an English language learner (ELL) has been associated with variations in mathematics self-efficacy. Sandilos et al., (2020) investigated elementary ELLs and reported lower levels of self-efficacy and achievement in mathematics, even when student demographics and working memory were accounted for. They also found no significant association between limited English proficiency and either mathematics self-efficacy or mathematics achievement. In contrast, Soland and Sandilos, (2021) found an association between ELL status and the growth trajectory of mathematics achievement through self-efficacy in middle school students. Variations in findings between these studies may be partially linked to their differing sample compositions in terms of race/ethnicity. Sandilos et al., (2020) focused on a sample predominantly composed of White students (43%) and a small proportion of ELLs (16%), whereas Soland & Sadilos, (2021) examined a longitudinal cohort from a large urban middle school district with a significant representation of Hispanic (70%) and ELL students (30%). Therefore, the association involving ELL status reported in these studies might also relate to other socio-demographic factors within the samples.

Students with learning disabilities are often observed to have lower self-efficacy in mathematics. Larsen and Jang, (2022) noted an association between students with IEP in Grade 6 and lower self-efficacy compared to their non-IEP peers. Additionally, they found that when students with IEPs were placed in classes with inquiry-based teaching, there was an association with improved mathematics achievement. Similarly, Jungert & Anderson, (2013) reported that the lower mathematics self-efficacy among fifth graders with learning disabilities correlated with a history of challenges in mathematics performance and prolonged experience of difficulty with mathematics learning. Their research suggests that instructional practices aimed at enhancing self-efficacy could be associated with supporting achievement among students with special needs.

The literature consistently reports an association between students’ SES and their self-efficacy in mathematics, which in turn is correlated with their mathematics achievement. Wiederkehr et al., (2015) found a correlation between lower SES and decreased levels of mathematics self-efficacy among elementary and secondary students, with this group also tending to have lower mathematics achievement compared to their higher SES peers. These associations were statistically significant with moderate to strong effect sizes. Similarly, McConney and Perry, (2010) analyzed PISA 2003 data and found that students with a higher level of self-efficacy were associated with better mathematics performance when controlling SES status. Yet, the used measures of students’ SES in these studies are often not consistent, such as using parents’ occupations (e.g., Wiederkehr et al., 2015), or constructed SES index with few variables (e.g., McConney & Perry, 2010). Such variability in measuring SES may relate to the observed strength of its association with self-efficacy or achievement.

Finally, by using a large-scale dataset, Kotok, (2017) found that high-achieving African Americans self-reported the highest degree of self-efficacy whereas high-achieving Asian students had the lowest. After controlling for some student- and school-level variables (e.g., family background, peer engagement, math efficacy, mathematics identity, school SES, school types, and school locations), inversely, high-achieving African American students with higher mathematics self-efficacy were associated with lower mathematics achievement compared to their White and Hispanic counterparts. Although Kotok, (2017) only use the sub-sample of high-achieving students in this study, the result partly supports the results by Cheema and Kitsantas, (2014) that reported a small, but significant negative relationship between mathematics self-efficacy and mathematics achievement among Black students in PISA 2003 U.S. sample. Nevertheless, with NSLS: 2009, Andersen and Ward (2014) did not detect any statistically significant differences in the association between mathematics self-efficacy and achievement among Black, Hispanic, and White high-achieving students.

School contexts and their relationship with mathematics achievement

Students develop their academic knowledge through the dynamic interplays between students’ self-efficacy and social-ecological factors (Bandura, 2001), and thus, considering school contexts is crucial for a more comprehensive understanding of the association between self-efficacy and mathematics achievement. Nevertheless, findings regarding the significance of school location as related to mathematics achievement vary. Kotok, (2017) did not find a significant association between school location (city, town, rural) and achievement disparities. In contrast, Wu, (2015), using NAEP data from 2000 to 2011 for Native American students at advanced academic levels, reported that regional school location was related to achievement in Grades 4 and 8. With PISA 2000 U.S. data, Williams, (2005) found a marginal rural-town achievement gap in mathematics but a substantial urban-town achievement gap in favor of town schools. Webster and Fisher, (2010) analyzed the TIMSS Australian sample and found that rural school students were associated with lower mathematics scores compared to urban school students, even when considering differences in socioeconomic status.

While Webster and Fisher, (2010) found a higher availability of mathematics resources in rural schools compared to urban schools, the presence of mathematics resources did not show a substantial association with mathematics achievement. It leaves us to wonder if there are other potential variables might play a role in the relationship between school location and school-wide achievement. For example, school SES, reflecting the average of all students’ SES in a school (as used by McConney & Perry, 2010), is reported as a significant factor related to a school's overall achievement (e.g., Kotok, 2017; Pitsia et al., 2017), but then the relationships between school SES and mathematics achievement vary by school location as well. For example, Williams, (2005) found that school SES was particularly influential in urban schools, more so than in rural ones, in relation to mathematics achievement. In town schools, a significant correlation was observed between school SES and student mathematics achievement. When school SES was accounted for, the gap in mathematics achievement between rural and town schools diminished, yet the difference between urban and town schools persisted, with town schools having the advantage. McConney and Perry, (2010) implied that regardless of their mathematics self-efficacy levels, students being part of a high SES school group were associated with higher mathematics achievement compared to those in lower SES schools after controlling students’ individual SES status. The strongest association between school SES and mathematics achievement was seen in students with high self-efficacy from less affluent families. Therefore, more thorough investigations of the school’s environmental influences are warranted for a better understanding of student mathematics achievement.

Related to this point, to date, there has been little discussion on how access to gifted education programs at schools is associated with students’ mathematics achievement. Therefore, in this study, we included the variable that indicates access to the gifted and talented programs at school as an additional school contextual variable. We view the percentage of gifted program participation to represent the rate of access to gifted education by following a talent development paradigm that focuses on increasing students’ access to high-quality gifted services to develop their talents in certain areas (Gentry et al., 2021). It is commonly accepted to view that schools with a higher percentage of students attending gifted programs tend to have higher average mathematics achievement because only already high-achieving students are identified to attend gifted programs. Such a view followed the traditional gifted child paradigm that assumes gifted children are born with high potentials (e.g., IQ, Terman, 1925) and the goal of gifted education is to make the fullest use of these potentials (Dai & Chen, 2013). Researchers (e.g., Gentry et al., 2021) have criticized this paradigm for substantially excluding a broad range of students with gifts and talents in domain-specific areas, which exacerbated the inequity in gifted identification. Gentry et al., (2019) reported that rural and town schools demonstrated considerably less equity in identification for gifted programs than city and suburban schools. They also found that nationally, students in low-SES schools were less likely to be identified as gifted students, specifically; 58% of students were from high-SES schools. Thus, after accounting for school location and school SES, examining the association between the rate of gifted program participation at school and mathematics achievement will reflect the importance of access to gifted education services.

Collectively, these studies presented thus far provide evidence that students’ demographics and school contexts are associated with students’ self-efficacy. Given the significance of mathematics self-efficacy in relation to mathematics achievement, it is reasonable to speculate that students’ differences in mathematics self-efficacy may be associated with observed disparities in mathematics achievement among student demographic subgroups. Nonetheless, findings regarding the association of mathematics self-efficacy with achievement for different student subgroups have been variable, potentially due to limitations in sample compositions, research designs, and methodologies. Furthermore, the consistent association of mathematics self-efficacy with achievement across different grade levels has yet to be fully explored, which could be addressed with a systematic investigation with nationally representative student assessment data. As students’ mathematics self-efficacy may vary with their academic or developmental progression (Schunk & Pajares, 2002), exploring the role of mathematics self-efficacy at different academic levels provides additional insight into the relationship between motivation and achievement gaps.

Thus, we examined the association between mathematics self-efficacy and mathematics achievement, taking into account student- and school-level demographic variables for three grade levels (i.e., Grades 4, 8, and 12). And we analyzed the achievement gaps by subgroups with consideration of variations in mathematics self-efficacy. Sequential two-level cross-sectional multilevel models (MLM) were applied for each grade to address the following research questions with NAEP 2019, a nationally representative large-scale dataset. The research questions are:

RQ1. What is the distribution of variability in students’ mathematics achievement within schools and between schools?

RQ2. How are student-level characteristics (i.e., gender, ELL status, IEP status, NSLP eligibility, race/ethnicity, and mathematics self-efficacy) associated with student-level variations in students’ mathematics achievement?

  • 2.1 What proportion of the variance in mathematics achievement can be associated with mathematics self-efficacy when controlling for student-level demographic characteristics?

  • 2.2 How do associations between student subgroups (i.e., by gender, ELL, IEP, NSLP, and race/ethnicity) and mathematics achievement differ when mathematics self-efficacy is considered?

RQ3. What are the associations between school-level characteristics and variations in student mathematics achievement?

Method

NAEP Data

This secondary data analysis study used the data from NAEP 2019 mathematics (IES license # 13090032). NAEP is designed to measure the trends in the academic performance of U.S. students in Grades 4, 8, and 12 (National Assessment Governing Board, 2020). To make valid inferences about the mathematics achievement of students in the U.S. population, NAEP employed a two-stage stratified sampling design to select students in Grades 4, 8, and 12 who were attending public, private, Bureau of Indian Education (BIE), and Department of Defense Education Activity (DoDEA) schools for assessment (National Center for Education Statistics [NCES], 2022a). The two levels of clustering in NAEP were schools within geographic groups and students within schools. Questionnaires that were administered to students, teachers, and school principals contain questions regarding students’ backgrounds as well as school contexts. NAEP data are cross-sectional consisting of student- and school-level variables that are associated with students’ mathematics achievement at a given point in time. The list of NAEP variable names and descriptions that are used as dependent, independent, and weights variables in this study is presented in Appendix A.

Plausible values for NAEP mathematics achievement

NAEP mathematics achievement was reported as scaled scores.Footnote 1 NAEP reported 20 plausible values of students’ mathematics achievement scores (variable name in the dataset: MRPCM1-20) for estimating population characteristics from the sampled students through marginal maximum likelihood analysis (NCES, 2016). The 20 plausible values of students’ mathematics achievement in NAEP 2019 served as the dependent variable as a set in this study.

Student-level predictors

Mathematics self-efficacy

Students’ mathematics self-efficacy was a primary variable of interest. Several items measured students’ self-efficacy in mathematics at each grade, asking students to what extent they could answer subject-specific questions. For example, there were seven items in Grade 8, one of which asked students whether they could “list all of the different possible outcomes when a coin is flipped three times.” Students answered on a 5-point response scale from “I definitely cannot” to “I definitely can.” Two-parameter item response theory model (NCES, 2022b) was applied to estimate scaled scores for students’ self-efficacy, and the variable (SQRPM7)Footnote 2 was used for analysis. The higher the score, the higher the students’ mathematics self-efficacy.

Demographic characteristics

We also used students’ background variables, which included gender (DSEX), ELL status (LEP), IEP status (IEP), NSLP eligibility status (SLUNCH), and race/ethnicity (DRACE10). Gender was dummy coded. Female students were coded as 1, and male students were coded as 0. We created a set of dummy coded variables to represent the categories of race/ethnicity with White students as a reference group, that is, Asian students (Asian = 1, White = 0), Hispanic students (Hispanic = 1, White = 0), Black students (Black = 1, White = 0), American Indian/Alaska Native students (American Indian/Alaska Native = 1, White = 0), Native Hawaiian/Pacific Islander (Native Hawaiian/ Pacific Islander = 1, White = 0), Two or more races (Two or more races = 1, White = 0). We collapsed the original NAEP categories of “No, formerly ELL” and “No” into the category of students who were not ELL and dummy coded it as 0 (reference group), ELLs as 1. Variables of students with IEP and 504 plans were also combined and recoded as 1, with students not having either plan as a 0. The categories of “reduced-price eligible” and “free lunch eligible” were collapsed as the group of students not eligible for NSLP (dummy coded as a 1) and students not eligible for any lunch programs as a 0.

School-level predictors

A set of background variables was selected to represent schools’ characteristics. The variables of the proportion of Hispanic students (SSCHHSP), Black students (SSCHBLK), American Indian/Alaska Native students (SSCHIND), and Native Hawaiian/Pacific Islander students (SSCHHPI) are continuous, ranging from 0 to 100 percent. The values of these four variables were added to form the new continuous variable of the proportion of underrepresented students in this study. The proportions of ELLs (C044006), students with IEPs (C044007), students eligible for NSLP (C051651), students receiving targeted Title I servicesFootnote 3 (C051801), and students in gifted and talented programs (C044004) are divided into several categories (for details, see Appendix A). The variable of school location (SULOCAL)Footnote 4 was recoded as a set of dummy variables with suburban schools coded as 0 (reference group) that generated three new variables, which were City (city = 1, suburb = 0), Town (town = 1, suburb = 0), and Rural (rural = 1, suburb = 0). School-level mathematics self-efficacy variable was created by taking the average of student mathematics self-efficacy in each school.

Data analysis

Because NAEP student data are nested within the school, we applied a set of two-level cross-sectional multilevel models (Raudenbush & Bryk, 2002) that enables us to examine within- and between-cluster relations of self-efficacy to mathematics achievement (Stapleton, 2013). We applied group mean centering and grand mean centering (McCoach, 2010) for independent variables to minimize bias in regression coefficient estimates (Enders & Tofighi, 2007). In this study, group means centeringFootnote 5 was applied for all student-level variables. Grand mean centeringFootnote 6 was used for all school-level variables. We also used the final school weight (SMSRSWT) variable at school-level models and student conditional weights (i.e., student original joint weight variable, ORIGWT/ SMSRSWT) at student-level models and to obtain unbiased students’ mathematics achievement estimates representative of the target population. To assess the association between self-efficacy and the magnitude of mathematics achievement gaps, we examined the rate of variance reduction between schools captured by models (Raudenbush & Bryk, 2002). We ran all MLM analyses with HLM 8.0 software (Raudenbush & Congdon, 2021). The plausible value feature in HLM 8.0 was used to handle the 20 plausible values of mathematics achievement when running models. Maximum likelihood estimation was used for all models.

MLM Modeling steps

To address the research questions in this study, five models were tested with data from each grade level. The details about model equations for Grades 4, 8 & 12 can be found in Appendix B. First, we fitted an unconditional model (Model 0) to the data without any predictors at student or school-level, which helped to determine the extent of variance in student mathematics achievement that is associated with school-level grouping. We calculated the intraclass correlation coefficient (ICC) to assess the unit dependence within the group. ICC here represents the proportion of total variance in student mathematics achievement accounted for by between-school heterogeneity, which ranges from 0 to 1 (Musca et al., 2011). An ICC value between 0.1 and 0.15 (Scherbaum & Ferreter, 2009) or around 0.2 for school-based clustering (Hedges & Hedberg, 2007) will support investigating the cluster effects with multilevel modeling analysis.

Next, we added the student demographic predictors into the Level-1 model (Model 1). The effects of all student-level demographic predictors were allowed to vary across schools initially but fixed for subsequent analyses when statistically non-significant random effects were detected. The students’ level of self-efficacy was then incorporated into Model 1 as Model 2, to observe its association with mathematics achievement alongside demographic variables.

As the next step, we simultaneously entered the school-level demographic variables into the random slope models of Model 2 to explain the variation of the effects of the student predictors across schools (Model 3). Additionally, the school-level mean self-efficacy was included in Model 3 to observe the association of this contextual variable with the outcomes (Model 4). Finally, in Model 5, we added all the above-mentioned school-level characteristics into the Level-2 slope model for self-efficacy and tested the cross-level interactions. Note that the above analytical steps with MLM were repeated with data from Grades 4, 8, and 12, and the results were evaluated separately to address research questions. Also, note that the final conditional model was not identical across grades because some effects of student-level predictors varied across schools at one grade but not at another. In addition, since no school location information was provided in the NAEP Grade 12 dataset, we did not have the school location variables in Models 3, 4, and 5.

Findings

To minimize the repetition in reporting the findings per grade, we focus on the result of Grade 8 and highlight the key results from Grades 4 and 12. All estimates were computed based on appropriate sample size in each variable per NCES reporting guidelines of statistical results (U.S. Department of Education & Institute of Education Sciences, 2005).

Mathematics achievement variation across schools

Results with the unconditional model (Model 0) in Grade 8 showed that the grand mean of the mathematics achievement scores across schools was 281.41 (SE = 0.57, p < 0.001). However, school averages varied significantly across schools, and on average, school means deviated about 16.45 points from the overall mean (p < 0.001). The calculated ICC value was 0.18, which meant that about 18% of the total variance in the eighth-grade mathematics achievement scores was accounted for by between-school heterogeneity. Similar results were found in the 4th and 12th grades. For Grades 4 and 12, the grand mean of mathematics achievement (γ00) is 239.52 (SE = 0.4, p < 0.001) and 148.47 (SE = 0.67, p < 0.001). But school means varied significantly between schools (Grade 4: SD = 14.49, variance = 209.96, p < 0.001, Grade 12: SD = 13.38, variance = 178.98, p < 0.001). More specifically, about 18% and 16% of the variance in NAEP 4th and 12th grade mathematics was attributable to between-school differences. While meaningful contextual effects exist across all grade levels, the association of the contexts with student mathematics achievement appears to diminish as the grade level advances.

The effect of mathematics self-efficacy on mathematics achievement

Two-level conditional MLM models (Models 1 & 2) were applied to examine the association between mathematics self-efficacy and mathematics achievement, alongside student demographic predictors. At Grade 8, no random effect at Level-2 was statistically significant, and thus we removed all random effects from models for Level-1 coefficients at Level-2 and set the effects fixed, indicating the association of these student variables with mathematics achievement was consistent across schools. But in Grades 4 and 12, some random effects of student-level demographic predictors were detected, for example, only the intercept of IEP and NSLP demographic variables at Level-1 were allowed to vary across schools at Grade 4. Table 1 presents the results of Models 1 and 2 for Grades 4, 8, and 12. The intercept of Model 2 in Grade 8 presented that the estimated mean achievement of a student who was at the level of 0 on all dummy-coded demographic variables (i.e., non-ELL, non-NSLP While male students without IEP) was 280 points (SE = 0.52). A similar interpretation is applied for the conditional means for Grades 4 and 12.

Table 1 Summary of results from the model 1 & 2 for NAEP 2019 mathematics achievement in grades 4, 8, and 12

The unique effect of mathematics self-efficacy after controlling demographics

The results of Model 2 showed that after controlling demographic differences, mathematics self-efficacy was significantly associated with students’ mathematics achievement in Grade 8, as well as in Grades 4 and 12. Similarly, the addition of mathematics self-efficacy accounted for an extra 11.86% of the variance in Grade 4 and 17.66% in Grade 12, suggesting a substantial association between students’ mathematics self-efficacy and their mathematics achievement. Furthermore, the magnitude of this association with mathematics achievement was larger in Grade 8 compared to Grades 4 and 12. An increase of one point in student-level mathematics self-efficacy was associated with an increase of 8.63 points (SE = 0.1) in mathematics achievement in Grade 8, 5.84 points (SE = 0.09) in Grade 4, and 7.34 points (SE = 0.15) in Grade 12. The variability in the association between mathematics self-efficacy and mathematics achievement across schools was significant for all grades, implying that this relationship was not uniform and varied by school context.

Mathematics self-efficacy and achievement gaps

Compared to the results with Model 1, the results with Model 2 showed that achievement gaps between all student subgroups (except the gendered gap at Grade 8) shrunk considerably after accounting for mathematics self-efficacy at all grades. However, the gaps were still statistically significant. Further details are described below.

Achievement gaps by gender

Gender achievement gaps in mathematics, although notable, remained relatively small in all grades in Model 1. For example, after accounting for other demographic variables, in Grade 8, the average mathematics achievement of males was 2.68 points (SE = 0.19, p < 0.001) higher than that of females. After accounting for the association with mathematics self-efficacy, the gender gaps narrowed down slightly in Grades 4 and 12 but widened with a slight increase in Grade 8.

Achievement gaps by ELL, IEP, and NSLP

The results in Model 1 also presented a glaring mathematics achievement disparity between ELL and non-ELL students in Grades 4, 8, and 12. The disparity in Grade 4 between ELL and non-ELL students was about 15.56 points (SE = 0.57) when controlling other demographics. However, the disparity was widened to 28.05 points (SE = 0.47) in Grade 8 and 29.38 points (SE = 1.23) in Grade 12. Even when controlling for their mathematics self-efficacy differences, the achievement gap by ELL status at Grades 4, 8, and 12 remained relatively wide.

The original mathematics achievement gap between IEP and non-IEP students in Grade 4 was 29.65 points (SE = 0.41), which widened significantly with increasing school years, 35.82 points (SE = 0.32) in Grade 8, 32.22 points (SE = 0.86) in Grade 12. After considering students’ mathematics self-efficacy, all gaps narrowed by about 5–9 points, but the disparities between IEP and non-IEP students were still large.

The gap between NSLP and non-NSLP students increased slightly from 9.64 points in Grade 4 (SE = 0.31) to 12.04 points in Grade 8 (SE = 0.22) and decreased to 9.11 points in Grade 12 (SE = 0.57). After controlling mathematics self-efficacy, the gap shrunk slightly.

Achievement gaps by race/ethnicity

The gaps of Black—White and Hispanic—White had been significant and extensive across grades, particularly for the gap of Black—White. Among Black students who had the same degree of self-efficacy in mathematics and were identical on other demographic predictors (i.e., gender, ELL, IEP, NSLP), the gaps between those Black and White students fluctuated across grades, about 9.84 points (SE = 0.43) in Grade 4, 19.15 points (SE = 0.37) in Grade 8, 20.75 points (SE = 0.8) in Grade 12 (see results from Model 1). After the self-efficacy difference between Black and White was controlled, the demographic effects were substantially reduced at all grades and still significant. The gaps between NHPI/AIAN and White were statistically significant regardless of their level of self-efficacy across grade levels. When considering the effects of mathematics self-efficacy, the gap between White—Asian in Grades 4, 8, and 12 remained still wide and significant, but shrunk about 4-points in Grades 8 & 12.

School contextual effects on mathematics achievement

Tables 2 and 3 summarized the results of these models for Grades 4, 8, and 12. In Model 3, the inclusion of school-level demographics explained 52.17% of the variance in average mathematic scores between schools in Grade 8, 45.9% in Grade 4, and 43.45% in Grade 12, which all suggested that in addition to students’ backgrounds (including mathematics self-efficacy), the demographic composition of schools was also found to be significantly associated with school mathematics achievement. After considering all the student-level variables, all school-level demographic variables, except the percentage of ELLs and city schools or not in Grade 8, significantly explain the variation in school average mathematics achievement. Taking Grade 8 as an example, as for school location, in comparison with suburban schools with similar school settings (i.e., percentage of ELL, IEP, students receiving Title I service, gifted students, and underrepresented students), school average mathematic achievement was 9.95 points (SE = 1.14) lower for town schools; 8.99 points (SE = 0.94) lower for rural schools. Also, for example, a one-unit increase (e.g., from 26–50% to 51–75%) in the percentage of IEP students in schools would result in a 3.82-point (SE = 0.52) decrease in school mean achievement with all other contextual backgrounds equal. A one-point increase in the percentage of underrepresented students in schools also resulted in 0.33 points (SE = 0.02) decrease in school mean achievement. All differences are statistically significant.

Table 2 Summary of results from the model 3 & 4 for NAEP 2019 mathematics achievement in grades 4, 8, and 12
Table 3 Summary of results from the model 5 for NAEP 2019 mathematics achievement in grades 4, 8, and 12

The results with Model 4 indicated that school mean mathematics self-efficacy was significantly associated with the school-level mean mathematics achievement after controlling school-level demographics across grades. This finding indicates a potential contextual correlation with self-efficacy, as the association of the average school mathematics self-efficacy with achievement was observed to be more pronounced than that of individual self-efficacy variables across the grade levels. For Grade 8, a one-point change in the average school-level mathematics self-efficacy is associated with a 17.4-point difference (SE = 0.42) in the average school mathematics achievement; for Grade 4, there is a 14.46-point difference (SE = 0.51); and for Grade 12, a 14-point difference (SE = 0.76). Including the average school mathematics self-efficacy in Model 4 accounted for an additional 27.56% of the between-school variance in Grade 8, 20.42% in Grade 4, and 27.42% in Grade 12, suggesting a substantial correlation. Although we were originally interested in whether school-level characteristics, specifically school average self-efficacy, explain the variation in the relationship between mathematics self-efficacy and mathematics achievement across schools, we did not find significant variation in the relationship in Model 2. Accordingly, as reported in Table 3, the results showed trivial changes in variances between schools explained by Model 5 compared to Model 4 across three grades.

Discussion

Mathematics achievement disparities by students’ socio-demographic subgroups have been widely discussed for decades (Hanushek et al., 2019), which instead of contributing to shrinking the achievement gaps but reinforced the negative stereotypes thrown on traditionally underrepresented students (Garcia & Guerra, 2004). In this study, using asset-oriented perspectives on reducing achievement gaps by subgroups, we draw attention away to the role of malleable characteristics of students (i.e., mathematics self-efficacy), which is a significant motivational process and predicts subsequent academic outcomes (Schunk & DiBenedetto, 2021). Our aim was to determine how mathematics self-efficacy is associated with mathematics achievement across different developmental stages and to consider its potential role as a factor related to the achievement gap attributable to students’ backgrounds. The findings indicate that mathematics self-efficacy is correlated with achievement, with the association appearing most prominent in Grade 8 compared to Grades 4 and 12. Contextual factors such as the proportion of students eligible for NSLP and from underrepresented groups, as well as school location, are also shown to be related to achievement, resonating with previous research (Pitsia et al., 2017; Wu, 2015). The findings highlight the relevance of learning; or school contexts when understanding students’ achievement. Additionally, the data suggest that mathematics self-efficacy is related to the narrowing of achievement disparities across socio-demographic subgroups. In the subsequent sections, we will delve into these associations in greater detail and discuss the implications of our findings for educational practice and future research, while acknowledging that our cross-sectional study design precludes the establishment of causal relationships.

The association between mathematics self-efficacy and achievement

Consistent with previous research (e.g., Bohrnstedt et al., 2020; Keşan & Kaya, 2018), findings from this study highlight the significant correlation between students’ mathematics self-efficacy and mathematics achievement across grades. Notably, results indicate that mathematics self-efficacy had the largest association with achievement in Grade 8. This observation points to the potential importance of supporting middle school students in bolstering their mathematics self-efficacy, which may be related to their current and prospective mathematics performance. We posit that there could be at least two reasons for this observed correlation.

First, as students progress through their academic coursework into middle school, they become more aware of the bombarded feedback cues about their mathematics performance that are from their primary resources (e.g., peers, mathematics teachers, and parents) (Hickman & Sherman, 2019). With the increasing complexity of mathematics in middle school, the positive cues students received from their direct ecological environments that blend with their positive dispositions toward learning mathematics play a determinant role in forming a high level of self-efficacy in mathematics (Hickman & Sherman, 2019). For example, for students who have a history of outstanding mathematics performance and access to mathematics learning facilities/resources, they tend to feel mathematics is becoming manageable in middle school and self-affirm their mathematics abilities (Usher, 2009). It is posited that the level of mathematics self-efficacy is correlated with students’ self-regulation, persistence, engagement, and effort in tackling challenging mathematics problems, which in turn are associated with differences in mathematics achievement (Bandura, 1977).

Second, when students go to middle school, students’ mathematics experiences start to diverge because of tracking and course placement decisions. Based on Subotnik et al. (2021)’s talent development megamodel, when students enter early adolescence, their mathematics talent begins to flourish with the provision of advanced mathematics learning opportunities and psychosocial skills (e.g., self-efficacy) (Subotnik et al., 2021). Middle school opens the door for able students to embrace multiple spheres of opportunities (e.g., honors classes, math clubs, academic summer camps) from which they build high self-efficacy through discourses and interactions with others. Unfortunately, for students not placed in advanced mathematics classes, they often face the prospects of less rigorous coursework, lack of teacher support, and low expectation, whereas their able peers are taught coherent conceptual understanding and high-order thinking skills (Stiff et al., 2011). Additionally, research has shown a correlation between placement in lower academic tracks and a decrease in self-efficacy, which is associated with changes in cognition and performance (Gray et al., 2002). Furthermore, this situation appears to be correlated with disproportionate effects on underrepresented students (e.g., Black, Hispanic, Native American, students from low-income families, ELLs, and students with disabilities), with some indicators that placement decisions may be associated with demographic data (Stiff et al., 2011).

Mathematics self-efficacy and achievement gaps

When students have similar levels of self-efficacy in mathematics, we observed that the associations of demographic factors with their achievement were less pronounced; mathematics achievement gaps by subgroups were reduced from small to large extent. This implies that mathematics self-efficacy could be uniformly associated with supporting mathematics achievement, irrespective of subgroup identity. But some student subgroups tend to have low mathematics self-efficacy, which is associated with lower achievement outcomes.

In our study, the correlation between mathematics self-efficacy and the narrowing of gender and racial-ethnic achievement gaps in mathematics was not significant, whereas a strong positive correlation with self-efficacy was observed in the achievement gaps of ELL, IEP, and NSLP students, particularly in Grade 8, which is in line with the findings of Polat et al. (2016) and Soland and Sandilos (2021). With the statistical control of differences in mathematics self-efficacy, we noted a correlation with the reduction of achievement gaps, leading to the consideration that the gaps often seen between well-represented and underrepresented students are associated with differences in self-efficacy levels. According to Bandura’s (1977) sources of self-efficacy, ELLs, students with IEPs, and students eligible for NSLP are traditionally documented to experience persistent underperformance in mathematics (Rodriguez et al., 2022) (mastery experiences), express frustrations after social comparisons to normally functioning peers (Coleman, 2001) (vicarious experiences), receive few encouragements (Solomon et al., 1996) (verbal persuasion), and struggle with physical and psychological ill-being (Campbell & Gilmore, 2014) (emotional and physiological states). These factors might interact with the context of having under-proficient mathematics teachers (Abedi et al., 2006), who may not effectively reinforce concepts of mathematics self-efficacy, or being in environments with prevalent negative stereotypes (Steele & Aronson, 1995), or attending schools in high-poverty areas with limited resources (Darling-Hammond, 2013), which are all associated with lower self-efficacy appraisals.

In contrast, White and Asian students, non-ELLs, students without learning disabilities, and students from families with more resources are often associated with a wealth of reinforcing experiences for self-efficacy at schools and homes, such as skillful mathematics teachers structured routine opportunities for success in mathematics and provide above-grade-level mathematics instructions (Usher, 2009); these experiences are correlated with higher self-efficacy in mathematics. Consequently, if ELL, IEP, and NSLP students were to exhibit high levels of self-efficacy comparable to their peers, they too might show similar academic achievements, and the observed achievement gaps might be less pronounced. It indicates that exploring how underrepresented groups cultivate an interest in mathematics, form aspirations related to mathematics, construct effective responses to stereotypes, and see themselves as capable mathematicians, be correlated with enhancements in their self-efficacy in mathematics.

Contextual correlations with demographic factors

The findings from this study also indicate a significant association between environmental factors (e.g., proportions of students eligible for NSLP and racially/ethnically underrepresented students, the proportion of gifted students, and school location) and the overall mathematics achievement in school. With the findings indicating that schools with a high proportion of students from low-income families, as well as Black, Hispanic, and Native American students, tend to have lower mathematics achievement compared to others. This underscores the importance of exploring how certain schools with comparable demographic profiles are associated with higher levels of student success in mathematics despite facing academic, social, and institutional challenges. For example, in-depth qualitative research into the student support strategies employed in such schools could yield insights that are correlated with the enhancement of educational outcomes for students from diverse cultural and racial backgrounds.

This underscores the importance of further research into scaling up interventions and practices that are associated with effectiveness in various locales. For example, findings from this study are associated with the idea that increased access to gifted education services correlates with smaller mathematics achievement gaps. It aligns with the finding by Young et al. (2017) with NAEP (2009) data that there is no significant difference in mathematics achievement between Black and White fourth-grade girls when Black students had similar levels of access to participate in gifted and talented programs.

Implications for practices

Previous studies (e.g., Plucker et al., 2013; Yang & Maeda, 2023) have identified the emergence of the mathematics achievement gap as early as fourth grade. This study finds a significant association between mathematics self-efficacy and mathematics achievement beginning from fourth grade. This highlights the correlation between early mathematics self-efficacy and achievement outcomes and suggests the potential benefits of supporting students’ mathematics self-efficacy during early school years in relation to achievement gaps. The change in students’ mathematics self-efficacy is associated with both internal personal and external environmental conditions, which can be variable and controllable (Van der Biji & Shortridge-Baggett, 2001). Likewise, students’ self-efficacy development in mathematics is associated with school contexts and involves interactions between students and educational elements such as effective mathematics teaching, high-quality mathematics programs, supportive school environments, and stakeholders (e.g., school leaders, mathematics teachers, and specialists) (e.g., Bobis et al., 2013; Griggs et al., 2013; Johnsen & Sheffield, 2021). Thus, fostering students’ resilient sense of self-efficacy in mathematics is associated with continued support from schools and educators at all levels, from elementary to high schools.

As the correlations observed in our study suggest, there is a potential benefit in focusing efforts toward rehumanizing mathematics education, particularly for students who are traditionally underrepresented in mathematics (Goffney et al., 2018). For example, it may be associated with positive outcomes if public schools frontload high-quality mathematics education since early childhood education (Plucker et al., 2017) to build young students’ self-efficacy in mathematics. Teachers create inclusive and culturally responsive mathematics learning environments that involve underrepresented students with high potential in meaningful and rigorous learning activities (Yang & Gentry, 2023), which are correlated with increased mastery experiences and thus may support self-efficacy. In this regard, we suggest revisiting the approach of anti-deficit achievement theory (Harper, 2010) and concentrating our conversations on closing achievement gaps by understanding how students from diverse cultural, linguistic, and socioeconomic backgrounds, are associated with high levels of self-efficacy and how this relates to navigating the systematic environmental barriers put on them. This would be more meaningful than exhaustive and overt statements of achievement gaps among student subgroups.

Limitations and future research directions

The large sample size and representative nature of the final analytic sample drawn from the NAEP dataset support the validity of the results reported in this study. However, some possible limitations are unavoidable, which may require some caution for implications and may be addressed in future studies.

First, the NAEP dataset used in this study is cross-sectional; therefore, we refrain from inferring any long-term influences of mathematics self-efficacy on mathematics achievement over time. Consequently, while our findings reveal various patterns in the association between self-efficacy and achievement, we advise researchers to employ longitudinal large-scale assessment data to more accurately discern the trends of demographic characteristics and self-efficacy in relation to mathematics achievement throughout the course of education. Additionally, qualitative research could provide valuable insights into the sources of self-efficacy among underrepresented students in mathematics.

Second, mathematics self-efficacy cannot capture all variations in mathematics achievement disparities within schools. Further research could examine the correlations between the mathematics achievement gaps among student subgroups and additional student motivation-related variables (e.g., achievement goal orientations, interest, enjoyment in mathematics, persistence, and effort expended). Future studies should also be mindful of the association of collective self-efficacyFootnote 7 at the school level. There is a trend that students are engaged in cooperative learning mathematics projects and teacher designed mathematics curricula through collaborative work (Schunk & DiBenedetto, 2021). Therefore, examining how collective self-efficacy is related to students’ mathematics achievement could shed light on promoting students’ self-efficacy and mathematics achievement. In addition, research has indicated that the teachers’ collective self-efficacyFootnote 8 in teaching is significantly associated with academic performance and the educational environment of students, even after controlling for previous student accomplishments and crucial demographic variables, such as socioeconomic status (e.g., Bandura, 1993). Schools that achieve success are identified by teachers’ shared beliefs in the capabilities of their staff to assist students in their growth and learning (Klassen et al., 2011). Therefore, investigating the factors contributing to mathematics teachers' collective self-efficacy, especially in schools with scarce resources and a high proportion of underrepresented students, and how this is associated with their students’ mathematics abilities, could contribute valuable insights.

Finally, mathematics performance disparities are associated with more than just the demographic composition of schools. Findings from other NAEP studies have shown that contextual factors, such as teachers’ professional development (Havard et al., 2018), classroom instruction (Lubienski, 2006), and teacher resources (Lee & Reeves, 2012), are also associated with students’ self-efficacy and mathematics achievement. Future researchers may consider including a broader spectrum of school-level variables to understand these contextual correlations further. Moreover, we also encourage future researchers to include some equity-related variables in mathematic learning as part of contextual variables. In this study, we considered the variable representing students’ access to gifted programs, acknowledging that such variables need further careful operationalization and modeling to enhance their interpretative validity.

Conclusion

The purpose of this study was to explore the association between mathematics self-efficacy and mathematics achievement disparities. As an exploratory study, it sheds light on how enhancing mathematics self-efficacy is related to the achievement gaps among student subgroups. We found a significant correlation between mathematics self-efficacy and U.S. students’ mathematics achievement, particularly noting a larger association in Grade 8. It provides empirical evidence for supporting the asset-oriented approach to consider students’ mathematics self-efficacy as a potentially influential and malleable factor in relation to achievement gaps.