1 Introduction

The term self-efficacy (SE) refers to individual’s confidence they can carry out future tasks and challenges (Bandura, 1997 p. 3). Previous meta-analyses have demonstrated the importance of academic SE in terms of its relationship with performance (e.g., Honicke & Broadbent, 2016) and demonstrated the contribution of SE as a predictor, and not only as an outcome, of performance (e.g., Talsma et al., 2018). Students’ mathematics self-efficacy (MSE) is important because it is related to motivation for learning and studying mathematics, effort and perseverance, performance, and future career choices (see, e.g., Martin et al., 2015; Pajares & Graham, 1999).

Within education SE is studied in numerous domains, e.g., reading, language, music, physical activity, and, of course, mathematics. Investigations have targeted students’, teachers’, and parents’ SE. More than a decade ago, Klassen and Usher (2010) provided a review of the status of research on SE in education, and a review by Pantziara (2016) on the state-of-the-art of research on SE within the field of mathematics education is now 7 years old. Furthermore, the field of mathematics education research is dynamic, with a large increase in the number of studies focusing on emotions in mathematics education research 2014–2021 (Schukajlow et al., 2023). Considering this backdrop, gaining oversight of the various approaches to investigating this important construct as it relates to the field of mathematics education is valuable.

In this review study we took a scoping approach (Munn et al., 2018) to provide a snapshot of research on students’ MSE conducted within the last 5 years. Scoping reviews are appropriate tools to report on the types of available evidence of a given field as well as the way the research has been conducted (Munn et al., 2018, p. 2). Having knowledge about the kinds of research currently conducted is important to understand the direction research on students’ SE has taken recently particularly within the field of mathematics education, considering both substantive themes and methodological approaches to investigating these themes. This might inform the field of mathematics education research whether there are important gaps in terms of research we might have a need for in the future. There is a relationship between the substantive and the methodological, where methodological advances can promote new research designs and advance our knowledge, and unresolved issues in current research can promote the use of new methods for approaching these (Marsh & Hau, 2007). To achieve our aims, we posed the following research question: Which substantive foci and methodological approaches have been used in recent (2018–2022) studies of MSE?

2 Theoretical background

The construct of SE was coined by Albert Bandura, whose conceptualization of the construct has remained the most influential. According to Bandura, SE is a multidimensional construct related to an individual’s confidence he/she can accomplish specific tasks or challenges in the future (Bandura, 1997 p. 42). Within mathematics one can consider SE at the domain-level, or for specific mathematical areas or tasks. This gives a structure of SE with specific areas, in learning domains, nested within fields. Another way of distinguishing SE beliefs is whether these are specific for an individual learner or of situations, which is also related to the stability of the construct.

Figure 1 provides a model of students’ MSE, including its’ relationship to predictor and outcome variables. In the following sections we will build on this model to provide a brief overview of the key theoretical and methodological considerations in relation to the construct of SE, as it relates to the field of mathematics education.

Fig. 1
figure 1

Model of students’ MSE

2.1 Conceptualization of SE

Bandura emphasized SE as a dynamic and multidimensional form of human agency (Bandura, 1997), which can change across time and situations, that is (1) in terms of the content or task in question, (2) the level of difficulty of that task, and (3) the strength of confidence students have in their ability to carry the task out. Although Bandura highlighted multidimensionality as a key conceptual feature of SE and provided recommendations for how to design measures that incorporated all three dimensions, the single dimension of SE strength has largely dominated education research (Street et al., 2017) to the exclusion of other critical dimensions. To also investigate the role of the content specificity (e.g., what is the mathematical object or task in focus) and the level of difficulty of the content (e.g., is this task easy or hard for me) is valuable in terms of gaining detailed knowledge regarding students’ MSE across different learning situations, and how best to support students’ SE in classroom situations. Another way to describe MSE is through the characteristics listed by Schukajlow et al. (2023), where the construct of SE is of positive valence, and its temporal stability is likely to vary according to the assigned object-specificity (with higher stability for more general SE constructs).

As a type of self-belief, SE is related to but also different from other self-beliefs such as self-esteem, self-worth, outcome expectancies, and self-concept (see, e.g., discussions by Bong & Skaalvik, 2003 and Klee et al., 2022, for important distinctions between various self-belief constructs). The key distinctive features of SE are its specificity (it should be conceptualized at the domain level or more specific), its future orientation (it should be related to specific tasks or challenges in the future, e.g., a test or learning situation), its mastery orientation (including a goal focus rather than comparative focus), and its relation to individuals’ confidence they can carry out tasks or accomplish challenges (“can” rather than “will” statements) (e.g., Bong & Skaalvik, 2003). In a previous review as much as 50% of the assessed research on SE was a-theoretical, that is, the studies did not follow key recommendations to measuring SE (Klassen & Usher, 2010). A more recent review by Diego-Mantecón et al. (2019) on conceptualization issues in mathematics education research indicated this problem is relevant within this field of research, and still prevails. One of the most common mix-ups is likely with the related construct of self-concept. Despite the similarities, repeated studies have concluded there are important distinctions between these two constructs (see, e.g., Klee et al., 2022; Marsh et al., 2019) and advocated for the careful consideration of theory when constructing their respective scales. While past studies have demonstrated both constructs are relevant and important to consider for the field of education, it is important to observe to what extent clarity of conceptualization and operationalization of SE prevails in the field of mathematics education in relation to other constructs which are often conflated with SE.

2.2 The relationship between SE and other constructs

SE can be studied as an independent or dependent variable, or both (see Fig. 1). Initially, many studies on SE have been concerned with establishing the importance of the construct, i.e., what is the effect of SE on other variables, e.g., Pajares and Graham (1999). This could include the effects of SE on outcomes (SE as a predictor), whether the effects of one variable on another is transmitted through SE (SE as mediator), and whether the effects of one variable on another differ according to students’ SE (SE as moderator). As the construct has become established, more studies have focused on how other variables might affect students’ SE (SE as an outcome, e.g., Pampaka et al., 2011). Given the theoretically proposed reciprocal determinism of Bandura’s social cognitive theory (1997), which postulates a continuous reciprocal interaction between behavior, cognition, and environment, studies have also sought to determine what is the dominant direction of the relationship between SE and performance. Previous research has investigated SE in relation to, e.g., gender, school grade (/age), or cognitive constructs such as grades or test performance (Klassen & Usher, 2010) showing that SE both orients students to engagement, and is itself modified by the outcomes of engagement.

2.2.1 SE as an outcome

SE is proposed to change according to the most recent influential experiences – that is, individuals’ cognitive appraisals of their learning and performance experiences (Bandura, 1997). According to Bandura, there are four key sources to students’ SE beliefs. These are mastery experiences (i.e., individuals’ own experiences of success or failure), vicarious experiences (i.e., the experiences of success or failure or relevant others, e.g., friends or classmates), social persuasion (i.e., verbal or non-verbal encouragement), and physiological states (i.e., bodily experiences of pulse, sweat, arousal, serenity, etc.). Importantly, the sources do not influence SE directly but are cognitively appraised, or interpreted, by the individual, such that the same experience might lead to highly different outcomes in different students. This highlights the role of students’ wider self-beliefs, e.g., their ability beliefs (e.g., Dweck, 2008) and emotions (Mega et al., 2014), as potential sources of SE or moderators of the relationship between sources of SE and SE. A meta-analysis by Byars-Winston et al. (2017) confirmed mastery experiences as the most influential SE changing source, while the relative effect of the other sources appears to differ across participants and contexts (e.g., Usher & Pajares, 2008).

2.2.2 SE as a predictor

Numerous studies have demonstrated the importance of SE, in terms of positive effects on effort, perseverance, motivation, self-regulation, (reduced) anxiety, and performance experiences (Zimmerman, 2000). SE can influence performance directly through increased effort or perseverance, or indirectly through affecting motivations or learning behaviours that in turn, lead to higher competence. SE is generally considered an affective or motivational construct and is also associated with other such constructs. Stronger SE predicts higher intrinsic motivation, effort, and perseverance, as individuals have stronger confidence in their own ability to bring about desired outcomes. In other words, it is worth trying, when you believe you have the potential to change an outcome. In contrast, a weak sense of SE is associated with low sense of agency or control. For students with experiences of failure in mathematics, a low sense of SE might mean they do not perceive that they can bring about a more positive outcome in the future, which might in turn be associated with mathematics anxiety (see, e.g., Dowker et al., 2016).

2.2.3 Reciprocal relationships

There is a theoretically proposed mutual influence between people’s behaviors and cognition and their environment (reciprocal determinism, see, e.g., Bandura, 1997). The nature of this reciprocal relationship has often been considered in terms of the relationship between students’ SE and their performance, where on the one hand SE can influence performance experiences, while on the other appraised performance (mastery) experiences are considered a key source of SE. Investigations have focused both on (a) to what degree is SE a reflection of, versus a contributor to, performance, (b) what is the dominant direction of effect, and (c) is the reciprocal relationship constant or does it change over time (e.g., Weidinger et al., 2018). A meta-analysis (Talsma et al., 2018) found support for the theoretically proposed reciprocal relationship between SE and performance, where the dominant direction of effect was from performance to SE. The relationships varied according to participant age, but only two of the studies included school-age children. Furthermore, the studies focused on a variety of domains (including mathematics).

2.3 Change in SE over time

Capturing the minute mechanisms of SE change (i.e., a degree of temporal stability) is challenging, considering the multiple potential influencing sources as well as the role of individual students’ appraisals of these. Studies have documented domain-specific declines in, e.g., students’ motivation and achievement (Gottfried et al., 2007) and perceived competence and task values (Jacobs et al., 2002) with increasing age, and investigating changes in students’ MSE is important. From a quantitative perspective, at least two measurement points are required to study change (e.g., increased strength of SE between time points T1 and T2), while a minimum of three are needed to detect non-linear change (e.g., a steep rise in SE between T1 and T2, followed by a flattening trend between T2 and T3). Appropriate time-lags (the time between occasions) should be considered depending on whether one investigates developmental change or transitions (in years), sequences of learning in class (days or weeks), or processes during a task (minutes or hours) (McNeish & Hamaker, 2020; for an overview see Fig. 1 in Street et al., 2022). Longitudinal designs are more costly, which is probably why crossectional designs dominate the field (Pantziara, 2016).

2.4 Situational specificity of SE

While change in SE can be investigated at the situational versus the long-term perspective, differential effects of situational, personal, or group factors on students’ MSE can also be investigated. Situational factors might change from one lesson to the next (e.g., the task at hand, quality of feedback), individual factors (e.g., motivation), while group factors could be considered at the classroom, school, district, or country level (e.g., the teacher of a class, the socio-economic make-up of a school). Multi-level analyses are important when students are nested within classrooms or schools, to partition variance into within and between-level variance, adjusting standard errors appropriately. Importantly, if a study includes nested data (e.g., situations nested within students or students nested within classrooms/schools), failing to account for the hierarchical structure of the data could lead to Type I errors which could give rise to atomistic fallacies (i.e., wrongly attributing the “cause” to individuals) or ecological fallacies (i.e., wrongly attributing the "cause’’ to group-level characteristics) (Hox et al., 2017).

Methodological advances mean there is more data available and there are also a range of approaches for investigating this large-scale data. Within psychology there have been repeated calls that researchers should consider the principle of ergodicity (Molenaar, 2004), when investigating learning and development processes. In short, we cannot make inferences about development (change over time) based on cross-sectional findings. Theories that have been developed on the basis of crossectional designs might not fit individuals, i.e., they may be inadequate for describing or understanding the development of individual students’ learning. As one important focus of the field of mathematics education is to provide knowledge for the field of practice, e.g., teachers in the classroom, this is an important point to consider. In short, it is important to consider individual variation within a group, and nestedness in terms of, e.g., time-points within individuals within classrooms within schools.

2.5 Other methodological considerations

Depending on the aim of a study many study designs and analytical approaches are possible and meaningful. Study designs can be conceptualized in terms of the timeline of the study (e.g., cross-sectional/longitudinal), whether the aim is to describe students’ SE or to change it (e.g., observational/experimental), or to what degree the study considers nestedness (e.g., time-points in students in classrooms in schools in countries). Further methodological choices related to the design of a study are the participants included (how many? how old? from where?). Previous studies (e.g., Ahn et al., 2016; Klassen, 2004) have demonstrated cultural differences in students’ SE beliefs, in terms of both strength of SE and its relationship with performance outcomes. There is a need for research from different cultures to represent a variety of perspectives and participants, as different cultural contexts may bring different assumptions and social mechanisms, relevant to the study of MSE. Other methodological choices are related to the analytical approaches applied (quantitative/qualitative/mixed methods), and the data sources included (e.g., questionnaires/performance indicators/interviews/observations). Furthermore, when performance indicators are considered in relation to SE, the measures can be more or less congruent in specificity (e.g., task-specific SE and task-specific mathematics test performance/task specific SE and domain-specific mathematics test performance) as well as in relation to the timing of measurement (e.g., SE as a predictor of performance should be measured prior to, not after, the performance test) (Pajares & Miller, 1995).

Considering the large number of studies including a focus on emotions within mathematics education, together with the highly interrelated relationships between SE and student outcomes such as engagement and performance and a large variety of available methodological approaches to investigate these relationships, there is a need to gain oversight over the types of available evidence on students’ SE in mathematics, how research is conducted in this field, and identify potential knowledge gaps. Taking a systematic approach, we conducted a scoping review (Munn et al., 2018) to investigate the kinds of substantive foci and methodological approaches that have been used to investigate students’ MSE in the recent 5-year period.

3 Method

To conduct our scoping review, we developed a coding scheme and coded a selection of publications in which SE was included in the title to get a snapshot of studies in the last 5years (2018–2022). As a precursor of a systematic review or meta-analysis we followed the PRISMA guidelines for systematic (scoping) reviews of a given field (Tricco et al., 2018) and guidelines of Munn et al. (2018): to identify types of available evidence, clarify key concepts and dimensions, examine how research is conducted, identify key characteristics or factors related to a concept, and identify and analyze knowledge gaps.

3.1 Search process and eligibility criteria

We conducted a systematic search (search #1) on March 7th 2023, including the search terms self-efficac* AND math* in the title, using the Web of Science Core collection of research titles. Furthermore, we limited our search to English language, peer-reviewed, empirical reports, published from Jan 2018 to March 2023. This gave 182 hits. In addition, on June 2nd 2023, we included the search term self-efficac* in the title in six key mathematics education journals (Journal for Research in Mathematics Education, Educational Studies in Mathematics, International Journal of Science and Mathematics Education, Journal of Mathematics Teacher Education, Mathematical thinking and learning, and ZDM Mathematics Education), the other search terms and limitations being equal (search #2). Authors are perhaps less likely to include “mathematics” in the title in such journals, and we did not want to risk research in the field of mathematics education to be underrepresented. Search #2 resulted in 29 hits, of which 13 were duplicates from search #1, rendering 16 unique hits. The abstracts of the 198 hits from search #1 and #2 were screened for inclusion and exclusion criteria (see Table 1), where 108 studies were excluded. 90 hits were kept for full text screening, considering full text eligibility and (the same) inclusion and exclusion criteria. A further 41 studies were excluded, giving 49 studies that were included in our review and coded (see Fig. 2).

Table 1 Inclusion and exclusion criteria
Fig. 2
figure 2

Article selection process

In terms of inclusion/exclusion criteria number 2 (conceptualization of SE), we followed previous reviews (e.g., Klassen & Usher, 2010), considering the SE measure as a whole. That is, we considered whether most items (e.g., three out of four) of a measure were congruent with theory and inclusion criteria 2a–2d (see Table 1). The response options (scale and anchors) of the measure were also included in this consideration, where the response option and the statement together should reflect the participant’s degree of confidence of their ability to carry out a certain task in the future (versus, e.g., participants’ intentions to do things; their expectancies of outcomes not contingent on their own actions; how often they believe things will occur; or their assessment of how good they are/how much better than others).

A large number of studies focusing on undergraduates’ (36) and teachers’ (42) SE were excluded from the 198 initially identified studies (see Fig. 2). Of the 90 records included for full text screening, 26 were excluded due to inadequate/inconsistent conceptualization (inclusion/ exclusion criteria 2a-2d; see Fig. 2). That is, among the studies we examined more carefully 29% were excluded because their definition or operationalization of SE did not align with theoretical tenets.

3.2 Coding process

To code the included studies, we used a coding manual including 13 categories (see supplementary materials). Following our model (see Fig. 1), the first author developed a draft for the code, which was then discussed, adjusted, and agreed upon by all authors. Two coders (the first and second author) trial coded five papers, followed by a discussion and clarifications to the coding manual. The two coders then proceeded to independently code 27 and 22 (respectively) of the remaining 44 studies each, including an overlap on 5 articles (= 10% of the studies). Inter-rater reliability for these five articles indicated substantial agreement, with Cohen’s Kappa ranging from 0.46 to 0.96, and an average of 0.71, while exact match ranged from 0.77–0.97).

4 Results

49 studies were included and coded for our review. An overview of the results (number and percentage of studies assigned to each code) is provided in Table 2. In the following sections we describe the coding results and provide examples.

Table 2 Results

4.1 Substantive foci of studies

4.1.1 Conceptualization of SE

Considering the multidimensional definition of SE, all studies included a focus on SE strength (students’ degree of confidence), while only a few focused on variation of SE either as a function of the mathematical content (five studies = 10%) or the level of difficulty of the tasks (five studies = 10%). Only two studies included a focus on all three dimensions of SE (i.e., level of difficulty, strength of confidence, and specificity of the mathematical object or content area). Liu et al. (2020) investigated students’ domain- and task-specific problem-posing SE and performance for different levels of difficulty problems, while Street et al. (2022) investigated starting point and change in students’ SE across a series of lessons according to perceived task difficulty and mathematical content area.

Most commonly, the studies focused on students’ MSE at the domain- level (55%), for example the relationship between students’ perceived responsibility for learning, MSE and sources of MSE (Lau et al., 2018). A relatively large percentage of studies included students’ SE for a mixture of mathematical topics (29%)—several of these studies used available PISA data (e.g., Soland, 2019), which includes different topic areas of mathematics in the SE measure. A smaller number of studies included a focus on students’ SE for specific topic areas, including geometry (five studies), functions (four studies), arithmetic (three studies), algebra (two studies), or combinatorics/probability (two studies). In addition, five studies focused on students’ SE for other objects of mathematics, including programming/coding (Jiang et al., 2022), mathematical modelling (Krawitz & Schukajlow, 2018), and problem posing (Q. Liu et al., 2020).

4.1.2 The relationship between SE and other variables

Across the studies, SE was investigated in terms of its role in predicting and affecting other outcomes (41%), such as mathematics achievement (Hiller et al., 2022), mathematics anxiety (Du et al., 2021), or mathematical creative thinking (Rahyuningsih et al., 2022). Studies also investigated SE as a mediator of the relationship between other variables (31%), such as the relationship between teacher-student relationships and students’ problem solving ability (Zhou et al., 2020), the relationship between implicit theories of intelligence and mathematics career interest (Huang et al., 2019), or the effects of perceived parental support on students’ engagement in mathematics (Sağkal & Sönmez, 2022). Only two studies investigated the role of SE as a moderator on the relationship between other variables. Schukajlow et al. (2019) investigated whether prior SE moderated the effect of developing multiple solutions on students’ MSE and perceived competence, while Cai et al. (2019) investigated whether there were differential effects of the use of tablets on students’ conceptions and approaches to learning mathematics according to their SE.

Furthermore, studies investigated SE as an outcome variable (35%), such as the role of cognitive activation on students’ MSE (Liu et al., 2022), how sources of SE raised or lowered students’ MSE in a rural, high poverty, area (Usher et al., 2019), or the role of perfectionism on how students perceive information from sources of SE (Ford et al., 2023).

Six studies (12%) investigated reciprocal effects between SE and other variables. Five of these studies included a focus on performance, where, e.g., Arens et al. (2022) compared the constructs of SE and self-concept, in terms of their reciprocal relationships with performance measures (achievement test and school grades), Grigg et al. (2018) investigated reciprocal relationships between math interest, intentions, SE, and performance, and Du et al. (2021) investigated the reciprocal relationships between students’ mathematics interest, anxiety, SE, and achievement.

24% of the studies treated SE as a correlate, including, e.g., the association between mathematical literacy and visual math literacy SE (Katranci & Şengül, 2019), and the relationship between parents’ and children’s MSE and emotional arousal to mathematics (Bartley & Ingram, 2018).

A range of variables were included as key foci in the studies assessed. Many studies (67%) focused on some type of affective variable, including interest (Grigg et al., 2018), self-regulation (Tian et al., 2018), and emotional arousal (Bartley & Ingram, 2018), not counting anxiety. As the number of studies including a focus on anxiety turned out to be large, we coded this affective variable separately, and found that 22% of studies included a focus on this (e.g., Du et al., 2021). As could be expected, many studies included a focus on performance, through either test performance (63%), assigned grades (12%), or both, (e.g., Arens et al., 2022). Furthermore, many studies included a focus on gender (33%, e.g., Huang et al., 2019), while a smaller number of studies included a focus on sources of SE (16%, e.g., Usher et al., 2019). Other foci in the studies included student behaviour variables (three studies = 6%), e.g., student response times (Soland, 2019), teacher variables (one study), i.e., teacher behaviour, SE, and knowledge (Kaskens et al., 2020), and other contextual variables (18%), such as the role of the educational center (Zamora-Araya et al., 2022). No studies emerged that focused on collective SE, although the search procedure as well as inclusion–exclusion criteria would have enabled this.

4.1.3 Change in SE over time

A majority of the studies (67%) were crossectional in design, where data came from a single time of measurement, e.g., using data from PISA (Borgonovi & Pokropek, 2019). A total of 33% of studies had some type of longitudinal design: 20% of the studies included two measurement occasions (e.g., Rakoczy et al., 2019), while six studies (12%) included three or more measurement occasions (e.g., Street et al., 2022). 13 studies (27%) included long gaps in between each measurement occasion (i.e., weeks, months, or years apart), focusing, e.g., on the effect of a three-week intervention (Rakoczy et al., 2019) or the reciprocal relationship between SE and performance across the space of three years (Arens et al., 2022). Only three studies (6%) were micro-longitudinal in design, that is, measuring SE across learning situations (i.e., minutes or days apart), investigating e.g., the starting point and change of students’ SE across a sequence of lessons in mathematics (Street et al., 2022). This micro-longitudinal code includes one case study where the measurement points were not specified in time, i.e., Leong (2021), who investigated how the SE of a single year-nine student in Singapore changed from moment to moment and across different tasks during a lesson in mathematics.

Five of the longitudinal studies applied an experimental design (10% of all studies). Examples are the impact of instruction based on dynamic versus static visualization on students’ SE and problem solving in real-time as well as three months later (Kohen et al., 2022), and the effect of prompting students to construct multiple solutions on their perceived competence and SE (Schukajlow et al., 2019). All other studies were observational (90%).

4.1.4 Situational specificity of SE

A majority of the studies (82%) focused on a single level of analysis (including case studies), while 18% of the studies investigated also effects of nestedness (e.g., the effects of variations across time-points, teachers, schools, or districts). Examples are the effects of teacher- (i.e., teacher behaviour, SE, and knowledge) and student-level (i.e., MSE, self-concept, and anxiety) variables on students’ arithmetic fluency and problem-solving (Kaskens et al., 2020), and the moderating effect of the educational center on the relationship between students’ MSE and performance in Costa Rica (Zamora-Araya et al., 2022). Importantly, we focused in our coding manual on whether studies included a substantial focus of, e.g., school effects, on students’ MSE. Several additional studies accounted for the nestedness of their data by applying multilevel analytical techniques but did not report the estimates for the group-level variation (as this was not a focus or aim of the study). We did not code whether studies should have accounted for nestedness (but failed to do so), e.g., if students were grouped within different classrooms but the shared variance of the classroom was not accounted for.

4.2 Methodological approaches of studies

4.2.1 Participants

Many different countries (21) were represented in the studies, in terms of the country of the participants. The largest number of studies (22%) included Chinese students, followed by USA (14%), Germany (12%), and Turkey (8%). One study included participants from four countries (Pepper et al., 2018), and three studies included data across (unspecified) OECD countries (e.g., Borgonovi & Pokropek, 2019).

Most of the studies included secondary school participants (86%), while 18% of studies included participants from elementary levels of education (two studies included both). Of the studies that included elementary students, one study included participants from 3rd grade (Lau et al., 2018), while the other studies included participants from grade four and up.

Many studies (39%) included a large number of participants, i.e., more than 1,000 students. Among these, four studies included between 158,000 and 605,000 students. These numbers were largely enabled through large-scale assessment data such as PISA or national repositories. One of these large-scale studies included data from Chinese students (Zhang & Wang, 2020) while the other three studies included data from students in the OECD (e.g., Borgonovi & Pokropek, 2019).

4.2.2 Analytical methods

A large majority of the studies (88%) took a quantitative approach to investigating MSE, including quantitative (mainly questionnaire) data and using statistical methods for analyses (including Maximum Likelihood and Bayesian approaches). Four studies (8%) collected both quantitative and qualitative data (i.e., mixed methods), where, e.g., Gao (2020) used a survey followed by Q-sorting procedure to untangle the differential effects of exposure and importance of different sources of MSE. Two studies included qualitative data alone (4%). These included using cognitive interviews to validate the PISA measure of SE (Pepper et al., 2018) and investigating switches in SE for a single student across a lesson in mathematics (Leong, 2021).

4.2.3 Data sources

All but three studies included questionnaires (94%) as a source of data, and more than half of the studies (55%) included a type of performance measure. In addition, many studies (29%) included secondary data—included here were all the very-large-scale studies mentioned above. In addition to these methods, three studies included interviews (6%) and one single study included observation data (Leong, 2021). Usher et al. (2019) drew on multiple data sources to demonstrate both general trends (structured questionnaire data to investigate sources of SE) and details of these trends (open-ended questions to investigate the importance of the sources).

4.2.4 Congruence of measures

For the studies that investigated the relationship between SE and a domain of functioning or performance variable (31 studies = 63%), we investigated the congruence or alignment between the SE measure and the task/performance measure, focusing on whether the measures corresponded in terms of content and degree of specificity. Slightly more than half of the studies were deemed congruent in this regard (58%). In Borgonovi and Pokropek (2019) students were asked about their exposure to specific types of tasks, e.g., “Solving an equation like 3x + 5 = 17.”, and for SE “How confident do you feel about having to do the following mathematics tasks … 3x + 5 = 17”. This illustrates an exact match between the exposure to a certain type of task and the SE measure of that type of task.

Similarly, we looked at the order of measurement, and whether this was aligned with the treatment of SE in the study (whether SE was treated as a predictor or an outcome). 74% of the relevant studies were aligned with theoretical recommendations in this regard (e.g., performance measured prior to SE in a study focusing on SE as an outcome), while 13% were mis-aligned. Four studies (13%) did not provide sufficient detail to determine the order of measurement.

5 Discussion

In order to review the state of SE research in mathematics education, we screened what substantive foci have been posed, and which methodological approaches have been used in recent (2018–2022) studies of MSE. Herein we identify and analyze knowledge gaps, which could serve as basis for systematic reviews or meta-analyses, and propose future directions on research in MSE.

5.1 Substantive foci

In terms of substantive foci, the reviewed studies included a clear focus on the domain of mathematics. Some studies also addressed a specific topic of mathematics (e.g., geometry or arithmetic), a specific competency (e.g., modelling), or other objects (e.g., programming). Because of the small number of studies on SE regarding specific objects, our knowledge of, for example, geometry SE or problem-solving SE remains limited. Very few studies compared SE for different objects (e.g., SE for solving modelling problems on linear functions and on the Pythagorean theorem in Krawitz & Schukajlow, 2018). Thus, we call for more research that focus on such comparisons in the future, to get deeper insights into the mathematics-specific sources of SE.

As most studies investigated strength of SE, few studies investigated the role of perceived difficulty on students’ SE (e.g., differences in the starting point and change of students’ SE as a function of the perceived level of difficulty of the object in Street et al., 2022). Future studies should expand our understanding of the role of MSE as a multidimensional construct, through investigating the role of both objective and subjective task difficulty on students’ SE change and development.

We found no studies investigating students’ collective MSE, which may indicate this type of research is conducted with a focus on teachers, rather than on students. Future studies should contribute to theory development by collecting evidence about students’ collective MSE.

Students’ (individual) MSE was defined as predictor, outcome, both predictor and outcome in reciprocal effects, and as mediator and moderator between other variables. While there has been a long-standing debate regarding the relative effects of SE and performance experiences in their theoretically proposed reciprocal relationship (see, e.g., Arens et al., 2022), SE is now often included as a learning outcome, indicating that the importance of students’ SE is widely accepted, beyond its’ influence on performance. Although Bandura (1997) defined physiological states as a source of SE we found no studies which incorporated biophysiological states as predictors. Future studies could incorporate measures of, e.g., heartrate or electrodermal activity as indices of stress or arousal, in additional to self-reported anxiety and emotions (Martin et al., 2023). Many studies included SE as a mediator, in order to analyze how effects of predictors (e.g., emotions or strategies) on learning outcomes (e.g., performance) can be transmitted via students’ SE. In the future, we call for more studies in which SE is included as a moderator, where the effects of predictors on learning outcomes can differ according to students’ SE, as shown by Cai et al. (2019). Thus, including SE as a moderator can bring additional evidence about the role of SE in mathematics learning.

While many studies included a focus on students’ mathematics performance, a larger proportion of studies included a focus on affective/motivational variables, as per the affective turn in mathematics education research described by Schukajlow et al. (2023). Consistent with Bandura’s theory on sources of SE, emotions being a key potential source, the associations between SE, affective, and motivational variables could be suitable for future meta-analyses. Given the prevalence of anxiety among the emotional correlates of SE, the type of emotional construct could be considered as a moderator of MSE in future studies. Other suitable moderators for meta-analyses, and constructs that should be addressed in future studies, are gender, sources of SE, and students’ appraisals or interpretations of SE sources.

Longer term longitudinal studies enabled investigations into broader changes in SE over time, while a smaller number of micro-longitudinal studies focused on the nature and mechanisms of changes of SE in mathematics classrooms. Qualitative data and analytical approaches enabled investigations into substantive foci such as different students’ interpretations and weighting of different sources of information, and shifts in perspective across events. Expanding our knowledge on intraindividual changes in students’ MSE within and across lessons in mathematics, e.g., through applying mixed methods and within (micro)longitudinal designs, would be valuable in future studies. Given the importance of development of SE over time for theoretical and practical implications, we also call for more intervention studies, which allow collecting indications on theoretically proposed mechanisms of change and sources of SE within ecological classroom settings.

Multilevel studies enabled a focus on the impact of contextual factors (e.g., migrant background or school characteristics) on students’ SE, while only few studies focused on student behaviour or teacher variables. Furthermore, while there is a growing body of research demonstrating the crucial role of teacher-student interactions on students’ development (Vandenbroucke et al., 2018), only one study in our review included classroom observation data enabling such a focus. Investigating the role of teacher-student interactions on changes in students’ SE beliefs is a promising avenue for future research. Future studies on MSE could focus on the role of teachers’ involvement with students, their emotional and instructional support to students, and the quality of teacher-student interactions.

5.2 Methodological approaches

In terms of methodological approaches, studies using quantitative data and analytical approaches dominated in our review, along with surveys and questionnaires as data collection tools. This mirrors findings of a previous review of research on SE in education more broadly (Klassen & Usher, 2010). As qualitative approaches tend to dominate within mathematics education research (Schukajlow et al., 2018), this might indicate that the bulk of research on MSE is being conducted by researchers outside the field of mathematics education. The relatively few mixed method and qualitative inquiries included demonstrated the value of methodological diversity in terms of expanding our understanding of the complex interplay between students’ SE and their appraisals of their experiences and wider contextual factors. Large-scale surveys are not well suited to capture, e.g., the process of SE change including situational, personal, temporal, and social conditions together with students’ individual cognitive appraisals (Usher & Pajares, 2008, p. 784). The study by Usher et al. (2019) is one example of how mixed methods and qualitative in-depth recording of students’ experiences can be used to fruitfully investigate the complex and unstable construct of MSE. Thus, we encourage more mixed methods and qualitative research, and we encourage mathematics education researchers to (keep) contributing to research on SE as it pertains to mathematics, to ensure the inclusion of mathematics education perspectives when developing knowledge on this important construct.

The inclusion of MSE in international databases (i.e., PISA) enables broad generalization, while in-depth qualitative studies enable minute situation-specificity. Sample sizes in our review ranged from 1 to more than 600,000, with many relatively larger (> 1000 participants) studies. This demonstrates a distinct increase in participants from a previous review (Klassen & Usher, 2010). Large-scale cross-sectional and longitudinal studies in hierarchically nested designs (i.e., students in classrooms) in a range of countries (e.g., Spain, China, Germany) enable us to draw conclusions at both individual, classroom and school levels. The largest single contributing country was China, followed by the US, reflecting a shift from a previous “western focus” in education research (Klassen & Usher, 2010). Maintaining diversity is important in terms of testing cross-cultural applicability of theories and deriving knowledge pertaining to mathematics classroom practice across the world.

More studies in our review included secondary than elementary students, and no participants younger than grade three were included, possibly mirroring the dominance of questionnaires as data source among the studies in our review (as well as the methodological and ethical challenges associated with research on younger children). We concur with Schukajlow et al. (2023) who suggested there is a need for studies that include children, in particular elementary age. One example is the need to untangle the effects of age and domain-specificity on the SE- performance relationship.

A prevailing concern in SE research, is the issue of operationalization of the construct. While Klassen and Usher (2010) concluded that more than half of the studies in their review suffered from shortcomings in measurement, we excluded slightly fewer of the identified studies from our search due to issues with conceptualization and operationalization of SE. Furthermore, many of the included studies in our review were coded as lacking in congruence of measures, meaning there was a mismatch between the measure of SE and the domain of functioning. Also, while there is a convergence of measurement (e.g., item-selection) particularly in large-scale international comparisons, we need to continue to be aware of cultural and sub-group differences in respondents’ understanding of items (Pepper et al., 2018). We reiterate recommendations by, e.g., Klassen and Usher (2010) and Diego-Mantecón et al. (2019), that researchers pay close attention to available guidelines for theoretically based approaches to conceptualization and measurement of the construct of interest.

Through our work with this review, we identified many studies of potential interest to researchers within mathematics education. We present a selection of these within our references (marked **), as illustrations of what we consider the state-of-the-art in current research on MSE. The special consideration articles highlight in different ways either best-practice contributions or innovative approaches to bring the field forward, in substantive and/or methodological terms. We summarise briefly: Many studies have been concerned with investigating reciprocal effects between SE and performance, and Arens et al. (2022) illustrated a best-practice approach to inform this long-standing issue. While most studies investigated MSE at the domain level or for a mixture of topics, Krawitz and Schukajlow (2018) demonstrated the value of focusing on students’ MSE for specific topics and types of problems and comparing these in the same study. One trend in our review was the many studies with a large number of participants, enabled through large-scale assessment data such as PISA. Soland (2019) found that item response times on very difficult items in PISA was strongly correlated with students’ SE, demonstrating a promising avenue for measuring students’ SE more directly. A key issue for the future is to develop new methodologies to capture change in a dynamic construct that interacts with situational and contextual factors. Two studies (Gao, 2020; Usher et al., 2019) illustrated innovative approaches to untangling students’ differential appraisals of SE sources, and Leong (2021) took a case-study approach to investigating changes in MSE from moment to moment and across different tasks during a single lesson in mathematics.

5.3 Limitations

There are limitations to our study. One is our search strategy, i.e., the sampling of the study. We searched for papers that included our search terms in the title only, meaning we knowingly did not target the large number of studies that focus on students’ MSE, but did not include the terms in their title. We included only research conducted in English, meaning research conducted in non-Western societies may not be appropriately represented.

6 Conclusion

Many trends uncovered in our review are promising, in terms of a large variety of thematic foci, coupled with novel (e.g., questionnaire followed by q-sorting procedure) and advanced (e.g., multilevel structural equation models) methodological approaches enabling new insights. As methodological advancement is rapid, new methods keep giving scope for phrasing new research questions. In the future we are likely to see further methodological-substantive synergies in the field of MSE. At the same time, we see the potential and need for future studies, such as to continue the focus on MSE as a multidimensional and dynamic concept, i.e., considering factors such as mathematical object specificity and (perceived) task demand, as well as designs that enable investigations into the nature and mechanisms of MSE changes over time. Qualitative or mixed-methods studies from the field of mathematics education focusing on, e.g., unpacking student appraisals of their experiences with mathematics, could constitute major future contributions to our understanding of students’ MSE. Careful consideration of the theoretical background of the construct of MSE in relation to the aims of each individual study continues to be important in order to bring the field forward.