Introduction

With the continuous rise of science, technology, engineering, and mathematics (STEM) education in global basic education, STEM teachers’ self-efficacy has become an academic research focus. As the main promoters of STEM education, elementary and high school teachers’ self-efficacy is crucial for high teaching quality and student learning outcomes. Therefore, this study aimed to explore STEM education’s impact on elementary and high school teachers’ self-efficacy to provide new perspectives and ideas for educational research at the theoretical level. At the practical level, teacher self-efficacy is widely recognized as an important factor in teaching quality and student learning outcomes. Therefore, a deep understanding of the degree and influencing factors of STEM education’s impact on elementary and high school teachers’ self-efficacy is of great significance for educational decision-making and teachers’ professional development.

Extant research on STEM education’s impact on elementary and high school teachers’ self-efficacy has shown that STEM education can enhance teacher self-efficacy, giving educators more confidence and improving their ability to teach STEM (Chen et al., 2021). Teachers’ personal characteristics, such as their educational background, teaching experience, and level of involvement in professional development, are closely associated with STEM teacher self-efficacy (DeCoito & Myszkal, 2018; Warsihna et al., 2021). Moreover, the support and resources schools provide to STEM teachers, such as professional development opportunities and teaching support, have been found to positively impact teacher self-efficacy levels (Gagnier et al., 2022; Skaalvik & Skaalvik, 2018). Nevertheless, the existing research in this area is relatively limited and fragmented, and there are still some controversies. In the past decade, relevant studies have mainly focused on two aspects, the first of which is the development of STEM teacher self-efficacy measurement tools, such as the Science Teacher Efficacy Belief Instrument (STEBI; Unfried et al., 2022; Koutsianou & Emvalotis, 2019; Slater et al., 2021), the Mathematics Teacher Efficacy Belief Instrument (MTEBI; Koutsianou & Emvalotis, 2019; Segarra & Julià, 2022), the Technology Teacher Self-Efficacy Scale (TTSE; Hammack & Ivey, 2017; Lee et al., 2019), and the Engineering Teacher Self-Efficacy Scale (TESS; Yoon et al., 2012; Webb & LoFaro, 2020). The majority of researchers have adopted quantitative research methods and continuously developed and applied relevant scales to investigate STEM teacher self-efficacy. However, due to scholars’ varying understandings of the concept of teacher self-efficacy as well as STEM teaching, the academic community still lacks a unified measurement questionnaire. Additionally, researchers have shown a tendency to diversify their research methods when investigating teacher self-efficacy, with an emphasis on the comprehensiveness of research methods. However, in the field of STEM teacher self-efficacy research, the use of a combined quantitative and qualitative research method is not yet widespread. Second, although there is research on the current status and influencing factors of STEM teachers’ self-efficacy, these studies have primarily relied on empirical surveys, with most scholars conducting cross-sectional studies (Unfried et al., 2022). Previous studies have analyzed the current state of STEM teachers’ self-efficacy and explored potential influencing factors. Research has indicated that teachers’ personal characteristics, such as their educational background, teaching experience, and level of involvement in professional development, are closely related to STEM teacher self-efficacy. Existing research has focused chiefly on studying teachers’ individual factors, such as sex, age, and education level, while neglecting comprehensive research and analysis on the multidimensional factors that may affect teachers’ self-efficacy. Some scholars have also conducted longitudinal studies (Hammack & Ivey, 2017; Seals et al., 2017) that entailed measuring STEM teachers’ self-efficacy before and after their participation in a specific professional development program to explore the role of professional development courses. Research has shown that participating in professional development courses and training can significantly improve STEM teachers’ self-efficacy levels. Moreover, teachers’ positive attitudes toward STEM education and their belief in their own abilities in STEM teaching are closely related to their self-efficacy levels.

We still do not know how STEM education as a whole affects elementary and high school teachers’ self-efficacy. Are there significant positive effects? Which specific factors have a more significant impact on self-efficacy? Are there differences between teachers at different educational stages? Are there differences in the impact of different types of STEM education on elementary and high school teachers’ self-efficacy? For example, do different fields of education, such as science, technology, engineering, and mathematics, have similar effects on teachers’ self-efficacy? Do different teacher characteristics (e.g., teaching experience, sex, educational stage, etc.) moderate the impact of STEM education on elementary and high school teachers’ self-efficacy? These are questions that need to be addressed. Therefore, we conducted a systematic meta-analysis study to synthesize and integrate existing research results to gain a more comprehensive understanding of STEM education’s impact on elementary and high school teachers’ self-efficacy. Through synthesis analysis of a large amount of research data, we identified patterns, trends, and effect sizes that can inform future research and guide educational practices through the derivation of more accurate and reliable conclusions to further validate or modify existing theoretical perspectives and highlight new research directions and methods for future studies and other scholars. Additionally, by examining the influence of self-efficacy in different educational contexts, we sought to contribute to a more nuanced understanding of how self-efficacy impacts teaching effectiveness and student outcomes.

STEM teacher self-efficacy measurement tools

Regarding the development of STEM teacher self-efficacy measurement tools, the context specificity of different teaching situations and tasks is crucial. STEM teacher self-efficacy measurement tools are mainly specifically for science, mathematics, technology, and engineering teachers. In 1990, considering the specificity of teaching situations in different subject areas, Enochs and Riggs limited the content scope of their teacher self-efficacy measurement questionnaire to science teaching and targeted elementary school science teachers (Enochs & Riggs, 1990). Their STEBI aligns with Bandura’s self-efficacy theory and has two structural dimensions: science teaching self-efficacy expectations and outcome expectations (Bandura., 1977). The questionnaire comprises 25 items rated on a 5-point Likert scale. Subsequently, Slater et al. (2021) adjusted the scale and developed the STEBI-B specifically for measuring the self-efficacy of preservice elementary science teachers completing graduate programs. The original scale was used to measure in-service elementary science teachers’ self-efficacy and is labeled the STEBI-A. The STEBI has been widely used to research science teachers’ self-efficacy. However, researchers have adjusted its structure and item wording many times to better accommodate specific subject areas and target populations (Unfried et al., 2022; Wray et al., 2022). These adjustments reflect the need for teacher self-efficacy measurement tools with sufficient sensitivity to capture self-efficacy in specific teaching environments (Gagnier et al., 2022; Stylos et al., 2022).

Mathematics teacher self-efficacy measurement tool

Segarra and Julià (2022) modified the STEBI and developed the Mathematics Teacher Efficacy Belief Instrument (MTEBI) for preservice mathematics teachers (Koutsianou & Emvalotis, 2019). The MTEBI has two dimensions: mathematics teaching self-efficacy expectations and outcome expectations. The mathematics teaching self-efficacy expectations subscale contains 13 items, and the mathematics teaching outcome expectations subscale contains eight items, all measured on a 5-point Likert scale. Confirmatory factor analysis has indicated that the two subscales are independent, enhancing the MTEBI’s structural validity (Koutsianou & Emvalotis, 2019; Segarra & Julià, 2022).

Technology teacher self-efficacy measurement tool

Hammack and Ivey (2017) investigated technology teachers’ self-efficacy among trainee teachers at a school in Singapore using the Technology Teacher Self-Efficacy (TTSE) scale. The TTSE has five dimensions: three dimensions of self-efficacy expectations (basic teaching skills, advanced teaching skills, and educational technology) and two dimensions of outcome expectations (traditional application of technology and constructivist application). Hammack and Ivey (2017) found that both the measurement model and the structural model demonstrated good model fit and that teacher self-efficacy significantly influenced teachers’ use of technology in either traditional or constructivist ways (Lee et al., 2019).

Engineering teacher self-efficacy measurement tool

Yoon et al. (2012) conducted a comprehensive review and analysis of relevant literature on K–12 engineering education, the development of self-efficacy measurement scales in the STEM field, and the impact of teachers integrating engineering into K–12 instruction. Based on their findings, those scholars constructed the K–12 Engineering Teachers’ Self-Efficacy Scale (TESS). Exploratory and confirmatory factor analyses were conducted using data from 434 teachers in 19 states of the United States. The final TESS scale comprises 23 items across four dimensions: engineering content knowledge, engagement, discipline self-efficacy expectations, and outcome expectations. The scale demonstrated high internal consistency and reliability upon validation. Furthermore, as a scale designed for the engineering teaching environment, the TESS provides valuable insights for K–12 engineering curriculum instruction and can help improve the quality of pre-engineering courses (Yoon et al., 2012; Webb & LoFaro, 2020).

Research on the status and influencing factors of STEM teacher self-efficacy

In empirical research, most scholars have conducted cross-sectional and longitudinal studies to establish the relationships between influencing factors and teacher self-efficacy and explore the specific mechanisms through which influencing factors impact self-efficacy based on a theoretical framework.

Cross-sectional studies

Most scholars have conducted cross-sectional studies to analyze the status of STEM teacher self-efficacy and explore potential influencing factors. Some researchers have emphasized the relationship between subject knowledge and teacher self-efficacy. Warsihna et al. (2021) conducted a survey using the STEBI to investigate science subject knowledge and self-efficacy beliefs among in-service elementary school teachers. The results showed a significant statistical relationship between teachers’ science teaching self-efficacy and their science subject knowledge. Unfried et al. (2022) studied K–5 teachers and found a positive correlation between teachers’ awareness of STEM subject knowledge and instructional effectiveness with their teaching self-efficacy; however, there was no correlation with years of teaching experience.

Longitudinal studies

Some scholars have employed longitudinal studies to measure STEM teachers’ self-efficacy before and after the teachers’ participation in specific professional development programs, with the aim of exploring the program’s impact. Seals et al. (2017) conducted a study on the influencing factors of STEM teacher self-efficacy among 49 teachers who participated in a 1-year teacher professional development program called the Urban STEM Initiative. Those scholars found that factors such as external challenges, lack of resources, and organizational environment can affect teachers’ ability to meet students’ diverse needs, thereby impacting teaching self-efficacy.

As an important application area of self-efficacy theory, research on teacher self-efficacy has been conducted for over 40 years and has yielded rich results. Based on teacher self-efficacy’s significant role in professional development and its close relationship with teaching behaviors, instructional effectiveness, student learning outcomes, and student self-efficacy, STEM teachers’ self-efficacy is gradually becoming a focus of academic research. Studies on the status and influencing factors of STEM teachers’ self-efficacy have mainly relied on empirical surveys, with most scholars employing a cross-sectional research design to analyze the current state of STEM teacher self-efficacy; however, a comprehensive study on the extent of STEM education’s impact on teacher self-efficacy and various factors’ influence on teacher self-efficacy is lacking.

Purpose of meta-analysis

This study employed a meta-analytical approach to integrate and statistically analyze the results of multiple independent studies and reveal the overall impact of STEM education on elementary and high school teachers’ self-efficacy. Regarding data analysis, our study sought to systematically analyze the collected data, evaluate their quality for inclusion in the study, quantify the overall effect, explore sources of heterogeneity, evaluate publication bias, and ultimately provide meaningful insights into the research issues under investigation. The research questions that guided this study were as follows:

  • RQ1: What is the overall effect of STEM education on elementary and high school teachers’ self-efficacy?

  • RQ2: Which factors within STEM education significantly influence elementary and high school teachers’ self-efficacy?

  • RQ3: Do different sample characteristics, such as subject/discipline, education level, and nationality, moderate the impact of STEM education on elementary and high school teachers’ self-efficacy?

  • RQ4: Are there any publication biases or other potential research biases that may affect the findings?

Methods

This study explored the effects of STEM education on teacher self-efficacy via meta-analysis comprising the main steps of conducting a literature search, formulating literature inclusion criteria, screening the obtained literature, coding the literature, assessing the accuracy of the coding, calculating and pooling the effect sizes, performing a heterogeneity analysis, and testing for publication bias.

Literature search

To ensure the comprehensive inclusion of relevant quantitative domestic and foreign research literature, we selected a number of databases for the literature search, namely Web of Science, ProQuest Education, Psych INFO, and the Psychology and Behavioral Sciences Collection. We applied the Boolean search rules to search for literature published between 2007 and 2022 within “(‘STEM’ OR ‘science’ OR ‘technology’ OR ‘engineer*’ OR ‘math*’)” AND “(‘self-efficacy’) AND (‘teacher’).” Additionally, we used keywords to search for and screen literature returned by Google Scholar as a supplement to the databases. The abovementioned method returned a total of 2,308 documents, and after removing the duplicates, we obtained 1,964 documents.

Formulation of literature inclusion criteria

Formulating appropriate inclusion criteria can limit or circumvent the influence of subjective factors. Inclusion criteria should be formulated strictly based on the research purpose and content and the study’s statistical needs. In this study, the following meta-analysis literature inclusion criteria were developed in keeping with the research theme, and the literature was screened accordingly.

First, research topics had to include “STEM education” and “teacher self-efficacy.” Second, papers had to be written in English or Chinese. However, during screening, because there were no Chinese papers that met the inclusion criteria, all the included literature was in English, and all papers had been published in SCI core journals between 2007 and 2022. Third, the research content had to include the impact of different types of STEM education on teacher self-efficacy, the education model could be any kind conforming to STEM education content, and the self-efficacy research object had to be teachers. Fourth, regarding study type, the research had to be empirical, which excluded qualitative research but included quantitative research. Regarding study design, only experimental studies were sought, including randomized trials or quasi-experimental designs with experimental (STEM education models) and control groups (non-STEM education models, including traditional education models) to clarify STEM education’s impact on teacher self-efficacy relative to other education modes. Fifth, to ensure the feasibility of calculating or converting single and combined effect sizes, included articles had to report statistically relevant information such as mean, standard deviation (SD), sample size (n), t, F, and effect size.

Literature screening

Considering the above inclusion criteria, this study conducted the first round of literature screening according to titular information and obtained 196 relevant studies, 126 of which were then excluded based on the abstract information, leaving 70 studies. Finally, the literature was screened according to the content of the full text article, which resulted in the retention of 20 articles that met the inclusion criteria, with a total of 209 effect values (some literature effect values exceeded 38). Figure 1 depicts the screening process.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses flow of study analysis in phases

Literature coding

Moderator variables

A regulated variable is a function affecting the relationship between independent and dependent variables. Different moderators produce different results according to the relationships between their variables. Common moderators used in this study were discipline, education level, and country. Additionally, this study identified sample size and questionnaire measurements as moderators, noting that they have been used in previous studies to test factors that can contribute to the heterogeneity of effect size differences (Chen et al., 2021; Wang, 2022). Table 1 provides information on all the included studies’ regulators. After identifying the source document, it was encoded, and a data file was generated for the calculation of subsequent indicators. Referring to the rules Lipsey and Wilson (2001) proposed, the document code had to at least include the following information: author, publication year, subject/course involved in the study, and education stage, as these moderators were used to test contributions to the heterogeneity of effect size differences (Chen et al., 2021). The literature codes for this study were as follows.

Large samples were encoded as L, and small samples were encoded as S. Referring to Table 1, a sample size greater than 100 was defined as large, and a sample size less than 100 was considered to be small. Regarding teaching to transmit knowledge, knowledge disciplines are teaching subjects. In the STEM Bill, which was written subsequent to the knowledge transmission perspective, a subject is the name of a subject or class to be studied (Wahono et al., 2020). Referring to Wahono et al.’s (2020) classification, this study was based on subjects for which research has explored teacher self-efficacy, including science, mathematics, technology, and engineering. Science was encoded as Sc, math as Ma, technology as Te, and engineering as En.

In the meta-analysis, education level is a standard moderator variable. Different education stages can affect experimental results, leading to heterogeneity in the effect scale (Fu et al., 2011); it was therefore anticipated that teacher efficacy would likewise be affected by education level. Therefore, consistent with the school-level classification criteria, this study divided education level (EL) into primary (students aged 5–11 years), secondary (students aged 11–13 years), and high school (students aged 14–17 years), encoded as PE, SE, and HE, respectively,.

Education systems and methods vary by country, and the same educational concept will have differing impacts when applied in different education systems. Therefore, in this study, the countries that originated the sampled literature were divided into two categories by latitude and longitude; countries in the East were coded as E, and those in the West were coded as W.

Control variable analysis allowed us to determine whether STEM education models are more beneficial than non-STEM education models for promoting teacher self-efficacy, as well as to ascertain STEM education’s impact on teachers’ self-efficacy. The control group was used as a moderator with respect to the relevant prior literature, and the experimental group was compared with the control group (Lee et al., 2019; Romero-Ariza et al., 2021). In the meta-analysis conducted in this study, control group coding was divided into non-STEM (NO-STEM) and pre-test coding.

Questionnaire measurement refers to measuring a particular characteristic of an objective thing using a questionnaire; that is, using a scale or tool to quantitatively or qualitatively measure objective things. General scales were divided into four types: categorical, ordinal, equal difference, and equal proportions scales. The questionnaire measurements (QMs) and factors used in this study were the STEBI-B, teacher efficacy and attitudes toward STEM (T-STEM), self-efficacy for teaching integrated STEM (STETIS), the Personal Science Teaching Efficacy (PTSE) Scale, the TESS Scale, the Technology Integration Self-Efficacy Scale (T-TISES), content knowledge (CK), pedagogic content knowledge (PCK), teacher self-efficacy (SE), teacher commitment (TC), self-efficacy (SEL), beliefs (BLF), practice (PRA), teaching self-efficacy (TSE), personal teaching efficacy and beliefs (PTEB), STEM instruction (SI), and the Engineering Design Self-Efficacy Instrument (EDSI). After sorting, we coded the information of the original documents, as shown in Table 1.

Coding accuracy assessment

Coding accuracy is critical for meta-analysis. In this study, two researchers independently coded the literature before the meta-analysis to calculate the degree of coding agreement. In light of the statistical requirement that the coefficient of agreement (Cohen’s kappa) should be greater than 0.7, we obtained a coding consistency of 0.9, which suggests the reliability of the coding results. In this study, the RR formula was used to calculate the individual study effect size and the pooled effect size by extracting the sample size, mean, SD, and other information pertaining to the 209 studies. STEM education’s impact on teacher self-efficacy was assessed according to the response ratio, which was calculated as follows

$$\mathrm{RR}=\ln\left(\frac{{\mathrm X}_{\mathrm P}}{{\mathrm X}_{\mathrm C}}\right)=\ln\;\left({\mathrm X}_{\mathrm P}\right)-\ln\;\left({\mathrm X}_{\mathrm C}\right)$$
(1)

Note. RR lis the natural logarithm (effect value) of the response ratio, XP represents STEM teachers’ self-efficacy score, and XC represents non-STEM teachers’ self-efficacy score. A positive RR indicates an improved self-efficacy score, and a negative value indicates a decrease in the teachers’ self-efficacy score due to the implementation of STEM education.

Given that over 50% of the case studies did not provide a measure of variance, the case studies were weighted according to the number of studies and experiments using the mixed-effects model in R. The variance (ν) for each RR was calculated as follows:

$$\upsilon=\frac{\mathrm S_{\mathrm P}^{2}}{\mathrm {n}_{\mathrm P} {\mathrm{X}_{\mathrm P}^{2}}}+\frac{\mathrm S_{\mathrm C}^{2}}{\mathrm {n}_{\mathrm C} {\mathrm{X}_{\mathrm C}^{2}}}$$
(2)

Note. nP and nC are the sample sizes, and SP and SC are the standard deviations of teachers’ self-efficacy scores in STEM and non-STEM education, respectively.

Table 1 Moderator information for the included studies

Results

Publication bias analysis

Given that we compiled our own dataset using data collected from published peer-reviewed studies, it was necessary to test for publication bias, which is when statistically significant positive findings are more likely to be published than negative findings that are not statistically significant. If this study was found to be affected by publication bias, we would have needed to correct our database to ensure more reliable meta-analysis results. According to the method proposed by Egger et al. (1997), publication bias can be tested using Egger’s test when the study sample is sufficiently large (k ≥ 20). The 208 single samples across 20 studies and many of the effects included in this study were tested for publication bias using the meta-bias function in R’s meta package (Balduzzi et al., 2019).

We also used the funnel function in the meta package to draw contour-enhanced funnel plots to visualize publication bias clearly in a scatter plot of the effect value of each study on the abscissa, with each study’s sample size or the reciprocal of the variance of the effect value as the ordinate (Fig. 2).

Fig. 2
figure 2

Funnel plot

Drawing the additional contour line funnel chart is equivalent to a traditional funnel chart. However, with reference to a traditional funnel chart, contour lines corresponding to the statistical significance level (e.g., 0.01, 0.05, 0.1) were added to divide the funnel into regions of different statistical significance, namely the white, dark gray, gray, and silver gray regions in Fig. 1, to identify whether points plotted on the funnel were statistically significant. The funnel plot’s degree of symmetry indicates the degree of publication bias in the research sample, such that the more asymmetrical the funnel plot, the higher the publication bias. As shown in Fig. 1, most of the research samples’ effect sizes were in the middle and upper part of the funnel, with the middle being a high-concentration area. A few studies were distributed at the bottom. The study samples’ effect sizes were relatively uniform and symmetrically distributed on both sides of the average effect value, indicating a small publication bias in the study sample. Egger’s test yielded t = 1.566 < 1.96, p = 0.29 > 0.05, further indicating that the publication bias in the research samples was not significant. Given that no significant asymmetry was found in the collected datasets, coupled with the fact that the sensitivity analyses yielded similar results, we concluded that publication bias likely did not affect the findings, suggesting the reliability of our meta-analytical results.

Heterogeneity test analysis

The heterogeneity test was used to test whether the degree of fluctuation of the samples’ effect sizes indicated a real difference in all samples included in the meta-analysis rather than a sampling error. Given that there may be heterogeneity among studies due to different study designs and effects, it was necessary to determine the model appropriate for use in the meta-analysis based on the heterogeneity test results. When there is no significant heterogeneity among the studies, a fixed- or random-effects model should be used. However, if heterogeneity among the studies is detected, it is necessary to continue investigating whether there are outliers, where individual studies’ effect sizes are extremely large or small. These extreme values may affect the overall effect size and even distort the research results. A common method for detecting extreme values is to observe whether the confidence interval of a study’s effect size coincides with that of the combined effect size. If not, the effect size is considered to be an extreme value. The sensitivity test results showed that the confidence intervals of the individual studies’ effect sizes coincided with that of the combined effect sizes, indicating no extreme values. The heterogeneity test results in this study showed that Cochran’s Q was 54,081.35, the degree of freedom was 207, and the I2 was 100% (p < 0.001), indicating significant heterogeneity among the included studies (I2 > 80%), resulting in the selection of a random-effects model for estimation. Due to the strong heterogeneity among the studies, subgroup analysis and meta-regression were used to determine the reasons for the heterogeneity.

STEM education’s impact on teacher self-efficacy

Based on the heterogeneity test results, this study used a random-effects model to combine the effect sizes. After applying our inclusion criteria, the final analysis included 20 published articles, with 208 independent samples of STEM education’s effect on teachers’ self-efficacy scores. The study that contributed the most samples was Johnson et al. (2021; 18.18%, n = 38), followed by Lee MH (11.48%, n = 24), Deehan et al. (2019; 11.00%, n = 23), Caner and Aydin (2021; 7.66%, n = 16), Gagnier et al. (2022; 6.70%, n = 14), and Gray (2017; 4.78%, n = 10); 14 other studies provided less than ten samples. The effect size of the 208 samples included in this study was RR = 0.224, with p < 0.05 and a 95% confidence interval [CI: 0.187, 0.26]. Overall, STEM education’s effect on teacher self-efficacy was found to be significantly higher than that of other education modes.

Subgroup analysis and meta-regression analysis

Subgroup analysis and meta-regression analysis (adjustment variable analysis) are important ways to explore the source of heterogeneity. Subgroup analysis is mainly suitable for studies where the moderator variable is categorical, and meta-regression is suitable for studies where the moderator variable is continuous.

Subgroup analysis

According to the influencing factors of the STEM education effects previous studies have reported, the sample size of a single study (Sample_size), the study area (Country), education level (Education_level), subject (Subject), year (Year), control group’s education mode (Control), and questionnaire design differences in teacher self-efficacy rating scales (QM) were used as moderating variables to test for moderating effects on the research results. Applying the random-effects model, this study used the metagen function in the meta package in R to analyze individual studies’ sample size, country, education level, subject, education mode applied in the control group, self-efficacy rating scale design, and other categorical variables, in turn. Group analysis and meta-regression analysis were performed on publication year (Figs. 3 and 4).

This study divided the original research sample into three categories corresponding to education levels: primary (n = 157), secondary (n = 15), and high school (n = 36). Regarding the between-groups effect, the difference comparison result was Qbet = 37.14, with p < 0.0001, indicating a moderating effect of Education_level; that is, the three education levels significantly impacted the consistency of teacher self-efficacy across STEM education and other education modes. Specifically, primary school showed a positive effect size, whereas the effect sizes of secondary and high school were both negative. The primary school effect size was RR = 0.147 (p < 0.05), which was significantly higher than that of high school. The high school effect size was RR = -0.013 (p = 0 < 0.05), and that of secondary school was RR = -0.058 (p = 0. < 0.05). These results indicate that STEM education’s effect on teacher self-efficacy was significantly stronger in primary school than in secondary and high school, with the high school effect size being slightly higher than that of secondary school, albeit with no significant difference.

Fig. 3
figure 3

Effect size in different subgroups

In this study, the original research participants were divided into four categories: Multi (mixed subject, n = 127), En (engineering, n = 14), Sc (science, n = 51), and Te (technology, n = 16). Considering the between-groups effect, the difference comparison result was Qbet = 0.170, with p < 0.05, indicating a moderating effect of Subject; that is, subject area or discipline significantly impacted the consistency of teachers’ self-efficacy across STEM education and other education modes. Specifically, the four subjects’ effect sizes were positive. The effect size of Te was RR = 0.773 (p = 0 < 0.05), which was significantly higher than that of En (RR = 0.173, p = 0 < 0.05). The Sc effect size was RR = 0.091 (p = 0 < 0.05), and that of Multi was RR = 0.018 (p = 0 < 0.05), indicating that STEM education improved teacher self-efficacy in multidisciplinary subjects significantly more than in Te subjects. Regarding other subjects, the En effect size was significantly higher than that of Multi (p < 0.05).

This study divided the original research sample’s QM into the following 12 categories: EDSI/TESS (n = 9), PSTE (n = 47), TSE (n = 14), STEBI-B (n = 21), PTEB/SI (n = 23), SE/TC (n = 6), CK/PCK (n = 4), SEL/BLF/PRA (n = 8), STETIS (n = 38), TESS (n = 5), T-STEM (n = 17), and T-TISES (n = 16). Considering the between-groups effect, the difference comparison result was Qbet = 0.170, with p = 0.001 < 0.05, suggesting a moderating effect of QM; that is, the 12 QM types significantly impacted the consistency of teacher self-efficacy across STEM education and other education modes. Specifically, the effect sizes of TSE and SE/TC were negative, that of T-STEM was 0, and the effect sizes of the other nine types were all positive. Among them, the T-TISES effect size was RR = 0.773 (p < 0.05), which was significantly higher than that of the other types. The TSE effect size was RR = -0.297 (p < 0.05), and the SE/TC effect size was RR = -0.142 (p < 0.05), which was significantly lower than the other types’ effect sizes. These results indicate that STEM teacher self-efficacy measurement was significantly higher for T-TISES than other scales (p < 0.05) and significantly lower for TSE and SE/TC than in other subjects (p < 0.05).

This study divided the original research sample’s control group treatment into two categories: NO-STEM (the education mode is not STEM education, n = 82) and Pre (pre-test prior to implementing STEM education, n = 126). Regarding the between-groups effect, the difference comparison result was Qbet = 0.170, with p > 0.05, indicating no moderating effect of Control; that is, NO-STEM and Pre had no significant effect on the consistency of teacher self-efficacy across STEM education and other education modes. Specifically, the effect sizes of the two types of Control were positive, with that of NO-STEM being RR = 0.164 (p > 0.05); however, no significant difference was found between the two (p > 0.05), indicating no significant difference in the improvement of teacher self-efficacy due to STEM education under differing treatment of the control group.

In this study, the original research sample size was divided into two categories: L (large sample, n = 112) and S (small sample, n = 96). Regarding the between-groups effects, the difference comparison result was Qbet = 0.170, with p > 0.05, indicating no moderating effect of sample size; that is, studies with small and large samples showed no significantly different impact on the consistency of teacher self-efficacy across STEM education and other education modes. Specifically, the average effect size of the two sample sizes was positive, with the effect size of L being RR = 0.109 (p < 0.05), which was almost equal to that of S (RR = 0.099; p < 0.05), indicating no significant difference between studies with small and large sample sizes in terms of STEM education’s improvement effect on teacher self-efficacy (p > 0.05).

In this study, the original research sample’s originating country was divided into two categories: W (Western country, n = 118) and E (Eastern country, n = 90). Regarding the between-groups effect, the difference comparison result was Qbet = 0.170, with p = 0.0.001 < 0.05, indicating a moderating effect of Country; that is, Western countries and Eastern countries showed significantly different impacts on the consistency of teacher efficacy across STEM education and other education modes. Specifically, the combined effect size of the research samples for W and E was positive; the effect size of W was RR = 0.157 (p = 0. < 0.05), which was significantly higher than that of E (RR = 0.036, p = 0. < 0.05), indicating greater improvement in teacher self-efficacy due to STEM education in Western compared to Eastern countries (p < 0.05).

Meta-regression analysis

Meta-regression analysis used the research paper publication year as a predictor variable to examine the impact of STEM education’s effect size with respect to teacher self-efficacy. Figure 4 shows that the moderating effect of publication year was not significant, indicating that different publication years did not affect STEM education’s effect size with respect to teacher self-efficacy.

Fig. 4
figure 4

Meta-regression analysis of year and response ratio

Discussion

Through a literature search with inclusion criteria and screening, we identified 20 English-language studies on STEM education’s impact on teacher self-efficacy. We also tested the factors that influence the heterogeneity of effect differences through literature coding and regulatory variables. We identified articles reporting an association between STEM education and teacher self-efficacy and sought to address whether education level, subject, questionnaire measurement design, control group treatment, and sample size influenced this relationship. The study results suggest that STEM education can effectively influence teachers’ self-efficacy.

Relationship between STEM education and teacher self-efficacy

We found a statistically significant relationship between STEM education and teacher self-efficacy (r = 0.224, p < 0.05) based on 20 relevant published articles and 208 independent samples. These meta-analytic findings are consistent with past literature reporting a positive association of STEM education with teacher self-efficacy. Although most researchers have agreed that STEM education influences teacher self-efficacy, research models generally fail to consider other moderator variables (e.g., subject/discipline, education level, nationality, control, QM, and sample size). Existing studies have mainly utilized quantitative research methods to determine the relationship between STEM education and teacher self-efficacy and have explored the specific mechanisms of various factors based on relevant theories. This is currently the main research paradigm in the study of STEM teacher self-efficacy. However, to give STEM teachers self-efficacy explanatory attention, it is necessary to conduct an in-depth analysis of the causes underlying the various influencing factors. Some influencing factors cannot be fully explored and explained through quantitative research alone; therefore, future research should combine quantitative and qualitative research methods to elucidate the factors influencing STEM teachers’ self-efficacy.

Moderation analysis

We sought to determine whether the relationship between STEM education and teacher self-efficacy was influenced by moderators. After including all moderators and interactions in a meta-regression model, education level, subject, QM, and country proved to be significant moderators of the relationship between STEM education and teacher self-efficacy. The factors influencing self-efficacy can vary among individuals in similar and different task environments. Based on theory and research on the sources of teacher self-efficacy, it can be inferred that the factors influencing STEM teacher self-efficacy cannot be limited to a single aspect. This necessitates a comprehensive study and analysis of multiple factor dimensions.

The results have indicated a moderating effect of education level. The difference in comparison result was Qbet = 0.170, with p = 0.001 < 0.05, suggesting a significant difference in the results for different education levels. Specifically, education level’s impact on the results was statistically significant, with p < 0.05, indicating the unlikelihood that the difference was due to random factors. Furthermore, the primary school effect size was significantly higher than that of secondary and high school, meaning that, in the comparison of education levels, primary school had a greater impact on the results. Primary school effect size was RR = 0.147 (p < 0.05), indicating a significantly higher effect of primary school compared to other education levels.

A possible reason could be that elementary school STEM education covers multiple subjects. Hence, elementary school teachers need to possess a wide range of knowledge and skills to teach STEM content effectively. Therefore, elementary school teachers’ self-efficacy may be influenced by their confidence in their STEM knowledge and teaching methods (Akerson et al., 2009). In contrast, secondary and high school teachers tend to be specifically responsible for a particular STEM subject, such as mathematics, physics, or chemistry. Hence, they may have more in-depth expertise and experience in one of these subject areas. Therefore, secondary school teachers’ self-efficacy may be related to their confidence in their teaching abilities and professional knowledge in a specific subject (Caprara et al., 2006).

Regarding subject/discipline, the difference in comparison result was Qbet = 0.170, with p < 0.05, indicating a moderating effect of the subject/discipline in the relationship between STEM education and teacher self-efficacy. As previously mentioned, the original research subjects were divided into Multi, En, Sc, and Te categories. Results indicated that the effect size of En subjects was significantly higher than that of the other subjects. In science and technology teaching, interdisciplinary learning is infrequently mentioned, and emphasis is placed on explaining knowledge points. However, interdisciplinary courses integrate multidisciplinary knowledge pertinent to life problems and familiar daily life situations. The contextual and instructional features of interdisciplinary courses can significantly enhance teachers’ self-efficacy. However, the specific effects depend on teachers’ subject knowledge, teaching abilities, and adaptability to interdisciplinary teaching. In single-subject teaching, teachers usually focus on their specific subject area. They may have a deep understanding and mastery of the subject’s content and teaching methods, leading to higher self-efficacy in that particular domain. Interdisciplinary teaching requires teachers to integrate knowledge and teaching methods from different subject areas to help students develop interdisciplinary thinking and problem-solving skills. If teachers can adapt to the demands of interdisciplinary teaching and continuously enhance their subject knowledge and teaching abilities, they may gradually strengthen their self-efficacy in an integrated multi-subject environment.

This study divided the original research sample’s QM into 12 categories. Regarding the between-groups effect, the difference comparison result was Qbet = 0.170, with p = 0.001 < 0.05, suggesting a moderating effect of QM; that is, the 12 QM types significantly impacted the consistency of teacher self-efficacy across STEM education and other education modes. Teacher self-efficacy is characterized by dependence on the environment and a specific subject matter. In the development of teacher self-efficacy measurement tools, it is crucial to consider the context specificity of different teaching situations and tasks (Tschannen-Moran et al., 1998). The existing teacher self-efficacy measurement instruments consider the contextual nature of teachers’ self-efficacy and commonly employ questionnaires tailored to specific teaching domains or subject characteristics (Tschannen-Moran et al., 1998). Given the continuous integration of STEM concepts into science education, Mobley (2015) developed the Science Teachers’ Self-Efficacy Scale (SETIS) based on the STEM integration concept. The scale has three dimensions: social, personal, and material factors. Mobley (2015) conducted semi-structured interviews to further validate the quantitative questionnaire’s structural dimensions. The interviews comprised 11 questions, including “What are the knowledge and skill requirements in teaching science under the STEM integration concept?” and “What challenges do you anticipate in teaching within the STEM integration framework?” (Mobley, 2015). The three dimensions can help researchers and educators gain a comprehensive understanding of the characteristics and variations in STEM teacher self-efficacy and can provide targeted support for teachers’ professional development and training.

Past research has noted differences in teacher self-efficacy between Eastern and Western countries (Klassen et al., 2009). This study reinforced this conclusion, finding that STEM education improved teacher self-efficacy to a significantly greater degree in Western countries than in Eastern countries. This difference can be attributed to East–West differences in educational systems and resource support, social recognition and support, and the educational culture and teaching methods. These factors have differing effects on teachers’ self-efficacy. First, regarding educational systems and resource support (Henson, 2001), Eastern and Western countries have different educational systems and resources available for STEM education. Some Western countries invest more resources in STEM education, for example, by providing advanced laboratory equipment, technological support, and training opportunities, which can enhance teachers’ confidence and sense of competence, thus improving their self-efficacy. In contrast, in some developing countries or regions, teachers may face more challenges due to limited resources, which can negatively impact their self-efficacy.

Second, regarding social recognition and support (Skaalvik & Skaalvik, 2018), Eastern and Western countries have different levels of social recognition and support for STEM teachers. In some Western countries, STEM teachers are regarded as important professionals, and their work is highly appreciated and respected, which can enhance teachers’ self-efficacy. However, in some Eastern countries, especially in developing countries, STEM teachers’ social status and recognition are relatively low, which can conversely affect their self-efficacy. Finally, Eastern and Western countries also differ regarding educational culture and teaching methods (Woolfolk & Burke, 2005). Some Western countries emphasize student-led inquiry-based learning and practical activities and encourage students to innovate and solve real-world problems. This teaching approach can enhance teachers’ self-efficacy. In contrast, in some Eastern countries, the educational culture may place more emphasis on traditional teacher-centered instruction and exam-oriented approaches, which can adversely impact teachers’ self-efficacy.

Conclusion

The meta-analysis study aimed to investigate the impact of STEM education on the elementary and high school teachers’ self-efficacy. The research findings effectively address the four research questions proposed in this study and provide clear evidence of the positive impact of STEM education on the self-efficacy of elementary and high school teachers. The key influencing factors and the reliability of the research results are also identified. These findings hold significant implications for education policymakers and practitioners in promoting and implementing STEM education programs.

The first research question aimed to explore the overall impact of STEM education on the self-efficacy of elementary and high school teachers. Through a comprehensive analysis of the research findings, we found that STEM education has a significant positive effect on teachers’ self-efficacy. This suggests that by engaging in STEM education activities, teachers can enhance their confidence and cognition in their teaching abilities, thereby improving their self-efficacy. This finding confirms the positive impact of STEM education on teachers’ self-efficacy.

The second research question aimed to identify which factors in STEM education have a significant impact on the self-efficacy of elementary and high school teachers. Through a meta-analysis, we found that factors such as educational training, teaching resources and support, and teacher involvement in STEM education have a significant impact on teachers’ self-efficacy. These results indicate the importance of providing effective training and support, as well as encouraging active teacher participation in teaching activities when designing and implementing STEM education programs. This finding reveals the significance of key factors in STEM education on teachers’ self-efficacy.

The third research question aimed to investigate whether different sample characteristics, such as subject area, educational level, and nationality, moderate the impact of STEM education on the self-efficacy of elementary and high school teachers. According to the research findings, we found significant moderating effects of different sample characteristics on the impact of STEM education on teachers’ self-efficacy. Firstly, educational level showed a significant moderating effect on the relationship between STEM education and teachers’ self-efficacy. Specifically, elementary school teachers demonstrated a more significant improvement in self-efficacy through STEM education compared to teachers with other educational levels. This suggests that elementary school teachers have a greater influence on self-efficacy. Secondly, subject area was also found to have a significant moderating effect on the relationship between STEM education and teachers’ self-efficacy, with engineering subjects having a greater impact on self-efficacy. This may be attributed to the emphasis on practical skills and problem-solving abilities in engineering subjects, which have a more positive influence on teachers’ self-efficacy. Additionally, the study categorized teachers’ self-efficacy into 12 categories and found significant moderating effects of these categories on teachers’ self-efficacy under STEM education compared to other educational models. This indicates that the characteristics of teachers’ self-efficacy depend on the environment and specific subject content, and different categories of self-efficacy may have varying impacts on STEM education. Lastly, the research findings also confirmed the differences in teachers’ self-efficacy between Eastern and Western countries. STEM education had a more significant impact on the self-efficacy of teachers in Western countries, which can be attributed to differences in educational systems and resource support, social recognition and support, as well as educational culture and teaching methods.

The fourth research question aimed to explore the potential publication bias or other potential research biases that may affect the research results. Through sensitivity analysis and publication bias tests, we found robustness and reliability of the research findings. This indicates that our research results are not significantly influenced by publication bias or other potential research biases. This finding emphasizes the credibility and reliability of the research results.

Despite the scale of this meta-analysis, there are several limitations to note. First, our study was limited to publish peer-reviewed literature, and our search excluded unpublished work (e.g., reports and dissertations). Although this study established, based on Egger’s statistic, that publication bias in the research sample likely did not affect the findings––that is, this study’s meta-analytical results are relatively reliable––we nevertheless recognize that there remains potential for publication bias because the exclusion of unpublished literature might have led to the over- or underestimation of the moderators’ influence. Second, a comprehensive examination of the multidimensional factors influencing teacher self-efficacy in STEM is lacking, which means that this study may have focused on certain influencing factors without conducting a comprehensive analysis of all relevant factors. Finally, there is a lack of comparative research on teachers’ self-efficacy in STEM across different subject domains. Previous studies focused on a specific subject domain without conducting a comprehensive comparison across multiple subjects.

Based on the aforementioned limitations, we suggest that future research should consider the following aspects. First, conducting comprehensive multidimensional studies would be beneficial. Researchers can explore various factors, such as teachers’ personal characteristics, teaching experience, instructional support, and teaching environment, to gain a comprehensive understanding of the characteristics and differences in STEM teachers’ self-efficacy. Second, comparative studies across different subject domains are needed. Researchers can compare STEM teachers from different subject areas to understand the extent to and manner in which different subjects influence teachers’ self-efficacy. Third, it would be beneficial to explore strategies to enhance STEM teachers’ self-efficacy. Based on these research findings, tailored teacher training and support programs can be designed for different teacher populations and subject domains to enhance STEM teachers’ self-efficacy.