Introduction

Science, Technology, Engineering, and Mathematics (STEM) education has gained increasing attention globally in the past decades due to its important role in improving society's living standards, supplying economic growth, and supporting global competitiveness (Carnevale et al., 2011; Xie et al., 2015). However, research has indicated that STEM teachers continue to not be well prepared, even though STEM education is of great importance to a students’ education (National Science Board, 2016, 2022). Furthermore, schools with high minority or poverty populations are more likely to lack quality K-12 STEM teachers. In addition, greater proportions of STEM teachers at these schools entered through an alternative route to certification that lack student teaching and formal training experience (National Science Board, 2022; Rotermund & Burke, 2021).

Research evidence supports that STEM teachers' self-efficacy (STEM-TSE) is a fundamental factor determining their job satisfaction and willingness to stay in the profession (Kasalak & Dagyar, 2020). STEM teachers face challenges such as high attrition rates and stress levels (Fuller & Pendola, 2019). Teachers’ confidence in their ability to teach is essential to teaching STEM subject areas effectively (Kelly et al., 2017; Wenner, 1995). Therefore, to improve the quality of STEM teachers’ instruction, job satisfaction levels, and retention rates, it is important to increase STEM educators’ self-efficacy levels.

A set of empirical studies support that providing STEM teachers with high-quality and ongoing professional development (PD) training is critical in improving STEM-TSE (Gardner et al., 2019; Kelley et al., 2020; Parker et al., 2020). PD is also associated with advancing teachers’ capacities and lowering the turnover rate (Nguyen & Redding, 2018). PD activities can help teachers reflect on their professional practices, improve pedagogy, and address content knowledge (Fulton & Britton, 2011). Additionally, PD providing timing coherently embedded within teachers’ daily jobs was demonstrated as effective in promoting teacher outcomes (Croft et al., 2010). As a result, effective PD activities can improve both teachers’ knowledge and their confidence in teaching the content in the STEM discipline, which can have a profound impact on teacher outcomes.

To help the design and implementation of future STEM teacher PD training, determining the effectiveness of current PD training programs in improving STEM-TSE is crucial. Additionally, synthesizing the key determinants of best practices in developing a high-quality PD for K-12 STEM teachers will be beneficial for the direction of future teacher training and PD programs. The objectives of this study were to investigate the effectiveness of PD in improving K-12 STEM-TSE and examine whether learning STEM pedagogy and other substantive PD characteristics positively impacted the estimated effects of PD on STEM-TSE.

Conceptual framework and literature review

PD and four sources of teachers’ self-efficacy

Self-efficacy is rooted in Bandura’s social cognitive theory (1977, 1986). A teacher’s sense of efficacy is defined as a teacher’s “judgment of his or her capabilities to bring about desired outcomes of student engagement and learning, even among those students who may be difficult or unmotivated” (Tschannen-Moran & Hoy, 2001, p. 783). Effective teaching practices are influenced by teachers' beliefs about their own teaching capacities (Tschannen-Moran & Hoy, 2001). Teachers with higher efficacy beliefs are more likely to have better classroom management procedures, student engagement strategies, problem-solving skills, and adequate instructional strategies that encourage students' learning (Zimmerman, 2000).

Bandura (1977, 1997) proposed four sources of self-efficacy: mastery experiences, vicarious experiences, social persuasion, and emotional arousal. Mastery experience refers to the experiences gained when an individual takes on a new challenge and succeeds. Mastery experience provides the most powerful source of one’s sense of efficacy (Bandura, 1997). Vicarious experiences refer to observing other people successfully completing a task. People are more likely to learn some positive beliefs about the self if they have positive role models in their lives. Social persuasion results when individuals receive feedback about their performance on a specific task. Emotional arousal occurs when a person experiences joy, excitement, or contentment while performing an activity.

PD is consistently used as an instrument to educate teachers. Teacher PD is defined as “activities that develop an individual’s skills, knowledge, expertise, and other characteristics as a teacher” (OECD, 2009, p. 49). The research acknowledged that teacher PD training is an essential approach to improving the quality of education (Coe et al., 2014; Darling-Hammond, 2000). Also, teacher PD programs significantly impacted the improvement of student achievement across various disciplinary foci (Blank & De Las Alas, 2009; Didion et al., 2020). In addition, a set of meta-analytic reviews of the literature have reported that teacher PD has a significant positive effect on teacher practices and outcomes, including the influence on teacher knowledge and skills, teacher social-emotional competence, and teacher well-being (Balta et al., 2015; Fischer et al., 2018; Iancu et al., 2018; Kraft et al., 2018; Oliveira et al., 2021; Thurlings & den Brok, 2017).

PD has also been found to contribute significantly to the sources of efficacy (Hill et al., 2004; Hoi et al., 2017; Ross & Bruce, 2007). For instance, PD gives teachers opportunities to complete a series of tasks related to their teaching practice and provides authentic evidence of success in mastery experiences (Tschannen-Moran & McMaster, 2009). By providing observation opportunities to see other teachers’ teaching and practice, teachers gain vicarious experience by developing a sense of adequacy while contrasting with others. Teachers are socially persuaded in PD when receiving favorable or negative feedback from colleagues and trainers on how well they are doing. When teachers are enthusiastic and joyful about what they are doing, they are positively physiologically aroused, which increases their sense of efficacy (Tschannen-Moran & Hoy, 2007). Thus, PD programs can improve teachers’ self-efficacy by adjusting the training environment and hierarchical difficulty levels of the training tasks.

Core features of STEM teacher PD and its impact on teachers’ self-efficacy

Desimone (2009) proposed a widely acknowledged core conceptual framework for investigating the effects of professional development on teachers' and students' outcomes. This model indicated five core features of high-quality PD that positively impact teacher attitudes and beliefs, including content focus, active learning, coherence, sustained duration, and collective and collaborative participation (Desimone, 2009). Later, Darling-Hammond et al. (2017) confirmed this framework and expanded it with an emphasis on collaboration, modeling, and support for teachers.

Regarding the core PD features for teachers in STEM disciplines, a set of studies summarized that most STEM teacher PD training was built upon Desimone’s (2009) and Darling-Hammond et al.’s (2017) framework, which centered on developing teachers’ content and pedagogical knowledge with various training approaches and formats (Gonzalez et al., 2022; Huang et al., 2022; Lo, 2021). Additionally, empirical evidence has demonstrated the impact of STEM teachers’ PD training on TSE as an important outcome measure of STEM teacher PD (Huang et al., 2022).

The present meta-analysis adopted the key components of Desimone's model (2009) to examine the impact of PD on STEM-TSE, since Desimone's work (2009) is one of the most widely used conceptual frameworks in studying the effectiveness of teacher PD programs. The core features of teacher PD studied in the present study include STEM-focused pedagogy training, sustained PD duration, instructional design for active learning, collaborative participation, and coherence by the timing of providing PD.

STEM focused

The integration of STEM-focused pedagogy has been discussed as including inquiry-based teaching, engineering-design-based teaching, problem-based teaching, task/project-based teaching, and the integrated-STEM (Guzey et al., 2020; Huang et al., 2022; Mohamad Hasim et al., 2022; Park et al., 2018; Thibaut et al., 2018). Teachers’ knowledge of STEM pedagogy has usually been developed by two paths: a) collective learning by lectures or presentations (e.g., Atiles et al., 2013, b) scaffolding authentic experiences (Huang et al., 2022).

Empirical research has found that the development of teachers' STEM pedagogy can significantly impact STEM-TSE (Mohamad Hasim et al., 2022). Teachers who investigated inquiry-based science teaching at a science museum, for example, demonstrated significant improvement in TSE and understanding of inquiry-based instructions (Duran et al., 2009). Several other studies that investigated how inquiry-based teaching and learning pedagogy training affected STEM-TSE found similar results (Deniz & Akerson, 2013; Gosselin et al., 2010; Patrick et al., 2014). Likewise, completing tasks involving data processing, mathematical modeling, or problem-solving positively affected TSE beliefs in classroom teaching (Ertmer et al., 2014; Evans, 2011; Haney et al., 2007; Maass et al., 2022). Additionally, engaging in the engineering design process and having an authentic integrated STEM learning experience were also found to be beneficial for the development of STEM-TSE (Ferand et al., 2020; Hammack et al., 2020).

Sustained PD duration

Literature shows that PD duration impacts teacher gains (Kennedy, 2016). Both the intensity (i.e., hours per day or per week) and the total contact time are essential components of an effective PD (Garet et al., 2001; Kowalski et al., 2020; Loucks-Horsley et al., 2009). Sustained STEM PD duration is highly correlated with other core PD components (Blank & De Las Alas, 2009). Effective PD that is sustained over time and includes a significant amount of hours is more likely to be of high quality (Ufnar & Shepherd, 2019). That is, PD with sustained duration gives teachers more opportunities to collaborate and engage in active learning, and spend more time on subject learning, so as to improve skills and competencies (Garet et al., 2001).

Instructional design for active learning

Active learning in teacher PD refers to participants who are directly engaged with the training materials close to their classroom and students (Darling-Hammond et al., 2017). Active learning in STEM teacher PD varies widely in implementation, and it includes instructional approaches as diverse as self-directed theory learning and self-explored practice, observing role models, reflections, providing and receiving feedback, and learning by doing (Blank & De Las Alas, 2009; Huang et al., 2022). These active learning activities in teacher PD are built upon concrete learning theories, such as Experiential Learning Theory (Kolb, 1984), Constructivism Theory (Vygotsky, 1978), and Social Learning Theory (Bandura & Walters, 1963), which combine behavior modeling with cognitive learning.

Research supports the positive relationship between STEM teachers' active learning in PD and TSE. Richter et al. (2013) compared beginning secondary mathematics teachers with and without constructivist-oriented mentoring, which included opportunities for reflection, experimentation with various teaching strategies, and independent decision-making. Findings revealed that beginning teachers who receive constructivist mentorship experience higher levels of efficacy and job satisfaction after the training and lower levels of emotional exhaustion. Peters-Burton et al. (2015) used a cognitive apprenticeship model-based PD and found that it impacted the perceptions of scientific thinking and inquiry instruction and the self-efficacy of 19 in-service STEM teachers. The findings showed that in-service STEM teachers shifted their views of inquiry and maintained a high level of self-efficacy throughout the study. Additionally, a lecture-focused STEM integration PD with hands-on activities and teachers' self-reflection during professional learning community sessions was found to have mixed findings as to the impact on STEM-TSE (Wang & Nam, 2015).

Collaborative participation

In STEM PD training, based on social learning theory (Bandura, 1977), collaborative participation has been increasingly seen as a core element. It refers to teacher collaborative learning facilitated by peer teachers, schools, and/or districts across small groups or disciplines (Darling-Hammond et al., 2017). STEM teachers' collaborative participation has been found to significantly impact TSE. For example, Kelley et al. (2020) investigated the effectiveness of collaborative PD training on high school STEM-TSE for integrated STEM instruction. Both science and engineering teachers worked collaboratively through an engineering design within a community of practice. The results indicated that STEM-TSE increased significantly after the PD training (Kelley et al., 2020).

Coherence

Coherence in teacher PD refers to supporting teachers to connect learning with their knowledge and beliefs, as well as to connect their teaching with student learning that is consistent with school, district, and state policies (Desimone, 2009). The coherence feature of teacher PD is embodied by the PD providing timing. Specifically, teacher PD is usually provided with two general timing options: job-embedded versus non-job-embedded (Croft et al., 2010). Literature supports that high-quality job-embedded ongoing PD training has an effective impact on improving TSE (Althauser, 2015). The term "job-embedded" refers to teacher learning that is rooted in daily teaching practice and student learning (Darling-Hammond & McLaughlin, 1995; Hirsh, 2009). Ongoing job-embedded PD usually comes with formative assessment development, examining student work, implementing individual professional growth plans, lesson study, professional learning communities, etc. (Croft et al., 2010). Whereas for not job-embedded PD, such as a summer institute, or other separate training sessions, usually takes place in or outside the school, is removed from instruction, away from students, and is centered on issues of related to practice (Croft et al., 2010).

Research questions

Although STEM education is important in educating students, few studies have systematically analyzed the characteristics of current K-12 grade level STEM teachers' PD and examined the effectiveness of PD training on STEM teachers' outcomes. Previous reviews of STEM teacher PD (see Appendix A) have included seven systematic/content reviews (Chai, 2019; Huang et al., 2022; Lo, 2021; Margot & Kettler, 2019; Mohamad Hasim et al., 2022; Seneviratne et al., 2019; Thibaut et al., 2018); and one meta-analysis (Gonzalez et al., 2022). These review studies focused on the characteristics of general STEM Teacher PD (n = 5) and features of PD for STEM integration (n = 3). These reviews found that the STEM Teacher PD most often examined is by content and pedagogical knowledge training and features of STEM instructional practices.

No studies, however, have specifically synthesized existing research on the effect of in-service teacher PD on overall TSE in STEM disciplines, though literature supports that PD is an effective approach to developing TSE (Gardner et al., 2019; Kelley et al., 2020; Parker et al., 2020). More specifically, no studies have used a meta-analysis approach to examine to what extent STEM teacher PD impacts TSE, and the effectiveness of learning STEM instructional practices and other STEM PD features in enhancing TSE.

Based on Bandura’s social cognitive theory (1977) as it relates to sources of efficacy, and Desimone’s framework (2009) of core features of teacher PD, this study used a meta-analytic approach to examine the effect of STEM teacher PD on TSE (Fig. 1: The conceptual model of this study). This meta-analysis focuses on identifying features of PD that can have a positive effect on STEM-TSE. The findings from this study can provide great insight into how PD can contribute to STEM-TSE and, therefore, enhance the design of PD in the future. The following research questions are examined:

  1. 1.

    What is the overall effect of the PD on STEM in-service teachers’ self-efficacy?

  2. 2.

    How do STEM-focused pedagogy training and PD duration moderate the effect of PD on STEM in-service teachers’ self-efficacy?

  3. 3.

    How do PD substantive characteristics, including active learning type, coherence feature, collaborative participation, and contribution to the source of TE, moderate the effect of PD on STEM in-service teachers’ self-efficacy?

Fig. 1
figure 1

Conceptual model

Methods

Literature search

The search retrieved databases from six resources: (a) the Educational Resources Information Clearinghouse (ERIC), (b) Education Source, (c) APA PsycInfo, (d) Web of Science, (e) Engineering Village, and (f) ProQuest. The search included PD and teacher efficacy as keyword concepts. For each of these searches, thesaurus terms were searched along with keywords in the titles and abstracts to find all relevant citations. The last search was retrieved in February 2023. The following keywords were used: professional development, professional training, in-service teacher education, continuing education, teacher education, faculty development, efficacy, teacher efficacy, etc. The search also included screening the reference lists of the included articles and existing review studies on the topic of STEM teacher PD effects on TSE to identify additional studies that might not have been sourced by the database search (Badampudi et al., 2015). To locate new non-peer-reviewed literature (e.g., dissertations, gray literature, and technical reports), we used these search terms to comb the ProQuest database. The full version of search words and terms appears in Additional File 1.

Inclusion criteria

All studies were reviewed and screened by two independent reviewers. The final included studies all met the following criteria:

  1. a.

    Empirical examination of the effects of PD on teacher efficacy. Secondary data analyses, literature reviews, and conceptual papers were excluded.

  2. b.

    Available in English and appeared from January 1977, to Feb 2023 (Bandura, 1977: the onset of Social Cognitive Theory).

  3. c.

    Samples included K-12 grade level, in-service teachers, teaching a subject in a STEM discipline. Studies with samples of higher education level educators were excluded.

  4. d.

    Examined the effectiveness of PD on teacher self-efficacy in classroom teaching (i.e., classroom management, student engagement, instructional strategy, and overall teaching self-efficacy) as measured quantitatively by standardized instruments and/or researcher-designed tools. Studies that only measured collective teacher efficacy were excluded.

  5. e.

    Provided the necessary quantitative information for calculating or estimating effect size.

  6. f.

    Employed randomized experimental or quasi-experimental designs and contained a control group. Single group pre/post-test design was excluded.

  7. g.

    Studies that compared the effect of the same PD on teachers’ efficacy for teachers with different characteristics, such as educational level, or prior experience, were excluded.

Screen procedures

The search in the six electronic databases resulted in an initial set of 9177 studies (Fig. 2). Covidence, a software specifically designed for systematic reviews, was adopted for the article screening. After removing duplicates, 6,365 studies were included in the title and abstract screening stage. A total of 927 studies were then identified as full-text screening eligible. Subsequently, a full-text review procedure was initiated and returned to 158 references for a further coding eligibility screening stage. Two independent raters screened during the title and abstract, and full-text screening stages, and they reached 89% and 92% agreement for the interrater reliability, respectively. The two raters then discussed and resolved the conflicts in Covidence to reach 100% agreement for the screening at each stage.

Fig. 2
figure 2

PRISMA flowchart

Additionally, two raters screened the 158 studies for data coding eligibility. As this study employed the most rigorous methodology to assess the impact of PD intervention on STEM-TSE, studies were excluded if they did not fall into pre/post and control group research design trials. This data screening stage returned a rate of 100% agreement. From the 158 studies, seven studies were identified as having the pre/post and control group design but poor statistical reporting. The researchers reached out to the authors of these seven studies that did not have enough statistical information to compute the effect size. One author provided a full version of the data results via email. Overall, from the 158 studies, 137 were excluded after the coding eligibility screening stage. In summary, the multistep literature search and screening procedures resulted in a total of 21 research articles (see Additional File 2) that included 48 independent cases on in-service STEM teachers’ PD effects on TSE.

Study coding

An extensive coding scheme spreadsheet created for use with systematic reviews (Egert et al., 2018; Huang et al., 2022) was modified for use in documenting information about the following study features, including (a) study publication information, (b) teacher and PD characteristics, (c) outcome measures (TSE), and (d) statistics for calculating effect sizes. All the features of each study were coded by two independent researchers. The coders met to check the agreement on the coding. Two coders revisited the studies and decided on the most appropriate code if there was a conflict in coding. Interrater reliability was determined using Cohen's kappa, and most variables satisfied this criterion at Cohen’s kappa = 1. Variables of coherence and collaborative participation reached Cohen’s kappa at 0.98 and 0.93, respectively. For studies where the two coders could not agree, the third author met with the two coders to resolve the disagreements, resulting in 100% agreement.

Outcome variables

The outcome measure was teacher self-efficacy (TSE), which focused on teachers’ overall self-efficacy in teaching in STEM disciplines, such as self-efficacy in classroom management, instructional strategy, student engagement, and STEM subject teaching. TSE that was measured in the included studies was drawn from (a) standardized instrument or its modified version, such as the Teachers' Sense of Efficacy Scale (TSES; Tschannen-Moran & Hoy, 2001); Math/Science Teaching Efficacy Belief Instrument (STEBI/MTEBI; Enochs & Riggs, 1990; Enochs et al., 2010), The Teacher Efficacy and Attitudes Toward STEM Survey (Friday Institute for Educational Innovation, 2012), and Dimensions of Attitude Toward Science (DAS; van Aalderen-Smeets & Walma van der Molen, 2013); and (b) self-designed instrument in the included studies (i.e., Nugent et al., 2018; Pinner, 2012).

Moderator variables

Based on the conceptual framework of this study in Fig. 1 (Bandura, 1986; Desimone, 2009) and a previous review of STEM teacher PD trends (Huang et al., 2022), six moderators were examined for their contribution to improving TSE through PD examined their contribution to improving TSE through PD. The variables analyzed were:

  1. (a)

    STEM-focused

    Previous studies indicate that content focused on pedagogy learning is one of the most important components of high-quality PD and is dominant in current STEM teacher training (Huang et al., 2022). As indicated in the literature, the widely adopted STEM pedagogy includes inquiry-based, task-based, problem-based, engineering design-based, and/or integrated STEM teaching (Guzey et al., 2020; Huang et al., 2022; Mohamad Hasim et al., 2022; Park et al., 2018; Thibaut et al., 2018). Therefore, according to the description of each PD program in the included articles, two categories were coded: teachers who received any type of the aforementioned STEM pedagogy training versus those who did not receive STEM pedagogy training.

  2. (b)

    Sustained PD duration

    As indicated by the literature, both intensity and duration of PD training are important aspects of effective PD training (Garet et al., 2001; Kowalski et al., 2020). The current meta-analysis documented the total PD contact hours for each of the studies included. The value was obtained based on the product of PD intensity (hours per dose) by PD frequency (total number of doses).

  3. (c)

    Active learning

    According to our proposed conceptual model, PD instructional design is one of the dominant factors determining teachers' active learning in the PD program (Blank & De Las Alas, 2009; Darling-Hammond et al., 2017; Huang et al., 2022). Three categories were identified within the included studies. First, constructivist-oriented training (e.g., van Aalderen‐Smeets & Walma van der Molen, 2015), which was based on social learning theory rooted in Constructivism (Vygotsky, 1986). Constructivism emphasizes that learners construct new understandings and knowledge through active participation in the learning process and social discourse with others (Piaget, 1973; Vygotsky, 1986, 1987). Studies were coded into this category if they mentioned that the PD instruction was based on constructivism or the self-exploratory format with an inquiry-based approach since these adhere to the constructivist learning model. The second category is cognitive apprenticeship-based design (Collins et al., 1987, 2018). This type of PD instructional design focuses on training teachers to learn by observing other exemplar teachers’ best practices and fostering modeling to help with cognitive and metacognitive development, along with practice and reflection (Peters-Burton et al., 2015). Examples of included studies are Chen (2020), Long (2015), etc. The third category is a mix of the aforementioned two types (e.g., Knowles, 2017; Mintzes et al., 2013).

  4. (d)

    Collaborative participation

    According to collaborative learning theory, group learning aids teachers in developing their higher-level thinking, and communication skills (Shabani et al., 2010). It is based on Vygotsky's Zone of Proximal Development (1978). We created two categories for this moderator: teachers who participated in collaborative activities versus no collaborative participation in PD. Collaborative participation was identified based on Huang et al.'s (2022) and Sancar et al.’s (2021) descriptions of collaborating with peers, such as lesson study with small group learning or discussion, co-teaching, peer coaching, peer tutoring by sharing experiences, and peer communication in the research community.

  5. (e)

    Coherence

    This study codified the timing of delivering the PD for the coherence feature from the included studies. According to the definition provided by Croft et al. (2010), four categories were identified: job/classroom embedded, summer institute, separated training sessions after school or during the semester, and blended with both job-embedded and summer institute or both summer institute and follow-up training during the semester.

  6. (f)

    PD contribution to sources of efficacy

    According to the definition of sources of teacher efficacy in the literature (Bandura, 1997; Hill et al., 2004; Hoi et al., 2017; Ross & Bruce, 2007), we coded the contribution of PD to sources of efficacy based on its design and task types. Five categories were documented, including mastery experience, vicarious experiences, social persuasion, emotional arousal, and mixed if the PD contributed to more than one source.

Effect size calculation

Morris’ (2008) procedures to calculate the unconditional effect size (Hedges’ g) for PD on STEM-TSE for each study were performed, which allowed treatment and control group comparisons and control for possible differences in the pre-training conditions (see formulas in Appendix B). Morris’s procedures allow for treatment and control group comparisons and control for possible differences in the pre-training conditions. A meta-analysis for Hedge’s g of pre-experimental scores was performed, showing that there were no baseline differences between treatment and control groups prior to the PD training (g_pre = − 0.13, 95%CI [− 0.27, 0.01], p = 0.06).

Data analysis

Meta-analysis pertains to some underlying assumptions. One of the assumptions is that the integrated effect sizes are statistically independent. However, a large portion of the included studies reported multiple data cases that resulted in more than one effect size. The multiple effect sizes within a single study may not be statistically independent due to one experimental manipulation measuring more than one similar dependent variable. Therefore, a multilevel random-effects model was used to account for this dependency. The overall observed study-to-study variation was tested for heterogeneity of variance using the Q statistic.

To be more specific, the meta-analysis was conducted using the metafor package (version 4.0.5) from R (Viechtbauer, 2010). A meta-regression model including all moderators was performed to minimize Type I error. Robust variance estimation (RVE) of the variance–covariance matrix from the data using the clubSandwich package from R (version 4.0.5) was used for fitting the multilevel model accounting for dependent data. RVE was used because this approach gives way to including all dependent effect sizes in a single regression model, even when the exact dependency is unknown (Hedges et al., 2010; Pustejovsky & Tipton, 2022; Tanner-Smith & Tipton, 2014; Tanner-Smith et al., 2016). The correlation between each effect size was constrained at r = 0.6 (Pustejovsky & Tipton, 2022). In the meantime, to avoid multicollinearity in the meta-regression, the analyses controlling for each moderator variable were run separately to explore heterogeneity in the reported estimates to back up the results.

Power analysis

A power analysis indicated that a sample size of 48 was adequate for detecting the PD effect on STEM-TSE when using the multilevel random-effects model for meta-analysis (Harrer et al., 2021). This power test was performed in R by the “dmetar” package. The power.analysis function implemented the formula by Borenstein et al. (2011) in R was used for the calculation of the power estimate. The plausible overall effect size (d) of a treatment under study compared to the control, expressed as the standardized mean difference (SMD) was set at 0.50. The expected number of studies (k) to be included in the meta-analysis was 48, and the mean sample size of the treatment group and control group in the studies to be included in the meta-analysis was set at n = 14, respectively. The alpha level to be used for this power computation was 0.05 and the between-study heterogeneity was set at “moderate”. The results showed that the power was 100%, which indicates that there were enough samples in this meta-analysis. We also gave the power analysis a second test using the sample size of the number of articles included (n = 21). The results showed a power of 99.61%, which is higher than the usual threshold of 80% and showed that there were enough samples in this meta-analysis.

Publication bias

Publication bias has the potential to skew meta-analytic results, which perhaps warrants having considerable doubts about meta-analyses that claim positive findings (Mathur & VanderWeele, 2020). Publication bias was examined by funnel plot (Duval & Tweedie, 2000) and Egger’s regression test (Egger et al., 1997). Additionally, sensitivity analysis was conducted using leave-one-out analysis and Cook’s distance measurements, as they essentially combine information about the leverage and fit of a study in the meta-analysis (Viechtbauer & Cheung, 2010). The metafor package in R was used for publication bias and sensitivity analysis checks (Viechtbauer, 2010).

Results

Summary of studies

In total, there were 21 articles, from which 48 effect sizes were extracted. Data from a total of 1412 teachers were included in the analysis. The sample size varied from 13 to 166 STEM educators at the end of each PD. Looking at the publication information of the included articles, about 62% were published (n = 13) and 38% were unpublished (n = 8; i.e., conference papers and dissertations), ranging from the year 1997 to 2022. The majority of the articles (n = 14, 67%) were conducted in the United States, and the remaining articles (n = 7, 33%) were from countries including China, Canada, Lebanon, Germany, Greece, and the Netherlands.

In terms of the methodological features of the included studies (n = 48), approximately half of them (n = 24) had a randomized controlled trial design, whereas the other half (n = 24) had a quasi-experimental design with the non-randomized assignment of participants. Regarding the grade level of the participating STEM teachers, 41% were from the primary grade level, 40% were from the purely secondary level, and 19% of the studies investigated mixed levels with both primary and secondary levels. The sample size and study characteristics are presented in Table 1. A more detailed description of each included study is included in Additional File 3.

Table 1 Sample sizes and study characteristics

The descriptive statistics for specific characteristics of PD training programs are presented in Table 2. In terms of STEM-focused pedagogy training, 29% of studies introduced inquiry-based learning pedagogy to teachers (n = 14), 23% of studies included integrated STEM learning pedagogy instruction (n = 11), and 19% of studies had task/problem-based learning (n = 9). 13% studies covered general STEM content and pedagogy but did not include these specific learning approaches (n = 6). Furthermore, 16% of studies did not focus on training teacher content and pedagogical knowledge (n = 8); therefore, there was no related instruction. Regarding sustained PD duration, among these included studies, the average and median total contact hours of the PD programs were 25.86 and 18.00 h, respectively (n = 31).

Table 2 PD characteristics of the included studies (n = 48)

For the active learning features coded by its PD instructional design, 33% were coded as constructivist-oriented (n = 16), 40% were cognitive apprenticeship-based design (n = 19), and 27% studies adopted a mixed PD instructional design (n = 13). Moreover, 69% of studies reported that teachers had collaborative participation in the PD training (n = 33), and the other 31% of studies did not (n = 15). Additionally, for the coherence coded by PD providing timing, 13% of studies conducted PD training as job-embedded (n = 6), 29% of studies offered their PD training during the summer that was called summer institutes (n = 14), 31% of studies provided separate PD training in the semester (n = 15), and 19% of studies offered as blended with both job-embedded and summer institute or both summer institute and follow-up training during the semester (n = 9). Four cases did not clearly indicate their coherence status by providing timing.

With regard to the PD contribution to the sources of teacher efficacy, 21% of studies demonstrated that the PD program purely contributed to the source of mastery experience for self-efficacy (n = 10), 6% of studies only contributed to the source of social persuasion (n = 3), 19% of studies had a PD program that only contributed to the source of vicarious experiences for self-efficacy (n = 9), and 4% PD programs contributed to the emotional arousal only (n = 2). However, 50% of PD programs contributed to multiple sources of self-efficacy (n = 24).

Research question 1: main effect of overall unconditional estimate

To address our first research question, the effects of PD on K-12 STEM teachers’ self-efficacy were investigated by the unconditional model without any moderator variables (Table 3). The overall summary estimate for the 48 standardized mean difference (SMD) effect sizes was 0.64 (95% CI [0.20, 1.08]) with a p-value of 0.0045. The data demonstrated a statistically significant high amount of heterogeneity (I2 = 94.99%, Q (47) = 362.381, p < 0.0001; Higgins et al., 2003). This result showed that, compared with the control group, on average, the STEM teachers’ PD training yielded a statistically significant impact on STEM teachers' self-efficacy with a medium SMD effect size (Cohen, 1977).

Table 3 Unconditional model

Sensitivity analysis and publication bias

The results effectively demonstrated that the current meta-analysis does not have significant publication bias. A funnel plot in Fig. 3 demonstrated an approximately symmetric plot for the effect sizes, which suggests free of publication bias. Egger’s regression test (Egger et al., 1997) was further inspected as a supplement test (t = 0.72, p = 0.47). Its non-significant result supports no statistically significant publication bias. As it was replicated in all leave-one-out sensitivity analyses, the positive effect of PD on TSE was not driven by specific studies ([0.4922, 0.6517]; ps < 0.0001). Cook's distance measure detected one study to be a potential sensitive case (Cook's d = 0.22). The random-effects model was therefore twice run, once with and once without that study. The difference between the I2 with and without that study was 1.17%, indicating a minimally decreased between-study variation by removing this study. Given these results, the study was kept in the analyses.

Fig. 3
figure 3

Funnel plot

Research question 2: moderation effect of STEM-focused pedagogy learning

A meta-regression model was performed to examine the moderation effect of STEM-focused pedagogy learning (see Model 1 in Table 4). First, we hypothesized that teachers who received STEM-focused pedagogical knowledge, such as inquiry-based, problem-based, and/or task-based learning strategies, would demonstrate a higher effect size on improving teacher self-efficacy than teachers who did not receive STEM pedagogical knowledge from the PD training. However, the adjusted moderation effects of STEM pedagogy learning were statistically and negatively significant at p < 0.01 (β = − 1.46, 95%CI = [− 2.55, − 0.37]), indicating that STEM-focused pedagogy learning was less effective in promoting TSE than those not learning.

Table 4 Effect of STEM pedagogy learning and its interaction with PD duration

Subsequent analysis

The results of the effect of STEM-focused pedagogy learning were negative compared with teachers who did not learn STEM pedagogy. This finding raised a flag reminding us to look into whether the impact of STEM-focused pedagogy learning interacted with the impact of PD duration or dosage. Literature shows that PD duration impacts the improvement of TSE significantly (Liu & Liao, 2019; Yang, 2020). Furthermore, the literature supports that PD duration is an important factor in strengthening certain knowledge, as a longer PD duration provides more active learning opportunities and content-focused instructional activities than a shorter duration (Desimone, 2009; Postholm, 2012). STEM-focused pedagogy is the foundation of instructional content in the STEM field. Therefore, in this study, we ran a second model to predict the interaction effect of STEM-focused pedagogy learning and PD duration on TSE (see Model 2 in Table 4).

Results indicated the effect of PD duration on increasing TSE for teachers who received STEM-focused pedagogy learning is statistically significantly higher than for teachers who did not receive STEM pedagogy learning in PD training (β = 0.24, k = 27, p < 0.0001). Figure 4 plots the results that teachers who did not receive STEM-focused pedagogy training in PD demonstrated a decreasing trend in the effect of predicting TSE; whereas teachers who received STEM pedagogy learning showed an increasing trend in predicting the effect on TSE.

Fig. 4
figure 4

Interaction effect of STEM pedagogy learning and PD duration

Research question 3: moderation effect of PD substantive features

Meta-regression models were performed to examine differences in the average effect of the moderator variables. The summary estimates for each moderator variable are in Table 5.

Table 5 Meta-regression model for substantive PD features

In terms of the active learning by PD instructional design, the cognitive apprenticeship-based group that uses demonstration-based learning with reflection and feedback showed lower moderation effect than other approaches, though it was not significant (β = − 0.94, p > 0.05). Furthermore, when the moderation effect of PD providing timing was examined, the results revealed that the based group, PD provided as job-embedded, was more effective than other types of timing. All other timings showed statistically significant less effectiveness in improving teachers' self-efficacy [i.e., summer institutes (β = − 2.36, p < 0.0001), separate training during the semester/after school (β = − 2.17, p < 0.0001), and mixed timing (β = − 2.22, p < 0.01)].

Additionally, collaborative participation and PD contribution to sources of efficacy did not show statistically significant results. The findings indicate that collaborative participation in PD activities did not significantly improve teachers’ efficacy in teaching compared to those who did not participate, and each source of efficacy contributed by PD did not have statistically significant differences in predicting teachers’ self-efficacy.

We are, however, cautious in interpreting the results due to the relatively small sample size. Even though running the meta-regression model that includes all moderators could prevent an increase in Type I error, all moderators were added to the model separately out of concern for multicollinearity issues (see Appendix C). Appendix C shows similar findings that back up the results in Table 5.

Discussion

This meta-analysis supports the significant contribution of PD as the source of teachers’ self-efficacy. It provides important information regarding the components of effective PD in improving K-12 grade level STEM. To the best of our knowledge, this is the first study that examines the effect of experimentally designed PD on STEM teacher self-efficacy. The results from the present study demonstrated that teacher PD had a medium effect size (g = 0.64) on improving STEM teachers’ self-efficacy in classroom teaching (Cohen, 1977). This meta-analysis confirms findings from previous empirical studies (e.g., Aaron Price & Chiu, 2018; Mintzes et al., 2013).

Importance of STEM-focused pedagogy training

The current meta-regression analysis quantitatively revealed the positive contribution of learning STEM pedagogy on STEM teachers' overall self-efficacy enhancement, even though a previous review study (Seneviratne et al., 2019) qualitatively synthesized mixed findings of the PD effect on science teachers’ efficacy specifically in inquiry-based pedagogy across the literature. The results of the current meta-analysis demonstrated that teachers who received STEM pedagogy training showed an increasing trend in predicting the effect on TSE, and vice versa. It also indicated that if the PD does not contain a pedagogy component, the longer the training, the worse effect it will have. This implies that when providing STEM teachers with adequate contact time for learning STEM content and pedagogical knowledge, their self-efficacy is more likely to reach the optimal level than that of those who do not have training in STEM pedagogy in PD.

This result highlights the importance of sustained content-focused STEM teaching pedagogy in STEM teacher PD programs. The current STEM teaching workforce remains uneven in preparation and qualifications across countries (Athanasia & Cota, 2022; National Science Board, 2022). Research pointed out that a considerable number of STEM teachers went through alternative qualification programs rather than formal teacher education programs (Rotermund & Burke, 2021). This will result in a lack of student teaching experience, formal training on teaching pedagogy, and placing in schools with high rates of minority enrollment in the STEM teaching workforce (NSB, 2016; Rotermund & Burke, 2021). Thus, a considerable number of STEM teachers may be more aware of the challenges after reporting to the teaching position, which raises the need for STEM pedagogy.

Moreover, this study shows that rather than looking at the impact of pedagogy learning alone, taking the sustained PD duration into account might be more appropriate in predicting the effects of PD training. STEM teacher PD programs should consider that the effect of learning pedagogy on improving TSE may vary depending on the contact time of training. According to Fig. 4, though the interaction effect between PD duration and STEM-focused pedagogy learning demonstrated an increasing trend in enhancing TSE, the growth curve is slow. The relatively slow growth in TSE implies that STEM pedagogy learning impacts the improvement of TSE over an extended period of time. That is, STEM-TSE cannot be expected to improve tremendously after one shot or a short period of STEM pedagogy learning. On the contrary, it takes time, effort, and patience. Policymakers and teacher PD program developers need to notice the long-term effect of learning STEM pedagogy. Therefore, providing ongoing and consistent PD training to develop teachers’ STEM pedagogy calls for more effort. Future research may need to investigate the optimal duration and intensity of STEM instructional strategy training.

Importance of multi-facet PD design

Previous studies indicated that, among the four sources of efficacy, the mastery experience supported by teacher PD has the strongest impact on TSE (Tschannen-Moran & McMaster, 2009). However, our results demonstrated no significant difference in the contribution of PD to each aspect of the source of efficacy when predicting STEM-TSE. This finding shed light on the implications for future STEM PD programs: rather than focusing on developing teachers' mastery experience in teaching STEM subjects, the effective PD design should balance the proportion of supporting mastery experience as well as other sources of efficacy so that to reach the proximal effect on the improvement of TSE.

More specifically, the results of moderation effects by collaborative participation and PD instruction design further support the importance of contributing to sources of TSE through multi-facet PD designs. In the present meta-analysis, the effects on TSE did not differ by whether participants participated in collaborative activities in PD. Among the included studies, teacher collaborative participation was mostly designed to support verbal persuasion such as small group discussions, and peer sharing and feedback (i.e., Aaron Price & Chiu, 2018; Kaschalk-Woods et al., 2021). While embedding collaborative learning opportunities in STEM PD training, rather than only supporting verbal persuasion experience by receiving feedback from peers, implementing more higher-order thinking activities such as argument-based inquiry and more hands-on collaborative activities (i.e., STEM experiments performed by group) to support affective states and mastery experience may boost the effects on training outcomes. Additionally, receiving feedback from experts on their practice (i.e., Nadelson et al., 2013) and collaborating with the higher education research community in STEM project teaching and learning (i.e., McCollough et al., 2016) might be the support for a multi-facet PD design with by contributing to multiple sources of efficacy.

Further, regarding the active learning reflected by PD instructional design, this study found that there were no statistically significant differences among the PD instructional approaches, such as constructivist-oriented PD instructional design, cognitive apprenticeship-based PD, and a mixed approach of both designs. Therefore, to maximize the PD effects, when designing future PD training programs, policymakers and program developers should consider the integration of mastery experience along with verbal persuasion, vicarious experience, and effective states. In addition to affording vicarious experience by encouraging STEM teachers to observe exemplified teachers' modeling and share thoughts, developing teachers’ cognitive and metacognitive development by scaffolding the authentic experience with the practice of STEM pedagogy and following the STEM subject characteristics to explore the tasks and lesson designs for their real class teaching might be more effective.

Therefore, providing multi-facet PD designs that include apprentice experience to support vicarious experience, constructivism-oriented instruction supporting mastery experience with learning by doing, and collaborative participation in enhancing verbal persuasion, as well as creating a comfortable training environment (e.g., expert on-site support in classroom teaching, training format option, or providing encouragement and assistance throughout the PD) for supporting emotional arousal might be more effective in increasing STEM teachers’ overall level of self-efficacy in classroom teaching.

PD providing timing matters

This study found that job-embedded PD training tailored for their classroom instruction leads to the most effective impact among all types of PD providing timing in terms of the representation of the coherence PD feature. In line with previous empirical research, ongoing job-embedded PD training could maximize the effect on TSE (Althauser, 2015; Croft et al., 2010). Our findings support that job-embedded TPD connects current PD content with STEM teachers' classroom teaching while also providing agile and adapted support to teachers' needs.

Conclusion and future directions

This meta-analysis backs up the considerable role that PD plays in TSE and sheds light on key PD elements that can help boost STEM-TSE. As far as we are aware, this is the first study that particularly looks at the overall impact of experimentally designed PD on STEM-TSE. The findings are valuable for school personnel and policymakers in providing insights for future STEM teacher PD programs. The significant positive interaction effects of PD duration and STEM pedagogy learning on TSE confirm the importance of ongoing STEM pedagogy learning to boost TSE at a proximal level. Regarding other PD features, the findings suggest that, rather than focusing on developing teachers' mastery experience, future PD training programs might be more effective in raising TSE, if they integrate all sources of efficacy by enabling teachers to plan lessons by adhering to the STEM subject characteristics, watching exemplary teachers teach, closely linking PD content to teachers' everyday instruction, encouraging teachers’ reflection and feedback, and creating a comfortable training environment. Additionally, ongoing job-embedded PD timing matters in the enhancement of STEM-TSE.

Although the scope and rigor of this meta-analysis permit great confidence in the results, there are several limits to the generalizability of the findings that should be noted. First, the sample size is relatively small. Given the small sample size of this study, research results should be interpreted with caution. Second, despite including research from seven different countries, our analyses were limited to studies published in English, which might have influenced our effects. However, excluding non-English articles is unlikely to substantially alter the overall meta-analytic conclusions (Morrison et al., 2012; Nussbaumer-Streit et al., 2020). Moreover, meta-analyses are not exempt from coder/rater bias, which threatens internal validity; in the same manner, it affects individual studies. To minimize this threat, we employed Cohen’s Kappa, shown to generate valid scores.

Concerning recommendations for researchers, conducting studies to investigate the effective duration or dosage of STEM instructional strategy training on improving TSE would help understand the best practices for introducing STEM pedagogy to teachers. Further, the investigation of the impact of in-service PD training on the self-efficacy of STEM teachers who were certified through alternative programs is still rare in the literature. Future research may seek best practices to develop the knowledge and skills of this group of teachers. Meanwhile, this study used the most rigorous approach to investigate the effect of PD on STEM-TSE by only including pre/post and control group design studies. The final 21 eligible studies in the current meta-analysis were screened from the initial sample of 6,365 indicating that a great number of experimental research in the STEM teacher PD field lacks a control group in the literature. A control group ensures the internal validity of studies and creates a benchmark for comparing the experimental results with experimental groups (Campbell & Stanley, 1963; Guetterman et al., 2018; Shadish et al., 2002). Therefore, future research should include comparison groups when studying teacher PD in STEM education.