1 Introduction: need to disentangle online PD courses

Online teacher professional development (PD) courses have increasingly been developed and investigated since the 2010s (Aldon et al. 2019; Borba and Llinares 2012) for developing teachers’ content knowledge (CK), general pedagogical knowledge (GPK), and pedagogical content knowledge (PCK, which is the focus of this paper), with a peak during the pandemic closures in 2020–2022 (Engelbrecht et al. 2020).

However, efficacy studies on PD online courses have been limited (Hennessy et al. 2022; Koellner et al. 2023), in particular in Germany (Lipowsky and Rzejak 2021), the country of the current study. In a recent survey, Capparozza et al. (2023) found only 16 studies investigating effects of online PD across all subjects; only three of them investigated the effects of the online PD course on changes in teachers’ knowledge (rather than emotional reactions to the PD or changes in teachers’ orientations). As most of these studies holistically investigated the effects of complete courses (Capparozza et al. 2023; Means et al. 2013), a persistent need was articulated for conceptually disentangling the underlying elements and principles of instructional design and for separately studying their effects (Aldon et al. 2019; Asterhan and Lefstein 2024; Desimone and Pak 2017; Koellner et al. 2023).

In our study, we focus on PD sessions that consist of two phases: In the inquiry phase, learners work on problems with limited guidance to become aware of the specific hurdles. In the systematization phase, the discovered knowledge is consolidated (Kapur 2010).

The goal of the focused randomized controlled efficacy trial presented in this paper was to study the effects of particular PD design elements in the systematization phase of an online PD session. The design elements in our PD programme were PCK videos (summarizing PCK elements treated in the inquiry phase; see Seago et al. 2018) compared to group discussions, an often-discussed design element to be maintained also for online PDs (Aldon et al. 2019).

In the theory section of this paper, we explain our study focus by embedding it into the state of research on two-phase instructional approaches and the design of online PD courses and derive a hypothesis of differential effects. In the methods section, we describe how we used the online PD sessions as research format for a focused randomized controlled trial. The results section reports on the findings of the study, and the conclusion discusses the differential effects of the design elements in light of other studies and derives affordances and challenges of conducting online PD sessions.

2 State of research on systematizing videos and discussions in online PD courses

2.1 Two-phase instructional designs: inquiry and systematization in PD courses

Two-phase instructional approaches (in which a first inquiry phase is followed by a systematization phase) have been identified as effective in instructional psychology for students and adult learning for various content areas (Kapur 2010). The research reviewed by Loibl et al. (2017) revealed that in the inquiry phase, the learners’ prior knowledge should be activated while working on problems, even if they are not completely solved. After these inquiries, the systematization phase has been outlined as essential to organizing the partially constructed knowledge and relating it systematically to the regular knowledge in the content area, with explicit learning opportunities to fill knowledge gaps and prioritize certain ideas, so that possible partial failure in the inquiry phase becomes productive. For classroom studies on two-phase approaches, students’ prior knowledge was shown to have a differential impact on their learning from the systematization phase, in which students’ holding the targeted misconceptions profited more (Loibl and Leuders 2019).

For teachers’ learning of PCK, active learning has also been identified as critical: There is strong empirical evidence for initiating teachers’ inquiries into specific knowledge elements and content-related teaching practices being a key design principle in effective PD courses (Garet et al. 2001; Timperley et al. 2007; Yoon et al. 2007). However, the ways in which constructed knowledge and reflected teaching practices are systematized in PDs and connected to the targeted PCK elements has gained only limited research attention so far (hardly explicated in the research overviews mentioned above), although systematization has been identified as critical in qualitative studies (e.g., Borko et al. 2014; Lesseig et al. 2017). So, we chose to study the under-researched systematization phase with respect to differential effects.

2.2 Design principles and design elements in PD courses: systematizing videos and group discussions

A domain-overarching meta-analysis for various target groups (Means et al. 2013) and a conceptual review of 362 papers (Bozkurt et al. 2017) showed that online courses are not per se more or less effective than face-to-face learning, as their effects depend on the implemented principles and elements of design.

Online PD courses

Research reviews focusing online PD courses for teachers revealed that design principles identified for effective face-to-face PD also apply to online PD courses: long duration, content focus, active learning, interaction between peers, and proximity to relevant teaching practices (Aldon et al. 2019; Capparozza et al. 2023; Engelbrecht et al. 2020; Quinn et al. 2019). Online PD courses have additional affordances in their higher flexibility and accessibility for teachers with difficulties in reaching face-to-face PD courses (Hennessy et al. 2022; Koellner et al. 2023). These affordances were key to the quick increase of online PD courses during the pandemic closures (Engelbrecht et al. 2020) that also reached the German context. However, very few studies have investigated online PD courses in the German context (Capparozza et al. 2023; Lipowsky and Rzejak 2021), and like for all massive open online courses across subjects (Bozkurt et al. 2017), there is a need to investigate in more depth the effects of particular design elements (Aldon et al. 2019; Asterhan and Lefstein 2024; Capparozza et al. 2023; Hennessy et al. 2022; Koellner et al. 2023).

Classroom videos

documenting typical teaching practices count as effective design elements for teachers’ inquiries aiming at developing teachers’ knowledge, as their joint analysis and discussion bear many opportunities for this (Major and Watson 2017) and for the underlying PCK aspects (Blomberg et al. 2013). Yet the depth of teachers’ learning in these video-based inquiries heavily depends on the teacher educators’ abilities to facilitate the discussions so that teachers’ contributions are focused onto and explicitly connected to the targeted PCK elements (Blomberg et al. 2013; Major and Watson 2017). But not all teacher educators are equally prepared for facilitating these focused and connective discussions (Borko et al. 2014; Lesseig et al. 2017). These findings call for further investigation into how particular design elements for systematization phases can support or partially replace teacher educators’ facilitation, for instance, unmoderated small-group discussions, “although many PD models utilize video … as the centerpiece … there is dearth of research on the design features and, in particular, the articulation of how video is intended to contribute to teacher learning” (Seago et al. 2018, p. 30).

Instructional PCK videos

(i.e., videos that provide explanations about the PCK elements in view) might be a promising design element for supporting the systematization phase in teacher PD (as used, e.g., by Seago et al. 2018). Empirical studies on other content areas have supported this design choice with their evidence that instructional videos can be effective for learning (e.g., Kulgemeyer and Peters 2016, for physics classrooms). In line with general findings that instructional explanations must not replace learners’ own cognitive engagement (Wittwer and Renkl 2008), the sequencing of video use has been studied for learning other topics, such as software programming: Review videos summarizing the content after an active learning phase were shown to have an additional effect on participants’ learning gains (van der Meij and Dunkel 2020). To emphasize the specific consolidation function of the instructional video in our two-phase PD approach, we use the term systematizing PCK video to refer to instructional videos that are implemented for systematizing the PCK elements actively addressed in the inquiry phase. We hope to transfer the instructional affordances of systematizing videos from students’ learning of science content knowledge elements to teachers’ learning of targeted PCK elements, for which no research was documented in a survey on use of videos in teacher PD (Major and Watson 2017).

Group discussions

Given the overall high relevance of teacher collaboration (e.g., in communities of inquiry; Jaworski 2006), substantial design research efforts have also been invested into initiating peer communication in online PD courses (Aldon et al. 2019; Borba and Llinares 2012), in synchronous online PD sessions (e.g., with breakout rooms with and without facilitators) and in asynchronous online PD sessions (e.g., with collaborative digital boards; see Seago et al. 2022). Collaborative settings such as group discussions provide opportunities for active learning, which is a key characteristic of effective PD (Desimone and Pak 2017; Lipowsky and Rzejak 2021). Active learning opportunities through discussions can consist of the analysis of classroom videos, student products, or participants’ individual summaries. Because these kinds of group discussions might serve to consolidate insights and knowledge, we investigated them as another promising design element.

In our study, we aimed to compare the beforementioned two options that may be able to activate learners: systematizing videos and group discussions. Both group discussions and systematizing videos can be used as follow-ups to confront students’ individually constructed, possibly flawed or incomplete ideas with the intended ideal. Both forms of consolidation of acquired knowledge can enable a comparison with one’s own current knowledge and the option of adaptive self-correction (Kapur 2010; Loibl et al. 2017), yet little is known about whether one design element of the PD is more effective than another and whether teachers with low prior knowledge might profit more than students (Loibl and Leuders 2019).

2.3 Online PD courses and two levels of evaluating success

To evaluate the success of teacher PD courses, Kirkpatrick’s (1956) four levels of evaluation have been widely used, teachers’ immediate (emotional and affective) reactions, teachers’ learning of knowledge, their behaviour (in terms of teaching practices), and results (in terms of student achievement; Lipowsky 2010). For the PD course content in this paper, earlier studies provided evidence for the efficacy on the third and fourth evaluation levels of behaviour (Prediger et al. 2023) and results in student achievement gains (Prediger et al. 2019). On this basis, the current focused controlled trial can concentrate on the more immediate evaluation levels, teachers’ reactions and learning of knowledge, because these are essential to scrutinizing the effects of particular design elements in efficacy trials (Sloane 2008).

The evaluation of teachers’ emotional and motivational reactions is meaningful for their learning, and this also applies to teacher PD courses (Kirkpatrick 1956; Korthagen 2017). Gaines et al. (2019) found that positive experiences during PD sessions were expressed through engagement, application, and reflection, but negative experiences slowed down engagement and decreased the chances that teachers would transfer the PD content to their own teaching. We selected boredom, enjoyment, and perceived usefulness as informative indicators for teachers’ reactions: Boredom provides information about the suitability of the learning pace offered and the way in which the learning content and learning activities are sequenced (Götz and Hall 2014). Enjoyment indicates the satisfaction of the participants with their achievement after engaging with the learning content (Ainley and Hidi 2014). Perceived usefulness provides information on how well a certain activity fits into future plans (Eccles and Wigfield 2020), for instance, whether the participants consider the learning content to be relevant to their teaching activities. Thus, reactions with low values of boredom, high values of enjoyment and perceived usefulness can operationalize the PD success on the first evaluation level.

For online PD courses, most evaluations have not gone further than this first evaluation level of teachers’ reactions, and online have rarely been combined with assessing learning, in other words, growth in teachers’ knowledge (Capparozza et al. 2023). This combination was focused on in the current focused controlled trial (the PCK content in the current study is introduced in Sect. 3.2).

2.4 Research questions

According to the state of research, online PD courses with classroom videos and systematizing PCK videos seem to bear potential for gaining positive reactions and possibly for enhancing teachers’ knowledge (Aldon et al. 2019; Capparozza et al. 2023; Engelbrecht et al. 2020; Quinn et al. 2019), but have still been under-researched: “There is not yet a large enough body of research on how to design and implement effective online PD” (Koellner et al. 2023, p. 554). The research gap is large (a) in the German context (Capparozza et al. 2023; Lipowsky and Rzejak 2021), (b) with respect to systematizing the PCK being studied, and (c) with particular focus on teachers with low prior knowledge (a group currently increasing in the German context due to teacher shortage and increasing out-of-field teaching). It was these concerns that drove the current study’s investigation of teachers’ emotional and motivational reactions and PCK growth in a video-based online PD session.

This research focus was built upon the substantial body of research on PD design principles and design elements for initiating and supporting teachers’ inquiries into teaching practices and their underlying PCK elements (Garet et al. 2001; Timperley et al. 2007; Yoon et al. 2007) in classroom videos (Blomberg et al. 2013; Borko et al. 2014). It aims to fill the research gap on the systematization phase, which is often left to spontaneous oral facilitation by teacher educators (Borko et al. 2014) but could be scaffolded more strongly. Because the latter is particularly relevant for online PD courses with varying degrees of facilitation capacities (Seago et al. 2022), we compare two particular design elements that might compensate for limited facilitation capacities: systematizing PCK videos and group discussions. We pursue three research questions in our efficacy trial:

  • RQ1: Teachers’ emotional and motivational reactions: How do the teachers differ in their motivational-affective reactions (operationalized by boredom, enjoyment, and perceived usefulness) across conditions in the online PD session?

  • RQ2: Comparing two design elements: Can teachers’ PCK be significantly enhanced by the online PD session? Is there a difference between the treatment conditions of the video group (i.e., watching a systematizing video) and the discussion group (i.e., discussing individual summaries)?

  • RQ3: Teachers with low prior PCK: Do teachers with low prior PCK benefit equally in terms of their growth in PCK from systematizing videos and small-group discussions?

3 Research context of the Mastering Math PD course

The focused randomized efficacy trial was conducted in a 2-hour online PD session within the research context of the 6‑month Mastering Math PD course (Prediger et al. 2019). In this section, we briefly present this overall PD context (Sect. 3.1) and the PCK content for the efficacy trial (Sect. 3.2).

3.1 Overall PD context: Mastering Math PD program on fostering students’ understanding of basic arithmetic concepts

The Mastering Math intervention program was developed for low-achieving students in Grades 5 and 6 who need a second chance to develop conceptual understanding of basic arithmetic concepts such as place value understanding and the meaning of multiplication. As German mathematics teachers in Grades 5–10 have tended not to be prepared for teaching these basic concepts from Grades 2 and 3, the Mastering Math PD course aims at developing teachers’ PCK on basic arithmetic concepts (Prediger et al. 2019). The PD course is usually organized around teachers’ own teaching experiments with the Mastering Math curriculum material, with six PD sessions introducing the underlying teaching principles, working with teachers on specifying the relevant concept elements, on noticing students’ learning needs in these concept elements, and on preparing the intervention sessions for enhancing students’ understanding. The overall effectiveness of the Mastering Math PD course and intervention program has been shown in earlier cycles with respect to teachers’ practices and knowledge (Prediger et al. 2023) and students’ learning gains (Prediger et al. 2019).

Based on these effectiveness findings in larger field trials, we can now focus on disentangling effects of particular design elements. In 2022, the course was conducted twice in six online PD sessions of 2 h each with volunteer mathematics teachers from all over Germany who were interested in overcoming students’ conceptual learning deficits after the pandemic school closures, and who had started to work with the Mastering Math curriculum materials in between.

3.2 PCK content the current study: specifying, noticing, and enhancing students’ understanding of multiplication

For achieving highly controlled laboratory conditions, the efficacy study concentrated on the second session of the online PD course in which the thematic focus was on PCK about understanding multiplication. According to Shulman (1986), teachers’ PCK is the knowledge needed for teaching subject matter content (such as multiplication) and entails several components: knowledge of representations, instructional strategies and representations, and knowledge about students’ learning and (mis)conceptions.

Understanding multiplication was selected as the content for this second session, as students’ challenges have often been documented (Clark and Kamii 1996). The key meaning of multiplication is counting in units, in other words, 3 × 5 is to be interpreted as three units of five each (Götze and Baiker 2021) in graphical representations such as dot arrays or as jumps on the number line (Barmby and Milinkovic 2011). The PD session aimed at enabling teachers to unpack and notice the lack of deep understanding in students’ incomplete explanations such as “the dot array matches 3 × 5 because here is 3 and here is 5.” Classroom interventions in which the unit structure is repeatedly articulated by teachers and students (in phrases such as “the dot array consists of three fives” or “three rows of five each”) have been shown to lead to significantly higher learning gains than interventions with the same tasks and graphical representations, but without meaning-related phrases articulated (Götze and Baiker 2021). This means that explicating the unit structure is a crucial part of effective instructional strategies for enhancing students understanding of multiplication. In total, teachers’ mastery of the PCK (in this case, the students’ understanding of multiplication) needs to unfold in teachers’ practices of specifying the relevant knowledge elements to be learned, noticing what students have understood, and enhancing students’ understanding. All of which was treated in the PD session (Prediger et al. 2023).

4 Methods of the focused randomized controlled efficacy trial

4.1 Sample

In our efficacy trial, 102 teachers participated. They had an average of 8.1 (SD = 8.6) years of teaching experience, with 69.6% reporting having a mathematics teaching certificate and 42.2% having encountered the Mastering Math curriculum material before.

All participating teachers were assigned completely at random to the two treatment groups, which had almost equal group sizes (54 participants in the discussion group D and 48 participants in the video group V). While all participants were included in the two treatment groups D and V when answering RQ1 and RQ2, RQ3 was restricted to teachers with low prior PCK, so we eliminated teachers with high pre-test performance (i.e., greater than 50%, n = 21). Additionally, we took into account technical difficulties resulting in missing answers in the pre-test to ensure its validity, so we excluded teachers with very low pre-test performance (i.e., lower than 25%, n = 25). By making this selection in the treatment groups D′ and V′ (both n = 28), we were able to examine the learning trajectories of teachers whose zone of proximal development we expected to be particularly well supported by the intervention. The intervention was tailored to typical difficulties of understanding multiplication and should thus meet the needs of participants with low prior knowledge.

Table 1 documents the teachers’ backgrounds for the whole sample and the treatment groups D/V for RQ1 and RQ2 and D′/V′ for RQ3. The two treatment groups D and V differ significantly only in the years of teaching experience, for which participants in the discussion group D reported significantly longer teaching experience (M = 10.0 years) than the video group V (M = 5.9) with p = 0.014. For the other variables, the treatment groups D/V and D′/V′ slightly vary, while χ2 tests do not detect significant differences (Table 1).

Table 1 Descriptive Data of the Sample and the Two Randomly Assigned Treatment Groups

4.2 Methods of data gathering for the efficacy study

4.2.1 Overview on the research design for the online PD session as a data-gathering format

Figure 1 provides an overview of the research design for the efficacy study that was organized in a short-term pre-post comparison trial with a completely randomized assignment of participants to two treatment conditions: the systematization phase with a systematizing PCK video (in the video group) or with small-group communication on individual summaries (in the discussion group). The treatment conditions formed the independent variable—the participants’ emotional and motivational reactions—and the dependent variable—their growth in PCK on understanding multiplication. We started the 2‑hour online PD session with a pre-test; continued with an inquiry phase on specifying, noticing, and enhancing the understanding of multiplication; followed with a systematization phase; and finished with a post-test.

Fig. 1
figure 1

Online PD session with THINK–PAIR–SHARE activities as a research format for a randomized controlled trial

Figure 1 shows how the pre-test and post-test were presented for the participants as an integral part of their PD experience alternating between activities and brief inputs in THINK (individual work), PAIR (partner work exchanging the individual ideas in breakout rooms), and SHARE (plenary discussions and brief inputs after activities) settings. All participants were transparently informed about the research interests and gave consent to the use of their written data for research purposes.

For the teachers, the pre-test was framed as first PD activity of the inquiry phase: in the 10-minute THINK Activity 1, three vignette-based items (serving as pre-test for the research) invited them to dive into the individual reflection about the meaning of multiplication, followed by a 5-minute questionnaire on some control variables (instruments are described below). These reflections were picked up in the 55-minute inquiry phase with PAIR and SHARE activities on specifying relevant concept elements of multiplication, noticing students’ understanding, and enhancing it. This inquiry phase was organized around a classroom video showing a teacher enhancing their students’ understanding.

In the systematization phase, the participants were randomly distributed to two treatment conditions for the 20-minute PAIR Activity 4: Participants assigned to the discussion group were invited to discuss among themselves in breakout rooms about main insights they had gained (without moderation of a PD facilitator). Meanwhile, the participants in the video group were invited to watch a 10-minute instructional PCK video (summarizing and systematizing the relevant PCK elements developed in the inquiry phase) and write individual notes.

In the 10-minute THINK Activity 5 (serving as post-test for the research), the participants were asked to “think through” the same items as in Activity 1 to check individually how their PCK had changed. Additionally, they completed a brief evaluation questionnaire capturing emotional and motivational reactions to the design pf the PD session.

In total, the intervention on multiplication took 100 min including pre- and post-test. After that, a 20-minute transfer phase informed about available curriculum materials and initiated the transfer to division.

This 120-minute online PD session was administered in three equal video conferences in Zoom on three days, keeping group sizes small to allow intense plenary discussions. The THINK activities were written down in a “Thinking Space” in LimeSurvey, in which the random assignment of participants to two treatment conditions was also organized.

4.2.2 Instruments

Control variables

A brief self-report questionnaire captured teachers’ backgrounds with respect to years of teaching experience, teaching certificate in mathematics, and whether they had already encountered the Mastering Math curriculum materials before.

Emotional and motivational reactions as the first evaluation level

To capture the first level of evaluation (Kirkpatrick 1956; Lipowsky 2010) of the designed online PD session, we collected data on participants’ emotional and motivational reactions in three facets: boredom, enjoyment, and perceived usefulness. To assess boredom and enjoyment during the online PD session, we applied the boredom scale and the enjoyment scale of the Achievement Emotions short scale (Bieleke et al. 2021), using statements such as “The PD session bores me” (four items, Cronbach’s α = 0.85) and “I enjoy participating in the PD session” (four items, Cronbach’s α = 0.86). To assess the perceived usefulness of the PD content, we extracted four items of the usefulness scale (Intrinsic Motivation Inventory; Ryan et al. 1991), for instance, “I think this is an important PD session” (four items, Cronbach’s α = 0.92). The participants provided answers to these items on 5‑point Likert scales ranging from 0 (strongly disagree) to 4 (strongly agree), inverted for boredom.

Dependent variable for the second evaluation level

The brief PCK pre-test and post-test on understanding multiplication consisted of three items each (for which the validity was based on qualitative cognitive labs and expert judgments; Fig. 2).

Fig. 2
figure 2

Items in the PCK pre-test and post-test with typical answers and scores

Participants were asked to specify what understanding multiplication entails using their knowledge on representations (Shulman 1986) in Item 1 and to analyse children’s understanding in Item 3 (noticing using knowledge on students’ learning). In Item 2, the participants received an incomplete dot array drawn by an imaginary student that misses a reference to multiplicative unit structures and were asked to write questions and instructional strategies to enhance the student’s understanding. The open enhancing Item 2 was administered before the more closed noticing Item 3 to avoid influence.

4.3 Methods of data analysis

Coding of open items

For each item, teachers’ answers were coded and rated with respect to conceptual understanding and the language needed to learn it (see Fig. 2), resulting in standardized partial scores between 0 and 1. The open items were rated by one of the authors and a student research assistant who had received a 4-hour training. Ratings that did not match were solved by discussion. A satisfying interrater reliability of Cohen’s κ = 0.83 was reached (see Prediger and Wischgoll 2023, for details). The PCK pre-test and post-test scores were determined as averages of partial scores of the three items.

Statistical analysis

The data analysis of variables used for this study revealed 7.4% missing values distributed over 37.25% of all cases. To test whether the missing data were distributed completely at random, we calculated Little’s test, which became significant (χ2 = 193.498, df = 133, p < 0.001), indicating that the data were not missing completely at random (MCAR). To test whether data were missing at random, a binary logistic regression model was calculated for the variable of interest, the dependent variable PCK post-test score (dummy coded). The other variables were used as covariates. The results became significant only for boredom and enjoyment, which indicates missing at random for these scales (MAR). We decided to impute data per the expectation-maximization algorithm (Dempster et al. 1997), with metric variables (PCK pre-test and post-test score, years of teaching experience, boredom, usefulness, and enjoyment) and categorial variables (prior encounter of Mastering Math material and teaching certificate in mathematics).

To answer RQ1, we calculated an independent t-test with the treatment groups as grouping variable and boredom, enjoyment, and perceived usefulness as dependent variables for each.

To answer RQ2 and RQ3, we calculated an analysis of variance with repeated measures with the treatment group as between-subjects factor and the PCK pre-test and post-test scores as within-subjects variables. The variables years of teaching experience, teaching certificate in mathematics, and prior encounter of the Mastering Math were not included in the analyses of variance as covariates because the assumption of homogeneity of regression slopes was violated.

All statistical analyses were done with SPSS 29.0 and conducted on an alpha level of 0.05. Effect size measures were reported as η2 (0.01 as a small effect, 0.06 as a medium effect, and 0.14 as a strong effect) or Cohen’s d (0.20 to 0.50 as a small effect, 0.50 to 0.80 as a medium effect, and > 0.80 as a large effect, Cohen 1988).

5 Empirical findings

5.1 Teachers’ emotional and motivational reactions to the PD session

In RQ1, we asked for teachers’ emotional and motivational reactions to the design of the PD session. Means and standard deviations for the three scales of boredom, enjoyment, and perceived usefulness are documented in Table 2.

Table 2 Teachers’ Emotional and Motivational Reactions to the PD Session

Although the PD session included two tests (25 out of 100 min), the participants in the whole sample reported low values for perceived boredom and high values for the experienced enjoyment during the PD session and the perceived usefulness of the PD content.

We also compared the emotional and motivational reactions between the treatment groups and tested for differences using independent t-tests. For boredom and enjoyment, no significant differences between both groups were found (for t-test statistics, please see Table 2). The video group (M = 3.44) seemed to have perceived the PD content as slightly more useful than the discussion group (M = 3.12), but this difference is not significant on the 5% level (p = 0.052).

In total, teachers in both treatment conditions experienced this research format equally as a useful, non-boring, and joyful PD session.

5.2 Effects of the PD session on the growth of all teachers’ PCK

In RQ2, we compared the growth in PCK from pre-test to post-test for all teachers in the two treatment conditions. Table 3 documents that both treatment groups D/V substantially benefitted from the brief PD session: Although the groups slightly differed in their PCK pre-test scores, both ended with substantially higher PCK post-test scores.

Table 3 Effects of the PD Session With Two PD Design Elements on All Teachers’ PCK Growth

The ANOVA with repeated measures revealed a significant main effect of time (Ftime(1, 100) = 137.51, p < 0.001) with large effect size (ηp2 = 0.58). The interaction effect between time and treatment was not significant (Ftime×group(1, 100) = 3.32, p = 0.071). These results suggest that time had a significant impact on PCK about understanding multiplication, with scores increasing significantly from pre- to post-test. But the insignificant interaction effect with group indicates no significant differences for the PCK growth between video group V and discussion group D regarding the whole sample, even if the descriptive data show a slight advantage for the video group.

5.3 Effects of particular design elements on PCK growth for teachers with low prior PCK

RQ3 was aimed at comparing the two design elements for teachers with low prior PCK which was operationalized as those with some PCK in the pre-test (above 25% of the maximum score) but not fully accomplished PCK (below 50%). Table 4 documents that both treatment groups D′/V′ substantially benefitted from the PD session, both groups started with almost equal PCK pre-test scores and ended with substantially higher PCK post-test scores.

Table 4 Effects of Two PD Design Elements on PCK Growth for Teachers with Low Prior PCK

The repeated measures ANOVA revealed a significant main effect of time (Ftime (1, 54) = 105.03, p < 0.001) with large effect size (ηp2 = 0.66). The interaction effect between time and treatment was significant (Ftime×group (1, 54) = 6.67, p = 0.013) with moderate effect size (ηp2 = 0.11). These results suggest that time had a significant impact on PCK on multiplication, with scores increasing significantly from pre- to post-test. Specifically, the results indicate that for the teachers with low prior PCK, the video group benefitted significantly more than the discussion group for their PCK growth.

In contrast, the repeated measures ANOVA for teachers with high prior knowledge (with pre-test scores above 50%; MD′ = 0.53, SDD′ = 0.07, MV′ = 0.54, SDV′ = 0.07) showed a significant large main effect of time (Ftime(1, 19) = 7.50, p = 0.013, ηp2 = 0.28, with post-test scores MD′ = 0.59, SDD′ = 0.15, MV′ = 0.67, SDV′ = 0.13), but no interaction effect between time and treatment group (Ftime×group(1, 19) = 1.23, p = 0.282). On the other hand, the repeated measures ANOVA for teachers with very low prior knowledge (with pre-test scores below 25%; MD′ = 0.17, SDD′ = 0.05, MV′ = 0.17, SDV′ = 0.00) showed a significant large main effect of time (Ftime(1, 23) = 45.17, p < 0.001, ηp2 = 0.66, with post-test score MD′ = 0.40, SDD′ = 0.22, MV′ = 0.44, SDV′ = 0.22), but no interaction effect between time and treatment group (Ftime×group(1, 23) = 2.47, p = 0.129).

6 Discussion

6.1 Summary and embedding of findings

“While more research is needed to ascertain the effectiveness and impact of various forms of online PD, it is clear they offer a valuable option for large numbers of teachers” (Koellner et al. 2023, p. 554). The state of research suggests that online PD courses are promising, and thereby of high interest for more research: on the first evaluation level of teachers’ emotional and motivational reactions, but also on the second level of teachers’ growth of knowledge (Capparozza et al. 2023), and not only in the overall effects, but for particular design elements (Asterhan and Lefstein 2024; Koellner et al. 2023).

Like other evaluations of online PD courses, our study showed that online PD sessions can get welcoming reactions (Capparozza et al. 2023). This is important, as we know from learning research with adolescents that emotions and motivation are important for successful learning processes (Pekrun et al. 2007). In our study, we were able to show that the teachers reacted to the course offered in a way that was considered conducive to learning in PD courses (Gaines et al. 2019). The experience of boredom was low, whereas enjoyment was high for both treatment groups. We conclude from this result that the sequencing of learning activities as think-pair-share with a consolidation by group discussion or systematizing videos met the participants’ demands with regard to the learning pace and content selection, which is a necessary condition for processing of the learning content.

Perceived usefulness is another important indicator of whether PDs courses might fit into participants’ demands. The high values in participants’ reactions of our study can be interpreted as indicating that the learning content and the processing matched their personal goals (Eccles and Wigfield 2020). In consequence, we infer that the course attracted persons whose goals for teacher PDs were met to a high degree. Although our results did not show significant differences between the treatment groups, we can see a tendency that we believe should be pursued further in research: Video group participants who watched a systematizing video tended to perceive the PD session as more useful than the discussion group participants who discussed their individual summaries, which might be traced back to the function of the video to draw participants’ attention to the most important aspects. On the other hand, we have to note that the discussion group did not have facilitators to ensure productive discussions. We assume that facilitators conducting group discussions in a productive manner, that is, those who follow established discussion norms, include the contributions of all participants, pursue a defined goal, and review relevant insights of the discussion (Lesseig et al. 2017), could have an impact similar to that of the systematizing video. However, research has shown that facilitating productive discussions in teacher PD courses has not been easy to do (Borko et al. 2014).

Research questions RQ2 and RQ3 focused on the second level of evaluation: the effects on the growth of teachers’ PCK (Kirkpatrick 1956; Lipowsky 2010). While not all online PD courses have been shown to be effective for knowledge growth (Bozkurt et al. 2017), our PD session resulted in significant growth in teachers’ PCK with substantial effect sizes over time for both treatment groups. Whereas many studies have investigated teachers’ inquiry phases (Garet et al. 2001; Jaworski 2006), we focused on the systematization phase, which is critical for sustained knowledge consolidation after initial inquires (Loibl et al. 2017). We see that as a first indication of the affordances of online PD courses with thoroughly designed systematization phases, at least for short-term knowledge growth, even a significant growth was found for teachers with high prior knowledge.

The main empirical finding relates to RQ3, which involved comparing two design elements for the systematization phase for the semi-sample of teachers with low prior PCK: Teachers with low prior knowledge are an important and growing target group for PD courses in Germany (due to increasing teacher shortages). While it made no difference to teachers with high prior PCK whether the systematization phase was implemented via systematizing videos or via collaborative group discussions, clear differences were discernible for the roughly half of the teachers with low prior PCK: These teachers’ PCK growth benefitted from the systematizing video significantly more than from group discussions. This result shows that systematization videos can be a valuable tool, especially in digital or digitally supported professional development. This result confirms the efficacy of systematizing videos, which has so far mainly been proven in other areas (van der Meij and Dunkel 2020). An explanation for this differential effect might lie in matching phenomena identified for students by Loibl and Leuders (2019), whose two-phase instructional approaches were particularly effective for those students whose initial misconceptions were explicitly treated in the systematization phase. We can hypothesize that the half of the teachers with above 25% and below 50% PCK pre-test scores were those whose ideas were best leveraged in the matching systematization video, whereas the videos might have matched less for the teachers with higher pre-test scores. A second hypothesis for explaining the differential effect draws upon the effects of drawing attention to the essential PCK elements. While teachers with high prior PCK might have already been able to focus their attention on critical aspects in the inquiry phase, those with low prior PCK might have needed the guide of the video for drawing the attention to the most relevant aspects.

Even if this finding is still bound to short-term laboratory conditions, it is highly interesting from the practical side beyond online PD, as systematizing videos might also support teacher educators in their often-challenging facilitation of systematizing discussions (Borko et al. 2014; Lesseig et al. 2017). Theoretically, the finding fits with those from other learning contexts (with students or adults in the context of content knowledge) in which an input that follows after a rich inquiry process can be powerful if it is sufficiently connected to the learners’ ideas emerging in the inquiry phase (Loibl et al. 2017). This can be achieved by expert facilitators (as shown by Borko et al. 2014; Lesseig et al. 2017) or by systematizing instructional videos (as in our video condition), but less by unmoderated discussions of the participants (as in our discussion condition), and we assume less by discussions moderated by inexperienced facilitators, but this is to be investigated in the future.

In sum, our findings on the first and second level of evaluation resonate with general findings that an explicit content focus and an explicit focus on teaching practices can be supportive to increasing teachers’ learning and emotional and motivational reactions to the PD content (Lipowsky 2010; Garet et al. 2001) and that systematizing videos can better support the needed explicitness.

6.2 Limitations and future research

Like in all controlled trials, the findings must be interpreted with caution and with regard to the methodological limitations of the study. For the findings on teachers’ emotional and motivational reactions and the overall effects on PCK growth, no comparison group was provided, only the evaluation within the whole sample participating in our PD session. This limitation often occurs (see the review by Capparozza et al. 2023), and should be overcome in the future by comparing different PD settings as face-to-face settings, asynchronous settings, or other types (Seago et al. 2022).

Whereas we hoped for few dropouts in the short-term setting, we lost participants’ answers due to internet time-outs and insufficient guidance in the switch between the video conference tool and LimeSurvey. In the future, dropouts might be further reduced by more explicit guidance and technical tools that are stable, for instance, against unintended closing of windows.

One methodological limitation inherent in the short-term research design is that our tests only had three items, for which statistical consistency is hard to achieve. Rather than striving for a unidimensional construct for a small PCK area with high resolution, we decided to ground the item validity of our intervention-sensitive instrument in qualitative cognitive labs and expert judgment. In the future, more methodological research should investigate how to cover a relevant and valid set of PCK elements in a short test duration and how to capture teachers’ enacted practices.

In our study, we used the systematizing video directly before the post-test. This must be viewed critically from the perspective of cognitive psychology, in particular cognitive load theory (Sweller 2016): The explicit instruction provided by the video may have contributed to the participants’ working memory being relieved by stronger guidance. This might explain a better performance in the post-test. In comparison, the discussion group received less guidance, which could explain the poorer performance. However, another point to consider is the expertise-reversal effect, according to which learners with low prior knowledge benefit from stronger guidance than learners with high prior knowledge. The results of this study suggest this effect and should be verified in further studies with additional follow-up tests with regard to sustainability of learning growth.

Finally, although the treatment groups were well balanced according to the reported participant characteristics, we do not know to what extent the groups’ formation had an influence on the performance in the post-test. For instance, we know that collective inquiries can be more or less productive depending on the interest to gain insight and the agreement to discuss thoroughly or not (Jaworski 2006). These unknown effects of collaboration can be taken into account in future research.

Additionally, potential ceiling effects for teachers with high prior PCK might also be overcome in the future by digital technologies, used for immediate assignments of participants to adaptive PD learning opportunities according to participants’ pre-test scores. All participants of PD sessions can benefit from this type of assignment.

6.3 Consequences for future PD and PD efficacy trials

The under-researched systematization phase seems to be relevant also for professional learning and should be systematically designed in future PD courses, with or without the moderation of PD facilitators and teacher educators.

Beyond these, we conclude with a remark on the practical aspects of research: This paper presented online PD sessions not only as a research subject, but also as a research format that offers practical and economic research opportunities for efficacy trials focused on particular design elements (operationalized as alternative treatment conditions). These have here been realized in a compact research design, with pre-test and post-test of 25 out of 100 min in one short-term PD session. In spite of its methodological limitations discussed above, the short-term PD session has been shown to reveal measurable effects and many practical advantages for focused efficacy trials. The reported positive emotional and motivational reactions suggest the PD format as promising to be transferred to other future focused trials that many researchers have requested (Koellner et al. 2023; Seago et al. 2018). We intend to extend the format to other design elements in the future.