Introduction

Professional development (PD) plays a key role in helping teachers stay up-to-date on current trends in education and equipping them with the knowledge and skills they need for teaching (Copur-Gencturk et al., 2019; Desimone, 2009; Hill et al., 2013; Penuel et al., 2007). Teachers, in general, and mathematics teachers, in particular, need mathematics-focused PD programs, given that teachers can enter the profession with limited prior preparation in the mathematics they are expected to teach (e.g., Heubeck, 2022). In fact, only 24% of US middle grade mathematics and computer science teachers had both a degree and certification in their field in 2017–2018 (Gray et al., 2022). At present, 35% of middle school mathematics and computer science teachers have neither a degree nor certification in the field in which they teach. Such a need for content-focused PD also exists at the international level in that teachers need opportunities to enhance their content and pedagogical content knowledge (e.g., Bold et al., 2017; Burns, 2023).

Yet in-person PD programs may not be accessible or available to teachers in need. Prior work has documented notable disparities in teachers’ access to quality PD programs (e.g., Glover et al., 2016; Powell & Bodur, 2019). Even when PD is available, the reality of teachers’ lives may not allow them to be at a certain place at a certain time. Thus, online asynchronous PD that is available has the potential to accommodate teachers’ busy schedules (e.g., Fishman et al, 2013). Such PD could reach nearly all US teachers and most other teachers worldwide regardless of their geographic location (e.g., Burns, 2023; Irwin et al., 2023; UNESCO, 2023; Van Ness & Varn, 2021). Yet while asynchronous PD is an attractive alternative to face-to-face or online synchronous PD, this format has potential drawbacks for teacher learning, such as the limited (or lack of) interaction with a PD facilitator and providing timely feedback. Thus, research is needed to investigate the extent to which teachers can learn in online asynchronous PD and under which design conditions they can learn.

Although considerable research has been conducted around online learning, very little has been dedicated to online asynchronous teacher PD. For myriad reasons, most research on online PD has focused on synchronous designs (e.g., Bragg et al., 2021; Fishman et al., 2013). Of the few available studies on online asynchronous PD (e.g., Polly & Martin, 2020), limited research, if any, has been conducted on the degree to which online asynchronous PD can be effective in equipping teachers with the knowledge and skills they need in teaching. Thus, research into asynchronous PD that can be completed anytime and almost anywhere is warranted.

Given that one of the concerns with online asynchronous PD is the need to provide adequate support (e.g., Kastberg et al., 2014; Lee et al., 2011; Polly, 2015), we implemented an intelligent tutoring system to provide just-in-time feedback as the teachers worked through the course. While considerable research has been conducted on intelligent tutoring systems for teaching mathematics (Mousavinasab, 2018; Steenbergen-Hu & Cooper, 2013), research on the use of such systems for teacher PD is limited, if any exists. Thus, research on the use of intelligent tutoring systems as a tool for supporting teacher PD is needed. Our research addresses this gap in the literature, as well.

Our work aimed to address gaps in the literature by investigating the potential for online asynchronous PD to enhance teachers’ content knowledge of mathematics (CK) and pedagogical content knowledge of mathematics (PCK), two important indicators of the expertise needed in teaching (e.g., Copur-Gencturk & Tolar, 2022; Li & Kaiser, 2011). This interactive, intelligent, and virtual program (IVIP) was designed to overcome a potential limitation of online asynchronous PD by creating an interactive learning environment using intelligent tutoring systems and providing just-in-time feedback based on mathematics teachers’ responses. Given that little is known about the potential of online asynchronous PD in teacher learning, this study provides initial empirical evidence on the fundamental viability of an interactive, online asynchronous mathematics-PD program with just-in-time feedback to support mathematics teacher learning. Specifically, we addressed the following research questions:

  1. (1)

    To what extent did mathematics teachers’ CK change by completing IVIP with just-in-time feedback?

  2. (2)

    To what extent did mathematics teachers’ PCK change by completing IVIP with just-in-time feedback?

  3. (3)

    To what extent was mathematics teachers’ prior online learning experience related to the development of their CK and PCK?

In the sections below, we introduce the conceptual framework that guided both the design of the program and the measures used in the study. We then situate our study in prior literature on intelligent tutoring and asynchronous programs and describe the design features of the PD we developed for this study.

Conceptual framework: teacher knowledge

Our work was grounded in two theoretically important and empirically distinct elements of the content-specific expertise teachers need in teaching: CK and PCK (e.g., Copur-Gencturk & Tolar, 2022). In line with prior conceptualizations of a robust understanding of mathematics (Copur-Gencturk & Tolar, 2022; National Research Council [NRC], 2001), we define CK as the knowledge of the rules and definitions of the domain and the ability to carry them out flexibly and appropriately (Eisenhart et al., 1993; NRC, 2001; Rittle-Johnson et al., 2015), an understanding of the conceptual underpinnings of these rules (Copur-Gencturk, 2021), reasoning about and evaluating mathematical situations (Copur-Gencturk & Tolar, 2022), and showing strategic competence with word problems (Copur-Gencturk & Doleck, 2021). For example, being able to identify what makes a situation proportional and solve mathematics problems involving ratios are indicators of mathematics CK.

We conceptualized PCK as the knowledge needed to make the content accessible to learners (Ball et al., 2008; Shulman, 1986). Pedagogical content knowledge includes the knowledge of particular representations, common student errors, strategies for engaging a variety of learners and the selection of activities that are aligned with a learning goal, responding to students’ mathematical needs based on their level of understanding, and identifying the areas of instruction that need improvement (e.g., Ball et al., 2008; Copur-Gencturk & Tolar, 2022; Copur-Gencturk et al., 2019). For example, knowing that some students struggle with reasoning multiplicatively (e.g., 6 is 3 times as much as 2) and instead tend to reason additively (6 is 4 more than 2) is an indicator of teachers’ PCK. Other examples of PCK would be knowing how to use drawn representations to support students’ understanding of a mathematical concept or knowing how to draw on students’ knowledge, resources, and lived experiences as a catalyst for their learning.

Literature review

Mapping the terrain of online PD research

In this literature review, we both situate our design in the existing literature and highlight research gaps that this study is addressing. From a review of the literature about online PD for teachers, Dede and his colleagues (Dede et al., 2009) noted that in the early 2000s, most research on any form of online PD cohered around one of four main questions: program design elements, the effectiveness of specific programs, the effects of technology on the learning experience, and the ways in which participants communicated in the environment. The work undertaken in the current study moves forward from these questions to instead focus on a system explicitly focused on improving teachers’ PCK and CK, what Dede and colleagues call “enablers of improvement” (p. 13), which in 2009 accounted for only 12% of the studies across all online teacher PD research considered by Dede and colleagues in their review.

In a more recent study, Huang and Manouchehri (2019) conducted a study of mathematics teacher educators to learn about online mathematics PD offerings in any form. They found that most reported courses focused on mathematics content more broadly or specific facets of PCK such as curriculum analysis. And, as recently as 2019, mathematics teacher educators perceived that teaching content and PCK online were more difficult than teaching the same ideas face-to-face (Huang & Manouchehri, 2019).

In our review of the literature, we found that most studies of any form of online PD in mathematics education are still largely comprised of individual case studies (e.g., Hjalmarson, 2017; McCrory et al., 2008; Polly, 2015; Polly & Martin, 2020) and are still grounded in best practices from face-to-face instruction (e.g., Surrette & Johnson, 2015), such as participant interactions. Consistent with these studies, our work shares important information about one such design. We extend the literature in that ours is a study of PD that enables improvement (Dede et al., 2009), a claim that requires quasi-experimental or experimental design to make.

While there is an increased presence of online PD research in general, particularly since COVID-19 changed the educational landscape, much of the research on STEM-related PD offered online focuses on either synchronous PD (e.g., Choppin et al., 2021;) or online asynchronous PD facilitated by a human (e.g., Fishman et al., 2013; Kastberg et al., 2014; Polly, 2015). We found no studies of interactive, virtual learning environments like ours in mathematics education. This was echoed in a recent systematic literature review of 53 studies about intelligent tutoring systems (Mousavinasab et al., 2018). The authors reported no PD studies at all using intelligent tutoring systems. They did review six studies with K-12 students and one with college students that were focused on aspects of mathematics.

Findings related to math teacher learning in online PD

Research consistently shows that online PD, whether synchronous or asynchronous, can be effective when it follows the principles of high-quality PD posited by Desimone (2009) including being content focused and aligning to the content teachers are expected to teach. The effectiveness applies to learning CK and PCK for mathematics teachers (e.g., Cady & Rearden, 2009; Polly, 2015; Polly & Martin, 2020; Pusmaz & Azdemir, 2012). Surrette and Johnson (2015) conducted a meta-analysis that included 20 studies of mathematics PD conducted between 2000 and 2012. They concluded that content-focused online PD can support teachers in deepening their knowledge, applying new pedagogies, and instructing students in ways that improve achievement in the content. The researchers noted a lack of research focused on the coherence between the content of the PD and the local or state curriculum, a gap our work can address because of the widespread adoption and adaptation of the Common Core State Standards for Mathematics in the USA (CCSSM; NGA Center & CCSSO, 2010). The CCSSM, and the state standards related to it, have made it more feasible to show alignment between content and local curriculum. Our instructional modules were developed aligned to the CCSSM.

While it is unsurprising that online synchronous PD can lead to meaningful learning (e.g., Choppin et al, 2021; Francis & Jacobson, 2013), considerable research has shown that well-designed online asynchronous PD also leads to positive outcomes. In fact, studies have shown that there is no statistical difference in outcomes between online asynchronous and face-to-face instruction in science (e.g., Fishman et al, 2013) and in mathematics (e.g., Russell et al., 2009; Seago et al, 2022). Further, studies have also highlighted that teachers prefer online learning to face-to-face learning (Pape et al., 2015). However, online mathematics content courses and mathematics methods courses continue to be perceived by teacher educators as being harder to teach than face-to-face courses (Huang & Manouchehri, 2019). By designing for a virtual facilitator to deliver instruction, we have made a one-time investment in the difficult endeavor of teaching mathematics CK and PCK online asynchronously that, while difficult to develop, can be reused many times.

Using video in PD

Research on mathematics teacher learning online has demonstrated that there are not significant differences between learning modalities. It has also shown that many of the same activities that are successful in face-to-face instruction can be used online. For example, in their study Seago, Knotts, and DePiper (2022) used three different approaches to teaching their video-rich PD: face-to-face facilitated by project staff, face-to-face facilitated by district staff, and self-paced, online asynchronous. Not only were there no statistical differences between the three conditions in learning PCK-related skills such as establishing mathematics goals and posing purposeful questions from participating in the four two-hour modules, but also, the level of completion was similar between conditions with about 82% of participants completing all or nearly all the materials. Like this study, ours relies on videos and analysis of student work as instructional strategies.

In another study relying on video-based PD, McCrory and her colleagues (McCrory et al., 2008) were able to use Adventures of Jasper Woodbury videos,Footnote 1 which use real-world enactments of stories to present mathematical situations for CK development and journal-article based discussion for PCK development in their online program. The researchers sought to understand the nature of the conversation that arose for two different groups: one group of elementary education generalists (n = 21) and one group of K-12 mathematics teachers (n = 25) in online courses that used both synchronous and asynchronous meetings. They noted that the Jasper Woodbury video prompted high-quality content-focused discussion for both groups but did not lead to fruitful conversation about pedagogy. In the course for mathematics teachers, they found that a journal article focused on student thinking was more successful for prompting discussion about pedagogy. Thus, they concluded that tasks in online courses are as important as in face-to-face courses. Building from this, we used somewhat different instructional strategies when designing for CK instruction versus PCK instruction.

Other PD using video and other observation-based approaches has also been shown to be effective. For example, Chieu and Herbst (2016) found that watching animation-based conversations was an effective way to engage preservice teachers in learning. In their online asynchronous conversations that were part of a blended course, the participants sensed the animations as having a social presence and were able to evaluate the teaching presented in the animations in meaningful ways. In our study, we intend the teachers to interact with the virtual facilitator as if she were a being present in their course.

Summary

In short, the research has shown that online asynchronous instruction can support the development of CK and PCK of mathematics. By using the design principles proven to be successful in face-to-face PD, online course developers can support meaningful learning for teachers. And, incorporating the same kinds of elements into the courses as are successful face-to-face can also be effective for supporting learning. All the examples presented above used some kind of interaction with the facilitators and among students as part of their learning design. In our study, we step away from that to ask whether PD that follows best practices (e.g., Desimone, 2009) and seeks to improve CK and PCK can engage teachers with tasks, videos, and analysis of student work to support teacher growth without a live facilitator present. Instead, we relied on just-in-time feedback from a virtual facilitator. In the next sections, we will examine what is known about interaction in online PD and facilitation in intelligent tutoring environments. Then, we will describe our PD system in detail.

Interaction in online learning

Both inside and outside of mathematics education, there is a strong literature base specifically focused on interaction in both synchronous and online asynchronous courses. These studies rely on interaction among the participants (e.g., Chieu & Herbst, 2016; Dennen, 2007; Hjalmarson, 2017) and/or are grounded in theories that focus on social engagement (e.g., McCrory et al., 2008; Sing & Khine, Yoon et al., 2020). For example, Yoon and her colleagues (2020) considered the “depth” of social interaction in an online, asynchronous course for high school biology teachers. In their analysis, they found that 63% of the content-focused posts were constructive-level posts. Such posts included learner-generated output beyond what was provided in the instructional materials without taking the opinions of others into account, meaning interaction was not contributing in meaningful ways to participants’ depth of learning.

Sing and Khine (2006) looked at similar questions through the lens of knowledge building versus knowledge acquisition. They considered a course with 11 in-service teachers from across the curriculum seeking an Advanced Diploma in Information Technology. In that study, despite efforts to foster a knowledge-building community, the researchers found that 60% of the posts in the learning space were Phase 1, meaning they stated an opinion or asked a clarifying question. Twenty percent were Phase 2, which focused on identifying and clarifying disagreements, and the remaining 20% covered all of Phases 3–5, which were the phases in which knowledge building was expected to happen. This suggests that while participants report that interaction is important to them (e.g., Bragg et al., 2021; Lee et al, 2011), they may not engage fully in that interaction.

This is echoed again in Kellogg, Booth, and Oliver (2014) who used social network analysis to examine interaction in two PD MOOC courses: one for district leaders who wanted to learn about technology planning and another for elementary and middle school teachers to learn about learning trajectories. In both online asynchronous courses designed to foster the development of peer networks, the researchers found that there was little interaction between participants. Some participants tended to broadcast without responding to others, while other engaged little or not at all. In fact, 23% of participants were networkers in the course for leaders and 36% in the course for teachers. Further, 87% of participants in the district leader course and 79% of participants in the teachers’ course were found to be “periphery” participators, meaning they had basically no interaction with anyone else in the course.

Studies like these raise questions about the importance of interaction in any form of online learning. In these studies, even with a strong focus on interaction in the PD, the learners focused on their own experience and learning. Further supporting this, for our study, Amador and colleagues (2019) posited that online asynchronous learning environments may be more “cognitive” because they are text-based, whereas online synchronous learning may be more “social”. Combined, studies of interaction in mathematics (and related fields’) PD suggest that engaging in an entirely online PD in which there is no interaction with other participants may be acceptable, or perhaps even preferable, for a large subset of adult learners. Thus, focusing more on immediate feedback provided by an avatar that feels present in the instruction, an element of intelligent tutoring systems such as the one used in this study, rather than interaction, may yield strong outcomes.

Facilitation and support in online learning

Another area of research related to interaction that informed our design and study has focused on the facilitator’s role. This is particularly important to our study because one of the strongest arguments for using an intelligent tutoring system for PD is the just-in-time feedback it can provide to any participant. Such feedback is not practical for online courses facilitated by humans, despite the importance of feedback for learning (e.g., Kastberg et al., 2014; Lee et al., 2011; Polly & Martin, 2020) and, when online PD is unmoderated, participants indicate they miss having support (e.g., Renninger et al., 2011). Further, Bragg et al. (2021) found in their systematic review of literature focused on online PD that learner supports are generally under-addressed in studies of PD. This study aims to add to that literature.

Polly and Martin (2020) noted the importance of feedback in their study a 40-h face-to-face workshop followed by a 20-h online asynchronous experience for teachers to learn how to use formative assessments in their mathematics teaching. Polly and Martin found that it was a challenge to help the teachers feel supported in the online environment even with the face-to-face experience preceding it. They concluded that having the teachers feel supported was critical for carryover of the intervention into the classroom.

Lee and colleagues (2011) analyzed 110 public health students’ perceptions about an online course to determine the extent to which perceptions of support were related to satisfaction and course performance. For this study, support was defined as a system that included instructional support, peer support, and technical support. They found a moderate correlation between each of these three kinds of support and student satisfaction with the course. Further, there was not a significant relationship between support and course performance, however there was a relationship between satisfaction and learning. Thus, they concluded that instructor feedback was critical for satisfaction and learning. They went on to assert that both immediate communication and ease of access to support were critical elements in helping students feel supported. These are elements that are inherent to intelligent tutoring systems and that we specifically attended to in our PD environment.

Research focused on instructor roles and feedback in online learning environments hints that virtual facilitation may be effective and acceptable to learners. For example, Dennen (2007) studied positionality of the instructor in online asynchronous learning. She found that in three online college courses (one in library science and two in communication), the instructor was either intentionally or tacitly positioned as the expert nearly all the time. And, even when the instructor attempted to change positionality, there was a strong tendency by the students to continue to hold the instructor in the position of expert. This suggests that having a model of an expert voice in the learning environment may be natural for learners.

This study extends prior research by taking aspects of online facilitation proven to be effective and automating them in ways we did not find elsewhere in the literature. We suspected that providing an omni-present expert to provide immediate, specific feedback to teachers’ thinking would be greeted with acceptance by teachers. This would show up both in teachers’ satisfaction and their growth in the course (e.g., Lee et al., 2011).

Learning in intelligent tutoring systems

Research on intelligent tutoring systems has demonstrated that not only can mathematics be learned in these systems (e.g., Anderson et al., 1995; Han et al., 2019), but also that K-12 students who used such systems in place of a tutor 50% of the time improved as much as students who worked with a tutor for the entire study (Steenbergen-Hu & Cooper, 2013). In their meta-analysis, Steenbergen-Hu and Cooper (2013) reported on two other studies in which there was more effect from using an intelligent tutoring system than having no human support. Intelligent tutoring systems, although not widely used in PD settings, have been used in K-20 mathematics learning for decades (e.g., Anderson et al., 1995; Ma et al., 2014; Pane et al., 2014; Steenbergen-Hu & Cooper, 2013).

As noted above, we have been unable to locate studies of intelligent tutoring systems focused on mathematics PD. However, there is a body of research from which to learn about the design of feedback in these systems. In one systematic literature review that included studies from 2009 to 2018 focused on feedback, Cavalcanti and colleagues (2021) considered whether automatic feedback improves the student experience and how such feedback is generated. The researchers identified 63 studies from across a wide array of subject areas. They found that while 22 papers did not provide evidence of student performance as it related to feedback, the remaining 41 studies all showed a positive relationship between feedback and student performance. Further, they found that of the 39 articles that discussed their methods for generating automatic feedback, 15 compared the learner’s solution with a desired solution. Because we were using AutoTutor, we used this approach to automatic feedback.

As described by Ma and colleagues (Ma et al., 2014), AutoTutor is designed to match learner responses to text that represents expectations. In our case, we identified common misunderstandings and struggles teachers and students are known to have about proportional relationships (e.g., Copur-Gencturk et al., 2022; van Dooren et al, 2008) and created feedback aligned to the tasks that addressed those misunderstandings. The resulting feedback can be comprised of hints, feedback, prompts, or assertions (Graesser et al., 2004). In their meta-analysis, Ma and colleagues (Ma et al., 2014), analyzed 107 articles about intelligent-tutoring systems using this approach to feedback. They found that for postsecondary participants (n = 6,767), there was an effect size for the instruction of 0.43 (SE = 0.05), which was significant (p < 0.05). For mathematics and accounting topics, regardless of age group (n = 8,038), the effect size was 0.35 (SE = 0.06) which was also significant (p < 0.05). When looking at the knowledge type, regardless of age group, the researchers found there was a significant effect size (p < 0.05) for each of the four knowledge types they measured: Procedural (0.39, SE = 0.05); Declarative (0.37, SE = 0.07); Mixed procedural and declarative (0.65, SE = 0.29); and Not reported (0.43, SE = 0.06). In short, the authors found the intelligent tutoring systems using feedback structures similar to those used in our study led to higher outcomes for other modes of instruction other than small-group tutoring and individual tutoring. In those cases, the outcomes of each condition were statistically similar. Combined, the research on feedback in intelligent tutoring systems supports our design. Our research extends the literature on the effectiveness of intelligent tutoring by looking at a specific, novel population (teachers) and by considering different categories of knowledge (CK and PCK).

Summary

Overall, our review of the literature both supports our design decisions and the approach we took to creating this intelligent tutoring system. It also highlights some of the holes in the literature our study addresses. Specifically, we are contributing to the research on mathematics teacher PD by examining the development of both CK and PCK of mathematics in this system supported by AutoTutor, which has proved to be effective in postsecondary settings for those studying a combination of procedural and declarative knowledge. Given the attention to feedback suggested in the literature on online mathematics teacher PD and the ongoing practical needs of offering PD at scale, understanding whether this approach to feedback can support learning is critical for the creation of more scalable PD systems that align to the principles of high-quality PD (e.g., Burns, 2023; Desimone, 2009).

The intelligent, interactive, virtual program (I-VIP) with just-in-time feedback

Because we wanted not only to create online asynchronous PD that was available to most teachers anytime and anywhere but also to create an interactive learning environment with just-in-time feedback, we took advantage of the affordances of intelligent, adaptive tutoring systems. The system allowed us to create an active learning environment in which teachers could build on their knowledge by engaging with the learning materials (e.g., Desimone, 2009). In particular, teachers were solving problems, analyzing student mathematical thinking in a written solution or from a clip of mathematics instruction, and reflecting on teaching practices. The system also provides a means to create an adaptive system that personalizes learning for the user based on content knowledge, mood, or emotions (Han et al., 2019). Consistent with the four conceptual components of intelligent tutoring systems (e.g., Ma et al., 2014), our system served as an interface for the user specifically about the topics of interest. In particular, teachers communicated with a virtual agent (e.g., the talking head in the upper right in Figs. 1 and 2) that used natural language (voice or text) and allowed for users’ natural language responses (Nye et al., 2014).

Fig. 1
figure 1

An example content knowledge activity

Fig. 2
figure 2

Sample pedagogical content knowledge activity focusing on analyzing the teaching and reflecting on the teaching practices

Given that receiving just-in-time feedback is critical for teachers’ learning, particularly for tailoring the learning process to teachers’ specific needs and existing knowledge (Philipsen et al., 2019), we relied on a set of key discourse moves in the system that were driven by an expectation–misconception framework. Specifically, for each activity, we anticipated how teachers would approach each activity (i.e., expectations or misconceptions), and we developed specific hints and prompts (e.g., asking teachers to explain their reasoning; highlighting an important element in the question) to lead them to the targeted answers. As a teacher answered an open-ended question, the dialog-based tutor analyzed their input to identify the parts of an ideal response the users had already mentioned and to provide hints to help the user explain each of the remaining concepts (Nye et al., 2021;

see Table 1 as an example of a participating teacher’s interactions with the virtual facilitator).

Table 1 Example of teachers’ interactions with the virtual facilitator in one content knowledge learning activity

We created the PD grounded in a model of CK and PCK that teachers should have for quality teaching (Ball et al., 2008; Copur-Gencturk & Tolar, 2022; Desimone, 2009). We conceptualized CK as having an understanding of the conceptual underpinnings of the mathematics teachers are expected to teach in school (e.g., Copur-Gencturk, 2021; Copur-Gencturk, 2022; Copur-Gencturk & Olmez, 2022) and PCK as the teachers’ understanding of students’ learning of these concepts (e.g., common patterns and struggles in their learning) as well as their knowledge of instructional tools and practices in their role of promoting or hindering students’ learning of these concepts (e.g., Ball et al., 2008; Copur-Gencturk & Li, 2023). The particular content targeted in this program was ratios and proportional relationships, given that prior research has documented that both students and teachers need additional support in this content area (e.g., Brown et al., 2020; Copur-Gencturk et al., 2022; Izsák & Jacobson, 2017; Jacobson et al., 2018; Weiland et al., 2021).

The program included CK and PCK modules with several submodules, each of which targeted key components of the corresponding knowledge domain identified in prior work (Cramer et al., 1993; Fisher, 1988; Lamon, 2012; Lobato et al., 2010; see Table 2 for the content and learning objectives of each module).

Table 2 Modules, submodules, and learning objectives

Through multiple cycles of revision over a 3-year period, we developed and finalized the content and activities of the program. In particular, we started with the development of the program by conducting an extensive literature review and requesting input from PD facilitators who had conducted PD in ratios and proportional relationships as well as content specialists in school districts. We interviewed PD facilitators and content specialists from over a dozen school districts across the USA to obtain feedback on the module content and to learn more about teachers’ needs, the resources they had available, and issues they saw in students’ learning of proportions. We then revised the content of the program based on this input and obtained feedback from scholars of mathematics who served on the Advisory Board of the project. The revised program content and the module activities were then tested with six participants, all of whom had experience in teaching mathematics in middle school. These teachers completed the program and answered a set of questions regarding their experience so that we could further improve the program content as well as the interactive nature of the program. The revised program was then completed by four middle school mathematics teachers. The team led by the first author met virtually with these participants as they were completing each activity. The teachers were asked a set of questions about each activity and about the overall program so that we could gather as much information as possible for the second round of revisions to the program content. We again revised these module activities based on their feedback. Additionally, we shared the modules with the Advisory Board to gain their feedback and revised the program accordingly.

The final program included two main modules: CK and PCK. The CK module included 35 interactive activities that aimed to enhance teachers’ mathematical understanding by providing teachers with opportunities to work on both qualitative and quantitative mathematics tasks (see Fig. 2), evaluate different solution strategies, and use multiple representations (e.g., Boston, 2013; Copur-Gencturk & Lubienski, 2013; Orrill & Brown, 2012). The PCK module consisted of 18 interactive, multistage activities organized into three submodules that aimed to enhance teachers’ PCK as it is used in the work of teaching by working on actual student work and middle school mathematics teachers’ instruction. Teachers were provided with opportunities to select and adapt tasks that were aligned with learning goals, unpack students’ work and connect the mathematical ideas in students’ responses, and reflect on teaching (see Fig. 2). In the PCK module, teachers also analyzed video clips of mathematics instruction to improve mathematical understanding of the students in the videos (Boston et al., 2003; Middleton & van den Heuvel-Panhuizen, 1995).

Methods

Study context

In this study, we utilized the data collected from two groups of teachers who completed IVIP with just-in-time feedback to investigate its role in teachers’ development of CK and PCK. The first group completed the program in early summer of 2021, whereas the second groupFootnote 2 was invited to complete the program a month after the first group had done so. We made slight changes in the program based on the feedback we received from the first group. We checked whether teachers’ learning differed by the group to which they belonged, and we did not find any statistically significant differences (t(55) =  − 0.70, p = 0.49 for the external CK measure; t(57) = 0.17, p = 0.87 for the diagnostic CK measure; t(55) = 1.12, p = 0.27 for the external PCK measure; and t(56) =  − 1.19, p = 0.24 for the diagnostic PCK measure). Thus, we did not control the group the teachers were in for our future analyses.

Because the program focused on ratios and proportional relationships, during the data collection phase of the study, we targeted mathematics teachers who were teaching at the grade levels in which these concepts are learned. In addition, to estimate the impact of our program accurately, we collected data from teachers located across the USA.

Teachers were contacted via email regarding the program and the study. Those who were interested in participating were invited to attend online meetings (scheduled based on their availability) to provide them with more context regarding the study and our expectations. During these meetings, the research team introduced the program and how to navigate the system as well as our data collection procedures. The teachers asked questions about the program, and those who agreed to participate in the study completed a set of surveys before and after they had completed the program.

Study sample

Of the 62 middle school mathematics teachers who agreed to participate in the study, 60 completed the program and at least one of the measures capturing the changes in their CK and PCK.Footnote 3 As shown in Table 3, the majority of participants in the analytic sample were female and White, similar to the profile of the US teaching workforce. Those who held a credential in teaching mathematics made up 60% of the sample. More than half the sample (53.3%) entered teaching through a 4-year teacher education program, 26.7% entered through a 5-year teacher education program that offered a master’s degree, and 15% entered through an alternative teacher education program.

Table 3 Background characteristics of teachers in the present sample compared with a nationwide sample of US teachers

Data collection procedures

Teachers first completed a set of surveys to capture their background information and baseline CK and PCK of proportional reasoning before they began the program. Teachers who finished the presurveys were given access to the program immediately afterward, and each person had one month to complete the program. Each knowledge component was measured by two sets of assessments: (1) validated proximal measures that were closely aligned with the content targeted in the program, to diagnose teachers’ mastery of the targeted concepts (“diagnostic” measures) and (2) external distal measures, to capture the change in teachers’ overall CK and PCK of ratios and proportional relationships (“external” measures). Teachers completed the diagnostic measures after completing each subcomponent of the program, whereas they completed the external measures after they had completed the entire program.

Measures

Diagnostic CK

This scale was developed by the team lead (the second author) to capture the degree to which teachers had mastered the key ideas targeted in the modules. To determine response process validity, which is the alignment between the test developers’ intentions for the item and the test tasker’s understanding of that item (e.g., Bonner et al., 2021; Bostic, 2021), we conducted think-aloud interviews with five participants who each completed one form (about half the items) and then participated in an interview about the form. To determine the face validity, we analyzed the participants’ responses to each item to determine whether it was “working” (e.g., whether it was right or wrong for the intended reasons) to identify which items to advance to the final item set. We shared items that appeared problematic or that did not seem to differentiate teachers with two members of our Advisory Board, one of whom was a mathematician and the other a mathematics educator. Finally, we collected data from over 200 middle school mathematics teachers for item validation. We identified 22 items for this scale by considering the expert feedback from the Advisory Board and teachers’ responses in the think-aloud interviews, as well as the performance of individual items on the validation study. We then created item response theory (IRT) scale scores by using data collected from teachers in the item validation as well as those in this study (N = 288). The empirical reliability of the scale was 0.71.

External CK

We also created a scale to capture the changes in teachers’ CK of ratios and proportional relationships by using a subset of items developed for a proportional reasoning form by the Learning Mathematics for Teaching (LMT) Project (Hill et al., 2004). The scale included 11 questions and 25 items (i.e., some items were testlets). The Cronbach’s alpha (i.e., an indicator of reliability) is 0.74. Teachers’ scores were computed by dividing their total number of correct answers by total number of items in the scale.

Diagnostic PCK

For diagnostic items on teachers’ PCK, we followed the same procedures and validation process as we used for the CK assessment. As with the CK assessment, we attempted to ask questions that were very closely related to the content of the PCK modules and in ways that were similar to that content. For example, in the PCK module, instruction focused on ideas such as selecting tasks to use, interpreting students’ work, identifying errors, and making sense of representations. The PCK assessment items paralleled these concepts. The final PCK scale included 14 items. The IRT scale scores were created and had an empirical reliability score of 0.50.

External PCK

Teachers’ PCK was also measured by a 10-item scale consisting of 2- to 3-min-long video clips of authentic instruction on ratios and proportional relationships and the corresponding open-text questions, as well as two multiple-choice questions. The video clips encompassed student–teacher interactions centering around ratio concepts. These questions and the video clips came from a pool of items that have been developed to measure teachers’ knowledge as it is used in the work of teaching and have been used in prior work (Copur-Gencturk & Li, 2023; Kersting et al., 2012). Teachers were given a brief description of the context for each video so that they would have the background to understand the instruction presented in the videos. The two multiple-choice items that were designed to capture teachers’ PCK were also developed by the LMT Project and were included in this scale.

Teachers’ responses were scored by using a 4-point rubric created by the first author and another researcher in mathematics education, based on their prior research on capturing teachers’ PCK (see Copur-Gencturk & Li, 2023). The data were blinded before coding to ensure that raters did not know whether the data came from a pre- or post-administration of the instrument or from which group of participants the data were collected. A Cohen’s kappa statistic of 0.81 indicated strong agreement between the two raters. The reliability of the scale is 0.61 (i.e., Cronbach’s alpha). Teachers’ scores on this scale were calculated by dividing the total number of points they received by the maximum number of points for the scale.

Prior online PD experience

We asked teachers before they began IVIP about any prior online PD experience they had. We asked whether they had participated in any online PD program, and if they had, we asked them to report the extent to which they felt they had learned from that program. Our rationale for including this variable in the analysis was that teachers’ prior online PD experience might help them adapt to learning in this kind of learning environment. Using this information, we created a variable with three categories: those who did not have prior online PD experience, those who had prior online PD experience with “some” reported learning, and those who reported learning “a lot” from a prior online program.

Data analysis

To investigate the change in teachers’ CK, we compared teachers’ scores on the pre- and posttests by using paired t-tests. To explore the change in teachers’ PCK, we followed the same procedure of comparing teachers’ scores on the PCK measures before and after the program with paired t-tests. Each knowledge component was measured by both diagnostic and external measures; thus, we report the results from four measures. To investigate the role of teachers’ prior learning experience from online PD, we conducted a set of regression analyses in which teachers’ posttest CK (or PCK) scores were predicted by their baseline CK (or PCK) scores and their prior online PD experience.

Results

According to data from the external CK measure, the teachers’ initial CK score was 0.72 on average before they began the program (see Fig. 3). After completing the program, their CK score increased to 0.76 on average, which is a statistically significant change (an effect size of 0.40 SD, p = 0.004). A similar pattern was also observed with the diagnostic CK measure in that teachers’ CK score increased statistically significantly (p < 0.001), yet the magnitude of this gain was more pronounced. Teachers’ scores increased on average from 0.14 to 0.56, an effect size of 0.99 SD.

Fig. 3
figure 3

Teachers’ Mean Scores on the Content Knowledge and Pedagogical Content Knowledge Measures Before and After Completing the Program Note. Error bars indicate an interval of ± 1 SD

Teachers’ PCK also increased during the time the teachers completed the program. The mean PCK score on the external measure was 0.35 before the program, and it increased to 0.39 after the program. This change is statistically significant (p = 0.01), with an effect size of 0.34 SD. The changes in teachers’ PCK scores, as measured by the diagnostic scale, were similar in magnitude, 0.32 SD (p = 0.02). The teachers’ scores increased from 0.28 logits to 0.40 logits on this measure.

Our analysis of the role of prior online PD experience and reported learning from such experience in teachers’ learning from IVIP indicated that teachers’ prior experience had a limited predictive ability on teachers’ posttest scores on the CK and PCK measures (see Table 4). In general, compared with no exposure to online PD programs, teachers’ positive prior experience seemed to be associated with an increase in their posttest scores after taking into account their entry-level knowledge. These results were not significant. Similarly, the prior, somewhat negative experience captured by teachers’ reports of little or some learning from their prior experience was linked to a smaller increase in teachers’ posttest scores compared with no prior online PD experience. These results were also not statistically significant.

Table 4 Linear Regression Models Predicting Teachers’ Posttest Content Knowledge (CK) and Pedagogical Content Knowledge (PCK) Scores

Discussion

The potential of online asynchronous PD facilitated by a virtual agent for supporting teachers’ learning is a new research area (e.g., Mousavinasab, 2018). In this study, we aimed to add to the literature by exploring the extent to which a program that provides interaction and just-in-time feedback through an intelligent tutoring system could enhance teachers’ CK and PCK. We found, based on the results from both locally developed and external assessments, that an intelligent, adaptive PD program with just-in-time feedback could lead to improvements in teachers’ knowledge.

In our study, we used two measures for each knowledge domain, which confirmed that teachers could indeed enhance their CK and PCK in online asynchronous PD. Our findings echo work with research on intelligent tutoring systems, which has consistently shown knowledge growth (e.g., Anderson et al., 1995; Han et al., 2019). Given that CK and PCK are important indicators of the quality of teachers’ instruction and students’ learning (Baumert et al., 2010; Hill et al., 2005; Kersting et al., 2012) and they are distinct elements of the content-specific expertise needed in teaching (e.g., Copur-Gencturk & Tolar, 2022), the gain in teachers’ knowledge after completing the program is critical. This finding is important and exciting, given the increasing need for teachers to have flexible ways to refine their understanding of the content they teach. Any teacher can access this training from the internet by using a basic web browser. More importantly, an intelligent, adaptive professional PD program with just-in-time feedback be a scalable model that can be used to train teachers in other subject areas.

It is also important to draw attention to the design of the system (e.g., Dede et al, 2009). We assert that the success of our PD program is tied to the careful planning of the learning materials. We spent a significant amount of time developing and refining activities and interaction materials to identify potential differences in teachers’ understanding and move them to the targeted level of understanding. Although we were able to find research on nuances in teachers’ understanding of mathematical concepts (i.e., CK), we struggled to find systematic research on teachers’ PCK of specific concepts, such as the ways in which teachers respond to common student struggles in their instruction. Thus, more research on teachers’ PCK is needed to create this kind of learning environments in other content and subject areas. Still, even though the design of the learning materials can be a time-consuming, detail-oriented process due to the limited prior work, when the learning materials are developed, the scalability of this program is a matter of sharing the URL with teachers who can access the internet. This process is considerably different from scaling an intervention to be led by a human facilitator, which requires that both materials and training be created to support efficacious implementation across sites.

Our findings also indicated different levels of gain in CK and PCK in this environment, which aligns to prior research (Pape et al., 2015). Teachers seemed to gain more CK than PCK. This finding could be related to the low reliability of both the diagnostic and external PCK measures. Alternatively, it is possible that the development of PCK is more difficult than the development of CK and that teachers may need more opportunities to develop PCK than CK. Thus, further research with more reliable measures could shed light on this result.

We also noted disparities in the growth of teachers’ CK depending on the measure. The diagnostic measure was more sensitive to growth than was the external measure. This result could be due to the proximal nature of the items that were included in the diagnostic measure. That measure was explicitly aligned to the instruction in the modules in both content and language use, whereas there were subtle differences between the CK modules and the presentation of items in the external CK measure. However, the gains detected for the diagnostic and external PCK measures were much closer. One possible explanation for the similar levels of learning detected is that both measures focused on the knowledge as it is used in instruction.

Although our findings are not statistically significant, we want to draw attention to teachers’ prior experience in similar programs as potentially shaping their current learning experience. This seemed particularly true for those who reported having “some” learning compared with those with a more positive learning experience. It is possible that some teachers might struggle with learning on their own and prefer learning with other teachers in a face-to-face program, but our study results cannot answer these questions. Similarly, although we did not measure teachers’ comfort level with technology in this study, this factor could also play a role in their learning. Teachers who are not comfortable with technology might struggle with navigating through the system, whereas those who are comfortable with the system might be able to use the program features more effectively. Taken together, we urge researchers to investigate in greater depth how teachers’ prior experience and comfort level with technology might influence their learning. We particularly urge extra attention when targeting low-income countries and rural areas, as those teachers typically have much less access to technology than their peers in higher income or more urban areas (UNESCO, 2023).

While this study provides initial, empirical evidence that online asynchronous PD with a virtual facilitator supported by AutoTutor would improve their CK and PCK, further research is needed to determine whether the results would be similar if a large number of teachers completed the program. Although the middle school mathematics teachers in our sample came from different parts of the USA and were diverse in terms of their educational background as well as their school profiles, they still constituted a relatively small number of teachers. Exploring the extent to which teachers retained the knowledge they gained in the long run is also an important area for future studies, given that prior work indicated teachers’ changes might not be sustained in each area of change observed (e.g., Copur-Gencturk et al., 2016). Additionally, future work is needed to investigate the possibility of developing such platforms in other countries. In part because of the global pandemic, teachers now have more access to the Internet than before. However, there is still stark disparity researchers need to consider to pursue such work. For example, 10% of US public school teachers lack adequate access to technology at home, though they have access in school (UNESCO, 2023). In contrast, in low-income countries, access is much lower with only 7% of homes having adequate access and some primary schools having no computer access for teacher use (UNESCO, 2023). Despite these limitations, we believe scholars in many countries in which online PD has been used and where internet access is not an issue would be able to investigate the possibility of using this approach to promote teachers’ CK and PCK in their own contexts. Such countries include most high- and middle-income countries, as well as some low-income countries (UNESCO, 2023). This number increases if such PD is formatted for smart phones, as more teachers have internet access by phone than using other digital devices (Burns, 2023).

When interpreting these findings, it is also important to note that both the diagnostic and external PCK measures had limited reliability. One underlying reason for the low reliability is that the PCK measure contains few items. Our plan is to improve the diagnostic PCK measure is by developing more items to capture teacher learning more accurately. Still, future work is needed to develop measures of PCK that capture such knowledge more precisely. Because the sample size was small, we could not check several important indicators of teachers’ professional background that might have played a role in their learning. Again, larger sample sizes would have allowed us to investigate how teachers’ professional background helps or hinders their learning in these types of programs.

It is important to note that our work focused only on the improvement of teacher knowledge from this type of PD. Yet a PD system like this one would allow researchers to collect data from teachers as they progress through learning activities, which in turn would allow them to uncover how teachers learn during PD. These systems have the potential to uncover how teachers develop CK and PCK as they progress through the program and explore the possible learning patterns that teachers with different levels of understanding of CK and PCK exhibit. Finally, although our study provides evidence of the development of teacher knowledge in online asynchronous PD supported by AutoTutor, future research is also needed on the extent to which features of this system are more effective than others. Our study provides initial evidence of the potential use of such a system to develop teachers’ CK and PCK, yet we cannot pinpoint specifically how the feedback teachers received played a role in their knowledge development. Experimental studies with different versions of the same program are needed to distinguish the individual roles of certain design features.

Conclusion

In this study, we set out to determine whether an intelligent, interactive, online asynchronous PD program with just-in-time feedback could lead to changes in teachers’ CK and PCK. The results of our work suggest that this approach can be an effective tool for supporting teacher learning of both CK and PCK. Although additional research is needed to understand the mechanisms whereby it is effective and how similar systems could be effective in other counties and with other types of subjects such (e.g., science and literacy), this study provides initial empirical evidence that this may be one approach to tackling the growing need for accessible and scalable PD. It also opens the door to creating interactive learning platforms for teachers to develop the knowledge and skills needed in teaching.