Introduction

Post-secondary STEM faculty have the opportunity to educate future generations of scientists and technical experts, but they do so typically without explicit training in educational best practices (Auerbach & Andrews, 2018; Stains et al., 2018). Understanding of supported strategies in STEM teaching has improved considerably in past decades (AAAS, 2009; Bennett et al., 2020; Freeman et al., 2014; Handelsman, 2007; NRC, 2003; Schwartz et al., 2016), but the professional development needed to create truly excellent STEM learning remains inconsistent (Bradforth et al., 2015; Mack & Winter, 2018; Seymour & Hunter, 2019; White et al., 2021). The underlying problems training STEM faculty, which include misaligned rewards structures and undervaluation of teaching and learning, persist (Erdmann et al., 2020; Gess-Newsome et al., 2003). This lack of transformative teaching for professors is a major factor in systematic problems like poor retention and inequitable outcomes especially for students from backgrounds that are traditionally underserved (Banilower et al., 2018; Pfeifer et al., 2023; Sithole et al., 2017; Theobald et al., 2020; Zemenick et al., 2022). Exploration of better professional development during STEM instructional careers is ongoing (Bouwma-Gearhart et al., 2019; Callens et al., 2019; Chen et al., 2021; Du et al., 2019; Erdmann et al., 2020; Stephens et al., 2012; Von Vacano et al., 2022), but will only be maximized by innovations for early-career scientists to develop their practice as inclusive educators prior to the start of their faculty roles (Calabrese Barton & Tan, 2019; Dewsbury, 2017). The program described below is intended to help in these efforts.

Research on professional development in STEM highlights specific benefits of teaching techniques that facilitate active learning and support students’ emotional needs (Beals et al., 2021; Handelsman et al., 2022; Jones & Kahn, 2017; NRC, 2015; Robnett et al., 2018). These complex teaching practices may be best learned in the formative stages of new faculty careers, both for uptake of modern teaching techniques and also for the broadest possible impact on students (Handelsman et al., 2004; Moreira et al., 2019; NRC, 2018). Most STEM teaching faculty build scientific disciplinary habits in graduate school, but few future STEM faculty have opportunities to learn and practice interpersonal teaching techniques (Henderson & Dancy, 2007). A similar lack of professional development around assessment hampers faculty (Brownell & Tanner, 2012), especially as summative assessments drive much of student anxiety and dissatisfaction within STEM education (Drew, 2011; Green et al., 2007; Kuh et al., 2014). Many STEM teaching faculty have only limited exposure to socioemotional teaching practices (Dewsbury, 2017), even as the need for more inclusive teaching in STEM is becoming increasingly apparent (Killpack & Melón, 2016; O’Leary et al., 2020). Exposure to evidence-based instructional practices during graduate training is likely to catalyze ongoing sustainable growth with socioemotional tools that help students to identify with and be resilient in science (Darling-Hammond, 1998; Gasser et al., 2018; Rosenberg et al., 2018; Ruzek et al., 2016; Tharani et al., 2017; Thomas & Zolkoski, 2020; Zeichner, 1987).

Instilling active, inclusive teaching skills is a natural continuation of widespread efforts to improve on early-career capacity in STEM teaching. National calls for better graduate training programs have echoed this need (Leshner, 2018; Leshner & Scherer, 2019; Love Stowell et al., 2015; Tanner & Allen, 2006), as well as the inherent benefits of pedagogical training for the research field as well (Feldon et al., 2011). Some of these needs may be met by course-based learning about pedagogy (Baumgartner, 2007; Deshler et al., 2015), while other methods include explicit mentorship (Lockwood et al., 2014), community-based practice (Brower et al., 2007; Price et al., 2021), or apprenticeship in K-12 teaching (Ufnar & Shepherd, 2021). Frequently, these efforts are targeted at the level of the graduate teaching assistant (DeChenne et al., 2015; Deshler et al., 2015; Gilmore et al., 2014; Reeves et al., 2016; Rivera, 2018; Schussler et al., 2015; Wan et al., 2020), which is an environment that is well suited for improving TA-led classroom effectiveness and introducing graduate trainees to pedagogical development. In this paper, we focus on practice-based professional development for future instructors (Stroupe et al., 2020) to build on these prior efforts.

The Science Teaching Experience Program for Upcoming PhDs (STEP-UP) at the University of Washington was built to provide a mentored teaching experience for doctoral students in STEM disciplines. STEP-UP is a teaching professional development experience advocated for by research, and was originally proposed and designed by STEM graduate students (Love Stowell et al., 2015). Their goals in creating this training program (in the first cohort of which they participated) were to increase their range of skills that would transfer into research professions, to gain and practice skills that would serve them in mixed research/teaching careers, and to learn more about the teaching methods that would make them more employable in a wider range of industries. During four cohorts, the program featured an in-person autumn course for future professors to (1) learn about active learning strategies and the supporting literature (Bielik et al., 2023; Yik et al., 2022), (2) consider how issues of identity, equity, and justice affect science teaching and learning (Byars-Winston et al., 2015; Flores et al., 2023; Shultz et al., 2022), (3) teach a mock class session for an audience of volunteer undergraduates, and (4) develop their practice and identities as educators through self-reflection and feedback. Over the subsequent two quarters of each cohort, participants then designed and taught an undergraduate course (see Fig. 1). While all autumn trainee courses were taught in-person, trainee-taught courses in the spring of 2020 were taught synchronously online.

Fig. 1
figure 1

This diagram represents the primary elements of the STEP-UP training program and gives examples (in smaller type) of the activities and practices happening during different quarters of the year-long program. For example, graduate trainees do exercises during class sessions at the end of autumn quarter about their teaching reflections and philosophy

Core teaching practices in STEP-UP (Table 1) form the basis for class sessions, during which graduate trainees spend most of their time in direct practice of teaching techniques and methods ranging from microskills (like deliberately pausing after a question) all the way up to macroskills (like designing and carrying out an entire 20-minute active learning module). Scaffolding these practice sessions to build on previously-learned elements of teaching and to become progressively more difficult is important, as are frequent opportunities for metacognition around those practice sessions and how this will translate to authentic teaching (Ford & Yore, 2012; Van Es & Sherin, 2002). For clarity hereafter, we refer to participants in this program as graduate trainees when their own learning is central and instructors when highlighting the perspective of their undergraduate students. As a core design principle of the program, STEP-UP intentionally and frequently engages participants in problems of practice around equity and identity in science education, with the broader goal of developing their skills to address some of the inequities found in science classrooms.

Table 1 Core teaching practices

Theoretical Framework

This study of STEP-UP draws on Clarke and Hollingsworth’s (2002) Interconnected Model of Professional Growth (IMPG) in teaching, which describes the specific mechanisms and the concurrent, nonlinear nature of teacher professional growth in four domains: Personal Domain, External Domain, Domain of Practice, and Domain of Consequence. Clark and Hollingsworth developed this model based on several longitudinal studies on teachers enacting new practices and reflecting on both teaching and student learning, as they became more central participants in a community of practice (Wenger, 1998). This model accounts for teacher learning as both the development of different kinds of teacher knowledge (Shulman, 1987) and as practice through apprenticeship (Lave & Wenger, 1991). This includes accounting for the situated nature of learning by documenting the features of a social setting that afford or constrain the practices that make up learning, as teachers move through cycles of enaction and reflection. This model is useful for examining a practice-based teacher education program like STEP-UP because the IMPG accounts for the complex ecology of learning (Scott et al., 2020) that trainees encounter through the multiple entry points and interrelated processes of the year-long course/teaching sequence. Course features such as role-play, introduction to educational research, and collaborating with other trainees to design and teach multiple times provided multiple entry points for trainees to approach teaching. Because we hope this research will inform the administrators of similar programs in the future, we used this model to focus on the specific programmatic features of STEP-UP that supported graduate trainees. This study focuses on the following activities in each domain of the IMPG:

  • External Domain: Teachers encounter new information or practice, such as seeing new practices modeled and participating in in-service meetings. In STEP-UP, this included immersion and reflection on inclusive, active teaching strategies in the trainee course (autumn). The External Domain influences the Domain of Practice and the Personal Domain, which in turn may influence the Domain of Consequence.

  • Domain of Practice: Teachers try out new practices through professional experimentation. In STEP-UP, graduate trainees engaged in this domain both through a variety of role playing and practice in the trainee course (autumn) and again when they taught undergraduate courses (spring).

  • Personal Domain: Teachers develop changes in knowledge, beliefs, and attitudes based on their professional experimentation with new information or practices. In STEP-UP, this includes trainees’ perceptions of these changes during and after the program, about their own teaching practice, their identities as teachers, and about student outcomes and student learning.

  • Domain of Consequence (Salient Outcomes): Professional experimentation with new information or practices, as well as related teacher changes in beliefs and attitudes, result in undergraduate or classroom outcomes. In our study of STEP-UP, we focused especially on how trainees’ professional experimentation with skills they developed in STEP-UP led to undergraduate perceptions of emotional well-being and self-efficacy (also described as confidence in data collection methods using common language) in surveys as learners (Lawson et al., 2007; Trujillo & Tanner, 2014; Zee & Koomen, 2016). Ideally, programmatic features of STEP-UP that support observable trainee outcomes will be identified and described.

Current Study

In this study of STEP-UP, we use the IMPG to map the experiences of individuals and groups of graduate trainees in each domain, as they participate in professional learning, apply what they have learned through teaching, reflect on their experience, and plan for future iterations. These cycles of practice, enactment with students, and reflection throughout the program are also in line with the approaches of Design-Based Research (DBR), discussed further in the next section.

Research Questions

This study draws on the IMPG (Clarke & Hollingsworth, 2002) for teachers (see Fig. 2) to address the following questions:

  1. 1.

    What features of a practice-based teacher education program best help graduate trainees increase enactment of and reflection on best practices?

  2. 2.

    What teaching practices were transferred from the practice-based teacher education program to the domain of practice (i.e., enactment)?

  3. 3.

    Which of these practices are most valuable in terms of undergraduate students’ learning and engagement?

  4. 4.

    Which of these practices are most valuable in terms of graduate trainees’ domains of consequence and personal knowledge?

Fig. 2
figure 2

Research questions from this study are mapped onto a close adaptation of Clark & Hollingsworth’s (2002) Interconnected Model of Professional Growth (IMPG)

Methods

To design the program and address our research questions, we took a design-based research (DBR) approach (Anderson & Shattuck, 2012; Barab & Squire, 2004; Scott et al., 2020; The Design-Based Research Collective, 2003). Design-based research situates research in real-world educational settings, focuses on the design and testing of an intervention, and uses mixed methods (Scott et al., 2020). Multiple iterations of a design-enact-study cycle are conducted and the findings in each cycle inform further enactment. This approach is appropriate for this study because the purposes of DBR are to inform practice, contribute to theory, and develop products (Bannan-Ritland, 2003; Barab & Squire, 2004). While DBR studies generally do not yield measurable effect sizes, they provide “rich descriptions of the contexts in which the studies occurred, the challenges of implementation, the development processes involved in creating and administrating the interventions, and the design principles that emerged” (Anderson & Shattuck, 2012). DBR studies often include both quantitative and qualitative approaches, which provide opportunities for both graduate trainees and undergraduate participants to share their experiences and give feedback on the program activities. In particular, observations and interviews/focus groups have the potential to provide high internal validity (LeCompte et al., 1993). DBR aligns with this study’s goals to contribute to program development and to the broader goal of supporting teaching professional development experiences for advanced doctoral students in STEM (Connolly et al., 2018).

Throughout the analysis, we used a DBR approach to better identify design principles and instructional approaches that can be generalized to other programs to support mentored teaching experiences for early career STEM faculty. A DBR approach allows us to contribute to the broader field and further specify our theoretical model by providing practical principles for the design of learning environments in the service of specific learning outcomes. DBR allows for a collaborative, mixed methods approach to the analysis in which expertise from the learning sciences can contribute to an interdisciplinary perspective that provides a “rich picture of how the instructional tools and their implementation influence student learning” (Scott et al., 2020).

The first year of data collection was a pilot of the program and focused primarily on iterative refinement of tools. In this study, the tools described, analysis, and results are primarily from data collected in 2021–2022. One important exception is the inclusion of the short Undergraduate Exit Ticket survey; data for this survey from both 2020–2021 and 2021–2022 academic years are reported here. All exit tickets were collected using online surveys, so the online nature of spring 2020 courses did not impact the method of collection of this data.

In addition to quantitative survey data, we also collected qualitative data from surveys, observations, and focus groups to provide rich accounts of graduate trainees’ and undergraduates’ experiences to address our research questions. Below, we describe these methods in more detail by research question, including a summary of data collected in Table 2.

Table 2 Data collected

Participants

Participants were late-stage STEM PhD students enrolled in a large, public university in the Northwest region of the USA. We collected data from 19 graduate students (trainees) enrolled in the program (6 in 2019–2020; 13 in 2021–2022) and 355 undergraduate students (primarily seniors in STEM majors) enrolled in the trainees’ courses. Graduate trainees were in their 4th year of a STEM Ph.D. program on average, and self-identified as 37% students of color, 53% women, 26% neurodivergent, and 26% as LGTBQ+. Participation was voluntary and mechanisms for protection of research participants were controlled by the university’s institutional review board (under study #UWIRB00006242). The experience equates to training on the order of roughly 30% FTE (full-time employee) for three quarters and is free for participants.

Data Collection

Surveys

To examine features of practice-based teacher education (RQ1), we distributed surveys at the end of the quarter (Supplemental A) to collect responses from graduate trainees about their experiences, including an Autumn Course Post-survey and a Spring Teaching Survey (Table 3). This survey focused on trainee perceptions of confidence in a set of core teaching practices, based on the teaching development course design. In 2021, we used a retrospective pretest survey model that asked respondents to rate their perceptions at the time of the survey (post) and then to recall their perceptions before the program started (Gouldthorpe & Israel, 2013). This model is particularly useful for addressing response-shift bias, in which respondents shift their frame of reference used to answer pretest and posttest questions, or when they might over- or underestimate their ratings in the pretest based on limited knowledge (Gouldthorpe & Israel, 2013). Since the graduate trainees’ prior knowledge of teaching practices was unknown before their autumn course in teaching methods, using a retrospective survey after trainees’ experience with the focal teaching practices (post-autumn-course, pre-spring) was the most appropriate method. Additional posttest questions prompted graduate trainees to reflect on their expectations for the course, intent to implement teaching practices, and suggestions for additions/revisions to the course. Finally, we asked trainees to identify which course features they believed best supported their confidence and then to elaborate on their responses in open-ended survey questions.

Table 3 Analysis of data collection tools by research question

Course Observations

To examine the development of teaching practices from professional development through mentored teaching experiences (RQ2), we conducted Course Observations of graduate trainees’ teaching. Graduate trainees recorded their teaching in at least one class session (~35–65 minutes), which occurred either in-person or online (6 online observation videos in 2020; 13 in-person observation videos in 2022). We created an observation protocol (sample questions in Supplemental A) based on the observable teaching practices and then developed content logs for each video. To develop content logs, we conducted repeated viewings of the videos to document enactments of graduate trainees’ observable teaching practices according to the project’s theoretical framework and course design (Derry et al., 2010). This strategy allowed for triangulation of the Course Observations data with survey, focus group, and exit ticket data (Denzin, 1978).

End-of-year Focus Group

The End-of-year Focus Group examined graduate trainees' overall experience in STEP-UP (6 in 2020; 12 in 2022). In 2022, we conducted two 75-minute focus groups, with six graduate trainees in each session. Graduate trainees reflected on their aggregated survey data, completed a short journaling activity, discussed their experiences with successful teaching strategies, and reflected broadly on the program and their preparation for the future (e.g., job market).

Undergraduate Exit Ticket

To examine the value of teaching practices for undergraduate learning and engagement (RQ3), we distributed the Undergraduate Exit Ticket (a short survey; sample items in Supplemental A) to those enrolled in STEP-UP trainees’ courses in spring 2020 (Likert-scale items) and 2022 (open-ended items). We asked students to complete exit tickets after three class sessions. The prompts in the exit tickets asked students to describe how equitable and inclusive their class was that day, their perceived opportunities for active participation, and how these compared to their typical STEM classes. In 2022, students were also prompted to describe anything their instructor may have done to impact their confidence or well-being as learners. Students earned participation credit for their responses, but could opt for their responses to not be included in the research project. The results presented here include descriptive statistics for both quantitative data from 2020 and coded qualitative data from 2022.

Alumni Interview

To examine the value of teaching practices in terms of value to early career STEM faculty (RQ4), we used the Alumni Interview with three alumni from previous training years as they transitioned into teaching faculty roles. Alumni who had already graduated and continued into teaching careers were recruited for interviews from the prior cohort. Each Alumni Interview lasted approximately 30–45 minutes and focused on their current positions/institutions, their experiences on the job market and in a new position, how the training program may have contributed to their job search and current role, and how they are currently applying what they learned with STEP-UP. We recorded each interview, created transcripts, and then analyzed each transcript using the qualitative methods described below.

Analysis

To analyze data from the tools described above, we used both quantitative and qualitative analysis methods. For Likert-scale items in the surveys and exit tickets, we used descriptive statistics to describe changes in graduate trainees’ perceptions. Because of the small sample size for each dataset, we did not conduct tests to determine statistically significant differences, but instead triangulated the quantitative results with qualitative data to strengthen our claims. We report the percentage of “positive” responses—the sum of the top two positive responses on a five-point scale, typically Agree + Strongly Agree.

To analyze qualitative data in open-ended survey items, focus groups, and exit tickets, we conducted qualitative coding using a combination of constant comparative method (Glaser & Strauss, 1968) and analytic induction (Robinson, 1951). This combination is typical of many approaches to coding because it allows for the testing of hypotheses while allowing for unanticipated outcomes (Merriam & Tisdell, 2015). In this study, the core teaching practices formed a framework for our inquiry as phenomena of interest (analytic induction) and we used the constant comparative method of analysis (both within and between trainee responses) to generate categories that described how graduate trainees and undergraduate students experienced those phenomena. For each dataset, we identified codes based on the teaching practices we expected to arise in each dataset, then iteratively applied and refined both the codes and hypotheses as we analyzed each dataset. The constant comparative analysis also allowed for triangulation between parts of a complex ecology of learning (Scott et al., 2020), in which we compared or cross-checked between datasets (e.g., surveys and observations) and between researchers in order to improve internal validity (Denzin, 1978, 2012).

For focus group and interview data, the research team also employed coding consistency checks by member checking preliminary findings with both the program director and the program participants themselves at different points during the analysis. By involving people who have direct experience and concerns with the program and the data, member checking served to enhance the credibility of the findings (Thomas, 2006).

Finally, informed by work on how inclusivity and emotional well-being are linked to student learning of science (Gannon, 2018; Lee et al., 2021; Moriña, 2017; Ruzek et al., 2016; Yorke et al., 2021), we focused our analysis of the student exit tickets in similar directions. In the analysis of the 2020 undergraduate exit tickets, we focused on how undergraduate students identified courses as “more inclusive compared to other courses.” We included this subset of the 2020 data in our analysis for this study because the wording in these survey items remained appropriate to the latest iteration of data analysis. In the analysis of the 2022 dataset, we first coded a sample of the responses using the qualitative methods described above (constant comparative method and analytic induction) to identify trends in open-ended responses. The analytical focus was on student perceptions of how instructors impacted students’ emotional well-being and self-efficacy as learners (Bandura et al., 1999; Connolly et al., 2018; Gasser et al., 2018; McMullen et al., 2015; Seymour & Hunter, 2019).

Positionality of the authors

We include the positionality of the authors to situate ourselves within the context of this study. Qualitative coding and interpretation are subjective and researcher experiences may influence coding although every effort to be objective is attempted. One co-first author (S.S.) is an external research and evaluation consultant with an M.Ed. and PhD in science education. She has extensive experience in both design-based research (DBR) and program evaluation. She co-designed the data collection tools and conducted the bulk of the data analysis. The other co-first author (B.W.) is an affiliate faculty member at the institution studied and a professor at a local community college. He has a MS in molecular science and a PhD in education. He was the director of the STEP-UP program and research PI although he did not himself collect or analyze data in order to minimize bias. The third author (B.V.) is a Ph.D. student focused on discipline-based education research (DBER) while situated in a traditional biology department. She has an undergraduate and master’s degree in biology. Her master’s degree research was also focused in DBER. She helped to interpret results and helped with manuscript preparation.

Results

Research Question #1: What features of a practice-based teacher education program best help graduate trainees increase enactment of and reflection on best practices?

Teaching Self-efficacy

Analysis of graduate trainees’ survey data demonstrated that STEP-UP contributed to their self-efficacy in all of the focal teaching practices over the course of the program. The greatest gains in self-efficacy overall were as follows (Table 4): activate passive lecture material using active learning techniques (86% increase pre to post), create a syllabus (86% increase), assess students in ways that challenge higher-level thinking (79% increase), design a class session starting from learning goals (79% increase), and strategies for calling on students gently (79% increase).

Table 4 Results from retrospective pre-survey taken by trainees before and after STEP-UP autumn course (n=13). Percentages in the first two columns represent total positive responses (confident, 4, and very confident, 5). The final column shows the change from pre to post scores. (Scale: Not at all Confident = 1, to Very Confident = 5.)

Graduate trainees reported that their confidence in teaching practices continued to increase during their spring teaching experiences (Table 5). All trainees reported a continued increase in their confidence to frame active learning strategies while teaching, which they attributed to the heavy focus in the autumn course. This makes sense in light of prior work focusing on learning communities, such as in Polizzi et al. (2021) and Zhou et al. (2023). Other frequent self-efficacy gains (83% of trainees) during teaching were in asking provoking questions, assessing students using methods likely to build trust, creating a syllabus, designing a class session starting from learning goals, and writing an effective learning goal. Graduate trainees reported that the experience of teaching multiple times helped them gain self-efficacy, and, importantly, comfort in their teaching (Table 5). This allowed them opportunities to try new things, reflect on the process, and pivot as needed. Fewer trainees reported increases in confidence to productively pull meaning out of student-written teaching evaluations (33% of trainees) (likely due to the fact that they had not received these at the time of the survey) and connecting emotionally with students (42% of trainees). Several graduate trainees reported that they faced challenges in connecting with students for a variety of reasons including COVID-related absences, masking, and teaching only a section of a course. Trainees noted that opportunities to interact with students on a 1:1 basis made a difference in how they were able to develop skills in this area as well.

Managing a classroom of students allowed me to experiment with different strategies and receive real-time feedback about student understanding and engagement. If one approach was not working, I could pivot and try another and see tangible changes in student responses. I also learned from watching my co-instructors' approach to teaching the same group of students. –Spring Survey

Table 5 Results from far-post survey taken by graduate trainees after their spring teaching experience. Percentages represent the trainees who indicated that their confidence increased during spring teaching. (Scale: decreased, stayed the same, increased)

Key Course Features

Graduate trainees identified a number of features unique to the autumn course that supported their confidence in teaching practices in the surveys. The most valuable features were opportunities to practice microskills, role play activities in which graduate trainees tried out scenarios as both instructors and students, collaborating with their colleagues (being part of a cohort of learners), and discussing educational research. For most graduate trainees, these activities represented unique opportunities in their graduate career to practice key teaching skills and iteratively design course syllabi, activities, and assessments. More information can be found in Supplemental B.

The teaching activities we did in the fall were very helpful for developing the teaching strategies listed [here]. My co-teacher and I also did a practice lesson before the course started and that helped us improve our framing of active learning activities. –Spring Survey

Research Question #2: What teaching practices were transferred from the practice-based teacher education program to the domain of practice (i.e., enactment)?

Enactment of Teaching Practices

Observations of graduate trainees’ spring teaching provided evidence for how trainees enacted strategies they learned in STEP-UP. The observation protocol was designed to focus on the observable STEP-UP practices that were also the focus of the autumn and spring surveys. Graduate trainees demonstrated their use of a number of instructional strategies aimed at supporting student confidence and emotional well-being which are two primary aims of STEP-UP. The primary teaching strategies evident in observations were active learning strategies and asking provoking questions, as well as gently throwing attention to students and giving instructions for active learning activities (Gentle “throws” are those transitions in attention that give the next speaker useful amounts of time and encouragement to speak without delaying the overall conversation; this technique is used widely in broadcast journalism to position speakers for smooth responses). Graduate trainees employed some strategies for supporting students socioemotionally and effectively assessing students, often through the use of discussion questions.

Active Learning Strategies

Graduate trainees implemented two categories of active learning strategies most frequently in the observations of teaching sessions. The most often used strategy was discussion of an instructor-posed question for students to discuss with a neighbor or small group. This was often followed by a gentle throw to a group or a region of the room to share student responses, though instructors often asked for a response from the whole room and called on student volunteers. Most examples of the strategy were a modified version of think-pair-share (e.g., turn to your neighbor or group, often with think time first). Polls (online or thumbs/hands) were also a common follow-up. The second strategy trainees commonly used was organized group work. When graduate trainees used this strategy, students often engaged in analysis and then presented figures, paper summaries, etc. In one class session focused on data visualization, small groups analyzed progressively more complex data visualizations, came up with improvements, and then practiced skills in groups that they would need for a final project.

Trainees frequently supported students by giving explicit instructions for active learning. Most often, graduate trainees used this to help students understand what was going to happen next, the format of the activity, and what the expectations for participation were. In some instances, the instructor gave some sense of purpose (e.g., you’ll need to do this on your midterm exam or in the final assignment; this is a debate that scientists in my lab are having, etc.).

Supporting Students Socioemotionally

In the observations, trainees regularly made efforts to connect with students, often by expressing enthusiasm for material, connecting their professional experiences to the course work, and or making current debates in the field or in their own labs explicit. Graduate trainees also took opportunities to communicate to students that instructors had their perceived interests at heart, primarily by noting material that was complex, reassuring students of what they did not need to know, acknowledging and explaining how instructors were listening to student input, explaining why institutional or class structures were the way they were, and examples of how instructors designed the course to reduce the overall cognitive load for students. Student exit tickets noted that these strategies had a positive impact on student learning.

Graduate trainees posed questions to assess students’ understanding in ways that were likely to build trust between instructor and student. Across the observations, trainees frequently made use of “gentle throws” to ask questions to groups or sections of the room. Graduate trainees consistently responded to students’ contributions to class in positive ways, encouraging and affirming student responses and regularly revoicing and elaborating on student answers. Importantly, they deftly handled incorrect or partially correct student responses, identifying which parts of a student response were correct and processing the answer out loud with students to break down any partially or wholly incorrect responses in a productive and encouraging way. In several cases, the trainees used a polling app that allowed students to respond anonymously but also see their peers’ responses, then engaged the whole class in breaking down the correct answers.

Questioning

Finally, graduate trainees’ questions to students, both pressing and probing, were very common in observations. As the trainees noted in the focus group, they had many opportunities to practice using provoking questions. Trainees used complex questions throughout each session and also specifically identified recall questions or rote questions meant to help everyone go through a process, or set of steps, or to ensure a shared baseline understanding or review from previous course work. This strategy was often used in place of instructor lecture; students were encouraged to contribute and hear from each other.

There were some instances in the observations in which it was apparent the trainee was reflecting on their practice in the moment, evidenced by posing a question to students and then immediately revising the question to be a more meaningful question requiring higher-level thinking. For example, in an interaction with a small group in a breakout room, one trainee helped students decide on which statistical tests would be appropriate for their hypothesis about crow behavior. The trainee went back and forth between asking students a number of probing questions to help them narrow down their ideas to even more specific statistical tests (one-tailed vs two tailed t-test) and giving them “just in time” information to explain or further elaborate on their choices.

You are correct, it is a chi-square test. So yeah… do you know the difference between a — or let me backup -– uh…what do you know about the differences between these two tests?

The difference between the quality of the trainee’s initial question and their revised question, and specifically the higher-level thinking required to answer the new question, is evidenced by the yes/no answer the first question would yield compared with the thorough explanation a student gave about the goodness of fit.

Trainee Reflections on Enactment of Teaching Practices

In the focus group, graduate trainees described a number of examples of supporting students’ emotional well-being and self-efficacy. One area of heavy support they noticed were COVID-related: absences, inability to contribute to group projects due to illness, and gaps in knowledge (presumably because of missed classes or difficulty completing classwork during the height of COVID responses). Trainees noted that their interactions with students in these areas were opportunities to practice supporting students’ well-being broadly, but also provided instructors with positive feedback about students’ experiences of the course. One trainee described how students who were absent with COVID contacted instructors directly. Students were apologetic and expressed disappointment at missing class because they were enjoying the work so much. In another instance not related to COVID, a student emailed their instructors to let them know that they would miss class due to a panic attack. The graduate trainees noted that the act of sharing this information alone was evidence of student comfort in the course. They went on to have several positive interactions with this student over the quarter: the student went beyond the requirements for missing class, seemed to take more pride in their project, and came to office hours to learn more about getting a degree.

Trainees noted how their use of active learning strategies seemed to have a positive impact on students as well (as will be explored in further detail in the next section). One trainee noted that students who were less vocal during class completed impressive presentations and had positive email exchanges with the instructor about their growth over the quarter. Another trainee noted that walking around to hear small group discussions allowed them to bring some of those conversations back to the large group and that students were much more willing to engage. Finally, a graduate trainee noted that rather than just reading their final presentation slides, as is typical, their undergraduate students incorporated active learning strategies into their final presentations.

Supplemental C and Supplemental D describe highlights from trainee’s reflections on their teaching, specifically their stories about a teaching strategy they used successfully. Supplemental C summarizes the main themes across the stories 12 graduate trainees told as they reflected on evidence that their use of the strategy was successful, the course features that supported their success, and their reflection on this experience, including conclusions they are drawing and potential next steps after this success. Supplemental D highlights illustrations, or mini-case studies, of how trainees interpreted and enacted what they learned in the autumn course, reflected on evidence of student outcomes, and came to conclusions about the results for their teaching and their students.

Research Question #3: Which of these practices are most valuable in terms of undergraduate students’ learning and engagement?

To understand the undergraduate experience in each course, students completed exit tickets at least three times during the course. Overall, undergraduate reflections provided additional evidence that the strategies graduate trainees employed successfully resulted in a more participatory and inclusive course environment for students. Almost all undergraduates in the spring courses felt that their class provided opportunities for their active participation, which most (72%) described as “more opportunity to participate than other similar courses.” Students identified specific contributing factors such as the focus on active learning, the increased interaction between students in smaller breakout groups, and checking in as a large group.

Almost half of student responses (44%) described their class as “more equitable and inclusive than other courses.” Most of the remaining half of students said that it was similar to other courses. Many students also noted that the department has an equitable and inclusive environment generally. Again, students pointed out features of the course, such as small breakout groups, that helped them feel encouraged to contribute and provided multiple avenues to understand scientific concepts.

When asked about active participation opportunities compared to other science classes, students very consistently said that there were more varied, useful, and meaningful opportunities for active participation in the course taught by a STEP-UP trainee. For example, one student said, “[The instructors] allow for open responses from the class, and I enjoy how casual the class is. I feel like I could mention a topic, and the instructors will even navigate the class in a direction towards a point I made. It makes me feel important.” Another said,

I would say they absolutely provide an opportunity to engage at every point in the material. Not everyone can speak, but there are always several opportunities to consult resources, discuss in small groups, and then present. This cycle, especially incorporating the opportunity to really apply, research, and cross-reference makes for a very engaging class! –Undergraduate Student

Students identified a number of examples of what worked for them in class. Most often students described examples of active learning strategies, including group work, varied participation structures, specific resources, or instructor-provided scaffold such as worksheets and content understanding.

More than half of students in the sample (56%) reported that what they identified as working for them was directly related to something the instructor did (see Fig. 3). For example, most students described how the instructors set up the course (e.g., set up group work, set up clear slides, explained concepts clearly) was a factor that ultimately helped them most in the course. One student said that they “appreciated being given a heads up prior to being called on.” They went on to say that it was related to something the instructor did and “it allowed my group time to collect our thoughts and prepare to answer the questions that was presented to us.” This student further described this practice as having a positive impact on their emotional well-being because it “relieved some anxiety that my group would have had.”

Fig. 3
figure 3

This graph shows the results from several multiple choice survey items answered by undergraduate students of STEP-UP instructors as part of post-session exit tickets. These exit tickets were filled out online (n=644)

About 22% of students reported that the instructor had a positive impact on their emotional well-being. Similarly, about a quarter of students in the sample (26%) reported that the instructor had a positive impact on their confidence as a learner.

Though not all students report that their instructor had a positive impact on their well-being and/or confidence, those that did were often very emphatic or effusive in their explanation and told important stories about their experiences. For example, a couple of students reported how surprising it was to be referred to by name, or to have a trainee go beyond their expectations to meet an accommodation they needed. Below is a preliminary list of student-identified instructor moves that contributed to their confidence and emotional well-being:

  • Acknowledging common stressors, accommodating absences and student concerns; stating explicitly that the material is difficult

  • Inviting all kinds of responses and positively receiving right and wrong answers. Stating explicitly that wrong answers are good and welcome

  • Taking the time to explain things, wrong answers, Poll Everywhere answers, etc.

  • Acknowledging students personally—using their names, making eye contact, responding to individual emails

  • Creating a generally positive environment: (words students used) relaxing, calm, supportive, fun, engaging, safe, low pressure, encouraging

  • Participation structures: calling on groups or letting people know they’ll be called on; providing multiple ways to participate, earn points, make-up work

  • Implementing student accommodation for a break for the whole class; including disability statements in the syllabus

  • Acknowledging feedback from students

  • Supplemental E highlights sample student reflections from the subset of students who reported that the instructor had a positive impact on their well-being and confidence as a learner. These quotes illustrate some of the ways that students were impacted as a result of graduate trainees’ enactment of STEP-UP teaching practices.

Research Question #4: Which of these practices are most valuable to graduate trainees?

To understand alumni perspectives on their experience with STEP-UP, including which practices seemed most valuable to them as they entered the job market, we conducted interviews with three alumni who have taken teaching-focused positions since their graduation. All three reported that their experience with STEP-UP was valuable in that they learned about teaching strategies, but also started to situate their teaching in a broader framework that gave them credibility and improved their self-efficacy. They reported these gains in relation to their teaching and their job search experience. Alumni appreciated that their work in the program was grounded in educational research and evidence-based strategies, multiple opportunities to rehearse and practice teaching, a cohort model, and opportunities to reflect on their teaching. Instructor supports were crucial. Alumni described how the program mentor modeled socioemotional and academic/professional support for graduate trainees (e.g., named strategies when in use), intentionally built community among the trainees, and supported them during the job search.

STEP-UP informed and influenced their career searches and perceived marketability. According to the alumnus, the program was especially consequential for the job search because the experience gave them language for their work and helped professionalize their experience as educators. They described this professionalization of teaching as a career or job responsibility in contrast to their R1 graduate experience more generally, which they reported consistently devalued teaching. The experience supported alumni’s ability to navigate the interview process, as they learned about the different kinds of institutions and job possibilities and how to recognize jobs that would value their approach to teaching. Two alumni said they would not have gotten the job without STEP-UP and the third said that it heavily influenced their application/process. Alumni described how the program mentor advised on application packages, including how to present their teaching, writing teaching statements, facilitating practice teaching sessions for sample lessons, and providing a reference.

In the interviews, alumni described how they are using what they learned directly in their teaching every day, especially in how they plan, reflect, and adjust their teaching. They are teaching with intentionality, continually assessing where they are in their teaching in relation to a broader framework through regular reflection on what good interaction with students looks like and asking themselves where there are opportunities in their teaching. They described meaningful examples of strategies they have tried out, reflected on the results (sometimes with data), and made adjustments with a goal in mind. One alumnus described requiring students to come to office hours before the first test. The alumnus made this decision after several students who failed the first test came to office hours and then drastically improved their subsequent test scores.

The first two semesters’ students would appear that they were doing fine in the class and just bomb the first exam…so I was thinking, what could I do to increase the chance that they’re going to come see me before that first exam, that they’re gonna participate in class before that exam. And that’s when I made those mandatory meetings before the first exam and a lot of them think it’s [about] me getting to know them and them getting to know me… I want them to feel comfortable with me because sometimes you think that the professor is someone you can’t talk to.

This alumnus reflected on student data and student’s experience in office hours, recognized the impact of student comfort and relationship building on student test scores, and adapted their course requirements to better support students.

Two examples of how alumni described teaching with intentionality that was informed by STEP-UP were highlighted in their interviews; in their teaching, they described working toward a specific goal/construct and constructing a set of overlapping strategies that support that goal. These two examples highlight how the alumni are focused on empathizing with and supporting the students’ experience. In the first example, alumni described employing strategies for setting up their class to encourage engagement in the moment and helping students learn how to learn in a safe space. These strategies included using online polling apps for students to register questions anonymously and recording class sessions with explicit instruction for how to rewatch the video and try out the content after class. In another example, alumni described strategies for giving regular feedback to students as part of students’ building expertise. These strategies included using both formative and summative assessment, limiting homework so that there is enough time to give meaningful feedback, and providing a relatively rigorous process for allowing revisions, including student reflection on the process itself.

Discussion

The goal of this research was to improve professional development for early-career faculty (i.e., STEM doctoral students) by describing the relationships between features in a teaching course, graduate trainee practices in the classroom, and outcomes for undergraduates in those classrooms. STEP-UP implemented an evidence-based theory of change by supporting graduate trainees to engage undergraduate students by enacting active-learning techniques and well-being in their courses. STEP-UP provided opportunities for graduate trainees to practice evidence-based, culturally responsive teaching strategies in a cohort of peers with meaningful feedback (External Domain). New knowledge about teaching strategies and opportunities for comfortable, low stress practice in the External Domain encouraged professional experimentation during training and while teaching (Domain of Practice). Similarly, positive teaching perspectives and evidence-based strategies in the External Domain influenced changes in the attitudes and beliefs held by graduate trainees about teaching (Personal Domain). Changes in knowledge, beliefs, and attitudes around self-efficacy and experimentation (Personal Domain) led to enactment and reflection of teaching practices (Domain of Practice). The combined changes in the External, Personal, and Domain of Practice resulted in undergraduate student reports of engagement and well-being, appreciation of supportive classroom climate, connections with peers, and participation structures that students found to be inclusive in graduate trainee courses (Domain of Consequence).

Increased Teaching Self-efficacy

In this study, we aimed to describe the features of a practice-based teacher development course that helped graduate student trainees increase enactment of and reflection on best practices in teaching (Research Question #1). The most valuable features of STEP-UP were opportunities to practice teaching strategies with peers and mentors and discuss teaching strategies that were new to them. Through these activities and practice, we found that graduate trainees’ teaching self-efficacy increased and that trainees with increases in self-efficacy went on to practice reflection and enactment of teaching strategies. This result is evidence for social cognitive theory which states that those with higher teaching self-efficacy perform better at teaching (Bandura et al., 1999). While we did not measure teaching performance of the graduate trainees, the mere fact that they practiced reflection and enactment indicates their growth as teachers. Other data sources in this study also reveal that their teaching practices were recognized and appreciated by the undergraduate students they taught.

In the spring focus groups, trainees reported that STEP-UP increased their confidence in teaching, resulting in personal pride that encouraged graduate trainees to experiment further with teaching practices. For some trainees, the experience reinforced their love of teaching, while for others it provided experience that allowed them to discern if they would like to teach more and at what level. Several graduate trainees reported these changes amidst a research supervisory environment that was unsupportive of or antagonistic to their teaching, yet both trainees and alumni continue to recognize the value of their efforts and growth. Although graduate trainees in STEP-UP increased their teaching self-efficacy, literature suggests that departmental culture regarding teaching can be a significant factor in graduate students’ perceptions of teaching and furthermore, a poor departmental culture surrounding teaching can lead to a lack of teaching self-efficacy and ultimately poor teaching performance (Burke & Hutchins, 2007; DeChenne et al., 2015). Instilling self-efficacy throughout the program, both by using a rigorous practice-based training program and by providing opportunities for well-supported experimentation, should be seen as a key aspect of future efforts especially for those situated in a campus or departmental culture that is unsupportive of teaching efforts.

Active Learning and Supporting Student Well-being

We intended to describe teaching practices that were transferred from the practice-based teacher education program STEP-UP to the domain of practice (Research Question #2). Across data collected for this study, we found that graduate trainees were able to implement two teaching practices most explicitly: supporting students socioemotionally and using active learning techniques especially through Socratic questioning practices.

Trainees were provided numerous practice opportunities during STEP-UP for teaching methods from microskills (statements of support and connection, for example) to macroskills (design of assignments with structures that put students in positive positions for collaboration). When observed, graduate trainees regularly made efforts to connect with undergraduate students through supportive actions and enthusiasm. For example, graduate trainees were observed reflecting and discussing their professional successes and failures. Ovid et al. (2021) found that one way to create a more inclusive classroom environment was through incorporating positive non-content talk. Although that study found a low percentage (14%) of students remembered when instructors shared personal stories, those stories were viewed positively by undergraduate students. In addition, sharing stories that humanize, create relevancy, and increase engagement for students may motivate them in the classroom (DeSurra & Church, 1994; Dewsbury & Brame, 2019; Freeman et al., 2007; Stolk et al., 2021; Trujillo & Tanner, 2014).

Active learning practices ranged from microskills (giving students a stated purpose for an activity) to medium-scale (using questioning strategies to probe or press student understanding) to macroskills (building classroom activities that provided opportunities for active-learning). The most common way that trainees enacted active learning practices was through questioning techniques. While formal assessment was not typically visible in the observations, we believe that the presence of questioning practices has the potential to lead to equitable and high-quality formative assessment of student learning (Morris et al., 2021). Taking a broad stance on what counts as assessment (as an equity move) to include instructor-posed questions provides evidence across the trainee observations of fair and equitable assessment through the following practices: asking questions that encourage higher-level thinking, gentle throws to groups, giving think time, random call, asking probing questions, revoicing student responses, giving positive feedback, reframing incorrect/partially correct responses. Additionally, there were many examples of trainee questions, small group discussions, and assignments that involved higher-level thinking. Throughout observations and discussions, trainees were clearly practicing and reflecting on occasions in which they positively handled incorrect student responses during active learning, which helped undergraduates to process their answers in an encouraging way.

In the exit ticket survey responses, undergraduate students found these trainee moves to be particularly compelling and reported that these practices created a collaborative and safe learning environment. This enabled students to take risks and contribute meaningfully in class without fear of being subjected to negative feedback or ostracization. Studies have shown that active learning can benefit student learning and performance (Freeman et al., 2014; Theobald et al., 2020). Although there is debate about who active learning helps most, and which students may be left behind (Cooper & Brownell, 2016; England et al., 2017; Gin et al., 2020), our study indicates that graduate trainees in STEP-UP have prioritized student well-being while guiding active learning. We posit that this combination is essential for undergraduate students to feel comfortable in an active learning environment which asks them to be vulnerable by offering their ideas to their peers in the class.

In-class Participation and Emotional Support from Instructor

We intended to describe which of these practices were most valuable in terms of undergraduate students’ learning and engagement (Research Question #3). Across several types of undergraduate-facing data, the two most salient outcomes were that undergraduates had many opportunities to participate meaningfully in class and that those undergraduates felt supported emotionally by their instructors. While comparing perceptions to other courses was beyond the scope of this study, undergraduates consistently reported that they felt able to participate comfortably at frequencies that seem to indicate success of graduate trainee teaching methods surpassing undergraduates’ perceived norms. While it may not be surprising to see supportive teaching in a positive classroom environment where graduate trainees were explicitly oriented and trained towards these practices (Rozhenkova et al., 2023), it is notable that these outcomes are often associated with the most experienced professors (Ambrose, 2010; Stronge, 2013; Whitaker, 2020; Wilson, 2004). Undergraduates participated and were routinely given opportunities to analyze their own understanding of the material presented to them (Winne & Azevedo, 2014). Metacognitive practices have been correlated with improved learning outcomes and retention in STEM especially for students historically and currently excluded in academia (Hansen et al., 2023; Knight et al., 2022; McKinney et al., 2021; Seymour, 1995). Furthermore, the intentionality of emotional support displayed by graduate trainees is likely to accentuate undergraduates’ use of active learning opportunities as they perceive safety in doing so, as well as to help undergraduates identify themselves as people likely to succeed in STEM.

This research was conducted in part during the COVID pandemic which started in 2020. While the method of instruction for graduate trainees did not change (but took a hiatus in the depth of the pandemic), one cohort of instructors was forced to quickly switch to online teaching. While not enough data for well-supported conclusions was collected about this cohort alone, the positive undergraduate and instructor feedback which matched other cohorts suggests that trainees were able to apply skills and perspectives to their new teaching environment. Anecdotally, they may have been faster and more adept at shifting curriculum, perhaps due to the advantage of habits and minds and lenses on education that focused on the student experience in their recent training course. More research will be needed to delve into the challenges and opportunities for trainee learning and practice.

For those considering creating a teaching professional development program for graduate students in the life sciences, this study offers a roadmap to design features that are likely to positively support both graduate students and undergraduate students. Programs that provide opportunities for authentic teaching experimentation after rigorous practice in active, supportive teaching methods are likely to see similar signs of success in the experiences of their graduate trainees and undergraduates. We suggest that basing the design of such programs in practice-based teacher development (Stroupe et al., 2020), creating an environment of intentionally instilling self-efficacy for trainees, and explicitly modeling and providing opportunities to practice teaching moves that demonstrate socioemotional support are key features in this arena.

Limitations

Tempering these positive findings is the knowledge that our research did not assess individual undergraduate students in their own personal contexts (such as demographics like race/ethnicity and gender); this kind of deeper case-based qualitative work will be needed to understand examples of particular teaching moves that graduate trainees might use to explicitly catalyze undergraduate progress for specific student groups. We intended to assess which of these practices are most valuable for trainees as they become early-career faculty. Theoretically, practices perceived as successful should translate to both their Domains of Consequence (career outcomes, like positions achieved) and Personal Domains (knowledge and beliefs that they take into those careers, like attitudes towards future development in their own teaching). Our limited assessment with STEP-UP alumni gives initial clues that graduate trainees are becoming student-centered teachers who are likely to both advocate for students and to continue improving in their teaching. However, it is important to emphasize that all alumni that chose to respond for interviews are all employed by teaching institutions; hence, these alumni may value teaching more than those that did not respond. Much of the design and research of this program occurred during a global pandemic, which changed instructional modes and stressed participants in uncontrollable ways. Importantly, alumni report on teaching moves and design ideas that have rapidly progressed beyond those that they explicitly practiced as part of STEP-UP, indicating likely continued cycles of enactment and reflection that are likely to drive future interconnected professional growth.

We would also like to note that this study took place at one institution and so the results cannot necessarily be generalized to other institutions. The impacts and benefits observed by this research are unlikely to remain for less-robust experiences for which graduate trainees spend less time learning and practicing. We hope that our recommendations for developing a teaching professional development program could be used to help jumpstart other programs with similar goals of teaching graduate students evidence-based teaching strategies and promoting inclusion in the classroom.

Conclusion

The problem of developing teaching experts from STEM graduate students is complex, important, and continues to be worthy of investment. Tackling it requires developing dual expertise in both science and teaching strategies, but the potential benefits for science are profound. This study examines how graduate students can be supported to develop their teaching practice and self-efficacy in ways that positively affect the students that they teach. The findings from this study, which were prevalent across different types of data, are aligned for the goals most often discussed around improving college STEM education. Additionally, we provide a model for how a Design-Based Research approach might inform other teaching professional development experiences for science graduate students, especially in the interest of scaling best practices. We present this work in hopes of contributing to the design of future programs that improve on this model of STEM teaching development.