Program evaluation seeks to gather information about the value or merit of a series of activities or events in a systematic way [1] Common program evaluation approaches used in medical education, like Kirkpatrick’s hierarchy and the logic model, assume a linear relationship between activities/changes and satisfaction/outcomes, and thus work best for simple programs. Evaluating events in a linear way limits understanding of how multiple factors interact and affect outcomes and do not account for unanticipated issues and macro-level changes [2, 3] and therefore do not work for complicated or complex programs. To handle complexity, other fields have used systems thinking to design evaluation approaches. [2,3,4,5]

The COVID-19 pandemic highlighted complexity in medical education and provides a good illustration for how systems thinking can be applied for evaluation. A key applicable system-thinking feature is the ability to account for the interplay of factors during constant change [2] It is critical to evaluate the inter-relations of factors to inform future decision-making during the pandemic and in preparation for future acute crises. There is little research on how medical education was impacted by prior pandemics and natural disasters, aside from perspective articles [6,7,8] Two additional articles investigated the impact of Hurricane Katrina on academic performance, but both lacked an evaluation framework. [9, 10] The purpose of this article is to provide a proof of concept for a systems-based evaluation model of the education impact of COVID-19.

Activity

Systems thinking was used to create a model (Fig. 1) for evaluating the educational impact of COVID-19 at the University of Utah School of Medicine during academic years 2019–2020. We identified three layers in the model: individuals (e.g., students, course directors, faculty), changes to instruction and assessment (e.g., shift to virtual lectures and OSCEs), and system level curricular changes (e.g., graduation requirements). The evaluation model was shared with education leaders, refined, and implemented 1 week later. Specifics of layers 1 and 2 were developed and refined with input from course directors.

Fig. 1
figure 1

A systems-based evaluation model for the educational impact of the COVID-19 pandemic

To evaluate layer 1, we added questions about communication, sense of community, resources/support, and wellness to end-of-course evaluations and end-of-year faculty surveys. Survey items were developed and refined through pilot testing. To evaluate layer 2, we used a value-driven outcomes framework [11] to capture the relative cost, time, and preference for COVID-19 adaptations relative to original in-person versions. We used instructional event type (e.g., small group discussion) and assessment type as the unit of analysis. Layer 2 information helped us understand which adaptations should be continued. Finally, evaluating layer 3, the overall learning impact and cost, will take place after the acute phase of the pandemic is over. Thus, this article will only report data from layers 1 and 2.

The University of Utah School of Medicine Institutional Review Board deemed this study exempt. For layer 1, individuals rated if each communication, sense of community, resources, and wellness survey item in Table 1 was exceptional, adequate, or lacking. A threshold of > 24% lacking indicted concern for the Curriculum Committee. For layer 2, course directors indicated the time (more, less, the same), cost (more, less, the same), and their preference for each adaptation relative to the original in-person version. Students indicated if they preferred in-person, synchronous, or asynchronous learning. Students could also indicate no preference, or that it depended on the topic. Student preference for original vs. adapted was determined with a 70% majority threshold (combined with percent no-preference/depends on topic). An overall value determination (high, moderate, low, or no added value) for each instruction event type and assessment type at the course level was based on the number of metrics (cost, time, course director preference, student preference) favoring the adapted version. Instruction events/assessments with at least three value metrics favoring the adaptation were considered high value, two value metrics favoring the adaptation were deemed moderate value, and one value metric favoring the adaptation was considered low value.

Table 1 Percent of University of Utah School of medicine students, course directors, and course faculty perceiving education metrics were lacking during COVID-19 in academic year 2019–2020

Results

Response rates were above 90% for students and course directors and 71% for faculty. Table 1 provides the percent of students, course directors, and faculty who rated each education metric item as lacking. Greater than 24% of year 2 students found all communication items, sense of community with faculty, personal space for remote sessions, and personal ability to preserve mental health to be lacking. Greater than 24% of year 1 students found sense of community with students, personal space at home for remote sessions, and personal ability to preserve mental health to be lacking. There were no items rated as lacking by 24% or more of year 3 students, year 4 students, course directors, or course faculty.

For the purposes of this paper, we have provided value metrics for two course types (Foundational Science course, clerkship) in Table 2. There were 36 instruction event types and 32 assessment type adaptations due to COVID-19. Almost half of the adaptations (48%, 33) did not have added value over the original versions, 34% (23) had low value, 13% (9) had moderate value, and 1% (1) had high value. A higher percentage of assessment adaptations were considered low-high value (63%, 20) relative to instructional event adaptations (38%, 13). There was a higher percentage of year 3–4 clerkship course adaptations considered low-high value (57%, 20) relative to years 1–2 course adaptations (42%, 13). Finally, there was no course director preference consensus on similar adaptations; student preference also varied for similar adaptations across courses.

Table 2 Instructional and assessment event adaptations due to COVID-19 for a year 1 foundational science course and year 3

Discussion

This is the first study of a systems-based evaluation model in a time of crisis. Similar to gathering feedback alone with surveys, level 1 data provided us with insight on how to better support our students. Sense of community was challenging to foster, especially while simultaneously enforcing stay at home orders, physical distancing, and not wanting to overwhelm students with too many online meetings. For both students and faculty, sense of community was endangered by online learning and changes in social interaction. We also identified aspects that needed to be addressed beyond normal course support, such as having enough personal space and WIFI at home and the ability to tend to one’s mental health. Based on these findings, we arranged for on-campus building space so students who needed to complete a final examination outside of their home could do so.

Level 1 data also highlighted how differently year 1–2 and year 3–4 students experienced course changes which has made us more mindful of their varying educational environments, course-loads, and developmental needs. For instance, year 1–2 students interact in much larger groups than year 3–4 students and may therefore have felt the sudden loss of those interactions more acutely. Moreover, early in medical school peer, student-faculty relationships and a sense of community are intensely developing in comparison with latter stages when students have had years to form peer bonds, build a support network, and acclimate to the culture of the school. Additionally, year 1–2 students were in three courses at the same time and may have found increasing disruptions at home trying to study relative to year 3–4 students in one rotation during COVID-19.

Layer 2 data revealed no clear consensus on preference or perceived value of adaptations made by course directors, which at first glance seemed counterintuitive. However, the lack of consensus reminds us why a systems-thinking approach for evaluating educational activities, particularly during uncertain times, is so important. We suspect that perceptions vary greatly because of the constantly changing and differing contexts for students and faculty. Additionally, adaptations made in years 3–4 seemed to have more value than those in years 1–2, which could have impacted the overall receptibility and stress of students or vis-a-versa. Overall, there was much interplay between layers 1 and 2, demonstrating that our layer 2 data is inherently affected by our layer 1 data, underscoring why a linear evaluation model is less than ideal for this situation.

As others have outlined,[2,3,4,5] using systems-thinking principles is helpful in designing evaluations of complex educational situations.[12] To design our nimble approach, we used the lens of systems thinking from the onset, considered inter-relations of factors, and captured diverse stakeholder input. Given the novelty of the COVID-19 crisis, results from this case study provide insight for other medical schools. There is potential for transferability of the systems-based model to other schools because it can be adapted to their needs. We expect to see more studies utilizing systems-based evaluation approaches in the future since the landscape of medical education is becoming more complex.