Introduction

The United States Medical Licensing Examination (USMLE) Step 1 is a high-stakes exam that was designed to measure competency in clinical basic sciences knowledge and provides a basis for medical licensing eligibility. Most medical schools require it for graduation and residency programs require it for entry into their programs [1]. Though Step 1 was designed to be a benchmark and measure of content knowledge, it was not designed to be a predictor for success in residency [2, 3]. However, the Step 1 score has been used heavily in the residency application process for filtering and ranking of large numbers of applicants, particularly in certain competitive specialties [4]. The consequences of Step 1 performance for residency selection have drawn considerable student stress, focus, and resources, including time and money, to the exam [2]. Students devote their time to both participating in the medical school curriculum and separately studying for the “parallel curriculum” of optimizing their Step 1 performance [5]. This “Step 1 climate” has resulted in increased stress and medical students reducing their involvement in curricular activities in favor of Step 1 studying throughout their pre-clerkship education [6]. Because of these factors, the use of USMLE Step 1 in residency selection has had detrimental effects on student mental health, well-being, and education quality [7, 8].

Over the past decade, the medical education community has made many calls to alter how the Step 1 exam is used in resident selection, including shifting the exam from three-digit scoring to providing only pass/fail scoring [9]. On September 15th, 2021, the USMLE announced that Step 1 would move to a pass/fail scoring system, done in part to decrease the stress associated with the exam [10]. However, emerging literature suggests that this transition has led to other stresses and concerns for students. In focus groups, students have voiced concerns about how residency programs will react to the change to pass/fail and “almost panic” about how they themselves will be impacted [11]. Some of these stresses relate to increased emphasis on the USMLE Step 2 score, which is a similar exam more focused on clinical applications, as well as clerkship grades in an opaque grading environment [12] and pressure to engage in extracurricular activities [11]. Kogan and Hauer have suggested that a singular change in Step 1 scoring without other adjustments to the residency selection process will result in increased emphasis on and stress over the aforementioned items [13].

Given the anticipated concerns from thought leaders, educators, and students suggesting that the Step 1 change to pass/fail may not have had the expected effects of decreasing overall medical student stress, it is important to determine whether there is an effect of the change on student stress specifically related to the Step 1 exam. Our study aims to compare student stress levels, overall and in relation to Step 1, in the period leading up to the Step 1 exam in students who took the exam pass/fail compared with students who took the exam for a score.

Materials and Methods

Study Design and Setting

This single-institution longitudinal survey study was conducted and approved by the institutional review board, at the Georgetown University School of Medicine. Study participants included two cohorts of second-year medical students: (1) the graduating class of 2023 who took the Step 1 exam for a score in 2021 (score cohort) and (2) the graduating class of 2024 who took the Step 1 exam pass/fail in 2022 (pass/fail cohort).

The medical school curriculum at our institution is a 1.5-year pre-clerkship curriculum that ends in December of the second year and is followed by an 8-week break dedicated to studying for the USMLE Step 1 exam. Students must take and pass the USMLE Step 1 exam before advancing to their third-year clinical clerkships which begin in early March.

Procedures and Instruments

All second-year medical students in the score and pass/fail cohorts were invited via email to complete a voluntary series of online surveys via Qualtrics (Provo, Utah) about their perceived stress. The surveys were administered four times to each cohort: at the beginning of the second year (M2) of medical school (time point 1), halfway through the M2 year (time point 2), beginning of the dedicated study period for Step 1 (time point 3), and middle of the dedicated study period for Step 1 (time point 4) (Fig. 1).

Fig. 1
figure 1

Survey time point. The survey was administered at the beginning of each cohort’s second year of medical school (M2), halfway through their M2 year, at the beginning of their dedicated study period for Step 1, and halfway through the dedicated study period. Traditionally, most students take their exam 6–8 weeks into their dedicated period

The 14-item survey included four demographics questions, four questions from the Perceived Stress Scale (PSS-4), and six additional questions about stress levels pertaining to the potential stress items identified in the literature (Step 1, Step 2, research experience, extracurricular activities, pre-clerkship coursework, and clerkship coursework). The full list of questions can be found in Table 1. The Perceived Stress Scale (PSS) is a popular tool used for measuring psychological stress intended to compare subject’s stress related to objective events [14]. Higher scores are associated with higher levels of stress [14, 15]. The 5-point PSS-4 rating sale is traditionally scored 0–4. The scores from each of the four items are then summed into a total PSS-4 score. Because our intent was not to compare participant stress scores with population norms, but rather to look at changes in stress in the participants over time, we opted to use 1–5 scoring for the scale and in determining the total PSS-4 score. Therefore, our PSS-4 scores do not correlate to published population scores and guidelines on what constitutes average vs high stress. For consistency, we also used the PSS-4 rating scale for the six questions about stress related to specific items.

Table 1 14-item survey questions

Data Analysis

We calculated descriptive statistics for all participants on the survey responses at each time point. We compare results from the two cohorts using a two-tailed t test for independent means. Each time point was evaluated separately as a cross-sectional data point that averaged the responses of all participants who participated in the survey at that particular time point. Because survey responses were anonymous, we were not able to follow or link participant responses over time. We performed a single factor analysis of variance (ANOVA) to compare the results in each individual cohort across the four time points. P values less than 0.05 were considered significant. Effect sizes for statistically significant results were measured using Hedges’ g, as the sample sizes were different for each group.

RESULTS

Descriptive Statistics of the Cohort

A total of 411 students were surveyed for the study across four time points and data comparing stress levels are summarized in Table 2. Response rates varied across both cohorts; however, the average response rate across both cohorts and all time points was 18.1%. The minimum and maximum response rates were 9.9% (N = 20) and 33.7% (N = 70), respectively. All response rates are shown in Table 3. Most participants identified as white in both the score (78.9%) and pass/fail cohorts (67.9%), an overrepresentation when compared with the compositions of each student body. See Table 4. There were slightly more students identifying as male in the score cohort (55.1%) and slightly more students identifying as female in the pass/fail cohort (57.1%).

Table 2 Average perceived stress for each cohort at each time point
Table 3 Response rates for each cohort at each time point
Table 4 Demographic data for each cohort

Overall Stress (PSS-4) and Stress over Time

For both the scored cohort and the pass/fail cohort there was no significant difference in reported PSS-4 stress levels from time point 1 to time point 4 (p = 0.23 and p = 0.78, respectively), and stress surrounding Step 2 (p = 0.19 and p = 0.26, respectively). However, both the scored cohort and the pass/fail cohort saw a significant difference in stress levels surrounding Step 1 from time point 1 to time point 4 (p < 0.005 for both).

In addition, there was also no significant difference in reported PSS-4 stress levels between the two cohorts at any given time point (Fig. 2). Stress related to Step 1 was significantly lower in the pass/fail cohort initially, but over time, stress levels related to Step 1 became similar between the cohorts (Fig. 3). Stress related to Step 2 varied over time for both cohorts but was higher in the pass/fail cohort (Fig. 4).

Fig. 2
figure 2

Average PSS-4 stress levels over time. This figure details the average PSS-4 stress level for each cohort over all time points. We conducted a two-sample t test at each time point comparing both cohorts. Error bars represent a 5% confidence interval. There was no significant difference between groups at any time point

Fig. 3
figure 3

Average Step 1 stress levels over time. This figure details the average stress level related to Step 1 for each cohort over all time points. We conducted a two-sample t test at each time point comparing both cohorts. Errors bars represent a 5% confidence interval. At time points one and two, there was a significant difference in stress related to Step 1 between the cohorts

Fig. 4
figure 4

Average Step 2 stress levels over time. This figure details the average stress level related to Step 2 for each cohort over all time points. We conducted a two-sample t test at each time point comparing both cohorts. Error bars represent a 5% confidence interval. At time point one, there was a significant difference in stress related to Step 1 between the cohorts

Stress Levels at Each Time Point

Stress related to the Step 1 exam was significantly lower for the pass/fail cohort than the score cohort at the beginning of the M2 year (time point 1) (2.33 vs 3.75, p < 0.001). These results correlated to a 28.4% decrease in stress related to Step 1. Stress related to the Step 2 exam at time point 1 was greater for the pass/fail cohort than the score cohort (2.30 vs 1.44, p < 0.001). These results correlated to a 17.24% increase in stress related to Step 2. There were no significant differences in stress levels related to research experience, extracurricular activities, or pre-clerkship/clinical coursework. See Table 2.

Midway through the M2 year (time point 2), stress related to the Step 1 exam was significantly lower for the pass/fail cohort compared to the score cohort (3.22 vs 4.04, p < 0.001). There was a trend toward higher stress related to research experiences in the pass/fail cohort that was not statistically significant (3.81 vs 3.33, p = 0.078). There were no differences in stress levels between the cohorts for any other items. See Table 2.

At the start of the dedicated study period (time point 3), stress related to Step 1 no longer showed a significant difference between the pass/fail and score cohorts (4.00 vs 3.54, p = 0.104). Stress related to clerkship coursework was higher in the pass/fail cohort (3.23 vs 2.53, p = 0.033). See Table 2.

Halfway through the dedicated study period (time point 4), there was no significant difference in Step 1 stress levels between the pass/fail and score cohorts (4.55 vs 4.53, p = 0.92). Stress related to pre-clerkship coursework was higher in the pass/fail cohort (2.75 vs 1.51, p < 0.001). See Table 2.

Discussion

We found that while there was no difference in general overall stress as measured by the PSS-4 between the students who took Step 1 for a score (M2023) and students who took Step 1 pass/fail (M2024), we did see differences in stress specific to the Step 1 exam. The score cohort reported significantly more stress related to Step 1 at the beginning of and halfway through their second year than the pass/fail cohort. Their stress over Step 1 outweighed their stress over any other element measured, such as Step 2, the curriculum, research, or extracurricular activities. However, by the time the dedicated study period commenced, the pass/fail cohort’s stress related to Step 1 reached the same level as the score cohort and remained the same through midway into the study period. Additionally, the pass/fail cohort reported more stress than the score cohort related to specific items at varying points: Step 2 at the beginning of their second year, clerkships at the start of the dedicated study period for Step 1, and pre-clerkship grades when midway through the dedicated study period for Step 1.

Our results showed that at our institution, the change to pass/fail scoring on USMLE Step 1 did ameliorate student stress and the “Step 1 climate” during part of the pre-clerkship period, suggesting that the change was successful in reducing the “Step 1 climate” during the pre-clerkship curriculum as stakeholders had intended [9, 10]. While the pass/fail cohort started with less stress surrounding Step 1, we found that both cohorts saw a significant increase in stress related to Step 1 during the dedicated study period. Though the reported stress levels in students taking the Step 1 pass/fail eventually rose to the same level as students who had taken Step 1 for a score, this did not occur until after the completion of the pre-clerkship curriculum, and they entered the dedicated study period. Thus, pass/fail scoring did indeed lead to decreased stress related to Step 1 during the pre-clerkship curriculum. Regarding the similar stress seen in both student cohorts during the dedicated study period, one might postulate that stress during dedicated study may be natural, given the high-stakes nature of passing the exam [16]. This could suggest that a rise in stress during the dedicated study period should be expected and might need to be accepted, with medical schools continuing to offer educational and emotional support and resources.

Ideally, the drop in Step 1 stress would have resulted in a lowering of student stress overall. However, the PSS-4 scores, which measure general stress, were similar between the two cohorts across all time points. It is possible that changes in Step 1 stress were not large enough to impact the overall stress that typically accompanies medical school and/or the PSS-4 was not sensitive enough to pick up the relatively smaller changes in stress levels. Another possibility is the one suggested by Kogan, Hauer, and others, where students’ stress may be shifted to other elements without a change in overall stress [12, 13]. For instance, we found stress related to Step 2 and clerkship coursework appeared higher in the pass/fail group at certain time points which could support their concerns. However, we did not find any significant trends in increased stress surrounding research or extracurricular activities.

Of note, the elevated stress related to Step 2, clerkship coursework, and even the pre-clerkship coursework seen in the pass/fail cohort only occurred during single time points. For instance, the pass/fail cohort had more stress related to Step 2 at the start of their second year of medical school, but this difference in stress was not sustained throughout all time points, which is what one would expect if Step 1 stress were displaced toward Step 2. It is possible that in the Step 1 pass/fail reporting environment, students were able to use some of their stress bandwidth to think about other concerns, with different concerns occupying their attention at different time points: Step 2 concerns at the beginning of the second year moving to concerns about clerkships at the end of the pre-clerkship curriculum/start of the dedicated study period, to concerns about the pre-clerkship curriculum at the time that pre-clerkship performance information is released midway through the dedicated study period. This could potentially be taken as a positive sign of students’ ability to allocate attention to something other than Step 1 performance.

There are limitations to our study. This was a single-institution study involving only two medical school classes. Our response rates were low, particularly in the pass/fail cohort. It is possible that students who were the most stressed would have been the least likely to complete the survey, resulting in an underassessment of student stress, particularly in the pass/fail cohort. Also, because we were not able to link participant responses over time, our results from cross-sectional observations may be due to differences in who completed the survey at each time point. While we attempted to measure student stress longitudinally, we measured it only during the second year of medical school and in the period leading up to the Step 1 exam. It is possible that the changes in stress related to other items, such as clerkships, research, extracurricular activities, and Step 2, would have increased after the Step 1 exam and closer to residency application time. Furthermore, the PSS-4 may not have been a sensitive enough tool to assess overall stress. Finally, this study was performed during a time period of curricular adjustments due to the COVID-19 pandemic, which might have influenced student perceptions of stress. We recommend additional investigations by others to confirm our findings related to the change in stress in students prior to taking their Step 1 exam and further examine the impact on potential later stress related to residency application.

Conclusions

The change in USMLE Step 1 scoring from a 3-digit score to pass/fail Step 1 appears to have decreased the stress specifically related to this examination experienced by second-year medical students at one institution. However, this reduction in stress was not sustained as students entered their study period to prepare for Step 1.