Advertisement

Medical School Resourcing of USMLE Step 1 Preparation: Questioning the Validity of Step 1

  • Gary L. Beck DallaghanEmail author
  • Julie Story Byerley
  • Neva Howard
  • William C. Bennett
  • Kurt O. Gilliland
Commentary

Continuing concerns have been raised about the burgeoning “Step 1 climate” overtaking medical schools and causing significant distress for medical school students, faculty, and staff [1, 2]. The United States Medical Licensing Exam (USMLE) Step examinations are a required gateway to physician licensure, a fact noted on their website as the purpose of the four examinations [3]. However, as Chen et al. [1] have alluded, the first of these examinations has mutated into something beyond the creators’ intentions.

Medical schools intentionally strive toward more competency-based objectives, focusing on developing physicians who are reflective, lifelong learners [4]. As part of competency-based medical education, frequent assessments must be conducted in a variety of contexts to ensure trainees have the requisite competence to continue to the next stage of development [4, 5]. These instructive assessments designed to drive learning toward competency should be holistic, formative, and developmental [6].

Whether formative or summative, medical students perceive a test as a test [7]. Chen et al. [1] point out exemplary performance on USMLE Step 1 is the students’ only concern. The single three-digit score and its misuse as the gateway to career choice and residency match have led to students’ obsession in constantly striving for maximal numeric achievement. In fact, the national mean for USMLE Step 1 has continued to rise over the past 15 years from a mean of 216 to 230 [8, 9]. This provoked a proverbial “arms race” with implications for medical schools attempting to be innovative and future-focused in preparing effective physicians for our changing health care landscape. Students disengage or express discontent toward topics not directly relevant to USMLE Step 1, although such content may be important for their future careers as physicians. Examples range from communication skills and empathy development to new areas such as quality improvement and patient safety.

Medical students invest a great deal of time and money in tuition and an increasing extra-tuition amount on supplemental commercial products promising improved scores on USMLE Step 1 [10]. What has not been reported is the burden placed on medical schools to divert resources to ensure their students do the best that they can on this examination. Here, we offer our medical school experience as a case study to articulate the resources, time, and effort being invested into USMLE Step 1 preparation. We recognize that many schools are undertaking similar initiatives and even greater investments may be undertaken at better-funded private schools or higher tuition settings. This is occurring at a time when our national conversation laments the debt burden on learners [11]. While, of course, not all test preparation expense is wasted in producing an expert physician workforce, we desire to spark discussion regarding the degree of investment required to compete in this arms race of USMLE Step 1 prep. In questioning the pedagogical validity of such curricular investments, we also consider the validity of Step 1, as it and its implications currently exist, with assessments of construct and consequences validity. As a community of leaders in medical education should continually do, we invite readers to ask—when is enough, enough?

Case Study

In the early 2000s, the School of Medicine (SOM) noted the upward trend in USMLE Step 1 scores nationally but did not overemphasize them locally given the success of our graduates over the years in both placement and clinical performance. As with many other medical schools, we restructured our curriculum in an effort to integrate the basic and clinical sciences. This revision resulted in a pre-clinical curriculum (Foundation Phase), launched in 2014, that spans 3 semesters sub-divided into 13 systems-based blocks. Having consulted with other medical schools who changed their curricula, we anticipated USMLE Step 1 scores might drop somewhat and they did, while the national mean continued to rise. Figure 1 shows the trend of examination performance from 2013 to present.
Fig. 1

Medical school vs. national mean for USMLE Step 1 *Parenthetic numbers indicate pass rate for the medical school for first time test-takers. In 2014, the new curriculum was initiated. Performance in 2014 reflects the performance of the second-year class. The first class of the new curriculum took Step 1 in 2015. Multiple interventions were introduced for the class taking the exam in 2017, but not early enough to see an impact. The class taking Step 1 in 2018 received all of the interventions that began in 2016–2017

Prior to the start of the 2016–2017 academic year, an audit of the Foundation Phase was undertaken to evaluate content discrepancies between our curriculum and “high-yield” Step 1 topics. A commercial review book popular with medical students was used to detect co-alignment and deviation between the content of our curriculum and USMLE Step 1. This information was shared with course directors. Enhanced curricular efforts also targeted deficiencies from USMLE Step 1 reports in specific disciplines to improve respective metrics.

For the class matriculating in the fall of 2016, the SOM began administering National Board of Medical Examiners (NBME) examinations twice per semester during Foundation Phase. This exposed students to the rigors and format of nationally standardized examinations, but the performance was not factored into student grades. In the fall of 2017, customized NBME exams were fully integrated into Foundation Phase blocks as graded final exams.

Additional innovative teaching modalities were introduced too. More formative assessments and feedback mechanisms were implemented to foster self-regulated learning. The technology was employed to specifically introduce spaced retrieval with repetition of core concepts. One of our pediatric faculty (author NH) with expertise as an educator was hired to teach active learning techniques, test-taking skills, and overall integration and organization of content. This pediatrician also conducts one-on-one test-taking counseling with medical students. Over half of the class took advantage of her services in the first year. Although this pediatrician, and now another physician, offers a wide array of evidence-based pedagogical interventions for students; the intent of their hire was to improve Step 1 scores. Finally, we purchased subscriptions to USMLE-Rx™ and UWorld© for Foundation Phase students.

Following the efforts summarized in Fig. 2, the results were impressive. First-time pass rate for the class matriculating in 2016 was 99.4%. Mean scores for the class were 3 points over the national average. The dedicated focus on preparing students for USMLE Step 1 resulted in improved numeric outcomes.
Fig. 2

Timeline of test preparation interventions

These investments come at costs of time and money to the SOM. Initial investment for the customized NBME exams costs approximately $75 per student per exam, now totaling more than $100,000 annually. Now that these are given at the end of each block, the faculty responsible for building these exams devotes approximately 4 h per block three to four times per semester in addition to the cost of each exam. Further, subscriptions for USMLE-Rx and UWorld total approximately $150,000 annually.

Faculty who were hired by the SOM to provide feedback or examination training represents an additional expense. A pediatrician at 0.5 FTE and emergency medicine physician at 0.25 FTE working with students on study and test-taking skills cost approximately $195,000. Additionally, a family medicine physician analyzing performance metrics to guide the SOM cost $50,000 annually. An overall sum is difficult to calculate for the time and effort spent by course faculty who not only have to prepare material they believe fundamental for medical student competency but also ensure content remains relevant to USMLE Step 1. Ultimately, these investments come at a cost of $495,000, which may ultimately result in tuition increases.

Threats to Validity

With investments this significant and a rising cost of tuition, we evaluated our curricular efforts and focus with regard to USMLE Step 1 preparation. Based on our experience, questions naturally arose regarding the validity of USMLE Step 1 in preparing and evaluating physician competency. The Standards for Educational and Psychological Testing [12] offers guidance on validity evidence for examinations. Reviewing these, threats to the validity of USMLE Step 1 appear to exist, specifically regarding construct and consequences validities.

Construct Validity

Construct validity has been defined as the degree to which a test measures what it claims to be measuring [12]. Construct validity considers the appropriateness of inferences made from collected measurements. Essentially, the concern with USMLE Step 1 is whether or not the instrument is measuring the intended construct. The focus of this assessment is medical knowledge [3]. The “Step 1 climate” [1] has led to learners and faculty focusing exclusively on this content, which inadequately addresses domains necessary for minimally competent physicians, such as patient safety, quality improvement, population health, and teamwork [13].

Trochim highlights several threats to construct validity relevant to this argument about USMLE Step 1 [14]. Mono-method bias refers to measures using a single instrument. In using USMLE Step 1, we assume it is a measure of medical knowledge. Although experts are brought together to develop this examination, the assertion that this exam provides a threshold for minimum competence in medical knowledge has not been challenged. Furthermore, in recent years, the mean for the exam continues to trend upward [9]. Additionally, the cottage industry of examination preparation has clouded the construct validity of USMLE Step 1 with concerns that test-taking ability, not medical knowledge, is truly being tested [15, 16, 17].

This ties directly to evaluation apprehension by the test-takers, which could result in poor performance [14]. It has been well documented that African American men and women, older students, and women as a group tend to perform worse on USMLE Step 1 [18, 19, 20]. With this in mind, inferences about scores become confounded. If medical students perform similarly to peers within their particular program yet underperform on this examination, questions regarding the construct itself should be raised rather than inferring medical schools or students are not properly preparing for the test—or worse yet, demonstrate a relative deficiency in medical knowledge.

Consequences Validity

The act of administering an exam, interpreting results, and basing decisions or actions on results impacts those being assessed [21]. Consequences validity evidence derives from examining the impact of a test on examinees, educators, and schools. This form of validity evidence explores the impact of an exam, whether beneficial or harmful, intended or not [12]. As Cook and Lineberry note, consequences validity addresses whether or not the act of assessment and subsequent interpretation achieve intended results with minimal negative side effects [21].

We have already established that according to the USMLE website [3], USMLE Step 1 is purportedly a measure of medical knowledge. The scoring of this exam employs Hofstee and Anghoff methods to determine a minimum passing score [8]. Therefore, it can be argued that receiving a passing score demonstrates minimum competence in medical knowledge. However, USMLE Step 1 results are currently interpreted under the context of an aptitude test, suggesting higher scores imply students will be better physicians. Furthermore, the Step 1 climate noted above is a negative side effect that heavily threatens consequences validity. Whether intended or not, this misuse of USMLE Step 1 scores undermines the consequences validity of the exam [22].

Residency program directors argue that USMLE Step 1 scores for applicants are a good predictor of later performance on specialty board exams [23]. It makes sense that a standardized test performance would predict other standardized test performance, and this bears out from other studies [23, 24]. However, further studies have identified USMLE Step 2 Clinical Knowledge (CK) as a better predictor of board pass rates [25]. Furthermore, USMLE Step 2 CK has been shown to negatively correlate with other meaningful indicators, such as future malpractice suits as a physician [26].

Gumbert et al. have argued knowing scores on USMLE Step 1 permits estimation of academic excellence and not just competence [27]. However, knowing how to take standardized exams obfuscates whether or not the exam is truly measuring the construct it claims to assess. By using scores to assume academic excellence, program directors are in essence using the exam as an aptitude test and not a minimum competency exam. One study found USMLE Step 1 was a poor predictor of future career performance with regard to academic rank and board passage [28]. If we as educators are to believe USMLE Step 1 to be a credible exam, we must also recognize that this standardized, multiple-choice exam simply provides a measure of minimum competence in medical knowledge. To interpret it in any other way is akin to educational malpractice [29], costing medical students and medical schools large sums of money and time to ensure medical students not just pass, but pass with arbitrarily inflated numbers.

Conclusions

As outlined in our case study, medical schools invest and partition substantial financial, temporal, and personnel resources toward UMSLE Step 1 preparation. The goal of medical school curricula is to graduate physicians ready to enter clinical specialty training. USMLE Step 1 content represents an incomplete segment of the competence necessary for students to become successful physicians.

This does not mean we believe USMLE Step 1 should be eliminated. If the medical education community is serious about transforming medical education toward holistic competency, one alternative might be to make the exam criterion-referenced. Criterion-referenced examinations measure specific learning standards, in this case medical knowledge. Unlike the current grading for USMLE, which ranks students to have a normal distribution, all students could theoretically pass a criterion-referenced examination by answering a certain percentage of questions correctly. Students, administrators, and other stakeholders would have a clear sense of students’ individual medical knowledge based on their raw score and percentage. Since evidence shows the exam is biased against underrepresented minorities and women [18, 19, 20], perhaps providing concrete results is a better method than reporting normalized scores, which would also allow for individualized learning plans [20, 30].

Additionally, Dumas et al. point out that performance on a single exam is merely a snapshot of knowledge at that point and time [31]. They suggest using a dynamic measurement model that incorporates longitudinal data that is scaled across time. This model calculates a growth score of learners incorporating student performance measures to estimate improvement over time. This growth score may be a more appropriate measurement for program directors to consider.

We offer this case study as an example of unreported costs of the “Step 1 climate” and the substantial efforts made in preparation of an examination whose meaning is being misused by the national community of medical education leaders—to the detriment of student development. We support Haertel’s recent call to action imploring educators and stakeholders to stop the use of exams for purposes other than their original design [29]. We add to that call, noting costs and threats to validity in hope of encouraging those responsible for the future of our profession to take action.

Notes

Compliance with Ethical Standards

Conflict of Interest

The authors have no conflicts of interest to declare.

Ethical Approval

Not applicable.

Informed Consent

Not applicable.

Previous Presentations

Not applicable.

References

  1. 1.
    Chen DR, Priest KC, Batten JN, Fragoso LE, Reinfield BI, Laitman BM. Student perspectives on the “Step 1 climate” in preclinical medical education. Acad Med. 2019;94(3):302–4.CrossRefGoogle Scholar
  2. 2.
    Prober CG, Kolars JC, First LR, Melnick DE. A plea to reassess the role of United States medical licensing examination step 1 scores in residency selection. Acad Med. 2016;91:12–5.CrossRefGoogle Scholar
  3. 3.
    United States medical licensing examination. What is USMLE? Available at https://www.usmle.org. Accessed January 30, 2019.
  4. 4.
    Nousianinen MT, Caverzagie KJ, Ferguson PC, Frank JR. On behalf of the ICBME collaborators. Implementing competency-based medical education: what changes in curricular structure and processes are needed? Med Teach. 2017;39(6):594–8.CrossRefGoogle Scholar
  5. 5.
    Lockyer J, Carraccio C, Chan M-K, Hart D, Smee S, Touchie C, et al. On behalf of the ICBME Collaborators. Core principles of assessment in competency-based medical education. Med Teach. 2017;39(6):609–16.CrossRefGoogle Scholar
  6. 6.
    Harris P, Bhanji F, Topps M, Ross S, Lieberman S, Frank JR, et al. On behalf of the ICBME collaborators. Evolving concepts of assessment in a competency-based world. Med Teach. 2017;39(6):603–8.CrossRefGoogle Scholar
  7. 7.
    Wormwald BW, Schoeman S, Somasunderam A, Penn M. Assessment drives learning: an unavoidable truth? Anat Sci Educ. 2009;2:199–204.CrossRefGoogle Scholar
  8. 8.
    United States Medical Licensing Examination. USMLE score interpretation guidelines. Available at https://www.usmle.org. Accessed January 30, 2019.
  9. 9.
    Manthey DE, Hartman ND, Newmyer A, Gunalda JC, Hiestand BC, Askew KL, et al. Western J Emerg Med. 2017;28(1):105–105-9.CrossRefGoogle Scholar
  10. 10.
    Kumar AD, Shah MK, Maley JH, Evron J, Gyftopoulos A, Miller C. Preparing to take the USMLE step 1: a survey on medical students’ self-reported study habits. Postgrad Med J. 2015;91:257–61.CrossRefGoogle Scholar
  11. 11.
    Rappley MD. The most important question we can ask. 128th annual Association of American Medical Colleges leadership plenary. Available at https://news.aamc.org/research/article/leadership-plenary-2017/. Accessed on January. 2019;30.
  12. 12.
    American Educational Research Association, American Psychological Association. National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 2014.Google Scholar
  13. 13.
    Radabaugh CL, Hawkins RE, Welcher CM, Mejicano GC, Aparicio A, Lynne K, et al. Beyond the United States Medical Licensing Examination Score: assessing competence for entering residency. Acad Med. 2019;94(7):983–9.CrossRefGoogle Scholar
  14. 14.
    Trochim WMK. Threats to construct validity. Available at https://socialresearchmethods.net/kb/consthre.php. Accessed on January 30, 2019.
  15. 15.
    Baños JH, Pepin ME, Van Wagoner N. Class-wide access to commercial step 1 question bank during preclinical organ-based modules: a pilot project. Acad Med. 2018;93(3):486–90.CrossRefGoogle Scholar
  16. 16.
    Giordano C, Hutchinson D, Peppler R. A predictive model for USMLE Step 1 scores. Cureus. 2016;8(9):e769.  https://doi.org/10.7759/cureus.769.Google Scholar
  17. 17.
    Ehrlich A. Acing step 1 isn’t just about intelligence. Accessed January. February 15;2018. Available at https://www.usmlepro.com/single-post/Scoring-260-on-USMLE-step-1, 30:2019.
  18. 18.
    Kleshinski J, Khuder SA, Shapiro JI, Gold JP. Impact of preadmission variables on USMLE step 1 and step 2 performance. Adv Health Sci Educ. 2009;14:69–78.CrossRefGoogle Scholar
  19. 19.
    Rubright J, Jodoin M, Barone MA. Examining demographics, prior academic performance, and United States licensing examination scores. Acad Med. 2019;94(3):364–70.CrossRefGoogle Scholar
  20. 20.
    Williams TS. Some issues in the standardized testing of minority students. J Educ. 1983;165(2):192–208.CrossRefGoogle Scholar
  21. 21.
    Cook DA, Lineberry M. Consequences validity evidence: evaluating the impact of educational assessments. Acad Med. 2016;91:785–95.CrossRefGoogle Scholar
  22. 22.
    Camargo SL, Herrera AN, Traynor A. Looking for a consensus in the discussion about the concept of validity: a Delphi study. Methodol. 2018;14(4):146–55.Google Scholar
  23. 23.
    Katsufrakis PJ, Chaudhry HJ. Improving residency selection requires close study and better understanding of stakeholder needs. Acad Med. 2019;94(3):305–8.CrossRefGoogle Scholar
  24. 24.
    Zuckerman SL, Kelly PD, Dewan MC, Morone PJ, Yengo-Kahn AM, Magarik JA, et al. Predicting resident performance from preresidency factors: a systematic review and applicability to neurosurgical training. World Neurosurg. 2018;110:475–84.CrossRefGoogle Scholar
  25. 25.
    Welch TR, Olson BG, Nelsen E, Beck Dallaghan GL, Kennedy GA, Botash A. United States Medical Licensing Examination and American Board of Pediatrics certification examination results: does the residency program contribute to trainee achievement. J Pediatr. 2017;188:270–4.CrossRefGoogle Scholar
  26. 26.
    Cuddy MM, Young A, Gelman A, Swanson DB, Johnson DA, Dillon GF, et al. Exploring the relationships between USMLE performance and disciplinary action in practice: a validity study of score inferences from a licensure examination. Acad Med. 2017;92(12):1780–5.CrossRefGoogle Scholar
  27. 27.
    Gumbert SD, Guzman-Reyes S, Pivalizza EG. Letter to the editor. Acad Med. 2016;91(11):1469.CrossRefGoogle Scholar
  28. 28.
    Gelinne A, Zuckerman S, Benzil D, Grady S, Callas P, Durham. United States medical licensing exam step I score as a predictor of neurosurgical career beyond residency. Neurosurg. 2019; 84:1028–1034.Google Scholar
  29. 29.
    Haertel EH. Tests, test scores, and constructs. Educ Psychol. 2018;53(3):203–16.CrossRefGoogle Scholar
  30. 30.
    Pereira AG, Woods M, Olson APJ, van den Hoogenhof S, Duffy BL, Englander R. Criterion-based assessment in a norm-based world: how can we move past grades? Acad Med. 2018;93(4):560–4.CrossRefGoogle Scholar
  31. 31.
    Dumas D, McNeish D, Schreiber-Gregory D, Durning SJ, Torre DM. Dynamic measurement in health professions education: rationale, application, and possibilities. Acad Med. 2019;94(9):1323–8.CrossRefGoogle Scholar

Copyright information

© International Association of Medical Science Educators 2019

Authors and Affiliations

  1. 1.Office of Medical EducationUniversity of North Carolina School of MedicineChapel HillUSA

Personalised recommendations