Medical School Resourcing of USMLE Step 1 Preparation: Questioning the Validity of Step 1
Continuing concerns have been raised about the burgeoning “Step 1 climate” overtaking medical schools and causing significant distress for medical school students, faculty, and staff [1, 2]. The United States Medical Licensing Exam (USMLE) Step examinations are a required gateway to physician licensure, a fact noted on their website as the purpose of the four examinations . However, as Chen et al.  have alluded, the first of these examinations has mutated into something beyond the creators’ intentions.
Medical schools intentionally strive toward more competency-based objectives, focusing on developing physicians who are reflective, lifelong learners . As part of competency-based medical education, frequent assessments must be conducted in a variety of contexts to ensure trainees have the requisite competence to continue to the next stage of development [4, 5]. These instructive assessments designed to drive learning toward competency should be holistic, formative, and developmental .
Whether formative or summative, medical students perceive a test as a test . Chen et al.  point out exemplary performance on USMLE Step 1 is the students’ only concern. The single three-digit score and its misuse as the gateway to career choice and residency match have led to students’ obsession in constantly striving for maximal numeric achievement. In fact, the national mean for USMLE Step 1 has continued to rise over the past 15 years from a mean of 216 to 230 [8, 9]. This provoked a proverbial “arms race” with implications for medical schools attempting to be innovative and future-focused in preparing effective physicians for our changing health care landscape. Students disengage or express discontent toward topics not directly relevant to USMLE Step 1, although such content may be important for their future careers as physicians. Examples range from communication skills and empathy development to new areas such as quality improvement and patient safety.
Medical students invest a great deal of time and money in tuition and an increasing extra-tuition amount on supplemental commercial products promising improved scores on USMLE Step 1 . What has not been reported is the burden placed on medical schools to divert resources to ensure their students do the best that they can on this examination. Here, we offer our medical school experience as a case study to articulate the resources, time, and effort being invested into USMLE Step 1 preparation. We recognize that many schools are undertaking similar initiatives and even greater investments may be undertaken at better-funded private schools or higher tuition settings. This is occurring at a time when our national conversation laments the debt burden on learners . While, of course, not all test preparation expense is wasted in producing an expert physician workforce, we desire to spark discussion regarding the degree of investment required to compete in this arms race of USMLE Step 1 prep. In questioning the pedagogical validity of such curricular investments, we also consider the validity of Step 1, as it and its implications currently exist, with assessments of construct and consequences validity. As a community of leaders in medical education should continually do, we invite readers to ask—when is enough, enough?
Prior to the start of the 2016–2017 academic year, an audit of the Foundation Phase was undertaken to evaluate content discrepancies between our curriculum and “high-yield” Step 1 topics. A commercial review book popular with medical students was used to detect co-alignment and deviation between the content of our curriculum and USMLE Step 1. This information was shared with course directors. Enhanced curricular efforts also targeted deficiencies from USMLE Step 1 reports in specific disciplines to improve respective metrics.
For the class matriculating in the fall of 2016, the SOM began administering National Board of Medical Examiners (NBME) examinations twice per semester during Foundation Phase. This exposed students to the rigors and format of nationally standardized examinations, but the performance was not factored into student grades. In the fall of 2017, customized NBME exams were fully integrated into Foundation Phase blocks as graded final exams.
Additional innovative teaching modalities were introduced too. More formative assessments and feedback mechanisms were implemented to foster self-regulated learning. The technology was employed to specifically introduce spaced retrieval with repetition of core concepts. One of our pediatric faculty (author NH) with expertise as an educator was hired to teach active learning techniques, test-taking skills, and overall integration and organization of content. This pediatrician also conducts one-on-one test-taking counseling with medical students. Over half of the class took advantage of her services in the first year. Although this pediatrician, and now another physician, offers a wide array of evidence-based pedagogical interventions for students; the intent of their hire was to improve Step 1 scores. Finally, we purchased subscriptions to USMLE-Rx™ and UWorld© for Foundation Phase students.
These investments come at costs of time and money to the SOM. Initial investment for the customized NBME exams costs approximately $75 per student per exam, now totaling more than $100,000 annually. Now that these are given at the end of each block, the faculty responsible for building these exams devotes approximately 4 h per block three to four times per semester in addition to the cost of each exam. Further, subscriptions for USMLE-Rx and UWorld total approximately $150,000 annually.
Faculty who were hired by the SOM to provide feedback or examination training represents an additional expense. A pediatrician at 0.5 FTE and emergency medicine physician at 0.25 FTE working with students on study and test-taking skills cost approximately $195,000. Additionally, a family medicine physician analyzing performance metrics to guide the SOM cost $50,000 annually. An overall sum is difficult to calculate for the time and effort spent by course faculty who not only have to prepare material they believe fundamental for medical student competency but also ensure content remains relevant to USMLE Step 1. Ultimately, these investments come at a cost of $495,000, which may ultimately result in tuition increases.
Threats to Validity
With investments this significant and a rising cost of tuition, we evaluated our curricular efforts and focus with regard to USMLE Step 1 preparation. Based on our experience, questions naturally arose regarding the validity of USMLE Step 1 in preparing and evaluating physician competency. The Standards for Educational and Psychological Testing  offers guidance on validity evidence for examinations. Reviewing these, threats to the validity of USMLE Step 1 appear to exist, specifically regarding construct and consequences validities.
Construct validity has been defined as the degree to which a test measures what it claims to be measuring . Construct validity considers the appropriateness of inferences made from collected measurements. Essentially, the concern with USMLE Step 1 is whether or not the instrument is measuring the intended construct. The focus of this assessment is medical knowledge . The “Step 1 climate”  has led to learners and faculty focusing exclusively on this content, which inadequately addresses domains necessary for minimally competent physicians, such as patient safety, quality improvement, population health, and teamwork .
Trochim highlights several threats to construct validity relevant to this argument about USMLE Step 1 . Mono-method bias refers to measures using a single instrument. In using USMLE Step 1, we assume it is a measure of medical knowledge. Although experts are brought together to develop this examination, the assertion that this exam provides a threshold for minimum competence in medical knowledge has not been challenged. Furthermore, in recent years, the mean for the exam continues to trend upward . Additionally, the cottage industry of examination preparation has clouded the construct validity of USMLE Step 1 with concerns that test-taking ability, not medical knowledge, is truly being tested [15, 16, 17].
This ties directly to evaluation apprehension by the test-takers, which could result in poor performance . It has been well documented that African American men and women, older students, and women as a group tend to perform worse on USMLE Step 1 [18, 19, 20]. With this in mind, inferences about scores become confounded. If medical students perform similarly to peers within their particular program yet underperform on this examination, questions regarding the construct itself should be raised rather than inferring medical schools or students are not properly preparing for the test—or worse yet, demonstrate a relative deficiency in medical knowledge.
The act of administering an exam, interpreting results, and basing decisions or actions on results impacts those being assessed . Consequences validity evidence derives from examining the impact of a test on examinees, educators, and schools. This form of validity evidence explores the impact of an exam, whether beneficial or harmful, intended or not . As Cook and Lineberry note, consequences validity addresses whether or not the act of assessment and subsequent interpretation achieve intended results with minimal negative side effects .
We have already established that according to the USMLE website , USMLE Step 1 is purportedly a measure of medical knowledge. The scoring of this exam employs Hofstee and Anghoff methods to determine a minimum passing score . Therefore, it can be argued that receiving a passing score demonstrates minimum competence in medical knowledge. However, USMLE Step 1 results are currently interpreted under the context of an aptitude test, suggesting higher scores imply students will be better physicians. Furthermore, the Step 1 climate noted above is a negative side effect that heavily threatens consequences validity. Whether intended or not, this misuse of USMLE Step 1 scores undermines the consequences validity of the exam .
Residency program directors argue that USMLE Step 1 scores for applicants are a good predictor of later performance on specialty board exams . It makes sense that a standardized test performance would predict other standardized test performance, and this bears out from other studies [23, 24]. However, further studies have identified USMLE Step 2 Clinical Knowledge (CK) as a better predictor of board pass rates . Furthermore, USMLE Step 2 CK has been shown to negatively correlate with other meaningful indicators, such as future malpractice suits as a physician .
Gumbert et al. have argued knowing scores on USMLE Step 1 permits estimation of academic excellence and not just competence . However, knowing how to take standardized exams obfuscates whether or not the exam is truly measuring the construct it claims to assess. By using scores to assume academic excellence, program directors are in essence using the exam as an aptitude test and not a minimum competency exam. One study found USMLE Step 1 was a poor predictor of future career performance with regard to academic rank and board passage . If we as educators are to believe USMLE Step 1 to be a credible exam, we must also recognize that this standardized, multiple-choice exam simply provides a measure of minimum competence in medical knowledge. To interpret it in any other way is akin to educational malpractice , costing medical students and medical schools large sums of money and time to ensure medical students not just pass, but pass with arbitrarily inflated numbers.
As outlined in our case study, medical schools invest and partition substantial financial, temporal, and personnel resources toward UMSLE Step 1 preparation. The goal of medical school curricula is to graduate physicians ready to enter clinical specialty training. USMLE Step 1 content represents an incomplete segment of the competence necessary for students to become successful physicians.
This does not mean we believe USMLE Step 1 should be eliminated. If the medical education community is serious about transforming medical education toward holistic competency, one alternative might be to make the exam criterion-referenced. Criterion-referenced examinations measure specific learning standards, in this case medical knowledge. Unlike the current grading for USMLE, which ranks students to have a normal distribution, all students could theoretically pass a criterion-referenced examination by answering a certain percentage of questions correctly. Students, administrators, and other stakeholders would have a clear sense of students’ individual medical knowledge based on their raw score and percentage. Since evidence shows the exam is biased against underrepresented minorities and women [18, 19, 20], perhaps providing concrete results is a better method than reporting normalized scores, which would also allow for individualized learning plans [20, 30].
Additionally, Dumas et al. point out that performance on a single exam is merely a snapshot of knowledge at that point and time . They suggest using a dynamic measurement model that incorporates longitudinal data that is scaled across time. This model calculates a growth score of learners incorporating student performance measures to estimate improvement over time. This growth score may be a more appropriate measurement for program directors to consider.
We offer this case study as an example of unreported costs of the “Step 1 climate” and the substantial efforts made in preparation of an examination whose meaning is being misused by the national community of medical education leaders—to the detriment of student development. We support Haertel’s recent call to action imploring educators and stakeholders to stop the use of exams for purposes other than their original design . We add to that call, noting costs and threats to validity in hope of encouraging those responsible for the future of our profession to take action.
Compliance with Ethical Standards
Conflict of Interest
The authors have no conflicts of interest to declare.
- 3.United States medical licensing examination. What is USMLE? Available at https://www.usmle.org. Accessed January 30, 2019.
- 8.United States Medical Licensing Examination. USMLE score interpretation guidelines. Available at https://www.usmle.org. Accessed January 30, 2019.
- 11.Rappley MD. The most important question we can ask. 128th annual Association of American Medical Colleges leadership plenary. Available at https://news.aamc.org/research/article/leadership-plenary-2017/. Accessed on January. 2019;30.
- 12.American Educational Research Association, American Psychological Association. National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 2014.Google Scholar
- 14.Trochim WMK. Threats to construct validity. Available at https://socialresearchmethods.net/kb/consthre.php. Accessed on January 30, 2019.
- 17.Ehrlich A. Acing step 1 isn’t just about intelligence. Accessed January. February 15;2018. Available at https://www.usmlepro.com/single-post/Scoring-260-on-USMLE-step-1, 30:2019.
- 22.Camargo SL, Herrera AN, Traynor A. Looking for a consensus in the discussion about the concept of validity: a Delphi study. Methodol. 2018;14(4):146–55.Google Scholar
- 28.Gelinne A, Zuckerman S, Benzil D, Grady S, Callas P, Durham. United States medical licensing exam step I score as a predictor of neurosurgical career beyond residency. Neurosurg. 2019; 84:1028–1034.Google Scholar