Introduction

Optimizing the criteria to rank residency applicants is a difficult task. The National Residency Matching Program (NRMP) is designed to be applicant-centric, with the overarching goal to provide favorable outcomes to the applicant while providing opportunity for programs to match high-quality candidates. From a program’s perspective, the NRMP is composed of three phases: the screening of applicants, the interview, and the creation of the rank list. While it is easy to compare candidates based on objective measures, these do not always reflect qualities required to be a successful resident or physician. Prior studies have demonstrated that objective measures such as Alpha Omega Alpha status, United States Medical Licensing Exams (USMLE), and class rank do not correlate with residency performance measures [1]. Due to the variability of these factors to predict success and recognition of the importance of the non-cognitive traits, most programs place increased emphasis on candidate interviews to assess fit [2].

Unfortunately, the interview process lacks standardization across residency programs. Industry and business have more standardized interviews and utilize best practices that include blinded interviewers, use of structured questions (situational and/or behavioral anchored questions), and skills testing. Due to residency interview heterogeneity, studies evaluating the interview as a predictor of success have failed to reliably predict who will perform well during residency. Additionally, resident success has many components, such that isolating any one factor, such as the interview, may be problematic and argues for a more holistic approach to resident selection [3]. Nevertheless, there are multiple ways the application review and interview can be standardized to promote transparency and improve resident selection.

Residency programs have begun adopting best practices from business models for interviewing, which include standardized questions, situational and/or behavioral anchored questions, blinded interviewers, and use of the multiple mini-interview (MMI) model. The focus of this review is to take a more in-depth look at practices that have become standard in business and to review the available data on the impact of these practices in resident selection.

Unstructured Versus Structured Interviews

Unstructured interviews are those in which questions are not set in advance and represent a free-flowing discussion that is conversational in nature. The course of an unstructured interview often depends on the candidate’s replies and may offer opportunities to divert away from topics that are important to applicant selection. While unstructured interviews may involve specific questions such as “tell me about a recent book you read” or “tell me about your research,” the questions do not seek to determine specific applicant attributes and may vary significantly between applicants. Due to their free-form nature, unstructured interviews may be prone to biased or illegal questions. Additionally, due to a lack of a specific scoring rubric, unstructured interviews are open to multiple biases in answer interpretation and as such generally show limited validity [4]. For the applicant, unstructured interviews allow more freedom to choose a response, with some studies reporting higher interviewee satisfaction with these questions [5].

In contrast to the unstructured interview, structured interviews use standardized questions that are written prior to an interview, are asked of every candidate, and are scored using an established rubric. Standardized questions may be behaviorally or situationally anchored [5]. Due to their uniformity, standardized interviews have higher interrater reliability and are less prone to biased or illegal questions.

Behavioral questions ask the candidate to discuss a specific response to a prior experience, which can provide insight into how an applicant may behave in the future [5]. Not only does the candidate’s response reflect a possible prediction of future behavior, it can also demonstrate the knowledge, priorities, and values of the candidate [5]. Questions are specifically targeted to reflect qualities the program is searching for (Table 1) [5,6,7].

Table 1 Behavioral questions and character traits [5,6,7]

Situational questions require an applicant to predict how they would act in a hypothetical situation and are intended to reflect a realistic scenario the applicant may encounter during residency; this can provide insight into priorities and values [5]. For example, asking what an applicant would do when receiving sole credit for something they worked on with a colleague can provide insight into the integrity of a candidate [4]. These types of questions can be especially helpful for fellowships, as applicants would already have the clinical experience of residency to draw from [5].

Using standardized questions provides a method to recruit candidates with characteristics that ultimately correlate to resident success and good performance. Indeed, structured interview scores have demonstrated an ability to predict which students perform better with regard to communication skills, patient care, and professionalism in surgical and non-surgical specialties [8•]. In fields such as radiology, non-cognitive abilities that can be evaluated in behavioral questions, such as conscientiousness or confidence, are thought to critically influence success in residency and even influence cognitive performance [1]. This has also been demonstrated in obstetrics and gynecology, where studies have shown that resident clinical performance after 1 year had a positive correlation with the rank list percentile that was generated using a structured interview process [9].

Creating Effective Structured Interviews

To be effective, standardized interview questions should be designed in a methodical manner. The first step in standardizing the interview process is determining which core values predict resident success in a particular program. To that end, educational leaders and faculty within the department should come to a consensus on the main qualities they seek in a resident. From there, questions can be formatted to elicit those traits during the interview process. Some programs have used personality assessment inventories to establish these qualities. Examples include openness to experience, humility, conscientiousness, and honesty. Further program-specific additions can be included, such as potential for success in an urban versus rural environment [10].

Once key attributes have been chosen and questions have been selected, a scoring rubric can be created. The scoring of each question is important as it helps define what makes a high-performing versus low-performing answer. Once a scoring system is determined, interviewers can be trained to review the questions, score applicant responses, and ensure they do not revise the questions during the interview [11]. Questions and the grading rubric should be further scrutinized through mock interviews with current residents, including discussing responses of the mock interviewee and modifying the questions and rubric prior to formal implementation [12]. Interviewer training itself is critical, as adequate training leads to improved interrater agreements [13]. Figure 1 demonstrates the steps to develop a behavioral interview question.

Fig. 1
figure 1

Example of standardized question to evaluate communication with scoring criteria

Rating the responses of the applicants can come with errors that ultimately reduce validity. For example, central tendency error involves interviewers not rating students at the extremes of a scale but rather placing all applicants in the middle; leniency versus severity refers to interviewers who either give all applicants high marks or give everyone low marks; contrast effects involve comparing one applicant to another rather than solely focusing on the rubric for each interviewee. These rating errors reflect the importance of training and providing feedback to interviewers [4].

Blinded Interviewers

Blinding the interviewers to the application prior to meeting with a candidate is intended to eliminate various biases within the interview process (Table 2) [14, 15]. In addition to grades and test scores, aspects of the application that can either introduce or exacerbate bias include photographs, demographics, letters of recommendation, selection to medical honor societies, and even hobbies. Impressions of candidates can be formed prematurely, with the interview then serving to simply confirm (or contradict) those impressions [16•]. Importantly, application blinding may also decrease implicit bias against applicants who identify as underrepresented in medicine [17].

Table 2 Examples of bias [14, 15]

Despite the proven success of these various interview tactics, their use in resident selection remains limited, with only 5% of general surgery programs using standardized interview questions and less than 20% using even a limited amount of blinding (e.g., blinding of photograph) [2]. Some programs have continued to rely on unblinded interviews and prioritize USMLE scores and course grades in ranking [18]. Due to their potential benefits and ability to standardize the interview process, it is critical that programs become familiar with the various interview practices so that they can select the best applicants while minimizing the significant bias in traditional interview formats.

Multiple Mini-interview (MMI)

The use of multiple interviews by multiple interviewers provides an opportunity to ask the applicant more varied questions and also allows for the averaging out of potential interviewer bias leading to more consistent applicant scoring and ability to predict applicant success [7]. Training of the interviewers in interviewing techniques, scoring, and avoiding bias is also likely to decrease scoring variability. Similarly, the use of the same group of interviewers for all candidates should be encouraged in order to limit variance in scoring amongst certain faculty [19].

One interview method that incorporates multiple interviewers and has had growing frequency in medical school interviews as well as residency interviews is the MMI model. This system provides multiple interviews in the form of 6–12 stations, each of which evaluates a non-medical question designed to assess specific non-academic applicant qualities [20]. While the MMI format can intimidate some candidates, others find that it provides an opportunity to demonstrate traits that would not be observed in an unstructured interview, such as multitasking, efficiency, flexibility, interpersonal skills, and ethical decision-making [21]. Furthermore, MMI has been shown to have increased reliability as shown in a study of five California medical schools that showed inter-interviewer consistency was higher for MMIs than traditional interviews which were unstructured and had a 1:1 ratio of interviewer to applicant [22].

The MMI format is also versatile enough to incorporate technical competencies even through a virtual platform. In general surgery interviews, MMI platforms have been designed to test traits such as communication and empathy but also clinical knowledge and surgical aptitude through anatomy questions and surgical skills (knot tying and suturing). Thus, MMIs are not only versatile, but also have an ability to evaluate cognitive traits and practical skills [23].

MMI also has the potential to reduce resident attrition. For example, in evaluating students applying to midwifery programs in Australia, attrition rates and grades were compared for admitted students using academic rank and MMI scores obtained before and after the incorporation of MMIs into their selection program. The authors found that when using MMIs, enrolled students had not only higher grades but significantly lower attrition rates. MMI was better suited to show applicants’ passion and commitment, which then led to similar mindsets of accepted applicants as well as a support network [24]. Furthermore, attrition rates have been found to be higher in female residents in general surgery programs [25]. Perhaps with greater diversity, which is associated with use of standardized interviews, the number of women can increase in surgical specialties and thus reduce attrition rate in this setting as well.

Impact of Interview Best Practices on Bias and Diversity

An imperative of all training programs is to produce a cohort of physicians with broad and diverse experiences representative of the patient populations they treat. To better address diversity within surgical residencies, particularly regarding women and those who are underrepresented in medicine, it is important that interviews be designed to minimize bias against any one portion of the applicant pool. Diverse backgrounds and cultures within a program enhance research, innovation, and collaboration as well as benefit patients [26]. Patients have shown greater satisfaction and reception when they share ethnicity or background with their provider, and underrepresented minorities in medicine often go on to work in underserved communities [27].

All interviewers undoubtedly have elements of implicit bias; Table 2 describes the common subtypes of implicit bias [14]. While it is difficult to eliminate bias in the interview process, unstructured or “traditional” interviews are more likely to risk bias toward candidates than structured interviews. Studies have demonstrated that Hispanic and Black applicants receive scores one quarter of a standard deviation lower than Caucasian applicants [28]. “Like me” bias is just one example of increased subjectivity with unstructured interviews, where interviewers prefer candidates who may look like, speak like, or share personal experiences with the interviewer [29].

Furthermore, unstructured interviews provide opportunities to ask inappropriate or illegal questions, including those that center on religion, child planning, and sexual orientation [30]. Inappropriate questions tend to be disproportionately directed toward certain groups, with women more likely to get questions regarding marital status and to be questioned and interrupted than male counterparts [28, 31].

Structured interviews, conversely, have been shown to decrease bias in the application process. Faculty trained in behavior-based interviews for fellowship applications demonstrated that there were reduced racial biases in candidate evaluations due to scoring rubrics [12]. Furthermore, as structured questions are determined prior to the interview and involve training of interviewers, structured interviews are less prone to illegal and inappropriate questions [32]. Interviewers can ask additional questions such as “could you be more specific?” with the caveat that probing should be minimized and kept consistent between applications. This way the risk of prompting the applicant toward a response is reduced [4].

Implementing Interview Types During the Virtual Interview Process

An added complexity to creating standardized interviews is incorporating a virtual platform. Even prior to the move toward virtual interviews instituted during the COVID-19 pandemic, studies on virtual interviews showed that they provided several advantages over in-person interviews, including decreased cost, reduction in time away from commitments for applicants and staff, and ability to interview at more programs. A significant limitation, for applicants and for programs, is the inability to interact informally, which allows applicants to evaluate the environment of the hospital and the surrounding community [33•]. Following their abrupt implementation in 2020 during the COVID-19 pandemic, virtual interviews have remained in place and likely will remain in place in some form into the future due to their significant benefits in reducing applicant cost and improving interview efficiency. Although these types of interviews are in their relative infancy in the resident selection process, studies have found that standardized questions and scoring rubrics that have been used in person can still be applied to a virtual interview setting without degrading interview quality [34].

The virtual format may also allow for further interview innovation in the form of standardized video interviews. For medical student applicants, the Association of American Medical Colleges (AAMC) has trialed a standardized video interview (SVI) that includes recording of applicant responses, scoring, and subsequent release to the Electronic Residency Application Service (ERAS) application. Though early data in the pilot was promising, the program was not continued after the 2020 cycle due to lack of interest [35]. There is limited evidence supporting the utility of this type of interview in residency training, and one study found that these interviews did not add significant benefit as the scores did not associate with other candidate attributes such as professionalism [32]. Similarly, a separate study found no correlation between standardized video interviews and faculty scores on traits such as communication and professionalism. Granted, there was no standardization in what the faculty asked, and they were not blinded to academic performance of the applicants [36]. While there was an evaluation of six emergency medicine programs that demonstrated a positive linear correlation between the SVI score and the traditional interview score, it was a very low r coefficient; thus the authors concluded that the SVI was not adequate to replace the interview itself [37].

Conclusions: Future Steps in Urology and Beyond

The shift to structured interviews in urology has been slow. Within the last decade, studies consistent with other specialties demonstrated that urology program directors prioritized USMLE scores, reference letters, and away rotations at the program director’s institution as the key factors in choosing applicants [38]. More recently, a survey of urology programs found < 10% blinded the recruitment team at the screening step, with < 20% blinding the recruitment team during the interview itself [39]. In 2020 our program began using structured interview questions and blinded interviewers to all but the personal statement and letters of recommendation. After querying faculty and interviewees, we have found that most interviewers do not miss the additional information, and applicants feel that they are able to have more eye contact with faculty who are not looking down at the application during the interview. Structured behavioral interview questions have allowed us to focus on the key attributes important to our program. With time we hope to see that inclusion of these metrics helps diversify our resident cohort, improve resident satisfaction with the training program, and produce successful future urologists.

Despite the slow transition in urology and other fields, there is a growing body of literature in support of standardized interviews for evaluating key candidate traits that ultimately lead to resident success and reducing bias while increasing diversity. With time, the hope is that programs will continue incorporating these types of interviews in the resident selection process.