What this paper adds

  • Medical schools commonly use personal interviews in admissions; yet psychometric studies show them to have low reliability and little predictive validity. One solution is multiple independent sampling with the Multiple Mini-Interview (MMI) being the exemplar. However, MMIs have substantial resourcing demands and potential negative recruitment impacts. The Modified Personal Interview (MPI) seeks to address these two issues with fewer interview stations [4] but a longer interview duration. In this paper we show its measurement utility for general MD admissions. The MPI can be an alternative for medical schools seeking to enhance measurement properties of interviews within resource and recruitment constraints.

Introduction

Admission to medical school is predicated upon applicants possessing academic ability alongside a range of key personal competencies [1]. Admission personal interviews are commonly used to rate personal competencies. However, evaluation of the admission personal interview demonstrated low inter-rater reliability and prediction of performance. Albanese et al. [2] reported that panels with multiple interviewers jointly interviewing applicants appeared to have minimal impact upon inter-rater reliability. An Association of American Medical Colleges (AAMC) 2011 Analysis in Brief reported that the admission personal interview has been slow to change with it still being the default interview format [3]. Koenig et al. reported that schools used interview formats ranging from unstructured to structured with the most commonly used format being semi-structured [1]. This publication [1] noted interviewing formats warranting attention, such as the Multiple Mini-Interview (MMI) [4]. Despite low measurement properties, admission personal interviews provided institutional opportunities to present a ‘human touch’ reflective of the importance ascribed to person-to-person interactions and personal attributes in the selection of future physicians within an otherwise stressful application process [2]. Moreover, a recent AAMC Last Page publication identified admission interviews as important to applicants’ decision-making regarding medical school selection [5]. Substantial recruitment value was ascribed to interviews enabling interaction with senior medical students, faculty and administrative staff.

One solution to optimize interview reliability in admission interviews is adoption of multiple independent sampling (MIS). Instead of one evaluation of an applicant, several, independent evaluations of the same applicant are averaged to generate an applicant’s evaluation [6]. However, uptake by American medical schools of the MMI, a MIS-based interview format, has been limited due to 1) its substantial resource requirements and 2) associated modifications of campus recruitment activities and diminution of recruitment advantages associated with admission personal interviews [6]. Of note, successful applicant recruitment contributes to student selectivity within a medical school’s admission processes and its associated prestige ranking [7], with prestige being a vaunted element of today’s medical school environment [710]. Furthermore, recruitment advantage for admission personal interviews above the MMI has also been reported for Emergency Residency selection decision-making [11]. To address MIS and the combined issues of reliability, recruitment and resourcing within admission decision-making, Axelson and Kreiter investigated adapting MIS to admission personal interviewing. For their study, they conducted secondary data analysis of multiple admission interview cohorts; predicting reliability of 0.48 for three brief (one independent interviewer) personal interviews to 0.73 for nine brief (one independent interviewer) personal interviews. They recommended adapting admission personal interviewing via MIS as an ‘intermediate approach’ for schools wishing to address this combination of issues yet unable or unwilling to adopt MIS as utilized in the MMI, given its resourcing and/or recruitment impacts [6].

Inspired by the ‘intermediate approach’ model of Axelson and Kreiter [6] we developed the Modified Personal Interview (MPI) employing MIS for small-scale specialized medical student selection tasks (leadership [10] and MD PhD [12]). As a medical school intent upon bringing MIS to large-scale admission interviewing but unwilling to implement the MMI due to recruitment and resourcing issues we adapted our MPI small-scale model to our large-scale admission context. (Please see Appendix A and Fig. 1 for a description of this large-scale MPI adaptation). Cognizant of the aforementioned combination of admission issues, we evaluated outcomes relevant to each issue through our MPI implementation large-scale process: MPI reliability, MPI predictive validity plus MPI admission recruitment and resourcing outcomes. We hypothesized 1) MPI reliability to be intermediate between 0.48 to 0.73 as predicted by Axelson and Kreiter [6], 2) positive MPI predictive validity for in-programme skills assessments such as a first year objective structured clinical examination (OSCE) and 3) maintenance of recruitment outcomes and resource efficiency.

Fig. 1
figure 1

Modified Personal Interview (MPI)

Methods

The study’s methods were selected to evaluate the MPI against multiple admission outcomes; 1) psychometric evaluation of MPI reliability and predictive validity and 2) institutional recruitment and resourcing outcomes associated with MPI implementation.

MPI reliability

Analyses

We examined the reliability of 2013/2014 implementation administration of the MPI. Analysis of reliability used generalizability theory via G‑string IV (perd.mcmaster.ca). Generalizability theory allows modelling of multiple sources of variance and the impact of changing the number of measurements via decision studies (D-studies). Our generalizability model examined the error due to stations and items nested in each station. The purpose was to evaluate inter-interview reliability across four interviews and to determine the amount of error due to rater, station type, and items evaluated within each interview.

To evaluate whether interview and interviewer type were influencing scoring, we conducted two analyses. Firstly, we conducted item-total correlations for all of the station types as well as compared differences in scoring using a factorial repeated measures ANOVA with station type as the repeated measures variable and circuit as the between-subjects variable. Secondly we conducted a repeated measures ANOVA between rater types (medical students, residents, and faculty) who rated the same applicants in each circuit.

MPI predictive validity

Preliminary validity evaluation

We examined the predictive validity of the 2013/2014 admitted cohort against performance in first year medical school courses. We evaluated predictive validity by regressing MPI total score against a five-station OSCE for first year medical students within the Arts in Science of Clinical Medicine year 1 course (ASCM year 1) and against year 1 in-programme didactic courses. The multiple linear regression model examined the prediction of MPI total score after controlling for undergraduate grade point average (GPA), medical college admission test (MCAT) sub-scores (Verbal Reasoning, Physical Sciences, and Biological Sciences), File Review total score, and Age. The OSCE consists of five examiners evaluating applicants across clinical scenarios intended to test physical examination skill and communication skills in the form of history-taking. While communication skills were scored across all stations, two stations examined history-taking specifically while the other stations focused on basic physical examination and rudimentary clinical judgment. Stations were scored using global ratings and we constructed three criterion measures: 1) average global rating score across all stations, 2) average global rating on the two history-taking stations, and 3) average global rating on the physical examination stations. We corrected correlations for range restriction [13] but not for reliability since the number of stations in the OSCE and MPI are fixed.

Institutional recruitment and resourcing assessment outcomes associated with MPI implementation

To assess institutional impact of MPI implementation upon recruitment and resourcing, we assessed relevant outcomes across the pre-MPI implementation (2012/2013) and MPI implementation (2013/2014) admission cycles. Prior to MPI implementation, we used a traditional admission interview format. This format employed ‘open-file interviewing’ with two (faculty member and second year medical student) interviewers reviewing applicants’ admission files pre-interview for background information and generation of interview questions. Pre-interview file review is a common medical school practice [14]. The MPI innovation implemented in 2013/2014 is described in Appendix A and Fig. 1. We compared recruitment acceptance yields across these admission cycles. Admission recruitment acceptance yield was calculated; recruitment acceptance yield = total applicant admission acceptances/total applicant admission offers. We compared admitted medical student cohorts’ gender distribution, GPA and MCAT scores and interviewer cadres’ size and demographics across these admission cycles.

MPI implementation was deemed programme evaluation with results secondary to quality assurance activities and exempt from research ethics review. This determination was made by our local University of Toronto Health Sciences Research Ethics Board. ASCM year 1 OSCE validity evaluation received approval via the University of Toronto Research Ethics Board.

Results

MPI reliability results

We report variance due to factors that could contribute to error including: applicants, interview day, circuit, station, and item (Table 1). The variance component indicates the amount of error in measure due to each factor and interactions between the factors. For perfect reliability, all of the variance (i. e. 100 % of variation in scores) would be attributable to true differences between applicants. For implementation administration, MPI inter-interview reliability was 0.56, i. e. the average score across all four stations had a reliability of 0.56 indicating moderate reliability. The reliability of a single station was 0.12. The reliability of the Likert scale items within a single station was 0.92.

Table 1 Variance components for full implementation generalizability analysis

The majority of variance in scores was driven by the interaction between applicants x interview station (51 %) suggesting variation in scoring of applicants across different stations contributed most of the error in measurement. The omnibus interaction term (i. e. the variance due to random interaction between circuit, station, items, and applicants) was 18 % and true differences between applicants on its own was 17 %. The remainder of variance was attributed to small interactions with circuit and day of interview. The D‑study predicted eight stations to reach an inter-interview reliability of 0.7 or higher.

Item-total analyses showed all MPIs had similar item-total correlations (0.2–0.3). Similarly, the repeated measures ANOVA failed to detect a difference between station types in overall score (F(3,1680) = 0.811, p < 0.488). However, we also failed to detect a significant difference between medical student and resident raters compared with faculty when rating the same applicants (F(2243) = 0.126).

Correlations between total MPI score and admission file was 0.25 (p < 0.05) suggesting a moderate association; correction for restriction of range increased this to 0.34. The MPI self-reflection interview had marginally higher correlation (0.175) to the file review compared with other MPI interviews (0.15 to 0.16). The correlations with GPA and MCAT total score were not significant (r = −0.013, p < 0.25 for GPA and r = 0.10, p < 0.12).

MPI predictive validity results

Total MPI scores had no significant correlations with grades on biomedical and basic science courses; correlations ranged between −0.1 and 0.1. However, MPI scores significantly predicted ASCM OSCE year 1 average global rating across all stations at 0.19 (p < 0.001), and average performance on history-taking stations at 0.20 (p < 0.001). Performance on physical examination stations was correlated at 0.16 (p < 0.05). While these correlations were low in absolute magnitude, correcting for range restriction showed the correlation with performance across all stations to be 0.30 and on history-taking stations at 0.33 with physical performance correlated at 0.24. Linear regression showed that the control covariate (GPA, MCAT sub-scores, age, file review total score) together had a R2 of 0.051 (i. e. predicted 5.1 % of variance in ASCM). Interestingly, t‑test of the coefficients showed that the pattern of prediction was driven by negative prediction of OSCE performance by age and performance on the Physical Sciences (i. e. Physics, Chemistry) sub-score of the MCAT. File review total score did not predict OSCE score. Addition of MPI total score increased the R2 of the whole model to 0.095, an R2 change of 0.043 which was significant (F(1248) = 11.86, p < 0.001). Comparison of the standardized coefficients showed the MPI total score to have the largest effect among the predictors. The complete regression model was significant (F(7247) = 3.69, p < 0.001). Coefficients and confidence intervals for each model are provided in Table 2 (MPI total score regressed against ASCM OSCE).

Table 2 Modified personal interview total score regressed against ASCM OSCE

Institutional recruitment and resourcing results associated with MPI implementation

Applicant acceptance recruitment yield was maintained with MPI implementation, as were prestige markers reflected in admitted applicants’ GPA and MCAT scores. Rater resourcing for implementation of the new MPI format proved to be efficient with substantially fewer total interviewers (n = 130) required for assessment of slightly more (n = 13) applicants (Table 3).

Table 3 Impact of modified personal interview (MPI) on the MD admissions

Discussion

The purpose of this study was to evaluate the combined admission issues of reliability, recruitment and resourcing for a modified personal interview that incorporated multiple-independent sampling while balancing resourcing and recruitment requirements. Additionally, we explored other psychometric properties of this admission interview format such as predictive validity.

Our MPI implementation measured reliability of 0.56 fell within the reliability range per number of interviews as predicted by Axelson and Kreiter [6] but is less than the traditionally accepted standards of 0.7 to 0.9 [13]. As noted with our D study, an MPI with eight interviews would attain a reliability of 0.7 or higher. An eight station MMI is not always feasible for medical schools and in some cases may not be desirable as schools may wish to spend the interview day hosting other recruitment activities for applicants. Furthermore, while high reliability is always a desirable goal, educational assessment has progressed beyond simply relying on measurement as the marker of assessment success [15]. The capability of educational assessments to incorporate institutional goals – in our case recruitment and resourcing – is important. For our context, the MPI implementation was associated with maintenance of our school’s acceptance yield and associated markers of high academic ability in our cohort such as MCAT and GPA scores. Just as important, it was a resource efficient method of interviewing students.

Furthermore, our study advances Axelson and Kreiter’s work [6] with our preliminary validity evidence of MPI score correlation with the ASCM year 1 OSCE. Prediction of performance is a significant marker of validity evidence since selection processes aim to choose the candidates most likely to succeed in medical school. The MPI shows early evidence of predicting aligned outcomes – communication skills performance – via an OSCE. Communication skills are best evaluated in face-to-face encounters and thus it is no surprise that MPI interview performance should predict performance in a similar in-programme simulated communication scenario. While the magnitude of the correlations is small, the score should be contextualized considering that the ASCM OSCE is a low stakes, short examination used for generating feedback and identifying skills deficits. Furthermore, the MPI was the significant positive predictor in the admission process. Like the MMI [16] it could be expected that the magnitude of MPI prediction could increase over time as more robust and reliable OSCE examinations are presented to our medical students. Validation is a continuing process that relies on the sufficiency of evidence to substantiate claims. Further work must clearly be done to build a chain evidence for the MPI’s validation.

Moreover, our MPI used caring and conscientiousness as the global competency creating an opportunity to help construct the importance of caring, compassion and conscientiousness as an institutional value for both faculty and learners. Thus, in addition to creating a more ‘human touch’ [2] for applicants, putting caring and conscientiousness as a focus of the MPI necessitated thoughtful consideration of the constructs by raters in every interview. We were able to leverage MPI rater training as a faculty, current medical student and resident educational development opportunity for sharing this instructional value through online and face-to-face sessions. These activities enriched the institutional value of the explicit attention afforded the institutional goals pertaining to recruitment and resourcing.

Our admission file assessment scores were not highly correlated with our MPI scores. This is likely a restriction of range issue but further evaluation of whether file assessment is a good screen for interview or if it in fact provides other information is worth investigating in the future. This finding presents our school with new admission resourcing and recruitment questions. Admission file review scores as a stand-alone method do not predict performance on the ASCM OSCE or indeed other assessments within our programme. However, ‘open-file interviewing’ via review of admission file materials within our self-reflection MPI station had similar measurement characteristics with other MPI stations. The resourcing and recruitment questions to now be addressed involve the heavy resource demand of the admission file review phase in the face of poor prediction capability. However, use of the applicant file in the interview has recruitment value as it is highly valued by our applicants. A next step is consideration of further use of ‘open-file interviewing’. For example, expansion of the MPI circuit to five interview stations to include a second MPI using ‘open-file interviewing’ warrants consideration.

Our focus upon institutional resourcing and recruitment factors ignored applicant resourcing and recruitment factors. Recruitment strategies integrated within interviews and interview day activities obviously requires applicants’ attendance and travel costs for many applicants. Attention to this matter is essential as medical education costs and medical student debt continues to rise. A next step will be in-depth qualitative investigation of the differential recruitment impact of the MPI and other recruitment strategies, including both institutional and applicant components. Investigation of recruitment strategies is essential as the field explores internet-based MMIs [17]. Internet-based interviewing may eliminate much of the need for applicant travel to attend campus-based admission interviews. To keep pace with technological innovation, comparative investigation of campus-based admission interviewing versus internet-based admission interviewing and their associated resourcing needs and recruitment strategies is necessary.

Our assessment of MPI implementation has limitations. We were interested in assessing the combination of admission issues of psychometric rigour (reliability and predictive validity), recruitment and resourcing with MPI implementation by comparing these issues and associated outcomes across pre-MPI and MPI admission cycles. We were able to do so with respect to recruitment and resourcing. However, we solely assessed reliability of administration of our implementation MPI and were unable to assess reliability of our pre-MPI use of a traditional admission interview format due to each applicant being interviewed by only a single interview station.

Our MPI seven-point Likert rating scale was selected to remain consistent with our large cadre of interviewers’ experiences in other local medical school in-programme assessment contexts. Selection of a broader Likert rating scale may have attained greater MPI reliability than the currently measured 0.56.

In addition, our resourcing measure related to the size of admission interviewer cadre alone, we did not include other resource components such as space and administrative staff requirements, and question development time as outlined by Rosenfeld et al. [18].

We believe this report provides evidence to advance uptake of Axelson and Kreiter’s ‘intermediate approach’ model for schools considering adoption of MIS-based interviewing in the face of questions regarding resourcing and/or recruitment [6]. Future study will explore psychometric factors such as longer-term MPI predictive validity. We will also investigate the differential value of various resourcing and recruitment strategies in preparation for a future without onsite medical school admission interviewing.