Setting and Participants
This was a prospective content validation study to identify and rate the quality of locally developed EPAs that were designed to assess the competence of internal medicine residents at our institution in 2014. Our program consists of 168 resident physicians, 33 rotation directors and ten associate program directors (APDs). All rotation directors were provided with basic education regarding the definition and purpose of EPAs at a noon conference. This one-hour educational session involved an interactive presentation by our local expert in EPAs, who reviewed the literature regarding EPAs and discussed the purpose of creating EPAs for evaluation. This education was reinforced with a subsequent e-mail message. Rotation directors were then instructed to write and submit EPAs that pertained to their subspecialty rotations, resulting in 229 locally developed EPAs (see electronic addendum). We rated the quality of these EPAs as well as previously published EPAs17 with an instrument we developed for the purpose of this study, which was deemed exempt by the Mayo Clinic Institutional Review Board.
Instrument Development
To develop an instrument for rating the quality of EPAs, we created a team of raters separate from the rotation directors who had submitted the EPAs. This team of raters consisted of five APDs (authors JP, CW, DD, KT, and TB) who had previous experience in graduate medical education, resident assessment, scale development and validation.
Identifying Item Content
The team reviewed salient literature on the topic of EPAs,2
,
5
–
9
,
11
,
13
–
15
,
19
,
20 and then met to discuss the qualities of EPAs that would be useful in assessment. After repeated review and discussions, the team reached consensus on the following essential domains of an EPA: 1) Focus, 2) Observable, 3) Clear Intention, 4) Realistic, 5) Articulates Trustworthiness, 6) Generalizable Across Rotations, and 7) Integrates Multiple Competencies. For each domain, every team member proposed three potential items, resulting in 105 candidate items equally distributed across these seven domains. Subsequently, the team reached agreement on three items for each EPA quality domain. Items were structured on a five-point scale (1 = strongly disagree, 5 = strongly agree).
Determining Item Reliability
To pilot test the instrument, all team members rated a convenience sample of ten locally developed EPAs. Intraclass correlation coefficients (ICCs) were calculated to determine inter-rater reliability for each item, with ICCs < 0.4 considered to be poor, 0.4 to 0.75 considered to be fair to good, and > 0.75 considered to be excellent.21 This first round of pilot testing revealed good reliability (ICC range 0.72 to 0.94) for the domains of Focus, Observable, Realistic, Generalizable, and Integrates Multiple Competencies. However, poor reliability (ICC range 0.24 to 0.61) existed for the domains of Clear Intention and Articulates Trustworthiness. Despite additional pilot testing by rating ten different EPAs, the domains of Clear Intention and Articulates Trustworthiness continued to perform poorly, so they were dropped from the instrument, leaving the five domains of Focus, Observable, Realistic, Generalizable, and Integrates Multiple Competencies.
Rating the Quality of EPAs
The final instrument was called the Quality of EPA (QUEPA). Since the items in these five domains had demonstrated excellent inter-rater reliability, the team members were randomly assigned and independently rated equally sized subsets (n = 46) of all locally developed EPAs using the QUEPA tool. Each team member also rated the AAIM End of Training EPAs (EOTEPA) to assess the application of QUEPA to EPAs that were not locally developed. Study data were collected and managed using REDCap electronic data capture tools hosted at Mayo Clinic Rochester.22 REDCap (Research Electronic Data Capture) is a secure, web-based application designed to support data capture for research studies.
Study Variables
Independent variables were primary ACGME competency (patient care, ICS, professionalism, medical knowledge, SBP, and PBLI), practice location (inpatient, outpatient), rotation type (general medicine, medicine subspecialty, non-medicine specialty), and activity type (EPA vs OPA). For primary ACGME competency and activity type, each team member classified every locally developed EPA. Although EPAs reflect multiple competencies, the group assessed which competency they felt was most represented by each EPA. This is a relevant step, as many programs will likely use performance in EPAs to assess resident progress in the sub-competencies, and high quality EPAs will need to be created to inform these decisions. In circumstances where the team did not agree, author JP adjudicated the final decision. The outcome variable that was utilized to calculate associations with the clinical context was the average QUEPA score (scale 1–5).
Data Analysis
Characteristics of rotation directors were described using descriptive statistics including sex, academic rank, time on faculty, and proceduralist versus non-proceduralist specialty. For the purposes of this study, non-proceduralist was defined as general medicine, allergy, endocrinology, hematology, infectious diseases, nephrology, neurology, preventive medicine, and rheumatology; and proceduralist was defined as cardiology, gastroenterology, emergency medicine, and pulmonary critical care. The dimensionality of the final 15-item QUEPA instrument was determined using factor analysis with orthogonal rotation. To account for the clustering of multiple ratings within raters and EPAs, we generated an adjusted correlation matrix using generalized estimating equations. This adjusted correlation matrix was then used to perform confirmatory factor analysis with orthogonal rotation. In addition, for a sensitivity analysis, we performed factor analysis using an unadjusted correlation matrix and within rater and EPA combinations separately. Factors were identified using the minimal proportion criteria. The threshold for item retention was factor loading > 0.4. We then calculated the percentage of shared variance that the extracted factors contributed to the original variables. Internal consistency reliability for items within factors and overall was calculated using Cronbach α, with coefficients > 0.7 considered to be acceptable. ANOVA models with random effects for rotation directors were used to compare overall QUEPA scores for subcategories within ACGME competency, practice location, practice type and activity type. The level for statistical significance was set at α = 0.05. Statistical analyses were conducted using SAS 9.3 (SAS Institute Inc., Cary, North Carolina).