Introduction

An entrustable professional activity is a professional task that postgraduate medical and surgical residents must master during their training; that is, these are tasks and responsibilities that faculty entrust a trainee to perform unsupervised once an adequate level of competence has been achieved [36]. Typically, entrustable professional activities consist of units of tasks that make up a management or evaluation process (such as managing a patient with a hip fracture), which, when put together, form the mass of critical elements that define a profession [38]. To date, entrustable professional activities have been created for the medical specialties of pediatrics [22], internal medicine [10, 12, 27], family medicine [35], anesthesiology [23], and psychiatry [6]. As of June 2014, the Accreditation Council for Graduate Medical Education required reporting on selected milestones and incorporating entrustable professional activities into training programs [31]; however, to date, little research has been carried out on how to best assess resident performance of an entrustable professional activity [19].

A trial competency-based medical education program was initiated at the University of Toronto in 2009, and as of July 2013, all first-year residents (postgraduate year [PGY]-1) have been automatically enrolled in the competency-based medical education program as part of their medical education and training [17]. Using a process of consensus, our faculty created a list of the top 10 entrustable professional activities for the program (Table 1) from a previously established list of competencies [39]. The ability of orthopaedic residents to perform these top 10 entrustable professional activities unsupervised before graduation from our residency program was thought to be critical.

Table 1 Top 10 orthopaedic EPAs for PG residency program*

Options to assess a resident’s ability to perform an entrustable professional activity independently include practice-based assessment and assessment in the simulated setting. In orthopaedics, objective structured clinical examinations have been used to determine the ability of residents to manage clinical problems [15] and to assess their communication and management skills [14]. The ability to perform technical procedures will always be best assessed in the operating room but it can be difficult to standardize surgical procedures [33], manage time restraints [16], and ensure optimal patient safety and clinical outcomes [26]. For these reasons, simulation increasingly is being used to provide opportunities for residents to perform procedures independently, demonstrate knowledge and skills deficits, and commit errors before actually performing surgery on patients in the operating room [2, 7, 8, 13, 21, 26]. At this time, the best method of assessing a resident’s ability to perform an entrustable professional activity independently is unknown. Many entrustable professional activities are longitudinal and describe care that takes place over time [12]. In the clinical setting, components of many entrustable professional activities would require evaluation at separate times and in different settings, making objective assessment difficult. For example, a resident might assess a patient for TKA preoperatively and perform the technical procedure on a different patient later that week. The use of simulation to assess an entrustable professional activity would help overcome these difficulties, allowing faculty to determine a resident’s ability to perform tasks in the clinical setting with the appropriate level of supervision.

We therefore asked: (1) Is simulation-based assessment of resident performance of entrustable professional activities reliable? (2) Is there evidence of important differences between senior and junior residents when performing simulated entrustable professional activities?

Materials and Methods

For our study, from the list of top 10 entrustable professional activities, three entrustable professional activities were selected: management of the patient for TKA; management of the patient with an intertrochanteric hip fracture; and management of the patient with an ankle fracture. The three entrustable professional activities were selected because each is an important component of the first phase of competency-based training at our institution, typically completed within the first year of residency. Furthermore, each of these activities was listed as a key physician competency in the Orthopaedic Milestones Project published by the Accreditation Council for Graduate Medical Education [1]. Approval for this study was obtained from the institutional research ethics board.

Each assessment of entrustable professional activity was 40 minutes long, divided into three parts, each performed at a different station: preoperative management (10 minutes), performance of technical procedure (20 minutes), and postoperative management (10 minutes). The preoperative and postoperative stations followed a previously described and validated Objective Structure Clinical Examination format [15], focusing on the skills of patient history-taking, physical examination, image interpretation (images displayed on a computer screen), surgical decision-making, obtaining patient consent, and management of patient risk factors (Table 2). History-taking and physical examination were performed on a standardized patient (a trained actor from our standardized patient program) in the TKA activity. No consent from them is necessary, because they are paid to participate. The postoperative management stations involved care of the patient after surgery and management of complications.

Table 2 Three EPAs showing breakdown of individual components assessed

The three technical procedure stations were performed using sawbones models (Fig. 1). For TKA, residents were asked to perform the distal femoral cut and AP cuts using standard equipment (an industry representative familiar with the equipment was present to guide the specific use of instrumentation). For the intertrochanteric station, a femoral sawbones without a fracture was placed inside a soft tissue cover on a radiolucent table. Residents were instructed to place a sliding hip screw into the femoral head under an image intensifier. For the ankle fracture station, an ankle sawbones in soft tissue was used; residents exposed the fibular fracture, allowing faculty to create a Weber B fracture. Residents were asked to reduce the fracture with a lag screw and obtain stable fixation with a plate.

Fig. 1A–C
figure 1

Technical procedures performed as a component of each entrustable professional activity are shown: (A) ORIF of oblique fibular fracture; (B) insertion of dynamic hip screw under image guidance; (C) performance of distal femoral cut and AP cuts for TKA. ORIF = open reduction and internal fixation.

For each of the pre- and postoperative management stations, a checklist was created using a modified Delphi technique with multiple surveys [25]. In this manner, a group of content experts (DO-H, WK, MN, PF, JSS, JT, TD, VW) reviewed initial checklists and were asked to add items or alter the wording as required. After this, consensus was achieved whereby reviewers accepted, rejected, or questioned each item; this process occurred until consensus was achieved. Items receiving over 95% consensus were accepted. The checklists were provided to guide expectations at each station.

Examiners also rated the residents using an overall global rating scale based on the Dreyfus model of skill acquisition (novice, advanced beginner, competent, proficient, expert) [4, 9]. Examiners were instructed to deem a resident as competent if the resident performed to the level of a qualified orthopaedic surgeon and able to perform this procedure independently without supervision. During the technical procedure, residents were evaluated using a task-specific checklist (also created using a modified Delphi technique) and a previously validated global rating scale designed for use in objective structured assessment of technical skills (OSATS) [32, 33]. Examiners were also asked to provide written comments with regard to the performance of each of the technical procedures.

Study participants included nine of 12 available PGY-1 residents who were at the end of their first year of orthopaedic training. Each of the PGY-1 residents was enrolled in the competency-based medical education program and had been deemed competent in the modules of basic arthroplasty, hip and basic fractures, emergency fractures, and management of medical comorbidities in the surgical patient. As a comparison group, nine of 12 available PGY-4 residents were invited to participate; only one of the PGY-4 was enrolled in the competency-based medical education program. All senior residents had undertaken arthroplasty rotations and had recently undertaken a 6-month trauma rotation.

All participants underwent the three entrustable professional activities assessments on the same day. Staff surgeons and fellows served as the examiners (DO-H, WK, MN, JSS, JT); the same examiner marked each station to maximize consistency. Two examiners marked all three TKA stations for each resident, two examiners marked all of the ankle fracture stations, one examiner marked the hip fracture technical procedure, and one examiner marked the hip fracture pre- and postoperative management stations. During performance of technical procedures, examiners provided assistance as requested but were instructed not to provide feedback.

Examiners were asked to disregard the year of training of the resident, if known, when performing the assessment. To help answer our first question about reliability, six of nine stations (all three ankle stations, the hip fracture technical procedure station, and the ankle fracture preoperative management and technical procedures) were videotaped and reviewed by a blinded observer (MP), allowing interrater reliability to be calculated. To answer our second research about differences between PGY-1 and PGY-4 residents, the mean global rating score on each of the three stations of each entrustable professional activity was compared between the two groups. Correlation was also sought between the checklist and the global rating for each station.

Statistical Analysis

All data (checklists, global ratings) were deidentified, entered into an Excel spreadsheet (Microsoft Inc, Redmond, WA, USA), and analyzed with the use of SPSS (Version 21; IBM Corp, Armonk, NY, USA). Reliability was calculated using Cronbach α for the overall global rating scale of the examination (sum of the global ratings for all nine stations converted to a percentage) and for the overall global rating scale for each entrustable professional activity (sum of the global ratings for the three stations of each entrustable professional activity). Individual station reliability was calculated with the use of the Cronbach α “if item deleted,” whereby the overall reliability was recalculated after removing each station. If removing any station increased the α, it implied that the station was performing poorly. The correlation between the checklist scores and the global rating scale for each station was assessed with the Pearson product moment correlation. A paired t-test was used for analysis of the difference between the two groups of residents. Interrater reliability was calculated for each examiner (MN, JSS, DO-H, WK, JT) and the blinded assessor (MP) using an intraclass coefficient. The number of participants was set (nine in each group); therefore, a power analysis was performed using a t-test with an α value of 0.05, an effect size of 0.5 on the 5-point global rating scale, and a sample size of 18; the power was 0.26.

Results

Reliability of Simulation-based Assessment of Entrustable Professional Activities

Using performance on the three stations of each entrustable professional activity, internal consistency was 0.84 for the TKA activity, 0.88 for the hip fracture activity, and 0.89 for the ankle fracture activity. The Cronbach α “if item deleted” decreased for every station, demonstrating that each station was performing well (Table 3). A good to high correlation was seen for all nine stations between the checklist scores and the overall global rating scale, suggesting that examiners were using the checklists appropriately (0.72; range, 0.65–0.8; p = 0.01). All videotaped entrustable professional activities showed strong interrater agreement with a mean intraclass correlation coefficient of 0.87 (0.07; p < 0.001). Pre- and postoperative management stations of the hip fracture activity as well as the postoperative management station of the TKA were not recorded as a result of feasibility issues.

Table 3 Cronbach α ‘if item deleted’ for each station*

Differences Between Senior and Junior Residents in Simulated Entrustable Professional Activities

For the hip fracture EPA, the PGY-4 group had a higher mean global rating scale than the PGY-1 group for preoperative management (3.56 [0.5] versus 2.33 [0.5], p < 0.001), postoperative management (3.67 [0.5] versus 2.22 [0.7], p < 0.001), and technical procedures (3.11 [0.3] versus 3.67 [0.5], p = 0.015; Table 4). For the TKA activity, the PGY-4 group scored higher for postoperative management (3.5 [0.8] versus 2.67 [0.5], p = 0.016) and technical procedures (3.22 [0.9] versus 2.22 [0.9], p = 0.04) than the PGY-1 group, but there was no difference for preoperative management with the numbers available (PGY-4, 3.44 [0.7] versus PGY-1 2.89 [0.8], p = 0.14). For the ankle fracture activity, the PGY-4 group scored higher for postoperative management (3.22 [0.8] versus 2.33 [0.7], p = 0.18) and technical procedures (3.22 [1.2] versus 2.0 [0.7], p = 0.018) than the PGY-1 groups, but there was no difference for preoperative management with the numbers available (PGY-4, 3.22 [0.8] versus PGY-1, 2.78 [0.7], p = 0.23). . In general, the majority of PGY-4 residents were able to achieve a level of competency or better in each of the stations; a higher number of PGY-1 residents were not able to, especially in the technical procedure stations (Table 5).

Table 4 Mean global rating scale (SD) for each of the EPA station components
Table 5 The number of residents deemed competent or better at each station of each of the entrustable professional activities

Discussion

As postgraduate medical education slowly moves toward competency-based medical education, the need to provide objective assessments of competence will continue to be an issue. Creating entrustable professional activities allows faculty to identify and select the most important, representative, and critical tasks that should be mastered [30]. Scheele et al. [34] recommended focusing on those tasks critically important in daily practice or that address high-risk or error-prone activities; certainly the list of competencies listed by the Accreditation Council for Graduate Medical Education include care of patients with ankle fractures, knee osteoarthritis, and hip fractures [1]. In such settings, entrustable professional activities may then be used to define five levels of responsibility: observe the activity, act under direct supervision, act under indirect supervision (available within minutes) on call, act unsupervised, and ability to supervise others [3638]. The ability to achieve level 4 (acting independently) in predetermined entrustable professional activities is a critical component of competency-based medical education [37]. The results of our study demonstrated that simulated activities may be used to determine which residents can perform tasks competently in the simulated setting, allowing these procedures to be performed by residents in the clinical setting under supervision. Most importantly, simulated activities allow for identification of those residents who require further training or remediation to achieve a minimal level of competency. In this way, the simulation of entrustable professional activities can be used effectively to supplement workplace-based assessment of residents.

Our study had a number of limitations. Although we believe that use of simulated patient encounters and simulated technical procedures is valuable, these assessments should be complemented by practice-based assessment. Second, although a simulated assessment of entrustable professional activities was shown to be reliable, validity evidence was limited to the finding that senior residents were able to perform the activities to a higher level than junior residents. Further research is required to demonstrate that the simulated performance of an entrustable professional activity correlates with actual performance in the clinical setting. Third, only components of each technical procedure were performed as a result of time limitations rather than the entire procedure. For example, it is possible that a resident who was able to perform the femoral cuts of a TKA competently may have had difficulty with the tibial resection. Another important limitation in this study was the potential for rater error or bias, because some of the examiners would have been aware of the year of training of residents. To examine for bias, videotaping was performed on six of the nine stations, and a strong correlation was seen between the examiner ratings and the assessments of the blinded reviewer on these stations. However, not every station was videotaped, so it is not possible to exclude the effect of bias (whereby examiners might be overly stringent on junior residents and overly lenient on senior residents or vice versa) on the findings of differences on the postoperative management of the hip and ankle activities. The study was also rather severely underpowered; for this reason, although we were not able to identify differences between senior and junior residents in the preoperative management of TKA and ankle fracture activity, we cannot exclude the possibility that there was a difference. Finally, simulation was carried out using sawbones models rather than cadavers. The advantages of dry models are many, including relative ease of preparation and reduced cost compared with use of cadavers [7]; the majority of comparative studies has also demonstrated that, overall, low-fidelity simulators are similarly effective but less expensive than high-fidelity simulators with regard to the acquisition of surgical skills [11, 18, 28, 29].

Results of our study demonstrated a high reliability for the entrustable professional activity examination overall for each individual component of the examination and for each station as well as strong interrater agreement. However, we have minimal evidence of concurrent validity in this setting or correlation with clinical performance. Interestingly, each of the junior residents had previously been deemed competent in basic arthroplasty and trauma; however, the majority of residents were not assessed specifically on technical procedures such as insertion of a dynamic hip screw or fixation of an ankle fracture in the operating room, and those performing TKA always did so in the presence of staff providing assistance and feedback, necessary to ensure patient safety and maximize clinical outcomes. Few studies have evaluated the assessment of entrustable professional activities in postgraduate residency training, either simulated or in the clinical setting. Hauer et al. [19] conducted two pilot entrustable professional activity-based assessments (inpatient discharge and family meeting) in the clinical setting, testing them on PGY-1 residents in internal medicine. Both the residents and faculty felt the assessments improved skills and facilitated useful feedback. Alyward et al. [3] developed a patient handoff entrustable professional activity for interns in internal medicine and pediatrics, identified as a critical skill for residents. Under direct observation, the interns were judged on using the five levels of entrustment as described by ten Cate and Scheele [38] with the majority of residents judged as being able to perform under direct or indirect supervision. The results of our study showed consistent evidence that senior residents were able to perform most components of each entrustable professional activity at a higher level than junior residents. Although this might be expected, within the competency-based medical education format, all junior residents had been previously been deemed competent at each of these activities. Whether there is an issue with skill retention or with the assessment methods used after their previous rotations is unknown. However, we were able to demonstrate deficiencies in both junior and senior residents in various aspects of each simulated activity. Certainly, these findings are consistent with the Orthopaedic Milestones Project, which lists milestone levels that residents will attain as they progress through training [1]. For example, in hip fracture patient care, junior residents are expected to move from an ability to take a focused history and perform a focused examination (level 2), to being able to make a comprehensive assessment of fracture patterns and capable of performing surgical repair (level 3) and to being capable of treating postoperative complications such as infection (level 4). Using these competencies as a curriculum guide, the identification of any technical and nontechnical deficits in an objective setting allow for remediation and reassessment to the advantage of the resident, the program, and future medical practice. Practice-based assessment will always be a critical component, but it is likely desirable that a resident demonstrate an ability to perform a task at a competent level, without supervision, in a simulated environment before working in the operating room or in a clinical setting.

Crucial in medical training, supervision must gradually decrease to build self-confidence and trustworthiness [20]. For supervisors to make valid entrustment decisions, however, sufficient acquaintance of preceptors with trainees, a concept known as “time to trust,” is critical [20]. Kennedy et al. [24] identified that faculty grant residents independence based on the resident’s knowledge and skill as well as their insight into limitations. Clearly, long rotations are often required to build sufficient relationships to determine a trainee’s strengths and limitations [5]; brief and fragmented faculty−resident contact is often not an ideal way to draw valid, reliable conclusions [22]; however, such close contact between a single resident and faculty can be limited, making a simulated entrustable professional activity valuable in determining a resident’s abilities.

The results of our study show that simulated entrustable professional activities may be used to determine the ability of a resident to perform professional tasks that are critical components of medical training. In this manner, educators can ensure competent performance of these skills in the simulated setting, before actual practice with patients in the clinical setting. Future research needs to demonstrate a correlation between competent performance in the simulated setting and performance in the workplace.