Introduction

Increased constraints to residents’ operative experiences and the surge of techniques to master have led to increased skills development outside of the operating room (OR) [1,2,3,4]. This is largely due to residents performing fewer open cases, resulting in a lack of confidence in performing these cases at graduation [5,6,7,8]. To address this gap, surgical programs have studied the value of skills acquisition programs and the feasibility of implementing an open skills curriculum as a whole and its utility in early surgical trainees [9,10,11,12,13]. Additionally, research has shown benefits of various individual components of simulation labs including dry labs [14,15,16], at-home skills practice [17], virtual reality [18], cadaveric labs [19], and home-video assessments [20].

However, effective practice in any setting must be accompanied by feedback [21,22,23]. It has quickly become apparent that the increase in modalities has outstripped the limited resources (e.g. faculty time) needed to provide feedback [24]. For this reason, peer feedback, crowd-sourcing, and artificial intelligence have been examined as potential solutions [20, 25, 26]. Feedback with scoring rubrics has been shown to allow surgical peers and even nonmedical professionals to recapitulate the gold standard “expert” (faculty) feedback [20, 26].

Less well studied in the setting of video-based learning is the narrative feedback component—the typical mechanism of feedback provision in the OR and in real-time clinical scenarios. van de Ridder et al. defined valuable feedback as “specific information about the comparison between a trainee’s observed performance and a standard, given with the intent to improve the trainee’s performance” [27]. In non-video based learning settings, schemas have examined the quality of narrative feedback by valence (positive/reinforcing vs. negative/corrective feedback), specificity (specific vs. nonspecific feedback), and relevance (relevant vs. irrelevant feedback) [28,29,30]. While “specific” and “relevant” feedback are generally considered high quality, authors also emphasize the importance of incorporating both “corrective” and “reinforcing” feedback valences [28, 29, 31,32,33].

As residency programs continue to refine or adopt open skills curricula to adapt to the changing landscape of surgery, limited resources require introspection for how each individual component within a single curriculum impacts skills acquisition for learners. The optimal strategy for curricular construction and feedback implementation, both critical for skill acquisition, in the setting of simulation curricula is unknown. Therefore, we aim to elucidate the individual impact of each curricular and feedback component within our curriculum on trainee assessment performance and identify areas of focus.

Methods

Study design and population

We conducted this retrospective cohort study in two phases. In the first phase, we analyzed the association between the surgical skills curriculum components and assessment performance. Components of the surgical skills curriculum included skills lab attendance, homework completion, homework scores, and receipt of feedback on homework. Assessment performance was measured by scores from an in-person assessment that occurred at the conclusion of the curriculum. In the second phase, we reviewed and assessed the quality of narrative feedback provided on homework assignments and the association of feedback quality with assessment score.

We reviewed data from two consecutive cohorts of surgical interns at our single academic institution. Interns represented general surgery, urology, otolaryngology—head and neck surgery, neurosurgery, orthopedics, and ophthalmology (Table 1). Interns from all specialties participated in the same curriculum. Our Institutional Review Board determined this study to be exempt from review (IRB22-37166).

Table 1 Demographic data of participating interns

Surgical skills curriculum

The surgical skills curriculum consisted of four components: (1) in-person skills lab, (2) submission of video-recorded homework, (3) receipt of scores on homework and (4) receipt of narrative feedback on homework. This culminated in a final assessment.

  1. 1.

    The skills lab consisted of five, 2-h skills lab sessions. Sessions occurred once every 2 weeks and focused on knot tying (two sessions), basic suturing (two sessions), and review (one session). Faculty members and senior residents facilitated sessions by introducing material, providing feedback, and debriefing lessons. We did not assess performance during these sessions; we recorded attendance.

  2. 2.

    Homework consisted of basic open skills that interns self-recorded using a cellphone and tripod. These were then uploaded to the Practice platform with two assignments every 2 weeks (Practice XYZ, Inc.). The first intern cohort (2021–2022) was assigned 12 homework assignments and the second intern cohort (2022–2023) was assigned 14 assignments, including four baseline assignments each year completed before the start of the curriculum. This curriculum is based on a previously published home video curriculum for basic surgical skills and consisted of basic knot tying (e.g. two-handed square knots, one-handed half-hitch (slip) knots, tying under tension) and suturing skills (e.g. running subcuticular sutures, vertical mattress sutures, simple running push pull) [34].

  3. 3.

    Homework assessors (peers and instructors) were untrained but scored the self-recorded videos using a microskill-based checklist that varied depending on the homework exercise. This checklist has been used in prior work to assess trainees' knot tying and suturing, with one prior paper reporting validity evidence for checklists used in several of the tasks [20, 35].

  4. 4.

    In addition to the numerical score, assessors had the option to provide narrative feedback for each homework assignment with the prompt: “What is one thing that was done well? What could be improved?” Peers or instructors returned homework scores and feedback on the Practice platform generally only if the assignment was submitted on time, before the next skills lab session. Only peer feedback was available for the first intern cohort (2021–2022) while both instructor (surgical education fellows) and peer feedback was available for the second intern cohort (2022–2023).

The final assessment included an evaluation of eight key knot tying and suturing skills similar to those completed in the homework assignments (e.g. two-handed square tie, atraumatic tie, suture ligation, simple running suture, vertical mattress suture). Faculty members were not trained but were provided with a rubric to formally assess each of the interns on their assigned task. Intern performance was rated on a five-point Likert scale, in which a score of equal to or greater than three represented OR-readiness. Faculty entered performance scores into Research Electronic Data capture (REDCap), a secure web platform for managing online databases and surveys, immediately after the intern completed their task [36, 37].

Feedback analysis

We collated all the narrative feedback for each exercise and participant. We then assessed feedback quality based on a modified component model from prior published work [28, 38]. We defined quality based on comment relevance, valence, and specificity. Under relevance, we defined “relevant” feedback as feedback that references the task. Under valence, we defined “reinforcing” as any positive feedback and “corrective” as any feedback that identifies an area of improvement. Under specificity, we defined “pragmatically specific” as specific feedback that may require inference from context clues and “semantically specific” as specific feedback that does not require inference on the part of the recipient. Exemplar feedback of each type is shown in Fig. 1. Two raters graded feedback for each exercise with these criteria for the first cohort. A third rater then reconciled all inter-rater discrepancies. We summed sub-components of quality for each exercise and then averaged quality scores across exercises to obtain a final overall feedback quality score per participant, by component.

Fig. 1
figure 1

Feedback grading rubric with sample feedback in each category. Feedback that was semantically specific was also pragmatically specific by definition. Pragmatically specific feedback was also relevant by definition

Statistical analysis

We calculated the standard deviation of the four pre-skills exercises for one cohort averaged together to assess dispersion among intern skills on entrance to intern year. We analyzed the impact of skills lab attendance, homework completion, homework scores, and receipt of feedback on homework on assessment scores with bivariate correlations and linear regression models. The discrepancy in homework assignment number was addressed by weighting homework scores by the number of assignments in each respective year. The number of variables that could be included in the models were constrained by sample size. We used peer scoring for homework scores; when no peer score was available, we used the instructor score for homework score when available (only in the 2022–2023 cohort) based on prior studies demonstrating no difference between peer and instructor score [20]. We determined effect sizes using partial omega-squared (small < 0.06, medium 0.06 to  < 0.14, large ≥ 0.14).

To compare homework scored by peers and homework scored by instructors to provide validity evidence for substituting instructor scores where peer scores were not available, we conducted a paired t-test for the 2022–2023 cohort [20]. This analysis was conducted using Excel’s t.test function. For the feedback analysis, we utilized Pearson’s correlation coefficient to calculate the association of each variable with assessment score. Inter-rater reliability was assessed using Cohen’s kappa. We performed this statistical analysis using SAS version 9.4, Microsoft Excel, and R Version 4.3.1 [39].

Results

Of the 97 surgical interns in the two cohorts, 71 (73%) participated in all aspects of the curriculum. Demographics are shown in Table 1. On average, participating interns attended 84% of skills lab sessions and submitted 87% of assigned homework assignments. Among submitted homework assignments, 83% received feedback. The mean score on assessed homework was 90% (Fig. 2). The standard deviation of the pre-curriculum homework assignment scores (only available for 2022–2023 cohort) was ± 7.6%. For the 2022–2023 cohort where both instructor and peer score was available (n = 267), the mean instructor score was 90% (SD = 11%) and the mean peer score was 91% (SD = 9.0%) with a nonsignificant paired t-test p value of 0.12.

Fig. 2
figure 2

Overall participation in each curriculum component

In a multivariable model assessing the relationship between measured curricular components (skills lab attendance, homework completion, homework score, and receipt of feedback on homework) and assessment performance, skills lab attendance and homework submission were not significantly associated with assessment score; both components had small effect sizes. However, for each additional homework that received feedback, the assessment score increased by 0.54 percentage points (p < 0.001; effect size 0.16). Since there are on average 13 homework assignments, 1 additional homework with structured feedback is a 7.7 percentage point increase in homework assignments that received structured feedback (e.g., 8 of 13 vs. 9 of 13 is a 7.7 percentage point increase). The model estimate for percentage of homework assignments that received feedback was multiplied by 7.7 (0.07*7.7 = 0.54) to interpret the effect for each additional homework that received feedback rather than for each additional percentage point of homework that received feedback (Table 2). For each percentage point increase in homework score, assessment score increased by 0.21 percentage points (p = 0.02; effect size 0.07) (Table 2). The model explained 18% of the variance in total score at assessment (adjusted R-squared = 0.18).

Table 2 Linear regression evaluating association of each predictor variable (skills lab attendance, homework completion, homework scores, and receipt of feedback on homework) with the outcome variable (assessment score)

Overall for homework feedback comment ratings, initial inter rater kappa was found to be 0.80, indicating substantial agreement, prior to reconciliation by a third rater. On average, each intern received written feedback on 72% of assignments (SD: 22%). Within the narrative feedback, 93% of comments were relevant, 99% were reinforcing, 80% were corrective, 83% were pragmatically specific, and 57% were semantically specific. Examples with associated definitions are shown in Fig. 1. Pearson’s correlation coefficient revealed a significant correlation between assessment score and relevant feedback (r = 0.26, p = 0.02), between relevant comments and pragmatically specific feedback (r = 0.74, p < 0.001), and between relevant feedback and semantically specific feedback (r = 0.54, p < 0.001). None of the other dimensions (reinforcing (r = 0.00), corrective (r = 0.06), pragmatically-specific (r = 0.17), or semantically-specific (r = 0.15) feedback) were significantly associated with assessment score. While relevant feedback was correlated with corrective feedback (r = 0.70, p < 0.001), relevant feedback was not correlated with reinforcing feedback (r = 0.01, p = 0.47). Finally, comparing the quality of feedback of the 2022–2023 cohort consisting of narrative feedback from instructors and peers to the 2021–2022 cohort with feedback from just peers demonstrated 98% vs 88% relevant comments, 83% vs 72% corrective, 98% vs 99% reinforcing, 89% vs 73% pragmatically specific, and 66% vs 40% semantically specific (Table 3).

Table 3 Comparison of feedback quality between the cohort graded by peers only (2021–2022) and the cohort graded by both peers and instructors (2022–2023)

Discussion

Our first finding demonstrated that relative to other curricular components, higher homework scores and receiving feedback were the only factors associated with a small, positive effect on assessment scores. Submitting homework alone was not associated with assessment score. These findings highlight that the participation in homework is not enough—and re-emphasizes that receiving feedback is key. In this particular curriculum, interns received feedback only if they submitted homework on time, prior to the next skills lab session. The measured improvement in assessment score was small (+ 0.54% per homework assignment provided with feedback), but translated to a one standard deviation increase in assessment score for 6.33 more homework assignments provided with feedback. Though the real world implications are unclear, it may represent an opportunity to emphasize the need to submit homework on time not only for one’s own learning but also to facilitate peer opportunities to provide feedback, which has been shown to improve one’s own performance [20].

This first finding prompted our team to further study the aspects of feedback on video-based learning that contribute to skills acquisition for surgical trainees. This led to our second finding that relevant feedback was significantly correlated with corrective, pragmatically specific, and semantically specific feedback, but reinforcing comments were not correlated with relevant or specific feedback. As comments including "good job" or "great work" were considered reinforcing, but not relevant as they did not specifically reference the task, this suggests a deficit in the provision of relevant and specific reinforcing feedback. High-quality reinforcing feedback has been shown to associate with higher student performance in clinical practice [40]. Additionally, positive feedback may be a greater motivator and improve morale in trainees more so than negative feedback [33]. Our findings contradict a previous study suggesting the quality of positive feedback is generally higher or similar to the quality of negative feedback [40]. While the previous study utilized a subjective Likert-scale score of 1–100 by feedback recipients, it is an interesting comparison point to our study where the feedback was graded by independent raters. Our findings suggest that graders may default to specific feedback when providing corrections and take a more global approach when providing reinforcing comments. This discrepancy is also notable in that relevant feedback was significantly positively correlated with assessment score. However, it is difficult to interpret this in isolation as none of the other dimensions (reinforcing, corrective, pragmatically-specific, or semantically-specific feedback) were significantly associated with assessment score. Given the retrospective nature of this study, this presents an opportunity for further exploration with other curricular components and with how the interns review and react to the comments to understand why these specific characteristics of feedback did not correlate with assessment score. Future studies of interventions to provide relevant and specific reinforcing feedback could reveal the impact on both feedback provider and recipient.

Third, our study found that the overall quality of narrative feedback provided was higher in the year with instructor or surgical education fellow feedback (2022–2023) rather than just peer feedback (2021–2022) (Table 3). However, our rubric-based homework feedback scores were consistent with prior findings that for assignments where both peer and instructor rubric-based scores were provided, there is no difference in scoring compared to instructor scoring [20]. While this difference in narrative homework feedback quality is likely due in part to interns’ time constraints compared to the relatively more available surgical education fellows, this finding suggests a likely gap in narrative feedback training for peer raters. Future iterations of this curriculum should provide clearer feedback guidelines and training to raters.

This study is limited as it is a single institution study and only interns are represented, thus the findings will be most applicable to institutions with similar curricular components and participants. Nonetheless, the framework of analysis can be broadly applied to understand how each component of different curricula may influence trainee skills acquisition and performance. In addition, there was no direct pre- and post-assessment given the study's retrospective nature. Given our concern that some of the variability of trainee performance on the final assessment may be due to differences in trainee skills level prior to the start of the curriculum, we utilized the four graded pre-curriculum exercises for one cohort (feedback was only available for the 2022–2023 cohort) to evaluate for overall variability in entrance score. As we found the standard deviation in homework score to be 7.6%, the small variation mitigates this limitation by supporting the assumption that most trainees entered with comparable baseline scores and thus variation in assessment score is attributable to curricular aspects. Finally, further studies will need to investigate other causes of assessment performance differences given that 82% of variance is not attributable to the variables measured. In this vein, it is difficult to conclude that there is no value in the components not associated with assessment performance, such as skills lab attendance, given that we only recorded attendance and not nuances including learner engagement, performance in skills lab, and the amount and quality of feedback received by circulating facilitators. In comparison, there was much more nuance available from the retrospective data regarding homework assignments including submission, scores, receipt of feedback, and quality of received feedback. Thus, this retrospective study should be viewed as hypothesis-generating and future prospective studies should examine the skills lab experience with greater detail and how this can contribute to perceived and measurable changes in performance [19].

Conclusion

In developing an open skills curriculum, critically evaluating each skills training and feedback component is important. Like prior studies, we found that repeated practice (or “participation”) alone was not enough and that provision of relevant feedback was a key component in developing basic open surgical skills. Our novel finding that feedback providers defaulted to more relevant and/or specific corrective feedback and more global (not relevant or not specific) reinforcing feedback highlights an area of focus for feedback training given its known benefit to trainees [40]. Future prospective studies examining interventions to providing higher quality reinforcing feedback (while maintaining high quality corrective feedback) on intern surgical skills and motivation may yield positive results for both feedback recipient and feedback provider in the era of multimodal learning.