Surgical Endoscopy

, Volume 26, Issue 6, pp 1516–1521 | Cite as

Proficiency-based training for robotic surgery: construct validity, workload, and expert levels for nine inanimate exercises

  • Genevieve Dulan
  • Robert V. Rege
  • Deborah C. Hogg
  • Kristine M. Gilberg-Fisher
  • Nabeel A. Arain
  • Seifu T. Tesfay
  • Daniel J. Scott



We previously developed nine inanimate training exercises as part of a comprehensive, proficiency-based robotic training curriculum that addressed 23 unique skills identified via task deconstruction of robotic operations. The purpose of this study was to evaluate construct validity, workload, and expert levels for the nine exercises.


Expert robotic surgeons (n = 8, fellows and faculty) and novice trainees (n = 4, medical students) each performed three to five consecutive repetitions of nine previously reported exercises (five FLS models with or without modifications and four custom-made models). Each task was scored for time and accuracy using modified FLS metrics; task scores were normalized to a previously established (preliminary) proficiency level and a composite score equaled the sum of the nine normalized task scores. Questionnaires were administered regarding prior experience. After each exercise, participants completed a validated NASA-TLX Workload Scale to rate the mental, physical, temporal, performance, effort, and frustration levels of each task.


Experts had performed 119 (range = 15–600) robotic operations; novices had observed ≤1 robotic operation. For all nine tasks and the composite score, experts achieved significantly better performance than novices (932 ± 67 vs. 618 ± 111, respectively; P < 0.001). No significant differences in workload between experts and novices were detected (32.9 ± 3.5 vs. 32.0 ± 9.1, respectively; n.s.). Importantly, frustration ratings were relatively low for both groups (4.0 ± 0.7 vs. 3.8 ± 1.6, n.s.). The mean performance of the eight experts was deemed suitable as a revised proficiency level for each task.


Using objective performance metrics, all nine exercises demonstrated construct validity. Workload was similar between experts and novices and frustration was low for both groups. These data suggest that the nine structured exercises are suitable for proficiency-based robotic training.


Robotic surgical training Simulation Construct validity Proficiency-based training 

The adoption of robotics in surgery has increased rapidly over the last 10 years. Indeed, the da Vinci Surgical System (Intuitive Surgical Inc., Sunnyvale, CA), is now present in over 1,000 hospitals throughout the United States. However, adequate training for surgeons has struggled to keep pace with the expanded usage of this system. Robotic training may be difficult when using traditional operating room teaching methods given the physical distance between the operating surgeon and the bedside surgeon and other constraints related to communication and the inability for the mentor to manually control the trainee’s instruments. Robotics may be amenable to simulation-based training outside of the operating room. However, in contrast to laparoscopy, there is very little validation work to date. For example, the Fundamentals of Laparoscopic Surgery program (FLS) has been extensively validated and proven highly effective [1, 2] in enhancing performance in the clinical setting. Accordingly, the FLS program had been widely adopted as a standardized curriculum for surgery residents. Additionally, given the wealth of validation supporting the use of FLS for assessment purposes, it is now a requirement of the American Board of Surgery [3]. By comparison, no such standardized curricula or validated assessment methods exist for robotic surgery.

To overcome this deficit we carefully performed task deconstruction of actual robotic procedures, identified 23 unique skills required for robotic operations, and developed a comprehensive, proficiency-based robotic training curriculum that includes an online tutorial, half-day hands-on session, and nine inanimate training exercises. Our prior study documented pilot feasibility for the proficiency-based curriculum (Dulan et al., unpublished data). A novice learner reached proficiency for all three components after 13 h of training. Additionally, significant performance differences were detected between expert and novice for the nine inanimate exercises, thus supporting preliminary construct validity. The purpose of this study was to more formally evaluate construct validity and workload of the nine exercises by enrolling a larger cohort of experts and novices. For this purpose construct validity was defined as the ability of a test to measure the traits it purports to measure [4]. Additionally, we aimed to establish revised expert performance levels for the purpose of proficiency-based training.


This study was conducted at the University of Texas Southwestern Medical Center at Dallas in the Southwestern Center for Minimally Invasive Surgery Training Laboratory. After receiving IRB approval, expert robotic surgeons (n = 8) and novice trainees (n = 4, medical students) voluntarily enrolled in our study. Surgeons known to have significant experience were recruited as experts; this cohort included one fellow and seven faculty from General Surgery (n = 3), Gynecology (n = 2), and Urology (n = 3). Novice (n = 4) medical students who demonstrated an interest in surgery but had no prior experience using the da Vinci robot were recruited for the study. These groups included one expert and one novice whose data was used for our pilot work.

Participants completed a short survey detailing demographic information, previous laparoscopic simulator experience, robotic surgery simulator experience, and comfort with laparoscopic and robotic technical skills. Each participant watched a standardized video that demonstrated error avoidance strategies and the correct methods for performing each of the nine inanimate exercises previously described as part of our comprehensive curriculum (Fig. 1). Briefly, five tasks used FLS models with or without modifications, including Peg Transfer, Clutch/Camera Peg Transfer, Pattern Cut, and Interrupted and Running Suture. Four tasks used other commercially available and custom-made components, including Rubber Band Transfer, Stair Rubber Band Transfer, Clutch/Camera Navigation, and Running/Cutting Rubber Band. These four exercises were developed specifically to address skills required for robotic procedures that were not addressed by the FLS tasks. Participants performed three to five consecutive repetitions of each exercise.
Fig. 1

The nine inanimate tasks used in the curriculum

Each participant was allowed up to one warm-up per task. Each task was scored by a single trained proctor for time and accuracy using modified FLS metrics. Each task had a cutoff time and well-defined errors, as previously described. The following formula was used: Task Score = cutoff time − completion time − (weighting factor × sum of errors); to heavily penalize suboptimal performance, errors were weighted by a factor of 10 (tasks 1–7 and 9) or 50 (task 8). A higher score indicated superior performance. A score of zero was assigned if a negative value was derived. Task scores were then normalized to give equal weight to each task. Normalization was performed by dividing the task score by preliminary proficiency levels, based on prior data obtained from a single expert, then multiplying this value by 100. A composite score equaled the sum of the nine normalized task scores.

After completing repetitions for each task, participants evaluated workload using the validated NASA-TLX rating tool [5]. Participants rated six domains (mental demand, physical demand, temporal demand, performance, effort, and frustration levels) on a ten-point scale, with high ratings indicating increased workload. A composite NASA-TLX score was defined as the sum of the six individual domain ratings.

To determine construct validity, expert and novice scores were compared. Statistical analysis was conducted using SigmaPlot 11.0 software (Systat Software, Inc., San Jose, CA). Mann-Whitney rank sum tests were used for the expert, novice, and NASA-TLX scores; P < 0.05 was considered significant. Values are mean ± SD unless otherwise stated.

To establish revised proficiency levels, the aggregate expert data for each task was analyzed and the group mean and standard deviation were determined. Outliers (>2 SD beyond the group mean) were trimmed and the new group mean was defined as the proficiency level, as described in prior studies [6, 7].


Eight experts performed 119 (range = 15–600) robotic operations; three experts had prior robotic simulator training and five had prior robotic animal training. Novice participants were all interested in surgery as a career and had completed proficiency-based open knot-tying and suturing [7] and basic laparoscopic simulator training (Southwestern Stations [8]). They had observed ≤1 robotic operation and had no prior simulator-based robotic experience.

Experts achieved significantly better performance than novices according to each of the nine task scores (Fig. 2) as well as the composite score (932 ± 67 vs. 618 ± 111; P < 0.001), thus supporting construct validity.
Fig. 2

Expert vs. novice performance on the nine inanimate tasks. Significant differences were detected for all nine tasks

No significant differences in workload were detected between experts and novices for any of the nine tasks (Fig. 3) or according to the composite workload ratings (32.9 ± 3.5 vs. 32.0 ± 9.1, n.s.). Importantly, the average frustration ratings across all tasks were relatively low for both groups (4.0 ± 0.7 vs. 3.8 ± 1.6, n.s.).
Fig. 3

NASA TLX workload scores as rated by surgeons after performing each of the nine inanimate tasks. No significant differences in workload were detected between experts and novices

The expert performance data were reviewed in detail for the purpose of establishing a revised set of proficiency levels. Of 252 total data points, 13 values were identified as outliers and trimmed. The resulting mean values (Table 1) were compared to our previous pilot proficiency levels. As only small differences were seen between the two data sets and since this new data set was based on a larger cohort of experts, the new data set was deemed suitable for use as revised proficiency levels.
Table 1

Revised proficiency levels*


Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Task 7

Task 8

Task 9

Expert task score

97 (66 s with no errors)

104 (67 s with no errors)

94 (68 s with no errors)

100 (69 s with no errors)

106 (70 s with no errors)

104 (71 s with no errors)

110 (72 s with no errors)

101 (73 s with no errors)

133 (74 s with no errors)

Standard deviation










* Based on the performance (trimmed mean) of eight experts


We developed this curriculum because of an overall lack of validated methods for robotic surgical training. While our prior studies confirmed that the online tutorial and half-day interactive session components were important to the content validity of our curriculum, the nine inanimate tasks are critical for teaching and assessing robotic technical skills. Our curriculum was designed to incorporate proficiency-based training; as such practices have resulted in excellent skill acquisition and long-term retention while maximizing the efficiency of the learning process [8, 9].

Paramount to this process is the use of valid metrics. Specifically, in order for expert levels to be used as meaningful training end points, the metrics must demonstrate construct validity, whereby novices and experts may be clearly distinguished according to their task performance scores. Importantly, this study demonstrated construct validity for all nine tasks, suggesting that they are appropriate for use in our curriculum. Even though our groups contained only eight experts and four novices, our protocol included multiple repetitions and the data set was sufficiently robust to allow meaningful statistical comparisons.

These data are not terribly surprising since our exercises were based heavily on FLS tasks and metrics that have been validated extensively in other studies [9, 10, 11, 12, 13, 14, 15]. However, only two prior studies had evaluated FLS tasks for robotic training and both studies involved only the interrupted suturing task [12, 13, 14, 16]. While both studies documented favorable results, ours is the first study that documented construct validity for the five FLS models as well as the other four drills we used in our comprehensive curriculum.

From a validation standpoint, we have now accumulated substantial data for this curriculum, with prior studies supporting content and face validity and our current study supporting construct validity. Regarding feasibility of implementing this curriculum, we anticipate good results. Our group of experts represented multiple disciplines and these tasks should be useful for a variety of learners. Our workload data suggest that the demands placed on trainees should not be unreasonable. Likewise, the revised proficiency levels seem suitable since they are based on a larger cohort of experts than our prior levels.

Regarding our curriculum’s use of inanimate exercises, several issues are worth mentioning. While the da Vinci system dominates the market currently, should other systems designed for similar applications become available in the future, the same inanimate tasks may be applicable. For the da Vinci system, the surgeon must rely heavily upon visual cues since there is no tactile or haptic feedback. For example, it is quite easy to fray suture, bend needles, or traumatize tissues using the robotic system if sufficient expertise has not been acquired. The inanimate models allow the learner to develop the necessary skills to overcome these constraints since the system interacts with actual physical models. Additionally, our scoring system assigns penalty points when these types of errors are committed. By comparison, virtual reality robotic systems may struggle to accurately simulate these interactions and are relatively expensive, costing $80,000 or more. However, tradeoffs also exist for our inanimate models since actual materials, such as suture, and the use of real robotic instruments may require substantial resources, as previously reported.

In conclusion, best methods for simulation-based training have evolved and this curriculum has aimed to maximally utilize these principles for robotic training. This study builds on our prior data and documented construct validity for all nine exercises included in our curriculum. The expert data were sufficiently robust to allow revision of our training performance goals. Implementation of our comprehensive proficiency-based robotic surgical training curriculum is encouraged.



G. Dulan, R. V. Rege, D. C. Hogg, K. M. Gilberg-Fisher, N. A. Arain, S. T. Tesfay, and D. J. Scott have no conflicts of interest or financial ties to disclose.


  1. 1.
    Sroka G, Feldman LS, Vassiliou MC, Kaneva PA, Favez R, Fried GM (2010) Fundamentals of laparoscopic surgery simulator training to proficiency improves laparoscopic performance in the operating room—a randomized controlled trial. Am J Surg 199:115–120PubMedCrossRefGoogle Scholar
  2. 2.
    McCluney AL, Vassiliou MC, Kaneva PA, Cao J, Stanbridge DD, Feldman LS, Fried GM (2007) FLS simulator performance predicts intraoperative laparoscopic skill. Surg Endosc 21:1991–1995PubMedCrossRefGoogle Scholar
  3. 3.
    Fundamentals of laproscopic surgery. Accessed 25 May 2011
  4. 4.
    Gallagher AG, Ritter EM, Satava RM (2003) Fundamental principles of validation, and reliability: rigorous science for the assessment of surgical education and training. Surg Endosc 17:1525–1529PubMedCrossRefGoogle Scholar
  5. 5.
    Hart SG, Staveland LE (1988) Development of a multi-dimensional workload rating scale. In: Human mental workload. Amsterdam, ElsevierGoogle Scholar
  6. 6.
    Ritter EM, Scott DJ (2007) Design of a proficiency-based skills training curriculum for the fundamentals of laparoscopic surgery. Surg Innov 14:107–112PubMedCrossRefGoogle Scholar
  7. 7.
    Scott DJ (2006) Proficiency-based training for surgical skills. Semin Colon Rectal Surg 19:72–80CrossRefGoogle Scholar
  8. 8.
    Goova MT, Hollett LA, Tesfay ST, Gala RB, Puzziferri N, Kehdy FJ, Scott DJ (2008) Implementation, construct validity and benefit of a proficiency based knot-tying and suturing curriculum. J Surg Educ 65:309–315PubMedCrossRefGoogle Scholar
  9. 9.
    Mashaud LB, Castellvi AO, Hollett LA, Hogg DC, Tesfay ST, Scott DJ (2010) Two-year skill retention and certification exam performance after fundamentals of laparoscopic skills training and proficiency maintenance. Surgery 2:194–201CrossRefGoogle Scholar
  10. 10.
    Stefanidis D, Korndorffer JR, Black FW, Dunne JB, Sierra R, Touchard CL, Rice DA, Markert RJ, Kastl PR, Scott DJ (2006) Psychomotor testing predicts rate of skill acquisition for proficiency-based laparoscopic skills training. Surgery 140:252–262PubMedCrossRefGoogle Scholar
  11. 11.
    Fried GM, Feldman LS, Vassiliou MC, Fraser SA, Stanbridge D, Ghitulescu G, Andrew CG (2004) Proving the value of simulation in laparoscopic surgery. Ann Surg 240:518–525PubMedCrossRefGoogle Scholar
  12. 12.
    Stefanidis D, Wang F, Korndorffer JR Jr, Dunne JB, Scott DJ (2010) Robotic assistance improves intracorporeal suturing performance and safety in the operating room while decreasing operator workload. Surg Endosc 24:377–382PubMedCrossRefGoogle Scholar
  13. 13.
    Stefanidis D, Hope WW, Scott DJ (2011) Robotic suturing on the FLS model possesses construct validity, is less physically demanding, and is favored by more surgeons compared with laparoscopy. Surg Endosc 25(7):2141–2146PubMedCrossRefGoogle Scholar
  14. 14.
    Korndorffer JR Jr, Clayton JL, Tesfay ST, Brunner WC, Sierra R, Dunne JB, Jones DB, Rege RV, Touchard CL, Scott DJ (2005) Multicenter construct validity for Southwestern laparoscopic videotrainer stations. J Surg Res 128:114–119PubMedCrossRefGoogle Scholar
  15. 15.
    Hamilton EC, Scott DJ, Fleming JB, Rege RV, Laycock R, Bergen PC, Tesfay ST, Jones DB (2002) Comparison of video trainer and virtual reality training systems on acquisition of laparoscopic skills. Surg Endosc 16:406–411PubMedCrossRefGoogle Scholar
  16. 16.
    Seixas-Mikelus SA, Stegemann AP, Kesavadas T (2011) Content validation of a novel robotic surgical simulator. BJU Int 107:1130–1135PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Genevieve Dulan
    • 1
  • Robert V. Rege
    • 1
  • Deborah C. Hogg
    • 1
  • Kristine M. Gilberg-Fisher
    • 1
  • Nabeel A. Arain
    • 1
  • Seifu T. Tesfay
    • 1
  • Daniel J. Scott
    • 1
  1. 1.Department of SurgerySouthwestern Center for Minimally Invasive Surgery, University of Texas Southwestern Medical CenterDallasUSA

Personalised recommendations