Background

Female sterilization is the most common method of contraception worldwide [1]. More than 600,000 tubal sterilizations are performed annually in the USA [2]. Initially performed laparoscopically, the hysteroscopic sterilization method is becoming increasingly popular [3, 4].

Hysteroscopy is generally considered as a safe procedure with a low complication rate [5, 6]. Its practice ranges from diagnostics in an outpatient setting to a surgical alternative for many gynecological problems. Teaching hysteroscopy skills traditionally has been based on a mentored model, where trainees are exposed to procedures with the guidance of an experienced teacher. However, in recent years, the surgical volume has been limited by restrictions on resident working hours and less highly skilled teachers are available [7, 8]. This results in difficulties in acquiring sufficient skills in advanced endoscopic surgery [9, 10]. Effective usage of simulation and training models is a possible solution to this problem [911].

Development and validation research on training models and simulators has been mainly focused on laparoscopy. Training models allow a surgeon to safely overcome the learning curve of a new technique before practicing on a patient [9, 12, 13]. Virtual reality (VR) simulators especially, allow more independent instruction and objective immediate feedback for more reliable, unbiased assessment of psychomotor skills [9, 12, 14]. In addition, it allows for repeated practice without any risk to patients. Training on a VR system bypasses the ethical concerns associated with practice on animals or cadavers. Besides, many VR systems allow for practice at varying levels of difficulty and across a wide range of scenarios, thus accommodating trainees at many levels [14, 15].

Prior to implementation of a new training tool in a curriculum, evaluation and validation of the simulator and its parameters are mandatory [1619]. Validity measures whether a simulator is actually teaching or measuring what it is intended to teach or measure [17]. Different aspects of validity exist. Face validity refers to whether the model resembles the task or procedure it is aiming to train for, by determining the opinion of users on realism of the simulation. Objective approaches consist of construct and predictive validity. Construct validity refers to whether the model measures the quality or ability it is supposed to measure [17]. In this regard, the simulator must be able to differentiate between the experienced and the inexperienced surgeon, or in addition, measure improvement in novices’ performance by training. Predictive validity is the extent to which the simulator predicts future performance by assessing whether the skills acquired on a simulator actually result in improved skills in patients in the real-time clinical setting [17, 20].

Excellent data are available to support the validity and effectiveness of VR training of surgical skills in general surgery [2123], urology [20] as well as in gynecology [24, 25]. VR training leads to more efficient movements and less errors, which translates into less operating time and improved patient safety.

In comparison to laparoscopy, little work has been done regarding hysteroscopy training despite its upcoming use and applicability during the last decades. Several training methods have been designed, focusing mainly on the development of physical models and box trainers [2628]. A collaboration between gynecologists and technicians in Switzerland led to the development of the Hysteroscopic Surgery Simulator System (HystSim™)—a VR simulator for hysteroscopic interventions. Face and construct validity have been established for a diagnostic training module [29, 30]. Recently, a new procedural training module became available by which the Essure® sterilization method can be practiced (EssureSim™).

The hysteroscopic sterilization method by Essure® Permanent Birth Control system (Conceptus; Mountain View, CA, USA) was approved in 2001 by the European Health Office and in 2002 by the U.S. Food and Drug Administration. Micro-inserts placed in both the tubal ostia cause a sterile inflammatory response of the intramural and isthmic parts of the Fallopian tube, thereby occluding the tubes within 3 months. Since the introduction of this method, it is performed by gynecologists around the world and has become an accepted alternative to laparoscopic sterilization. Initially taught with significant hands-on supervision, the EssureSim™ is developed to train gynecologists who want to start performing this procedure in a more efficient manner and without risks for the patient.

The aim of this study is to determine the face and construct validity of this VR training module for the hysteroscopic placement of tubal sterilization micro-inserts.

Methods

Participants

Between June 2010 and April 2011, 25 ob-gyn residents and 44 consultant gynecologists (N = 69) were randomly recruited at the Annual Meeting of the Dutch Society of Obstetrics and Gynecology and from a university hospital and a major teaching hospital in the Netherlands.

Given that hysteroscopic sterilization is performed as a type of therapeutic vaginoscopic hysteroscopy, without use of a speculum and tenaculum, three groups were made. This division was based on a combination of Essure® experience level and experience level in therapeutic vaginoscopic hysteroscopy. “Novices” (N = 17): never performed an Essure® placement nor a therapeutic vaginascopic hysteroscopy, “experts” (N = 17): performed >25 Essure® placements and >25 therapeutic vaginascopic hysteroscopies, “intermediates” (N = 35): any experience varying between a novice and expert. The assessment of the participants’ experience was made by self-estimated numbers of both procedures.

Equipment

The EssureSim™ consists of an adapted hysteroscope (10-mm resectoscope), an Essure® simulation device, simulation hardware and software (Fig. 1). The simulation software runs on standard laptop hardware (2.40 GHz Intel® Core™ 2 DUO CPU P8600, 2 GB RAM, NVIDIA Quadro FX 2700 M graphic card). The system does not possess haptic feedback. The software contains eight different cases with varying degrees of difficulty.

Fig. 1
figure 1

Set up EssureSim™ (with permission of VirtaMed AG)

Face validity

Participants completed a questionnaire immediately after completing the cases on the simulator. It included questions about participants’ demographics and experience level in hysteroscopy training, several hysteroscopy procedures, and hysteroscopic sterilization. The opinion of each participant was assessed with 14 questions about the simulator and sterilization module. These questions concerned the realism of the simulation and training capacities, and were presented on a 5-point Likert scale [31]. Additionally, two statements were proposed for further opinion inquiry. These were answered with “agree,” “disagree,” or “no opinion.” Face validity was determined by analyzing the opinion of the participants with prior Essure® experience. In this manner, realism and training capacity of the simulator was evaluated only by the participants who had knowledge of the real-time procedure and who could make a comparison between both environments.

Construct validity

To investigate construct validity, the participants performed tasks on the simulator. To all participants, a standard introduction of the simulator and sterilization procedure was given. A familiarization with the VR simulator was executed, consisting of one tubal micro-insert placement in a uterus with normal tubes. In the first case (case 1), the participant performed a bilateral sterilization in a uterus with normal tubes, as shown in the animation (Online Resource 1). The second case (case 2) comprised a bilateral placement in a uterus of a more difficult level, because of the thickened endometrium of this uterus, decreased visibility, and slightly more lateral insertion of the tubes (Online Resource 2). All participants were supervised by one supervisor (J.A.J.), who gave answers to questions and gave instructions if one was not able to proceed.

Case 1 and 2 were used for analysis. Parameters being measured by the simulator and used for data analysis were task time, path length, trauma, patient comfort, amount of distension fluid used, and successful placement. A description of all parameters used is given in Table 1. These parameters were compared between the different groups for both cases separately, since they were of a different level.

Table 1 Description of all parameters used

Use of statistics

Data were analyzed using the statistical software package SPSS 17.0 (SPSS, Inc., Chicago, IL). Differences between the general demographics and performances between the three groups were analyzed using the Kruskal–Wallis test for nonparametric data. If the Kruskal–Wallis test resulted in a significant difference, then a comparison between two separate groups was done using the Mann–Whitney U test with post hoc Dunn’s (Bonferroni) correction. To verify the minimum sample size, a power analysis was performed. A total sample of 69 subjects achieves a power of >.80 with the Kruskal–Wallis test with a target significance of .05. The average within-group standard deviation assuming the alternative distribution is 1.0 (PASS 2008 NCSS; LCC, Kayville, UT). A p value of <.05 was considered to be statistically significant. Values are presented as medians with interquartile ranges unless stated otherwise.

Findings

Table 2 shows the general demographics of the participants. A significant difference for age was seen between groups (p < .05), while gender and handedness did not differ significantly. Of all participants, one expert and three participants of the intermediate group had been introduced to the HystSim™ at other conference venues.

Table 2 Baseline characteristics of all participants

Face validity

Of the 69 participants, all completed the entire questionnaire. Table 3 summarizes the median values of the scores considering the realism and training capacity of the simulator, awarded by the participants with prior Essure® experience (N = 22). In the questionnaire, realism of the sterilization procedure was scored with a median of 4.00 points on a 5-point Likert scale. Training capacity of the sterilization procedure was awarded a median of 5.00 points. Of all participants with prior Essure® experience, 100.0 % agreed with the statement that the hysteroscopy simulator offers procedural training of hysteroscopic skills. Furthermore, 95.5 % indicated the training module for the Essure® sterilization method as a useful preparation for real-time placement.

Table 3 Results face validity

Construct validity

All of the 69 participants completed all cases. Median values of the assessed parameters for case 1 and 2 are shown in Table 4. The simulator was able to differentiate between subjects with varying hysteroscopy experience for two out of six parameters.

Table 4 Results of construct validity for each group

The parameter task time was able to differentiate significantly between all groups in both cases. The novice group performed both cases significantly slower in comparison to the other groups (p = .001 for both cases). In addition, all groups required more time to finish the second case, a uterus of a more difficult level, in comparison to the first case.

Similarly, the parameter path length showed significant differences between groups in both cases. The novices had a significantly longer path length in comparison to the intermediate and expert group (case 1, p = .001 in both groups; case 2, p = .006 in comparison with the intermediate group).

The results for parameter task time and path length are visualized in Fig. 2. Both parameters reflect a more efficient performance of hysteroscopy by experienced gynecologists; however, the clinical relevance of a shorter duration of 1 to 1.5 min per patient is uncertain.

Fig. 2
figure 2

Results of construct validity in box plots. Box plots for parameters task time and path length, for all groups performing case 1 and 2. Bars are medians, boxes show interquartile range, whiskers show range, dots are outliers, and large horizontal bars indicate statistically significant differences, specified with p values

In the first case, the parameter trauma displayed a significant difference between the novices and the intermediate group and a similar trend in comparison to the expert group. However, in the second case, a reversed (nonsignificant) effect is observed. The novice group achieved a median score of 8 contacts in comparison to 13 in the expert group. A similar contradictory trend in both cases is seen for the parameter patient comfort.

The analysis of the parameter distension medium did not show significant results, while the intermediate group used the largest amount of fluid in both cases. The last parameter, the number of correctly placed devices, did not differ significantly between the three groups and no specific trend could be observed. Both inexperienced and experienced participants were able to position the sterilization micro-inserts in a correct manner.

Discussion

The aim of this study was to determine the validity of a new training module by which the Essure® sterilization method can be practiced on a commercially available VR simulator. We assessed the realism of the simulator by questionnaires (face validity) and determined the capacity of the simulator to distinguish between experienced and inexperienced hysteroscopists (construct validity). Face validity was established with high scores, while construct validity showed moderate results.

The study was preceded by a power analysis and contained a sufficient number of participants. One supervisor coached all participants to limit inter-supervisor bias.

According to the fact that hysteroscopic sterilization is usually performed as a type of therapeutic vaginoscopic hysteroscopy [32, 33], participants were grouped by their experience in both procedures.

In general, gynecologists with ample experience in performing hysteroscopies are considered experts. In the absence of generally accepted criteria for the classification of experience levels, we applied the arbitrary number of 0 and 25 therapeutic procedures to form three levels. Both the novice and expert group were of similar size (N = 17), whereas the intermediate group consisted of clearly more participants (N = 35), indicating that the majority of our study population had some or more therapeutic vaginoscopic hysteroscopy experience. Face validity was assessed by taking into account only the opinion of those participants who had tubal sterilization experience (performed ≥1 Essure®). In this manner, realism was evaluated only by those participants who had knowledge of the real-time procedure.

Not all performance parameters measured by the simulator were able to differentiate between participants with varying hysteroscopy experience. We hereby confirm findings of previous studies by Bajka et al. [30] and Panel et al. [34], who investigated the face and construct validity of the diagnostic and sterilization module on this hysteroscopy simulator, respectively. Both studies found that less than half of all used parameters significantly correlated with hysteroscopy experience.

Possible reasons for the current study could be the fact that an active coaching strategy was adopted, by which the supervisor was easily accessible for questions and practical advice. It should be emphasized that this might have reduced possible differences between experienced and inexperienced participants. Another reason may possibly be the lack of haptic feedback, which might impair especially the experienced hysteroscopist. Not only the visual aspect but also haptic feedback gives guidance to the operator for efficient and safe hysteroscopies. Both parameters trauma and patient comfort, which is a combination of number of trauma and the distension pressure of the fluids exerted on the uterine wall, might not be able to differentiate in a consequent manner between novices and experts as a result.

Also, the parameter distension medium should be interpreted with caution due to a number of missing data, as the simulator tended not to register fluid use during all placements. Further refinement of the software and scoring systems is therefore necessary. Incorrectly placed devices were mainly caused by placing them too deeply into the tubes, whereby the coils were not visible in the uterine cavity after deployment. The fact that novices scored high percentages for correctly placed devices might be explained by the observation that those participants without any hysteroscopy experience tended to adhere more closely to any practical advice given during device placement, in contrast to the more experienced groups. In addition, one needs to realize that the assessment of the participants’ experience was self-reported and therefore is subject to recall bias. Also, the division into three levels of experience could be seen as a potential source of bias since the norm of both sufficient hysteroscopy and sterilization experience must be met to be classified as an expert.

One could ask oneself in general if a slower performance with more use of distension medium is not preferred when a higher correct placement rate is achieved with better patient comfort. The parameters used by this simulator might not be the only measures of hysteroscopy performance. For procedural exercises, one could design a global rating scale (GRS), which is a scoring system that is built on certain clinically relevant performance parameters [35, 36].

Conclusion

In conclusion, this simulator received the highest scores regarding both procedure realism and training capacity. It was able to differentiate between subjects with varying hysteroscopy experience for two out of six parameters. We consider this study as an essential basic step in the validation cascade of a VR simulator for training operative hysteroscopies and for hysteroscopic sterilization in particular. Also, we believe this simulator could be suitable for future training of hysteroscopic sterilization skills, after further refinement of the software. The next important step would be the investigation of the learning curve, with concurrent use of a clinically relevant GRS. The learning curve is a vital part of construct validity and in addition addresses implementation of the simulator in hysteroscopy training curricula. The learning curve could possibly indicate the necessary number of training sessions contributing to efficient and safe daily practice. Assessing predictive validity would be a last and ideal step in the validation cascade, providing data to which extent the simulation can predict real-life hysteroscopic performance.