Introduction

In order to enhance patient safety, it has become increasingly important to measure outcome in health care. Surgical outcomes such as blood loss, operative time, and the occurrence of adverse events are widespread applied instant measures. These measures, as well as skills and experience of the surgeon (usually expressed by the number of performed cases) are currently still used as quality predictors [1]. However, it is also established that surgical outcome, apart from surgical experience, is influenced by co-factors such as the makeup of the OR team and (inherently) patient factors (i.e., the case mix). These factors are not taken into account when the aforementioned crude and unadjusted parameters are used to measure and present the actual surgical outcome [2, 3].

With respect to patient-related factors, recent research in laparoscopic hysterectomy (LH) demonstrated five significant covariates predicting successful outcome: uterus weight, body mass index, number of surgeons present at surgery, prior abdominal surgery, and type of laparoscopic hysterectomy (i.e., total laparoscopic hysterectomy, supracervical laparoscopic hysterectomy, or laparoscopic-assisted vaginal hysterectomy) [4]. Moreover, experience is predicting successful surgical outcome in LH, with respect to blood loss and adverse events, up to at least a hundred procedures. This finding was also observed in the field of advanced colorectal laparoscopic surgery [5, 6]. Finally, recent research demonstrated a significant experience independent and case mix-adjusted surgical skills factor (SSF) with regard to successful outcome in LH [4].

The aforementioned findings support that surgical outcomes in laparoscopic hysterectomy should be monitored consecutively, as both case mix and surgeon’s skills may vary over time, and experience alone is not sufficiently predicting these outcomes. Parallel to the traditional outcome measures, the traditional single outcome learning curves in surgery, which were applied in order to assess surgical proficiency, do not take these findings into account [710]. Monitoring tools based on cumulative sum (CUSUM) analysis, already used in obstetrics and general surgery, overcome these shortcomings [1116]. In the industrial setting, since 1974, CUSUM charts have been shown to be ideally suited to detect relatively small persistent changes in the event rates over time [3]. Traditional CUSUM approaches, however, make no adjustment for different risk profiles because machine inputs are usually relatively homogeneous. In contrast, patients undergoing a particular surgical intervention are often very heterogeneous in their clinical presentation. Additionally, the surgical approach may vary considerably due to the clinical presentation as well as the preference of the surgeon. As a result, the probability of successful outcome may vary considerably between patients. By using a likelihood-based scoring method, the cumulative sum procedure is adapted so that it adjusts for the surgical risk of each patient estimated preoperatively [2, 17, 18]. As a result, the user will be provided with a graphical representation of its surgical outcomes corrected for patient mix and instantly compared to the national average. Trends will be visualized, and significant deterioration in surgical outcome will be noticed.

In gynecology, nowadays, a shift in implementing more advanced surgical procedures is observed. However, several studies suggest that these advanced laparoscopic surgical procedures are characterized by a specific proficiency gaining curve due to the acquirement of unique operative skills [19]. Consequently, this learning curve is considered a barrier for widespread implementation of advanced laparoscopic surgery [5]. Other research already revealed that even in basic laparoscopy, nearly a fifth of surgeons never gain proficient skills to perform laparoscopic surgery adequately [20]. These insights, combined with the call for constant monitoring of patient safety, make us strive for risk-adjusted continuous quality assessments during mentorships and beyond in order to adjust performance when quality of surgery is at risk.

The aim of this study is to develop such a tool. In order to signal derailing surgical performance in a timely fashion, a risk-adjusted real-time quality control system for laparoscopic hysterectomy is analyzed, inquired, and launched.

Methods

A previously described data set of 1.534 LHs, performed by 79 surgeons, was used to validate and compose a risk adjusted CUSUM graph in LH [4]. Significant predicting covariates were included. These consisted of uterus weight, body mass index, number of surgeons present at surgery, prior abdominal surgery, and type of laparoscopic hysterectomy (Table 1).

Table 1 Association between predictors and primary outcomes in laparoscopic hysterectomy

The CUSUM score depends on four factors: the current average level of surgical performance, a chosen level of surgical performance deemed undesirable, the patient’s surgical risk estimated preoperatively, and the actual surgical outcome in this patient. Preoperative surgical risk estimation was based on body mass index, uterus weight, and prior abdominal surgery. With respect to the continuous surgical outcomes, blood loss, and operative time, these were dichotomized using the rounded mean observed value. Consequently, successful surgical outcome was determined as blood loss <200 mL, operative time <120 min, and no adverse event. Because incidences of these outcomes varied, with accompanying varying influences of covariates, we applied three risk-adjusted CUSUM graphs, one for each outcome.

With the chosen level of surgical performance deemed undesirable, we aimed to minimize the number of procedures before possible derailing performance is signaled, while minimizing “false alarms”. For quality control, a lower boundary line is not used. To allow a sensitive and timely detection of “eventful” procedures, this model resets itself to 0, each time the x-axis is hit [18]. As a consequence, the median number of procedures needed to detect an unacceptable failure rate (in case a surgeon performs below an acceptable level) is based on the upper boundary (“out of control”, odds ratio of 2 compared to average performance). Nevertheless, this model cannot prevent that also average clinical performance every once in a while is “flagged” as derailing (Fig. 2). The primary outcome of this study is the number of procedures after which surgeons are flagged, both true positive and false positive.

To apply a risk-adjusted (i.e., based on the patient’s surgical risk estimated preoperatively) CUSUM analysis, we have to estimate the logistic regression model as described earlier [4]. Based on this model, we can compute the probability of an unfavorable outcome (failure) for each procedure. For ease of notation, suppose we use only uterus weight Ut as a predictor. Then, provided that the surgeon is performing exactly on the national average (i.e., is in control), the probability of failure in procedure i is:

$$ \mathrm{p}0(i)=1/\left(1+ \exp \left(-\beta 0-\beta 1\times \mathrm{U}\mathrm{t}(i)\right)\right) $$
β0:

the intercept in the logistic regression model

β1:

log odds ratio for uterus weight

If the surgeon performs worse than average (OR = 2 compared to the national average), the probability of failure becomes larger and is given by:

$$ \mathrm{p}1(i)=\mathrm{p}0=1/\left(1+ \exp \left(-\beta\;0-\beta\;1\times \mathrm{U}\mathrm{t}(i)- \log (2)\right)\right) $$

Given the outcome of procedure i, we can compute the log likelihood ratio as

$$ W(i)=\left\{\begin{array}{ll} \log \left(p0/p1\right)\hfill & \mathrm{if}\;\mathrm{failure}\hfill \\ {} \log \left(\left(1-p0\right)/\left(1-p1\right)\right)\hfill & \mathrm{if}\;\mathrm{success}\hfill \end{array}\right. $$

Now, we construct the CUSUM graph by plotting X(i) = max(0, X(i − 1) + W(i))

This X will provide the actual direction and weight of the outcome of procedure i on the CUSUM graph corrected for uterus weight. In our model, we included all covariates (uterus weight increase per 100 g, BMI increase per 5 points, numbers of prior abdominal surgeries, 1 or 2 performing surgeons, and type of laparoscopic hysterectomy).

Results

Figure 1 provides an example of the principle of a risk-adjusted CUSUM graph of 21 consecutive LHs in one surgeon with respect to blood loss <200 mL. The horizontal axis represents the numbers of consecutive procedures. The vertical axis represents the cumulative sum of the risk-adjusted scores per procedure. As can be seen at “no. 1,” the fourth procedure was complicated by blood loss >200 mL in a regular patient, followed by three regular procedures with blood loss <200 mL. The eighth procedure (no. 2) was performed uneventful in a “challenging patient” (e.g., high BMI and large uterus weight). At the 13th procedure (no. 3), blood loss >200 mL occurred; however, this occurred in a challenging case (high BMI and large uterus weight compared to no. 1). At the 15th procedure, another failure occurred (no. 4), however, because of average patient characteristics (see also Table 1); this procure was expected to be performed uneventful. A steep rise on the curve represents this discordance between the observed and expected outcome. At attempt number 21 (no. 5), the CUSUM graph goes out of control. Consequently, the chart signals.

Fig. 1
figure 1

Example of a cumulative summation analysis graph in one gynecologist with respect to blood loss <200 mL (see “Results” for explanation)

For the defined outcomes of LH, respectively blood loss, operative time, and adverse events separate risk-adjusted CUSUM graphs that were constructed. In order to detect unacceptable failure rates (clinical performance OR 2.0 compared to average clinical performance) in surgical performance within 20 procedures, as a result, a surgeon with average surgical outcomes will be flagged without justified bad performance once in approximately every 70–75 procedures respectively (Fig. 2). Reference values are based on the previously described cohort of 1.534 procedures performed by 79 gynecologists.

Fig. 2
figure 2

Threshold curves for blood loss, operative time, and adverse events. Horizontal axis represents amount procedures before flagging in case of out-of-control performance (OR 2.0 compared to average performance). When performing exactly on average, flagging will occur as frequent as depicted on the vertical axis

Once one of the three CUSUM graphs signals, one should analyze at least 20 of its past performed procedures using a concise checklist, as depicted in Table 2. Five fields address possible causes. If one or more fields are ticked once ore more, this field should be studied and addressed in particular. This checklist is not validated yet.

Table 2 Check list after signaling of CUSUM graph

Web-based non-commercial and protected application is available in order to process the proposed CUSUM graphs in the field of LH in order to provide the surgeon his/her performance statistics at a glance (https://www.qusum.org). The program is primarily designed for a national multicenter validation study; however, one is free to register and apply the application. This software should be easily integrated with (existing) data recording systems in the near future. The five characteristics (uterus weight in grams, body mass index (kg/m2), number of previous abdominal surgeries, one or two surgeons, type of LH, and the three primary outcomes (operative time in minutes, blood loss in milliliters, and adverse event) can be entered immediately postoperatively or at any given moment.

Discussion

With proposed validated and risk-adjusted CUSUM graphs, gynecologists have the ability to continuously monitor their surgical performance in laparoscopic hysterectomy, consequently identifying suboptimal factors with respect to operative time, blood loss, and adverse events. As a result, they are able to enhance patient safety.

Despite correction for patient case mix (i.e., identified risk factors), this analysis model still inevitably yields flagging of surgeons with average clinical performance. This is due to the sensitivity of the model. If the CUSUM analysis has to identify derailing performance (OR 2 compared to average performance) in surgeons within a reasonable number of procedures (i.e., 20 laparoscopic hysterectomies), occasional flagging of surgeons with average clinical performance is inevitable. These proposed cutoff limits are set primarily to identify possible suboptimal situations and to enhance patient safety. The goal is twofold. Firstly, by alarming out of control limits in a timely fashion, the surgeon can evaluate his/her performance as well as of its surgical team and even its equipment and act if necessary. Secondly, by providing (national) averages as a standard of care, hypothetically at long-term, also suboptimal performing surgeons that do not cross the out-of-control line will improve their outcomes.

Although this proposed CUSUM system for laparoscopic hysterectomy is based on national averages of Dutch cohort in 2009, we suggest that the reference values are applicable to every gynecologist. The proposed cutoff values might appear “mild.” However, if these values are raised, as a consequence, signaling will be delayed. This will result in less adequate flagging of potentially derailing performance.

If implemented in a straightforward digital registry tool (or stand alone computer program), this CUSUM for LH provides easy to understand and swift to apply insight into tailor-made proficiency curves. We suggest that out-of-control signaling should primarily be discussed internally and only after a certain acclimatizing period should be discussed with expert peers in order to identify suboptimal care and to provide “Best Practices.”

A number of aspects of the proposed model should be addressed. Firstly, is the average signaling rate of one in 75 procedures in surgeons with average clinical performance acceptable? Yes, however, proper information and efficient evaluation are a prerequisite. Time-consuming evaluation will harm initial motivation. When a CUSUM chart goes out of control, one should be provided with a concise check box-based questionnaire in order to signal the origin of derailing performance (Table 2). This could be due to skills, technical issues, misjudging of a series of cases, problems with the OR team, etc. These issues should be directed. Secondly, ideally, the CUSUM chart (and preferably also its evaluation system) should be integrated and implemented in an already existing electronic patient file system. Registration of patient data in multiple sources will affect quality and quantity of data. Thirdly, the national averages set in this tool should be updated on a frequent basis, preferably every 5 years. Hypothetically, the cohort will improve its surgical outcomes over time. As a result, averages and out-of-control limits should be fine-tuned as well.

An example is found in the field of (surgical) oncology in which the value of continuous quality assurance is well studied [2123]. However, these examples use evaluation of care on a yearly basis and often lack correction for patient case mix. Furthermore, most of these registries use adverse events as sole primary outcome and direct hospitals rather than surgeons personally. Some registries reflect hospital outcomes to national averages; however, most systems compare to (outdated) literature. CUSUM analysis addresses all abovementioned points of interest.

For a start, the CUSUM should be applied and compared indoors only. By means of a multicenter prospective cohort study, the proposed cutoff values are validated as well as the feasibility of this system should be researched. More information as well as the web-based CUSUM tool can be found on www.qusum.org. In conclusion, applying CUSUM charts as quality assurance for the surgical performance and clinical outcome measures in LH might enhance patient safety.