Introduction

Efforts to evaluate surgical performance are increasing [1, 2]. The relationship between a surgeon’s experience and individual performance has been well evaluated at the procedure-specific level [3,4,5]. However, previous studies regarding the learning process mostly report the partial experience of a limited number of surgeons [6,7,8]. These reports fail to describe the heterogeneity among surgeons and to display a generalized index to measure a surgeon’s proficiency [9, 10].

Reduced operation time has traditionally been assumed to be the standard measurable parameter to determine the overcoming of the learning curve in the field of surgery [11, 12]. Other clinical markers such as complication rate and reoperation rate have been studied to determine the learning curve of colon surgery; however, this approach has not been yet been generalized in the field of surgery [13].

Gastrectomy with radical lymph node dissection is regarded as the standard treatment to increase survival for gastric cancer patients with a locally advanced disease [14,15,16]. However, gastrectomy is a demanding technique, especially in older patients with higher body mass index and multiple comorbidities [17, 18]. Although a higher proficiency in gastrectomy has been related to a superior postoperative outcome and patient survival compared to initial experiences, a universal metric to comprehend the overcoming of the learning process in gastrectomy remains limited [19, 20].

To address this, we investigated the influence of surgical experience on a surgeon’s performance in radical gastrectomy in terms of the clinical outcome and actual survival outcome. The aim of this study is to determine an indicator to predict the overcoming of the learning curve of distal gastrectomy in gastric cancer surgery.

Materials and method

Study cohort

The study cohort consisted of 2100 consecutive gastric cancer patients who underwent open radical distal gastrectomy performed by nine surgeons from eight institutions in Korea (Chonbuk National University Hospital, Dong-A University Hospital, Hallym University Dongtan Sacred Heart Hospital, Keimyung University Dongsan Hospital, National Cancer Center, Seoul National University Hospital, Soon Chun Hyang University Bucheon Hospital, and Yonsei University Severance Hospital) between January 2001 and December 2006. The surgeons who finished fellowship training and started their practice as independent staff members in tertiary hospitals during the period were enrolled in the study.

All surgeons had a minimum of 2 years of surgical fellowship training for gastric cancer in a tertiary institute before enrolling patients. All information and patient data were obtained after approval of the institutional review board in all institutions in accord with the ethical standards of the Helsinki Declaration of 1975. Patients who underwent adjacent organ resection or combined resection were excluded.

Data collection

The database from the retrospective review consisted of patient age, sex, BMI (kg/m2), comorbidities, surgical parameters (extent of lymph node dissection, residual tumor status, operation time), pathological outcomes (size, differentiation, lymphovascular invasion, venous invasion, margin status, number of harvested lymph nodes, 7th AJCC TNM staging), clinical outcome (complication, hospital stay) and survival outcome of the 5-year follow-up. Postoperative complications were defined as adverse events that occurred during the primary admission or any complication that occurred within 30 days after an operation and was graded using the Clavien–Dindo classification [21].

CUSUM chart and change point analysis

The CUSUM value was defined as Sn = \(\sum (\text{Xi} -\text{Xo})\), where Xi = 0 for success or 1 for failure and \(\text{Xo}\) was set at 0.9 for a 90% acceptance rate; charts were built from this formula setting [22]. This CUSUM chart recognizes failure in a case-sensitive manner and displays the results as a change in slope in the CUSUM chart.

For any acceptable outcome, the CUSUM curve runs horizontally (slope gradient = 0), and as the degree of failure increases, the slope of the curve will tend to incline. CUSUM charts were studied for each surgeon for two different parameters (clinical outcome and operation time). The two CUSUM charts were analyzed and compared for goodness of fit for gastrectomy in terms of short- and long-term outcomes.

General usage of the term ‘change point’ indicates ‘the time at which a change began to occur’. In this study, to apply an objective and unified method to detect the exact point of change in the CUSUM charts, the change-point analysis (CPA) technique was used (Change-point analyzer, Taylor Enterprises, Illinois, http://www.variation.com). From this method, we can identify (1) the presence of a change in trend, (2) the number of changes, (3) the precise site (case number) of the change and (4) the confidence level of the change in each CUSUM chart.

Clinical outcome CUSUM

First, we hypothesized that outcomes such as severe morbidity, extended hospital stays, positive resection margins and insufficient nodal dissection are important indicators in performing radical distal gastrectomy [23]. In this analysis, each case was considered a failure when at least one of the following criteria was evident: (1) the complication grade was greater than the Clavien–Dindo classification II, (2) the postoperative hospital stay was longer than 30 days, (3) the number of retrieved lymph nodes was less than 16 and (4) the proximal resection margin was positive in the final pathology [21, 23,24,25]. To investigate the given parameters as a learning index, a clinical performance-based CUSUM chart was formulated for each surgeon to document the change of performance failure over time.

Operation time CUSUM

In a second analysis, the operation time was used to determine the surgeon’s surgical adaptation performance. A CUSUM chart was derived in terms of operation time, and the trend in operation time was evaluated using a change-point analysis for each surgeon.

The operation time at the change point was compared with the mean operation time of the initial 30 cases, and its reduction rate was studied. The correlation between the reduction rate and operation time and the probability of post-CP were investigated.

Identification of the changing point and comparison of groups before and after the changing point

Two CUSUM charts were built for each surgeon in terms of the clinical parameters and operation time. The change-point analyzer was applied to each CUSUM chart, and the actual change point was calculated.

After identifying the change points (CPs) for each surgeon, the patients were divided into two phases. Pre-CP (learning phase) was defined as the patient group before the change point (CP) and post-CP (post-learning phase) as the patient group after the change. The clinicopathologic outcomes and survival data were compared.

Statistical analysis

The Chi-square value was calculated, and Student’s t test was applied for mean comparison. Survival outcomes were studied with the Kaplan–Meier method, and the log-rank test was performed to compare the survival outcomes. In the analysis, time was defined as the time from surgery to overall death. All correlation analyses were performed using bivariate correlation analyses. All tests were two sided, and a p value < 0.05 was considered significant in all studies. All statistical analyses were performed using IBM SPSS Statistics version 21 (IBM Inc., Chicago, IL, USA).

Results

Patient features and clinical outcome

Of the 2100 patients, 1461 patients were male (69%) with a mean age of 58.7 ± 11.6 years and a mean BMI (kg/m2) of 23.3 ± 3.2. Comorbidity was present in 40.4% of the population; hypertension was the most common (20.4%), followed by diabetes (11.5%), pulmonary disease (2.0%), liver disease (4.6%), cardiac disease (3.6%), cerebrovascular disease (1.9%), other malignancies (0.7%), renal disease (0.6%) and other conditions (5%). The majority of the patients underwent D2 dissection (86.5%), with R0 resection (98.9%). With regard to the lymph nodes, the mean number of retrieved nodes was 37.8 ± 15; 87 (4%) patients had fewer than 16 harvested nodes. The distributions of stages I, II and III were 54.6%, 19.2% and 26.2%, respectively, and 40.7% of the patients underwent chemotherapy.

The pathologic stage and clinical outcomes for each surgeon are summarized in Table 1. The volume of patient enrollment (112–355) and mean operation time (171.5 ± 53.4, 135.5–229.9, minutes) varied among surgeons. The overall complication rate was 17.5% (n = 368), and the rate of occurrence of complications considered greater than a Clavien–Dindo grade II complication was 6.0% (n = 127). Mean postoperative hospital stay was 11.5 ± 7.0, and the proportion of patients who were admitted for longer than 30 days was 2.3% (n = 48).

Table 1 Clinicopathologic outcomes of each surgeon

Clinical outcome CUSUM analysis and surgical performance

The CP values regarding the clinical outcome of each surgeon were 37,194, 198, 102, 196, 73, 70 and 74 for surgeons A, B, C, D, E, F, H and I, respectively (Table 2). The number of cases in the pre-CP group was 944 and that of the post-CP was 956. Surgeon G showed no CP in this analysis; thus, the data were excluded from this analysis. In comparison to pre-CP (18.4%), post-CP (14.6%) showed a decrease in complication rate (p = 0.031) and a decreased postoperative stay (11.8 vs 10.7, p = 0.001). Surgeons A, C, D and F showed significantly higher numbers of harvested lymph nodes in post-CP compared to pre-CP. However, overall harvested lymph nodes showed no difference between the pre-CP and the post-CP group.

Table 2 Group comparison by clinical outcome CUSUM

Operation time CUSUM analysis and surgical performance

With respect to the CUSUM analysis generated based on the operation time, all surgeons displayed a CP (Table 3). The number of cases in the pre-CP group was 730 and that of the post-CP was 1370. In this comparison, post-CP showed a larger mean number of harvested lymph nodes (34.1 vs 39.8, p = 0001), a decreased rate of less than 16 harvested lymph nodes (6% vs 3.1%, p = 0.001) and a less positive rate of proximal resection margins (0.4% vs 0%, p = 0.042).

Table 3 Group comparison by operation time CUSUM

Survival outcome in clinical score CUSUM versus operation time CUSUM

In the clinical score CUSUM, there was no difference in overall survival between the different phases (82.1 vs 83.1, p = 0.622), and no differences were noted when stage by stage comparisons were conducted (stage I, 96.2% vs 93.2%, p = 0.180; stage II, 87.7% vs 84.1%, p = 0.109; stage III, 53.5% vs 60.0%, p = 0.423).

However, when the CP was divided by the operation time, post-CP showed a better overall survival rate (79.4% vs 83.5%, p = 0.013) and a higher survival rate in stage II (76% vs 86.1% p = 0.010) and stage III (51.5% vs 60.6% p = 0.013) compared to pre-CP. (Fig. 1).

Fig. 1
figure 1figure 1

Survival analysis in operation time CUSUM model vs clinical score CUSUM model. a Survival analysis in operation time CUSUM model showed no difference between Pre-CP and Post-CP in stage I (5-year survival rate: Pre-CP 94% vs Post-CP 94.1%, p = 0.798). b Post-CP group showed better survival than Pre-CP in operation time CUSUM model for stage II (5-year survival rate: Pre-CP 76.1% vs Post-CP 86.1%, p = 0.010). c Post-CP group showed better survival outcome in operation time CUSUM model for stage III (5-year survival rate: Pre-CP 51.5% vs Post-CP 60.6%, p = 0.013). d Survival analysis in clinical score CUSUM model showed no survival difference in stage I patients between Pre-CP and Post-CP (5-year survival rate: Pre-CP 96.2% vs Post-CP 93.2%, p = 0.180). e Survival analysis in clinical score CUSUM model showed no survival difference in stage II patients between Pre-CP and Post-CP (5-year survival rate: Pre-CP 87.7% vs Post-CP 84.1%, p = 0.109). f Survival analysis in clinical score CUSUM model showed no survival difference in stage

Operation time and case volume

To understand the degree of reduction in operation time, the operation time at the point of CP was compared to the mean operation time of the initial 30 cases. (Table 4). At CP, the operation time decreased from 25.7 to 42.4% compared to that of the initial 30 cases. The relationship between reduced operation time and allotment for post-CP is depicted in Fig. 2. The probability for post-CP allotment increased as the reduction in operation time increased. For example, after experiencing a 40% decrement in operation time compared to the mean operation time of the initial 30 cases, there is a 97.3% probability that the case will correspond to a post-CP case.

Table 4 Reduction of operation time according to surgeon
Fig. 2
figure 2

Relationship of operation time reduction and post-CP allotment. The probability of overcoming learning curve is differed by reduced operation time compared to the mean time of initial 30 cases. When operation time is reduced by 25% compared to the mean operation time of the initial 30 cases, surgeons have 72.4% probability of overcoming the learning curve. When operation time is reduced by 40% compared to the mean operation time of initial 30 cases, surgeons have 97.3% probability of overcoming the learning curve

The overall changing pattern of operation time and retrieved lymph node numbers according to case volumes is depicted in Fig. 3. A gradual decrease followed by a plateau in operation time is shown, with a gradual increase in the number of retrieved lymph nodes. In contrast, the trend of the clinical outcomes indicates a steady pattern throughout the analysis.

Fig. 3
figure 3

Trend of operation time and clinical outcomes. a Overall, the mean operation time of case numbers shows a gradual decrease and the mean harvested lymph node numbers of cases show a steady increase with experience. b The complication rate, severe complication rate (Clavien–Dindo grade ≥ III) and patient proportion of long hospital day stay (≥ 30 days) show a constant plateau despite experience

Discussion

Emerging demands to understand the process of learning have placed an emphasis on methods to evaluate a surgeon’s performance. To our knowledge, this is the first multicenter study to quantify the long-term learning experience of gastrectomy among surgeons beyond their fellowship training. The clinical and oncologic outcomes were collected and analyzed, beginning from the initial experience to late-period performances. Using these data, we tested which parameter may be used as a surrogate to precisely determine the cutoff point of the learning curve in terms of actual survival.

The major finding of this current study is that regardless of surgeon, operation time decreased, and the number of retrieved lymph nodes increased after a certain amount of surgical experience, while other clinical outcomes such as the rate of occurrence of severe complications, excessively long hospital day stays and positive proximal margin status did not show a significant change over time. In our analysis, the surgical experience required to perform acceptable surgeries exceeds the experience required to achieve a steady morbidity rate. Patient groups associated with a decrease in operation time demonstrated a significant difference in the retrieved number of lymph nodes, resulting in a better survival outcome; this should be considered as the surrogate marker for the overcoming of the learning curve.

Since longer procedure times and higher complication rates are initially anticipated, most studies apply two parameters to investigate the learning curve in surgical performance: operation time and clinical outcome [26, 27]. In a previous study of laparoscopic colon cancer surgery, clinical outcomes such as complications, conversion rate and re-admission rate were considered a better index for the determination of the learning curve than operation time. The results showed that shorter operation times did not have better outcomes in terms of conversion rate and hospital re-admissions compared to clinical outcomes [13]. Initially, we hypothesized that better clinical outcomes such as severe complications or hospital day stay could ensure better surgical outcomes and lead to improved survival. However, according to our data, complication (severe) rates and hospital stay did not have an inverse relationship with survival rate.

In radical gastrectomy, the quality of complete removal of the lymph nodes around the feeding vessels and adjacent organs is important. At the same time, it is challenging and time consuming to perform this procedure without morbidity. According to our data, the surgical experience required to ensure safe lymph node dissection exceeds the experience required to maintain a steady morbidity rate, and this could be measured more adequately by operation time rather than by the clinical outcome.

The number of metastatic lymph nodes correlates well with prognosis, and completeness of radical lymph node dissection is the most important step for increased survival [28]. In gastric cancer, there is a general agreement that dissection of a sufficient number of lymph nodes (15 or greater) is of great benefit to provide adequate and accurate postoperative N staging, and the number of examined lymph nodes is a potentially independent factor associated with the prognosis of gastric cancers [29]. Therefore, the number of retrieved lymph nodes could be considered a good index for determining surgical experience and proficiency.

In our study, operation time was more important than the clinical parameters in understanding the correlation between the learning process and patient survival. Operation time was decreased as surgical experience accumulated and eventually reached a plateau after a certain amount of surgical experience was accumulated by all surgeons, therefore displaying the classic form of a learning curve. Although operation time can be influenced by many patient factors such as sex and BMI, there were no significant longitudinal differences in patient demographics between the surgeons. Therefore, we can say that operation time can decrease after considerable experience, despite the presence of many factors that may affect operation time.

The CUSUM chart is primarily used for two purposes: assessing a learning curve and quality control [18, 30]. Since its introduction, it has been applied in many surgical performance studies; however, the method used to define the cutoff point of learning is arbitrary among the studies and only few articles have addressed this issue [30,31,32]. In our analysis, we required a uniform method to interpret multiple CUSUM plots in different surgeons, and this conundrum was resolved by using the CPA technique.

In general, the surgical outcomes can be influenced by many factors, and surgical performance may display fluctuations resulting in multiple changes over time. A suitable changing point analysis technique should detect all changing points in terms of changes in behavior; likewise, it should analyze the priority of the change and, most importantly, show the direction and the strength of the change. Since the CPA was introduced by Taylor (http://www.variation.com) to report the CP with estimated confidence intervals using the bootstrap method, it was considered suitable for our analysis. With this method, we can successfully define the first significant change as the change point for all surgeons.

Our study has several limitations. First, we did not investigate the factors that affect individual differences in obtaining proficiency. For example, the present study could not include the amount of practice each surgeon obtained during their period of fellowship training. Second, we did not include data from laparoscopic surgeries. Laparoscopic surgery was introduced during this period, bringing changes in the treatment approach for gastric cancers [33]. In the latter phase, selected patients, for whom surgery was indicated, underwent laparoscopic gastrectomy, whereas patients with a higher BMI and comorbidities underwent conventional gastrectomy. T1 has raised concerns that this latter portion of the population might have had a negative impact on the overall patient outcomes. However, patient population demographics showed no significant changes over time; therefore, this paradigm shift was not problematic. Another issue is that current outcome is based on specialized high-volume centers in East Asia, which raises the question whether it is applicable to the West, where prevalence of gastric cancer is low and patient demography is quite different. For clarification, further investigation is required for a wider application of our results.

One of the strengths of this study is that it is a multicenter study including a large amount of prospectively registered data for consecutive patients in contrast to most of the published papers on this topic, which consist of a single surgeon’s experience.

Overcoming the learning curve of distal gastrectomy for gastric cancer can be better predicted by operation time rather than by the combination of several postoperative clinical parameters. It is recommended that surgeons operate on stage I cancer patients initially before overcoming the learning curve.