Introduction

Surgical audit has become increasingly important in the modern healthcare system. Previous risk models for surgical audit have focused on short-term outcomes, such as postoperative mortality. Copeland et al. [1] from the UK devised a scoring system for a comparative audit of postoperative mortality and morbidity, designated as the Physiological and Operative Severity Score for the enUmeration of Mortality and morbidity (POSSUM). In the USA, the Department of Veterans Affairs established the National Surgical Quality Improvement Program in 1994 [2]. This program was expanded to private sector hospitals in collaboration with the American College of Surgeons [3, 4]. In Japan, we developed a prediction model for the postoperative mortality of general gastrointestinal surgery, designated as Estimation of Physiologic Ability and Surgical Stress (E-PASS) [57]. We recently modified this model, termed mE-PASS, to maintain the discrimination power with a reduced number of variables [8]. We also demonstrated its efficacy in a specialty field of surgery for colorectal carcinoma [9], gastric carcinoma [10], liver carcinoma [11], and choledochocystolithiasis [12].

Although these efforts have made significant progress in outcome research, surgical audit for long-term outcome has rarely been investigated in any type of cancer. In this study, we aimed to generate a prediction model for overall survival (OS) following gastrectomy for gastric cancer and propose a method to gauge long-term outcome.

Patients and methods

Study design

This study was conducted as part of the multicenter prospective cohort study ‘Establishment of quality of care among hospitals in digestive surgery’ (UMIN000001410). This cohort study was conducted to investigate the effectiveness of mE-PASS for surgical audit regarding postoperative mortality and to generate a new model to predict long-term survival following surgery for digestive cancers. The results of mE-PASS have been previously reported [8]. This article addresses only the long-term outcome of gastric cancer resection. The protocol required institutional investigators to obtain written informed consent from all participating patients and was approved by the Central Ethics Committee of the National Hospital Organization (NHO), Japan, on 1 November 2004. Eighteen NHO hospitals registered patients for this analysis.

Patients

Eligibility criteria were patients undergoing any type of elective resection for gastric cancer in the operating room. Exclusion criteria were as follows: (1) patients who did not sign the consent forms, (2) those who had concomitant cancer of different organs, (3) those who had a history of cancer in the previous 5 years, (4) those who received concomitant surgery in different surgical fields, and (5) those who received endoscopic resection [8]. Patient entries were made between 1 April 2005 and 8 April 2007.

Surgical procedures

Surgical procedures for gastric cancer have been standardized for a long time in Japan. In guidelines by the Japanese Gastric Cancer Association (JGCA) [13], systematic lymph node dissection, termed D2, has been recommended for most tumors, except for stage IA. Endoscopic resection has been recommended for lesions of differentiated-type adenocarcinoma without ulcerative findings, of which the depth of invasion is clinically diagnosed to be confined to the mucosal layer (cT1a) and a diameter of less than 2 cm. Wedge resection of the stomach is occasionally applied when endoscopic resection fails to obtain a sufficient resection margin against the indicated lesions. D1 gastrectomy has been recommended for other lesions of cT1a and no lymph node metastasis. D1+ gastrectomy has been recommended for a lesion of differentiated adenocarcinoma where the depth of invasion is confined to the submucosal layer and the lesion is less than 1.5 cm in diameter without regional lymph node metastasis. These policies have been widely distributed throughout the country.

Data collection

Data of 10 variables for E-PASS and 20 variables for POSSUM were prospectively collected along with 18 variables of pathological findings and 6 variables regarding surgical procedures. Pathological and surgical findings were recorded according to the JGCA classification [14]. Data for UICC’s M category and N category were also collected, and we analyzed data using the UICC stage [15]. Data were inputted in a database at the website of the Japan Clinical Research Assistant Center (JCRAC), Tokyo, Japan. Postoperative complications were classified according to Clavien’s classification [16]. The primary end point was postoperative OS. Patients were followed every 3–6 months by the attending doctor. To ensure the reliability of these measurements, institutional investigators were required to periodically input survival status as well as confirmed dates for survival or death in the database. Follow-up data entries were discontinued on 16 April 2012 so that the last patient registered could be followed up to 5 years after surgery.

Statistical analysis

All statistical analyses were performed using the SPSS 17.0 (SPSS Inc., Chicago, IL, USA) software program. OS rates were calculated by the Kaplan-Meier method. Univariate analysis for OS was performed using the log-rank test. Two-tailed P values less than 0.05 were considered significant. Cox’s proportional hazard analysis was used to generate a prediction model for OS. To assess proportionality of the model, we plotted −log[−log(−(S(t))] curves for the independent variables of the newly devised Cox model [17]. As previously reported, model discrimination and calibration were evaluated by the area under the receiver-operating characteristic curve (AUC) and the Hosmer-Lemeshow goodness-of-fit test, respectively [18]. Categorical variables between groups were compared using the chi-square test with Yates' correction for continuity where appropriate. Correlation between interval and interval variables was analyzed by Spearman’s rank correlation (ρ). Correlation between different continuous variables was analyzed by Pearson’s correlation coefficient (r).

The ratio of observed-to-estimated 5-year OS rates (OE ratio) was used as a metric of the quality of care between hospitals. When an OE ratio of a hospital is greater than 1, the observed survival rate is higher than the expected rate, indicating that the quality of care is better than expected. In contrast, an OE ratio smaller than 1 indicates a poorer performance. For the comparison of OE ratios between hospitals, a sample size determination was made based on the hypothesis that a difference between the 5-year OS rate of 72 %, the baseline rate of our subjects, and that of 42 % was significant. If we determined the α error to be 0.05 and the β error to be 0.2, the sample size needed was 42. Therefore, we compared OE ratios between hospitals that registered more than 42 patients.

Results

Of the 796 patients enrolled, three were found not to have undergone resection and were therefore excluded from the analysis. Of the remaining 793 patients, 31 lacked enough data for the analysis. Accordingly, we analyzed 762 patients (group A) to develop a model to predict overall survival. Among them, 697 patients (91 %) completed the 5-year follow-up (group B). The reason for failure to follow up was disconnection due to a patient moving, transfer to another hospital, or disappearance without notice. We analyzed the accuracy of the model using patients with full follow-up data. Table 1 shows the demographic data of group A. Among the patients, 98 patients (13 %) were 80 years or older. Median values for numbers of dissected lymph nodes were 4 for D0, 19 for D1, 30 for D2, and 44 for D3.

Table 1 Demographic data of group A (n = 762)

Univariate analysis for postoperative OS identified 13 physiologic- and 9 tumor-related variables as significant from the 29 variables (Table 2). Using these significant variables, we performed a stepwise increase method of the Cox regression analysis to obtain a prediction model for OS (Table 3). We designated this model as the Estimation of Postoperative Overall Survival for Gastric Cancer (EPOS-GC). To assess whether the adequate variables were incorporated into the model, we checked the proportionality of the independent variables (Fig. 1). The curves did not cross each other and remained parallel in all the variables.

Table 2 Univariate analysis of prognostic factors for overall survival in gastric carcinoma resection
Table 3 Equation to predict postoperative overall survival in gastric carcinoma
Fig. 1
figure 1

Assessing proportionality of EPOS-GC variables. In the case that the number of cases in a category was ≤50, the category was merged into the next category. For example, in the depth of invasion, the T4 category (n = 28) was merged into the T3 category (n = 142). S(t) survival function, Circ circumferential involvement

The EPOS-GC yielded the high χ 2 value of 519 in the Cox hazard analysis, which was comparable to that of the model where all 22 significant variables were incorporated (χ 2 value 563). Furthermore, the predicted 5-year survival rates of the EPOS-GC were significantly correlated with those of the model utilizing all 22 significant variables (ρ = 0.93, n = 697, P < 0.0001). As shown in Fig. 2, the EPOS-GC demonstrated good discrimination power as the AUC (95 % CI) was 0.89 (0.86–0.91), which was significantly higher than that for the UICC TNM stage of 0.81 (0.77–0.84) and that for the residual disease status of 0.60 (0.56–0.65). Furthermore, the AUC (95 % CI) for the model utilizing all 22 significant variables was 0.89 (0.86–0.91), being the same as that of the EPOS-GC. The calibration power of the EPOS-GC was also good as judged by the Hosmer-Lemeshow test (χ 2 = 23.5, df = 8, P = 0.77).

Fig. 2
figure 2

Receiver-operating characteristic curve analysis to predict 5-year overall survival. The 5-year death rate of the current model, the EPOS-GC, was obtained by (1—the predicted 5-year survival rate) and was used for this analysis. UICC definitions were used for the TNM stage and residual disease status (Ref. [14]). EPOS-GC Estimation of Postoperative Overall Survival for Gastric Cancer

Subsequently, we investigated the effects of predicted OS rates on survival in each TNM stage (Fig. 3). In stage I patients, the observed 5-year OS rates were 0 % at the predicted OS rates (Y) <0.5 (n = 3). The observed rates increased to 38 % at 0.5 ≤ Y < 0.7 (n = 16) and 89 % at Y ≥ 0.7 (n = 311). All patients with Y < 0.7 (n = 19) were 70 years or older, 13 of whom (68 %) were 80 years or older. In stage IV patients, the observed 5-year OS rates were only 7.7 % at Y < 0.5 (n = 52), but it increased markedly when the predicted rates (Y) increased. The patients at 0.5 ≤ Y < 0.7 (n = 8) included five patients of 70 years or older, but all the patients at Y ≥ 0.7 (n = 3) were under 70 years old.

Fig. 3
figure 3

Effects of predicted overall survival rates on survival in each stage. Overall survival (OS) curve was plotted according to the predicted OS rates (Y) determined by the EPOS-GC in each stage

Subsequently, we determined the OE ratios of the participating hospitals and found that they did not vary significantly between hospitals (Table 4). When we checked the OE ratios of the model utilizing all the significant variables, they were significantly correlated with those of the EPOS-GC (r = 0.79, n = 10, P = 0.0061).

Table 4 Comparative audit of postoperative overall survival among centers

Discussion

In this study, we generated a model to predict postoperative OS using Cox hazard analysis in light of physiologic status and tumor characteristics, and compared the quality of care between hospitals. Although the findings were not validated in an external data set, this report may lead to staging a risk-adjusted surgical audit for long-term outcome in gastric carcinoma surgery.

Previous studies for surgical audit have been focused on short-term outcome, such as perioperative mortality. Nevertheless, some concerns would surface among surgical oncologists if soley a comparative audit regarding perioperative care was performed. Hospitals that tend to perform operations with fewer nodal dissections would get higher rankings. Therefore, surgical audit regarding long-term survival should be performed simultaneously in order to confer decent evaluation to the referral hospitals. When evaluating long-term survival, we must keep in mind the patients’ physiological conditions. As an unprecedented aging society has come to developed countries, how to deal with the growing number of individuals with multiple chronic conditions (MCC) is a major task for the modern healthcare system [19]. These people sometimes go back and forth between long-term care facilities and hospitals. It is often difficult to predict what disease will terminate the patient’s life. Therefore, it is inappropriate to compare stage-stratified OS rates between hospitals as a metric of the quality of care. Hospitals treating higher rates of MCC patients will get lower rankings. In contrast, hospitals selecting non-MCC patients will get high rankings. Therefore, risk adjustment regarding the physiological status and tumor characteristics is essential for surgical audit. Costa et al. [20] constructed a prognostic score for gastric cancer using clinical, pathological, and therapeutic variables in order to obtain a comprehensive prognostic parameter. This score, which consists of six variables, revealed a better predicting power for OS than the TNM stage. In this study, we proceeded with this idea and obtained the equation for predicted OS rates using a Cox analysis. The EPOS-GC allows us to calculate the OE ratios between centers, leading to the comparative audit. Of course, it is conceivable that all potential confounding factors should be incorporated for the surgical audit. Nevertheless, the EPOS-GC gained the same AUC value as the model in which all the significant variables were incorporated. Furthermore, the predicted 5-year survival rates of EPOC-GS highly correlated with those of the model utilizing all the significant variables. Accordingly, the OE ratios of the model utilizing all the significant variables significantly correlated with those of the EPOS-GC. Therefore, we considered that the EPOS-GC can be used for surgical audit instead of the model utilizing all the significant variables. The advantage to using the stepwise model is that a reduced number of variables would be required in a future audit survey if the model is universally validated.

Stage-stratified 5-year OS rates in this study were lower than those in the previous nationwide survey. The 5-year OS in stage III patients was 27 % in this study, while the national registry in 2002 reported it as 41 % [21]. These results may be, at least in part, attributable to the higher rates of elderly people in this study; this study included 13 % elderly patients aged 80 years or older, but the previous survey included only 8 % of this group. Actually, stage III patients in the current study included 15 % (25/165) of patients aged 80 years or older.

In the current model, the depth of invasion and nodal status of the TNM stage were identified as expected as an independent variable, but distant metastasis was not chosen by the Cox regression analysis. This may be because distant metastasis is strongly associated with the depth of invasion and nodal status; therefore, no significant contribution augmenting the accuracy of the model would be added by this factor. Instead, circumferential involvement was identified as an independent variable, which may be unassociated with the depth of invasion or nodal status. Most cases of linitis plastica were included in this category. For physiological status, age and ASA class were identified as independent variables as anticipated. Surprisingly, serum sodium levels were also selected by this analysis. Since many critical conditions are associated with hyponatremia, including heart failure, liver cirrhosis, and renal failure, this variable may augment the predictive power in another direction.

In the process of constructing the EPOS-GC, we did not include variables relating to surgery, such as blood loss during the operation or the level of nodal dissection. This is because the purpose of the model was surgical audit. We consider that inclusion of surgery-related variables might interfere with the results of OE ratios. On the other hand, we wish to know whether the inclusion of surgery-related variables would raise the predictive power for OS. Special attention was paid to the ratio of metastatic lymph nodes (RML), since previous studies demonstrated that RML was superior in predicting OS than the number of metastatic nodes [22, 23]. Consequently, inclusion of surgery-related variables including RML did not augment the AUC values as compared with the current model. Therefore, we understand that there is no merit to adding surgery-related variables.

The current study did not demonstrate a significant difference of the OE ratios between hospitals. The reason may be explained by the permeation of standardized surgical procedures for gastric cancer. Since gastric cancer has long been one of the most frequent subjects for surgical oncology in Japan, most surgeons have obtained enough of a caseload in their carriers. Therefore, there may be no significant difference between the hospitals.

The main limitation of this study is that participating hospitals did not include high-volume hospitals for gastric cancer surgery, since this study was conducted within the NHO hospitals. Furthermore, the study protocol required informed consent from the patients; therefore, all the patients who qualified were not incorporated into this study. Validation studies will be be necessary including high-volume centers with a complete enrollment of the qualified patients. Another limitation was that this study was done only in Japan. Since D2 gastrectomy has now prevailed throughout the world [2428], validation studies are presumable in other countries.

In conclusion, this preliminary study suggests the possibility of surgical audit for long-term outcome. If validated in the future, the current methodology would open the door to a new field of outcome research.