Introduction

Although the incidence of gastric carcinoma (GC) is declining, the disease remains the second most common cause of cancer-related death worldwide [1]. Fortunately, because of the increase in early detection, the survival of patients with GC has improved, particularly in Eastern countries such as Korea and Japan. However, the prognosis of patients with advanced gastric cancer (AGC) is still poor despite the availability of aggressive multimodal therapies [2]. Currently, the major issue in treating GC is how best to individualize therapy based on tumor characteristics as well as patient factors. For example, to improve quality of life as well as obtain early recovery, minimally invasive treatments such as endoscopic submucosal dissection (ESD) and laparoscopic gastrectomy have been proposed as a suitable alternative to conventional open surgery in patients with early gastric cancer (EGC) [3]. Radical surgery with extended lymph node dissection is a standard approach for AGC in Asian countries, whereas chemoradiation has an important role in the West [4]. Additionally, molecularly targeted therapies have been extensively studied to provide a more personalized treatment [5].

The ability to predict the precise prognosis of a patient is critical for selecting the optimal treatment plan and follow-up strategies. We have learned the following from our experience. First, the only reliable prognosticator is the extent of disease at the time of diagnosis, which is expressed as tumor, node, metastasis (TNM) stage. Second, however, even within the same tumor stage, heterogeneous clinical courses are frequently observed. Third, there is no consensus on the best treatment strategy for GC. Moreover, treatment results differ between the East and the West.

A better staging system is needed, which considers patient and treatment factors as well as tumor characteristics. The nomogram is a statistics-based tool that provides the overall probability of a specific outcome [6]. Because it reflects not only the tumor characteristics but also the host status, the nomogram is able to incorporate more clinicopathological parameters than the TNM staging system. Thus, it can provide the clinician with a better estimate of the prognosis for a patient. Previous comparisons with risk grouping approaches in several diseases suggest improved predictive accuracy [79]. Another potential benefit of the nomogram is that, with a simple graphical representation of a statistical predictive model, it generates a numerical probability of a clinical event [6].

To date, two nomograms for GC have been developed and validated, one by the Memorial Sloan-Kettering Cancer Center (MSKCC) and another by an Italian group [10, 11]. However, direct application of the Western nomograms to Eastern patients with GC may be inappropriate for reasons of the different treatment strategies as well as possible ethnic variances.

The purpose of this study was to develop a prognostic nomogram for patients with GC in a high-volume center in Korea where radical surgery with extended lymph node dissection is the standard surgical procedure. We also compared the predictive accuracy of this nomogram with that achieved by the American Joint Committee on Cancer Staging (AJCC) TNM staging system.

Methods

Patients

Between 1995 and 2005, 1,646 patients who underwent a curative intent R0 resection for GC and had no other cancer history were included. Patient data were prospectively collected from The Gastric Cancer Patients Registry of the Seoul St. Mary’s Hospital, The Catholic University of Korea. Radical surgery with more than D2 lymph node dissection was carried out in 89 % of the patients and lymph node retrieval in less than 15, consisting of 3 %. Patients with one or more missing values were excluded from the study. The 1,614 eligible patients were randomly assigned to the test set (n = 805), which was used for model building of the nomogram as well as for internal validation, or to the validation set (n = 809), for external validation within each AJCC stage (Table 1).

Table 1 Patient characteristics

Follow-up evaluation consisted of the patient’s history, physical examination, laboratory tests including detection of tumor markers (carcinoembryonic antigen and carbohydrate antigen 19–9), chest radiography, endoscopy, computed tomography (CT), and bone scintigraphy. These assessments were repeated every 6 months until the third postoperative year, then every year thereafter for at least 5 years. Magnetic resonance imaging, CT of the brain or chest, barium enema, and positron emission tomography were performed only when indicated. Disease status at last follow-up was based on a retrospective review of medical records, telephone interviews, and the registration data at the Korean National Cancer Center. The institutional review board of Seoul St. Mary’s Hospital approved this study.

Statistical analysis

Model-building process: (1) The Kaplan–Meier method and log-rank test for categorized variables and univariate Cox proportional hazard regression model for continuous variables were used to determine the prognostic factors for overall patient survival. (2) We categorized some continuous variables that did not satisfy the linearity assumptions for further analysis. (3) The multivariate Cox proportional hazard regression model was used to derive the formula for constructing the nomogram. Model minimization was performed by stepwise backward elimination, and departure from proportionality in hazard was tested in all Cox models.

Internal validation: Model performance was quantified with respect to discrimination and calibration. (1) For discrimination, we used the Harrell’s c-index, which is similar to the area under the receiver operating characteristic curve but appropriate for censored data to provide the probability that in a randomly selected pair of patients in which one patient dies before the other, the patient who died first had the worse predicted outcome from the nomogram. The bootstrap method was used to obtain a relatively unbiased estimate (200 repetitions). (2) For calibration, we presented the calibration curves for 3- and 5-year survival, which plot the predicted probabilities from the nomogram versus the actual probabilities for groups of patients defined as quartiles. Again, bootstrapping correction was used for this process.

External validation: The same methods, discrimination and calibration, used in internal validation were performed for the validation set.

All analyses were performed using R package with the Design, Hmisc, and Lexis libraries (http://lib.stat.cmu.edu/R/CRAN/) [1215] and SAS for Windows version 9.2 (SAS Institute, Cary, NC, USA).

Results

Disease-specific survival for the test set

For the 805 patients in the test set, the mean follow-up time was 68.1 ± 40.4 months (median, 65.4 months); at last follow-up, 126 (15.7 %) patients had died. Disease-specific survival (DSS) differed significantly (P < 0.0001) according to the 7th edition AJCC stage (Fig. 1). The 3-year survival according to each stage was 99.7 % in Ia, 98.9 % in Ib, 92.4 % in IIa, 80.8 % in IIb, 77.3 % in IIIa, 67.9 % in IIIb, and 52.7 % in IIIc; the 5-year survival was 98.8 % in Ia, 98.9 % in Ib, 84.9 % in IIa, 72.0 % in IIb, 69.2 % in IIIa, 55.2 % in IIIb, and 41.7 % in IIIc.

Fig. 1
figure 1

Gastric cancer-specific survival by AJCC stage, 7th edition

Prognostic factors for survival (test set)

The results of the univariate and multivariate analyses are presented in Table 2. From the univariate analysis, tumor size, depth of invasion, lymph node status, gross type, and Lauren classification were associated with overall survival, whereas age and gender were not. However, we included both age and gender in multivariate analysis to improve the performance of the nomogram. From the multivariate analysis, tumor size, depth of invasion, and lymph node status were significantly associated with overall survival (P < 0.0001).

Table 2 Univariate and multivariate Cox proportional hazards regression survival analysis

Formula and nomogram (test set)

From the multivariate Cox proportional hazard regression model, we obtained the formula for calculating the total point score as follows: I(age <30)*44 + I(age 40–49)*19 + I(age 50–59) × 17 + I(age 60–69)*27 + I(age 70–79)*41 + I(age ≥80)*86 + I(gross type = 1–5)*40 + I(tumor size = 5–10)*11 + I(tumor size >10)*26 + I(depth of invasion = T2)*79 + I(depth of invasion = T3)*75 + I(depth of invasion = T4)*100 + I(Lauren = diffuse or mixed)*3 + I(lymph node status = N1)*34 + I(lymph node status = N2)*58 + I(lymph node status = N3a)*60 + I(lymph node status = N3b >15)*86. The indicator function I(x) is equal to 1 if the statement in the parentheses is true and is equal to 0 otherwise. The total point scores ranged from 40 to 215 with a median of 92. The nomogram for 3- and 5-year survivals is shown in Fig. 2.

Fig. 2
figure 2

Nomogram to estimate the probability of overall survival at 3 and 5 years

Internal validation of the nomogram

We compared predictions from the nomogram with those obtained by using the AJCC staging system. Individual AJCC stage and nomogram predictions were compared for their ability to rank the patients using Harrell’s c-index. The nomogram discrimination was superior to that of AJCC stage [c-index, 0.87 (95 % CI, 0.76, 0.95) vs. 0.77 (95 % CI, 0.65, 0.88); P < 0.001]. The calibration curves for 3- and 5-year survivals are presented in Fig. 3. Nomogram-predicted probabilities versus actual probabilities for 3-year survival were 63.0 vs. 62.7 % in the 1st quartile range, 93.1 vs. 90.8 % in the 2nd quartile range, 97.9 vs. 98.2 % in the 3rd quartile range, and 99.0 vs. 99.1 % in the 4th quartile range. The relative probabilities for 5-year survival were 48.8 versus 48.6 %, 88.8 versus 84.2 %, 96.5 versus 98.2 %, and 98.3 vs. 99.1 % in the 1st, 2nd, 3rd, and 4th quartile ranges, respectively.

Fig. 3
figure 3

Calibration curves for 3- and 5-year nomogram predictions for the test set (n = 805). Diagonal line ideal nomogram, vertical bars 95 % CI

External validation of the nomogram

The performance of the nomogram for the validation set was relatively lower than that for the test set. Nevertheless, the nomogram discrimination was still superior to that of the AJCC staging system [c-index, 0.84 (95 % CI, 0.73, 0.94) vs. 0.79 (95 % CI, 0.66, 0.89); P < 0.001]. As shown by the calibration curves for 3- and 5-year survivals, the nomogram slightly overestimated predicted values for high-risk patients (Fig. 4). Nomogram-predicted probabilities versus actual probabilities for 3-year survival were 59.3 versus 67.4 % in the 1st quartile range, 92.2 versus 88.9 % in the 2nd quartile range, 97.5 versus 98.7 % in the 3rd quartile range, and 98.7 versus 100 % in the 4th quartile range, respectively. The relative probabilities for 5-year survival were 44.8 versus 54.3 %, 87.1 versus 82.6 %, 95.8 versus 97.0 %, and 97.8 versus 99.4 % in the 1st, 2nd, 3rd, and 4th quartile ranges, respectively.

Fig. 4
figure 4

Calibration curves for 3- and 5-year nomogram predictions for the validation set (n = 809). Diagonal line ideal nomogram, vertical bars 95 % CI

Discussion

The ideal tumor staging system provides a correct prognosis of cancer, thus serving as an excellent basis for clinical decision making, planning of new clinical trials, and counseling patients with respect to clinical outcomes. Currently, the most widely accepted cancer staging is the TNM system, which has been updated periodically based on advances in understanding cancer outcomes. In the recently published AJCC 7th edition of the TNM staging system for GC, T and N classifications, which were significantly modified, stratified patients into 20 subgroups [16]. Even though the new TNM system contains more subgroups than the 6th edition, limitations remain that compromise accurate prognosis of each patient. For example, some patients may have unexpected early recurrence and death, although most of the patients with the same-stage disease are alive. Our analysis of the 7th TNM staging system detected a 5-year survival difference ranging from 11 to 34 % within each stage (data not shown). To obtain a better predictive outcome, some investigators suggest the ratio of positive lymph nodes among the retrieved nodes as N classification [17].

Numerous studies have investigated the role of patient-related, tumor-related, or treatment-related factors affecting the prognosis of gastric cancer. Among them, the studies looking for molecular targets have been of great interest. Recently, it has been reported that positive status of human epidermal growth factor receptor 2 (HER2) is associated with poor outcomes [5]. However, molecular marker studies are costly, and long-term results are not yet available. Moreover, it still does not represent the treatment factors.

The prognostic nomogram is a statistics-based approach to predict patient outcome. It directly quantifies patient risk based on the proven prognostic factors without forming risk groups. Nomograms for various kinds of malignant disease have been shown to be useful to estimate individual survival compared with conventional prognosticators [79]. Until now, two typical nomograms have been proposed for gastric cancer. Kattan et al. [10] from Memorial Sloan-Kettering Cancer Center (MSKCC) developed a postoperative nomogram for disease specific survival after curative surgery. In this nomogram, predictor variables were age, sex, primary site, Lauren histotype, number of positive lymph nodes resected, number of negative lymph nodes resected, and depth of invasion. The nomogram predicted DSS with a concordance index of 0.80. The predictive ability was superior to the TNM staging system, especially in stage II and IIIA. Another scoring system to predict recurrence after radical surgery for gastric cancer was developed by an Italian group [11]. The model correctly predicted recurrence with 83.5 % of sensitivity and 81.1 % of specificity, and the ability to predict recurrence was also superior to TNM staging.

Despite these two well-constructed scoring systems for GC, it is uncertain whether they can be used to predict patient outcome in the East. We previously reported the significant differences in clinical characteristics as well as treatment results between the East and the West [18]. Although the reasons of these differences remain unclear, they could be explained by several factors, including differences in surgical procedures, environmental factors, tumor biology, and host genetic factors. To consider the possibility of applying the Western nomogram to Korean patients, we investigated the validity of the MSKCC nomogram with our patients. Kattan, who established the MSKCC nomogram, kindly provided the prediction equation. The concordance index was 0.63 (95 % CI, 0.55, 0.71) which was even lower than that of AJCC staging (0.79 in validation set) (data not shown).

This lower concordance rate is interesting and is quite different from the validation study from the West. Novotny et al. [19] investigated the accuracy of the MSKCC nomogram when applied to patients at a German high-volume center. The concordance index was 0.77 and was superior when compared with the predictive ability of AJCC staging (P < 0.008). This discrepancy of c-indices between the East and the West may derive from the differences of clinicopathological characteristics, such as high prevalence of advanced stage, proximal location, or lesser extent of lymph node dissection. Therefore, we think that it is necessary to develop a new nomogram based on the data from Eastern patients.

As expected, our nomogram predicts with a high concordance index; the calibration appeared to be accurate for predicting individual survival. The predicting power of the nomogram was superior to that of AJCC stage system.

Our nomogram has several specific characteristics that differentiate it from previous nomograms. First, we used the N factor as well as the T factor from the 7th AJCC staging system. In our series, the retrieved lymph node count was more than 15 in almost all patients (99 %). Therefore, we can directly adopt the N classification from the AJCC staging system rather than use both numbers of positive and negative nodes, which was required for the nomogram from the MSKCC. Second, tumor location was not a variable in our nomogram because we could not detect survival differences according to the tumor location. In our previous report, we found that the location of tumors was significantly different between the Korean and the U.S. populations [18]. The proportion of cancers in the upper third of the stomach including gastroesophageal junction tumors in Eastern countries was significantly lower than that reported in Western countries [20]. We believe that this ethnic difference is reflected in our nomogram, thus rendering it more suitable for Eastern patients with GC. Third, a notable feature of our nomogram is the focus on 3- and 5-year survivals. Our nomogram can predict both 3- and 5-year survival with less than 5 % difference between predicted and actual probabilities for each quartile. Accurately predicting 3-year survival is particularly meaningful for clinical trials evaluating results of new treatment. Although the 3-year survival has not yet been formally validated as a surrogate measure, preliminary data from the Global Advanced/Adjuvant Stomach Tumor Research International Collaboration group indicate that 3-year disease-free survival is strongly correlated with 5-year overall survival, the benchmark for judging efficacy of adjuvant therapy in GC [21]. Recently, Korean researchers provided remarkable data (CLASSIC) showing a survival benefit for adjuvant chemotherapy in patients with GC [22]. The primary endpoint of this study was 3-year disease-free survival. As expected, the 3-year survival data provide an opportunity for earlier use of new treatment options. Given its accuracy, using our nomogram to predict 3-year survival may provide a great benefit in clinical trials as well as clinical practice.

Despite these strengths, this study has critical limitation in terms of validation. We did not use completely outside data sets from other hospitals, but used our data for an external validation, and strictly speaking, the data set for external validation should be described as internal validation. We believed that treatment outcomes and surgery policy would differ among countries; this could be the most stringent external validation. These aspects also would differ even among institutions in the same country; this could be the less stringent external validation. For development and validation of this nomogram, we evenly divided our data, consisting of more than 1,600 cases, into two sets. Although the validation set also came from our data, we believe it can be regarded as an external validation. It is just outside the test set, and could be the least stringent form of external validation.

In summary, our nomogram improves the ability to predict individual patient survival compared with the TNM staging system. To the best of our knowledge, it is the first nomogram developed from a high-volume center in Asia where radical surgery with extended lymph node dissection is the standard surgical treatment for GC. External validation from other institutions may facilitate wider use of this prognostic nomogram.