The current study describes the impact of the changes made in the 7th edition of the TNM classification for stomach cancer by comparing stage-specific survival and predictive accuracy of the 6th and 7th edition staging system in a combined data set with over 2000 patients who underwent an R0 resection for gastric cancer.
Three earlier single-institution Asian studies compared the 6th and 7th TNM classifications for gastric cancer.11–14 The first study analyzed 9998 patients treated at a Korean university hospital and found a more detailed classification of prognosis in the 7th edition staging system, accompanied with increased homogeneity within stage groups.11 A Chinese study found better prognostic stratification in the 7th edition staging system.12 Another Korean study evaluated nodal classification in 295 patients and found that in multivariable analysis, N classification was an independent prognostic factor for survival in the 7th edition, but not in the 6th edition, staging system.14
One strength of the current study is the use of data from multiple institutions, thereby reducing the risk of unique outcome due to single-institution bias. However, both series are Western, and no Asian data set was used. Another advantage of the current study is the high quality of the data: all patients underwent an R0 resection, and disease-specific survival was used as the outcome measure. In the three previously published studies, overall survival instead of disease-specific survival was used, and in one study, 14.5 % of the patients underwent an R1 resection.12
With the redefinition of nodal classification, the distribution of patients among the N1, N2, and N3 categories is more equal (Fig. 1b), while the disease of many patients is upstaged under the new staging system. A point of discussion on nodal staging in gastric cancer is that in the Western world, lymph node yield is generally low, certainly in comparison with Asian centers.15,16 This leads to the potential shifting of patients into a more advanced nodal classification simply by investigating more lymph nodes.17 Several groups have suggested the use of lymph node ratio (metastatic/total lymph nodes) instead of nodal status because of its higher prognostic accuracy and the elimination of the effect of this shift.18–20 In these studies, however, cutoff values for lymph node ratio intervals are often based on the data set they used. This introduces an advantage for lymph node ratio, which will perfectly fit the data set the study uses, whereas TNM nodal classification is part of an established system. However, decreasing the threshold for N2 and N3 categories in the 7th edition staging system considerably reduces the shifting effect. A minimum number of 15 nodes, however, remains the recommended threshold for adequate nodal staging.
A limitation of the stage groupings of the 7th edition staging system is that N3a and N3b categories were combined as N3, thereby not recognizing the prognostic significance of having 7–15 positive nodes, versus more than 16 positive nodes in overall stage grouping. The introduction of N3a and N3b as separate categories in overall stage grouping will increase complexity of the staging system, but it is unknown whether it will improve overall predictive accuracy. This issue needs to be further addressed in future staging systems.
There are several benchmarks for comparing the performance of two staging systems. First, there should be homogeneity within stage groups; patients within the same stage group should have only small differences in survival. Second, there should be discrimination between stage groups; patients in different stage groups should have larger differences in survival. Third, a staging system should have good predictive accuracy; patients with a higher stage should have a worse survival. And fourth, a staging system should be as simple and intuitive as possible in clinical practice, because increased complexity impedes clinical utility.
Homogeneity Within Stage Groups
Establishing homogeneity within stage groups requires grouping of TNM combinations that have similar survival estimates (Table 2). For homogeneity testing, results are highly dependent on the size of the data set. Ahn et al.11 showed improved homogeneity of two homogeneous stage groups in the 7th edition compared to one homogeneous stage group in the 6th edition, using a data set of nearly 10,000 patients. In the current study, numbers are smaller, and therefore significant homogeneity within stage groups is hard to detect (results not shown).
Discrimination Between Stage Groups
Heterogeneity between stage groups can be assessed by comparing stage-specific survival estimates for significant differences. Whether differences between stage groups are significant is highly dependent on the size of the data set. Small differences in survival estimates between stage groups are more likely to be statistically significant in a large data set. In the current study, stage-specific heterogeneity has decreased in the 7th edition when compared to the 6th edition. Although AJCC 6th edition stage II contained a highly heterogeneous population (Fig. 3a), and distributing these patients between stages IIA, IIB, and IIIA in the 7th edition has created three groups with a significantly different prognosis, the distribution of 6th edition stage IIIA patients into AJCC 7th edition stages IIIA and IIIB has created two stage groups with almost identical stage-specific survival (Fig. 3b). Wang et al. showed decreased heterogeneity between stage groups in the 7th edition as well.12
Prognostic Accuracy for Individual Patients
Performance of a staging system can also be assessed on the individual patient level by comparing survival of patients with different stages. Several ways of comparing staging systems on an individual-patient level have been proposed, but there is no standard method.21 Commonly used methods include explained variation (or Brier score), the area under the receiver–operator characteristic curve, the concordance index, and a summary measure of separation. We decided to use the concordance index and Brier score to measure the prognostic accuracy of the staging systems because they analyze different, complementary measures. Concordance index is a measure of whether ranking of patients by staging is consistent with the ranking of their outcome. Its advantages include interpretation (because it is a probability), robustness (because it is based on ranks, it is not sensitive to small changes in the data), and availability of appropriate statistical methods for estimation. It also incorporates a built-in penalty for staging systems with a higher number of categories, so that with equally performing staging systems, the system with more categories will have a lower concordance probability. It does not penalize possible shifts (miscalibrations) between predicted and observed survival. Therefore, we also used the Brier score because it looks at the actual difference (in months) between predicted and observed survival, taking possible shifts into account.
In the current data set, concordance analysis showed no difference for T category, an improvement for N category, and a decline for stage grouping. Brier scores consistently showed no significant improvement from the 6th to the 7th edition. Therefore, it can be concluded that for individual patient outcome, no improvements were detected from the 6th to the 7th edition staging systems.
Only one of the previously published studies compared the two staging systems on an individual-patient level. It found increased predictive accuracy for the 7th edition staging system.12 A disadvantage of the method employed in that study is that the metric used for comparison, the Akaike information criterion, measures how well the staging system fits to the used data set without assessing the actual prognostic accuracy.
Complexity of the Staging System
The larger number of stage group categories for the 7th edition of the staging system means that the system has become more complex. Increasing the number of categories of the staging system is not unique to gastric cancer.4 With the increasing availability of pathologic and molecular data, there is a trend toward incorporating more and more information into newer staging systems. Although these new categories might better reflect the natural history and prognosis of these diseases, there is a limit to the improvement of prognostic accuracy achievable with a categorical anatomic-based staging system like the TNM classification.22,23 At the same time, the goal of creating an intuitive, easy-to-use staging system disappears, and in daily clinical practice, cancer staging consists of using complex tables, if it is used at all.
Meanwhile, tools for individual patient prognostication have been developed that significantly outperform the TNM classification in prognostic accuracy. For gastric cancer, a nomogram has been developed based on a single US institution’s database.24,25 This nomogram has been validated in several international patient cohorts.26–28 The question is whether the TNM classification should aspire to the same goal of highly accurate individual patient prognostication as these nomograms. Prognostication is only one of the five goals of the TNM classification; all the other goals are directed toward a simple, intuitive international language: to aid the clinician in planning and evaluating treatment, to facilitate the exchange of information, and to contribute to research.1
In summary, the 7th edition of the AJCC staging system for gastric cancer has resulted in improved predictive accuracy for the N classification but decreased heterogeneity among stage groups. The increased complexity of the 7th edition staging system is not accompanied by an improvement in prognostic accuracy of stage grouping. Staging represents a compromise in accounting for the most reproducible and prognostically relevant factors to aim at a simple, intuitive, useful, common language to describe the natural history of a tumor. It should not be confused with more complex multivariable prognostication models, which may be useful in defining groups of patients at homogenous risk of recurrence, regardless of anatomic TNM characteristics.