To date, the EAU UTUC guideline relies and recommends the use of two different grade classification system: 1973 WHO and 2004/2016 WHO grading system. Which system should be used in everyday clinical practice is still under debate. We hypothesized that one may be better. To test this hypothesis, we examined the ability of either the 1973 or the 2004/2016 WHO grading system in predicting CSM, in a cohort of non-metastatic UTUC patients treated with RNU. Our analyses showed several noteworthy observations.
First, of all RNU patients examined in the current study (n = 4271), approximately 90% harbored the highest grade level, regardless of which grading system was used. Specifically, 86.7% harbored G3 according to 1973 WHO grading system and 88.1% harbored high grade according to 2004/2016 WHO grading system. These elevated rates of high-grade UTUC may be explained by the nature of the study population. Specifically, all patients harbored stage T1 or higher . Moreover, all patients were treated with RNU. In consequence, a selection bias towards higher grade was operational, relative to studies that also included non-invasive (stages Ta and Tis) UTUC patients treated with less definitive modalities than RNU [19,20,21,22]. However, even in those studies, the rate of non-invasive UTUC represented a marginal fraction of the overall population and the vast majority also harbored high-grade disease. For example, Singla et al.  examined 753 UTUC patients treated with RNU or distal ureterectomy, between 1998 and 2015. Of those, 78.8% harbored T1 or higher stages and 89.2% harbored high-grade UTUC. Moreover, Roupret et al.  recorded T1 or higher stages in 66 (68.0%) patients and high grade in 50 (51.5%) patients, within 97 UTUC patients, despite ureteroscopy or percutaneous endoscopy treatment.
Second, the current analyses demonstrated marginal discrimination between G1 and G2, with respect to CSM. Within the three-tier grading system, independent predictor status of G2 and G3, relative to G1, could not be established. These results were confirmed in RNU patients with T1 or T2 or lower stages. The combination of these observations suggested limited discrimination ability of the three-tier grading system. Nonetheless, the addition of the 1973 WHO grading system resulted in a 2% accuracy gain, relative to multivariable models without consideration of the three-tier grading system. However, a 2% gain may be considered marginal. Specifically, this figure implies that within a cohort of 1000 individuals, the use of the three-tier grading system would improve CSM prediction in 20 patients. This gain is important in large-scale prospective trials or in large-scale epidemiological analyses. However, a 2% gain in predictive accuracy may not be clinically meaningful in everyday clinical practice.
In the second part of the analyses, we focused on the two-tier WHO grading system. Here, we validated the independent predictor status of high grade relative to low grade. Specifically, high-grade UTUC had 1.70-fold, 1.76-fold, 1.65-fold, and 2.19-fold higher risk of CSM, relative to low-grade UTUC in overall population, in T1, T2 or lower and G2 patients, respectively. Finally, we also recorded a 2% accuracy gain, when the 2004/2016 WHO grading system was added to multivariable model, where grade was previously not considered. In consequence, based on accuracy, the added benefit of the 2004/2016 WHO grading system was exactly the same as for the 1973 WHO grading system. However, the discrimination of CSM rates appeared more practical with the two-tier grading system, where high-grade patients exhibited a nearly twofold higher CSM rate and reached independent predictor status. In consequence, it appears that based on statistical criteria used in the current analyses, the two-tier grading system benefits of a slight advantage over its three-tier counterpart.
Additional consideration may be required to decide which grading system should be included in everyday clinical practice and which may be abandoned. Several investigators compared intra- and interobserver variability of the two- vs three-tier grading system in bladder cancer [12, 23,24,25,26,27,28,29]. Unfortunately, such analyses did not focus on UTUC. However, based on methodological considerations, a system that relies on two tiers is invariably more likely to result in a lower intra- and interobserver variability than a system with more than two levels. This notion rests on the effect of chance. In consequence, based on similar predictive accuracy, superiority of discrimination in univariable and multivariable models, and on methodological consideration of intra- and interobserver variability, it appears that the two-tier grading system might represent a better alternative. However, specific expert intra- and interobserver variability testing in UTUC patients should ideally complement the findings of our study.
To the best of our knowledge, we are the first to examine the ability of either 1973 or 2004/2016 WHO grading classification in predicting CSM, in UTUC patients identified within a large-scale population-based database. Only one group of investigators  examined grade assignment differences according to 1973 vs. 2004/2016 grading system in a smaller cohort (n = 458) of UTUC patients treated with RNU, at a single Chinese institution, between 2008 and 2013. Unfortunately, the complexity of the methodology used by Guan et al. renders comparisons with our methodology practically impossible.
Our work is not devoid of limitations and should be interpreted in the context of its retrospective and population-based design. First, the SEER database focuses on invasive UTUC, since Tis and Ta patients are not included. In consequence, our observations are based on more advanced stage and grade distribution and are not directly comparable with studies that used the entire UTUC population as reference. However, Tis and Ta patients should ideally not be treated with RNU. In consequence, their exclusion from SEER database does not represent an important limitation for studies that focus on RNU. Second, disease progression or disease recurrence data are not available in the SEER database. In consequence, they cannot be examined as endpoints. Third, the SEER database does not allow to ascertain either type or duration of chemotherapy. Fourth, due to the short median follow-up, future studies with longer follow-up should be done to confirm or refuse our results. Fifth, our study did not benefit of central pathology review. Sixth, our analyses could not assess intra- and interobserver variability, which are essential in clinical practice. Finally, the SEER database represents a proportion of the United States populations. In consequence, our findings are only applicable to patients from the United States and are not be generalizable to patients from other parts of the world. However, these limitations apply to this and to all other studies based on the SEER database.