In this study, we developed the VENUSS prognostic score for clinically non-metastatic PRCC, which is based on tumour size, T stage, N stage, presence of venous tumour thrombus and nuclear grade. The performance was further evaluated in an independent cohort of 150 high-risk PRCC patients from the prospective adjuvant ASSURE clinical trial. We show that the VENUSS score and the corresponding VENUSS groups may be superior to UISS, TNM and the 2018 Leibovich prognostic groups . VENUSS may be used for patient counselling, follow-up planning and for prognostic stratification in adjuvant trials.
There has been no general consensus on how to best risk-stratify patients with PRCC following curative surgery. Guidelines advocate the use stratification systems such as UISS , which was developed on patients with all RCC subtypes; however, the majority of tumours were clear cell . Although both PRCC and clear cell RCC share prognostic factors such as T stage and N stage, the individual contribution of each factor to the overall recurrence risk is different, and some factors such as tumour necrosis may not be prognostic in PRCC . Some researchers used the TNM group , which does not appreciate additional prognostic factors such as venous tumour thrombus and only considers tumour size indirectly through T stage. Interestingly, prospective adjuvant trials such as ASSURE and SORCE used a modified UISS  or the 2003 Leibovich score  for defining inclusion and assessing baseline risk, both of which however were not validated in these patients.
Several PRCC prognostic models were published over the past years. A nomogram predicting disease-specific survival was developed and validated in 2010, but included both patients with and without distant metastases  and may therefore be of limited clinical utility. Buti et al.  developed the GRade, Age, Nodes and Tumour (GRANT) score from the ASSURE trial cohort for both clear cell and non-clear cell RCC. Recently, Leibovich et al.  published a prognostic model for PRCC, which is based on 607 surgically treated patients from the Mayo Clinic. Based on nuclear grade, fat invasion and the presence of venous tumour thrombus, the authors proposed three groups for recurrence and death from PRCC. The c index of this model was 77%, but calibration (i.e. comparing the predicted probability and the observed frequency) or clinical net benefits were not assessed . In the present study, we compared VENUSS with other prognostic models, including UISS, TNM and the 2018 Leibovich prognostic groups. While the c index of the Leibovich prognostic groups was comparable with the original publication , VENUSS showed better discrimination in both the development and the ASSURE cohort. Of note, UISS was found to be superior to TNM and Leibovich prognostic groups. However, it is possible that both the UISS and the Leibovich prognostic groups showed a poorer performance than VENUSS as they were developed for different endpoints. Indeed, prognostic models are often used for different endpoints in clinical practice. For example, the ASSURE trial used the UISS (outcome of interest: overall survival), but the primary endpoint of ASSURE was disease-free survival.
Critically, our study included an independent cohort, which were the PRCC patients of the prospective adjuvant ASSURE clinical trial. The dataset was available from Project Data Sphere, which provides researchers the opportunity to conduct secondary analyses of prospectively collected trial data. In this analysis, discrimination and calibration were worse than in the development cohort, which is due to cohort composition. Indeed, two thirds of patients in the development cohort had stage I disease, compared to 10% of patients in ASSURE. While the development cohort included consecutive patients, ASSURE recruited from pre-screened patients with a higher risk of recurrence. Thus, although both cohorts included the same subtype of RCC, they were different in terms of the risk of recurrence due to the different distribution of prognostic factors. Subsequently, differences between study cohorts led to substantial differences in c indices and calibration, which in turn depend critically on variation of predictors . As ASSURE included only patients at high risk of recurrence, there was little variation in predictors and thus lower discrimination and worse calibration, specifically in those with a lower risk of recurrence according to VENUSS. Thus, quality measures in the development and independent cohort cannot be compared directly, but VENUSS appeared to be superior to the other prognostic models.
An interesting observation was that the proportion of patients with oligometastatic recurrence was greater in high-risk than intermediate-risk patients. This finding has to be treated with reservation as the number of patients becomes low in each subgroup. While further validation is required, our data emphasise that patients with high-risk disease may benefit from close follow-up, as a considerable proportion of patients with oligorecurrent disease may be amenable to potentially curative salvage procedures.
An important benefit of VENUSS is that it is based on routine pathology and does not include clinical variables such as performances status or symptoms, which may be more subjective. There is little extra work for the reporting pathologist to assign the score and group. This can then be used for patient counselling and planning of follow-up.
We analysed one of the largest cohorts of non-metastatic PRCC, followed established research guidelines for prognostic modelling  and used an independent cohort to test the performance of VENUSS and to compare it do other risk group definitions. However, this study has a number of limitations, arising mainly from the retrospective character of the development cohort, missing candidate prognostic variables, as well as the possibility of not having picked up all recurrences. Firstly, the follow-up regimen was not standardised across centres, but generally followed international guidelines of the time. As the median follow-up was 53 months, it was not possible to present evidence beyond the 5-year landmark. Secondly, as the development cohort was retrospective, clinical and pathological data were reviewed locally rather than centrally. We feel that our results were not deeply hampered by this approach, as only standard clinical and pathological variables were analysed; however, we cannot exclude underreporting of pathological features. Our study represents a real-world scenario in which a central review is rarely performed, making the conclusions more generally applicable. Additionally, VENUSS and other definitions were also evaluated in an independent cohort from prospectively documented trial data, which may be considered the gold standard. Thirdly, it was not possible to adjust for multiple non-measured confounders, such as patient preference for follow-up imaging, imaging modalities, co-morbidity, symptoms, laboratory values and performance status, which were not available. However, the aim of this study was to provide a simple score based on routine pathological parameters. Papillary type 1 and 2 was only available in a subgroup of patients. It has been suggested that nuclear grade may be used as a surrogate for type , but there is no high-level evidence at present to support this approach. Additionally, some centres do no routinely grade PRCC. A proportion of type 2 PRCC may be hereditary leiomyomatosis and renal cell cancer (HLRCC), which may be another confounder given the highly aggressive nature of this disease. For this study, we only collected patients with documented sporadic PRCC, but cannot exclude that some patients may have had undocumented or undiagnosed HLRCC. As other groups [3, 4], the current study did not identify papillary type as a significant prognostic factor on multivariable analysis, but this may be due to the lack of central pathology review. This is also true for the presence of tumour necrosis and sarcomatoid features. It may be the case that the presence of both pathological features is not prognostic, but that a certain percentage is required to show statistical significance. Finally, we did not obtain data on treatment of recurrent disease, which was beyond the scope of this study. Instead, we focused on the time interval from surgery to detection of recurrence. Our proportion of patients with oligometastatic recurrent disease was comparable to other studies [19, 20], which supports the validity of our dataset. The current study reinforces the concept that, with routine follow-up imaging, oligometastatic and thus potentially curable disease is detected in a significant proportion of patients across all risk groups. Despite these limitations, our model may form the basis for follow-up risk stratification and inclusion criteria for adjuvant trials.