Alternative prostate cancer grading systems incorporating percent pattern 4/5 (IQ-Gleason) and cribriform architecture (cGrade) improve prediction of outcome after radical prostatectomy

Percentage Gleason pattern 4, invasive cribriform and/or intraductal carcinoma (IC/IDC) and minor pattern 5 are recognized as independent parameters for prostate cancer outcome, but are not incorporated in current grade groups (GGs). Two proof-of-principle studies have proposed alternative grading schemes based on percentage Gleason pattern 4/5 (integrated quantitative Gleason score; IQ-Gleason) and IC/IDC presence (cribriform grade; cGrade). Our objective was to compare the performance of GG, IQ-Gleason and cGrade for predicting biochemical recurrence and metastasis after radical prostatectomy (RP). RP specimens of 1064 patients were pathologically reviewed and graded according to the three schemes. Discriminative power for prediction of biochemical recurrence-free (BCRFS) and metastasis-free (MFS) survival was compared using Harrell’s c-index. The GG distribution at RP was 207 (19.4%) GG1, 472 (44.4%) GG2, 126 (11.8%) GG3, 140 (13.2%) GG4 and 119 (11.2%) GG5. Grading according to 5-tier IQ-Gleason and cGrade systems led to categorical shifts in 49.8% and 29.7% of cases, respectively. Continuous IQ-Gleason had the best performance for predicting BCRFS (c-index 0.743, 95% confidence interval (CI) 0.715–0.771), followed by cGrade (c-index 0.738, 95%CI 0.712–0.759), 5-tier categorical IQ-Gleason (c-index 0.723, 95%CI 0.695–0.750) and GG (c-index 0.718, 95%CI 0.691–0.744). Continuous IQ-Gleason (c-index 0.834, 95%CI 0.802–0.863) and cGrade (c-index 0.834, 95%CI 0.808–0.866) both had better predictive value for MFS than categorical IQ-Gleason (c-index 0.823, 95%CI 0.788–0.857) and GG (c-index 0.806, 95%CI 0.777–0.839). In conclusion, the performance of prostate cancer grading can be improved by alternative grading schemes incorporating percent Gleason pattern 4/5 and IC/IDC.


Introduction
The Gleason grading system is the cornerstone of risk assessment and prediction of clinical outcome in prostate cancer (PCa) patients. In radical prostatectomy (RP) specimens, the Gleason score (GS) is determined by adding the two most frequent growth patterns resulting in a final score of 2 to 10. Based on the work of Pierorazio et al. and Epstein et al., the GS is categorized into five grade groups (GGs) [6,13]. The International Society of Urological Pathology (ISUP), World Health Organization (WHO) and Genitourinary Pathology Society (GUPS) endorse reporting of GGs in conjunction with GS [4,5,9]. The advantages of the GG system are its simplicity, explicit distinction of GS 3 + 4 and 4 + 3 and categorization of GS ≤ 6 as GG1.
Although of prognostic significance, it is unclear how percent pattern 4, IC/IDC and minor pattern 5 altogether translate to individual risk assessment and should be used in clinical practice. For instance, it is unknown whether patients with GG2 with 20% Gleason pattern 4, IDC and tertiary pattern 5 have worse outcome than those with GG3 with 60% pattern 4 but no cribriform carcinoma or tertiary pattern 5. Few groups have demonstrated that alternative grading schemes incorporating some of these pathological factors had significantly better discriminative value than current GGs [14,17,18]. On biopsy and RP specimens, Sauter et al. found that an integrated quantitative Gleason (IQ-Gleason) score, which is purely based on Gleason pattern 4 and 5 quantities, led to better risk stratification for biochemical recurrence-free survival (BCRFS) [14]. Alternatively, modification of the GG system for the presence of IC/IDC on biopsies -labelled as cribriform grade (cGrade) -resulted in improved discriminative value for disease-specific and metastasis-free survival (MFS) [18].
Although these proof-of-principle studies reveal that optimization of the current GG system is possible, no studies have independently validated the prognostic value of these models. The objective of the current study was to compare the discriminative ability of GG, IQ-Gleason and cGrade for BCRFS and MFS in a RP cohort.

Patient selection
Patients who had undergone RP for prostatic adenocarcinoma at three medical centres in The Netherlands between 2000 and 2017 were included in this study; 854 patients were operated at Erasmus MC, University Medical Centre, Rotterdam; 96 at Leiden University Medical Centre (LUMC), Leiden; and 137 at Antoni van Leeuwenhoek Hospital, The Netherlands Cancer Institute (NKI), Amsterdam. While the RP specimens of Erasmus MC were unselected consecutive samples, those from LUMC and NKI were selected for having GG3-5 disease in their original pathology report to increase the number of high-grade tumours. Patients who had undergone hormonal, radiation and/or viral therapy (n = 23) before RP were excluded. RP specimens were fixed in neutral-buffered formalin, after which they were sectioned transversely and embedded entirely for diagnostic purposes. All slides were available for pathology review. The institutional Medical Research Ethics Committee approved this study (MEC-2018-1614).

Pathological evaluation
All 1064 RP specimens were reviewed in joint sessions by two investigators (EH, GvL), blinded to clinical outcome. In case of discordances, the assessment of the senior genito-urinary pathologist (GvL) was included in the study database. For each specimen, the following features were recorded: GS and GG according to the 2014 ISUP/2016 WHO guidelines, pT stage according to the American Joint Committee on Cancer (AJCC) TNM 8th edition, surgical margin status, presence of IC/IDC and percent Gleason 4 and 5 growth patterns. Invasive cribriform and IDC were not distinguished and grouped for all analyses. In case of multifocality, we only monitored the characteristics of the index tumour defined as the tumour with the highest grade, stage or volume. Tertiary patterns occupying < 5% of the tumour volume and IDC were not included in the GG. The GG concordance rate at revision was 88/135 (65.2%) for RP from NKI and 39/94 (41.5%) for specimens from LUMC; this discordance rate was affected by the fact that the original tumour grading had been performed by a large number of general pathologists and that several samples were originally graded before the 2005 ISUP consensus meeting.

Clinical follow-up
Clinical follow-up after RP consisted of 6 monthly and later annual monitoring of serum prostate-specific antigen (PSA) levels. Biochemical recurrence was defined as PSA levels ≥ 0.2 ng/ml measured at two consecutive points in time, at least 3 months apart with undetectable PSA levels after RP. Post-operative lymph node and distant metastases were confirmed by biopsy, imaging or multidisciplinary consensus.

IQ-Gleason and cGrade assessment
The IQ-Gleason score is calculated by summing Gleason pattern 4 and 5 percentages. Ten points are added if any Gleason pattern 5 is present and 7.5 points extra if it exceeds 20%. This results in a continuous IQ-Gleason score from 0 to 117.5 points. For comparison purposes, we a priori categorized IQ-Gleason into five ordinal groups as follows: 0-25, 26-50, 51-75, 76-100 and 101-117.5.
cGrade is based on the GG system, where the grade is decreased by 1 point in case no invasive cribriform and intraductal carcinoma is present in GG2-5 tumours. In the rare case of GG1 with IDC, 1 point is added leading to cGrade2 classification. Schematic descriptions and examples of IQ-Gleason and cGrade are depicted in Fig. 1.

Statistical analysis
Missing PSA values (n = 27) were imputed using the median PSA value. BCRFS and MFS were analysed using the Cox proportional hazards model and visualized by Kaplan-Meier curves. Hazard ratios (HR) for survival time were calculated using univariate Cox proportional hazard regression. For all models, Cox proportional hazard assumptions were met. Harrell's concordance index (c-index) was used to quantify the discriminative ability of the grading models. Bootstrapping was used to obtain unbiased estimates of the model's performance and 95% confidence interval (CI). Statistics were performed using SPSS version 25 (IBM, Chicago, IL, USA) and R version 4.0.4 (R, Vienna, Austria). Results were considered significant when the two-sided p value was < 0.05.

Discussion
Gleason pattern 4 percentage, presence of IC/IDC and minor/tertiary Gleason patterns have been well acknowledged as independent prognostic features of prostate cancer. Therefore, according to the latest ISUP and GUPS recommendations, these pathological factors should be included in pathology reports in conjunction with the GS and GG [4,5]. Albeit of clinical significance, it is yet unclear how to combine this pathological information into  Total  GG1  207  ----207  GG2  287  163  22  --472  GG3  -1  63  59  3  126  GG4  3  39  29  50  19  140  GG5  --3  54  62  119  Total  497  203  117  163  84  1064  (b)  cGrade1  cGrade2  cGrade3  cGrade4  cGrade5  Total  GG1  198  9  ---207  GG2  219  253  ---472  GG3  -8  118  --126  GG4  --52  88  -140  GG5  ---17  102  119  Total  417  270  170  105  102  1064  (c)  IQ-Gleason1  IQ-Gleason2  IQ-Gleason3  IQ-Gleason4  IQ-Gleason5  Total  cGrade1  356  56  5  --417  cGrade2  138  107  21  4  -270  cGrade3  1  25  71  68  5  170  cGrade4  2  15  18  45  25  105  cGrade5  --2  46  54  102  Total  497  203  117  163  84  1064 comprehensive risk stratification models for individual patients. Recently, two proof-of-principle studies have shown that the discriminative value of conventional GGs can significantly be improved by incorporating novel pathological characteristics in alternative grading systems [14,18]. In this study, we show that continuous IQ-Gleason and cGrade both outperformed GGs particularly in prediction of MFS and to a lesser extent of BCRFS. These findings demonstrate that prostate cancer grading can significantly be improved by incorporating Gleason pattern 4 percentage, tertiary patterns and IC/IDC in new grading schemes. This is the first study to independently validate IQ-Gleason for prediction of BCRFS after RP [14]. Furthermore, we show that the additive value of IQ-Gleason is even stronger for predicting MFS, which to our knowledge has not been reported yet. For comparison purposes, we also analysed IQ-Gleason as a 5-tier system, which outperformed GG for MFS but not for BCRFS, indicating that categorization led to significant loss of discriminative power. Finally, we confirm the findings of our previous biopsy study showing that cGrade has better discriminative ability for predicting MFS [18]. Both alternative grading systems led to considerable reclassification of original GGs, with only 51% of IQ-Gleason and 71% of cGrade categories being similar to the respective GG. For both alternative grading systems, the most prominent effect was the re-categorization of many GG2 patients with low-risk features as IQ-Gleason1 or cGrade1,  respectively, doubling the number of men in the lowest risk category. While these men remained at very low risk of less than 1% for developing post-operative metastasis, BCR rates in IQ-Gleason1 and cGrade1 were higher than in GG1. The GS/GG has been the global standard for prostate cancer grading for many years. Disadvantages of the current grading system are that it is prone to inter-observer variability and does not implement the new prognostic factors. The better discriminative value of cGrade is mostly related to the classification of the large group of GG2 men without IC/IDC as cGrade1. As cGrade is based on the GG system, it will still suffer from considerable inter-observer variability. At the same time, the IQ-Gleason assessment is time consuming and might result in delays in daily clinical practice. Yet, apart from its better performance, a strong point of continuous IQ-Gleason is that it is less susceptible to inter-observer variability related to the assessment of minor high-grade components. For instance, a RP with 70% GP3, 27% GP4 and 3% GP5 is graded as GG2 with tertiary pattern 5 according to ISUP, GUPS and WHO recommendations; however, Fig. 3 Harrell's c-index for biochemical recurrence-free (a) and metastasis-free (b) survival of grade groups (GGs), categorical integrated quantitative Gleason (IQ-Gleason), cribriform grade (cGrade) and continuous IQ-Gleason if GP4 and GP5 quantities were assessed as 23% and 7%, respectively, GP5 would be regarded as a secondary component resulting in GG4. In both scenarios, the IQ-Gleason would, however, be 40 and remain unchanged. So, while subjective assessment can easily lead to significant alterations in GG categorization, the continuous IQ-Gleason score is more resistant to inter-observer variability.
In the 2019 ISUP survey on prostate cancer grading, the majority of respondents indicated they were open to altering the current GS/GG system by incorporation of new pathological parameters, but most felt more validation was needed before actually changing the current system [19]. While IQ-Gleason and cGrade both outperform current prostate cancer grading, we believe there is still room for further optimization. To determine how pathological factors should be weighed in such a system, it is important to determine the mutual interaction and collinearity of the variables. Most additional factors have been investigated as a single variable without including the other relevant covariates. In a GG2 biopsy cohort including both IC/IDC and Gleason pattern 4 quantity, Kweldam et al. showed that IC/IDC occurred more frequently with incremental Gleason 4 percentage, and that IC/IDC was the only independent predictive factor for post-operative BCRFS [11]. Similarly, Seyrek et al. found that GG2 RP specimens with a higher percentage of pattern 4 had more frequent IC/IDC and tertiary pattern 5, but that IC/IDC was the only independent factor for BCRFS [16]. Further study on the interaction of these pathological factors by other groups is required to identify their independent contribution to prostate cancer outcome.
Prostate cancer grading on biopsies is an important factor for therapeutic decision-making. Alternative grading systems could have added value in comprehensive risk stratification after biopsy, for instance supporting identification of candidates for active surveillance among IQ-Gleason1 and cGrade1 patients. While tumour grading at radical prostatectomy is mostly prognostic, the new grading schemes could have impact on patient communication and follow-up in the large group of IQ-Gleason1 and cGrade1 men.
The strong points of this study were the detailed monitoring of Gleason pattern percentages and growth patterns. The retrospective design, short median follow-up period of 54 months and relatively small sample size were restrictions, limiting the power of the statistical analyses. Furthermore, the current RP cohort was specifically enriched for high-grade tumours from other centres to increase statistical power in high-grade patients, which might have introduced a bias.
In conclusion, this is the first study validating the clinical performance of two alternative prostate cancer grading systems. We show that both IQ-Gleason and cGrade outperformed GGs in having better discriminative ability for MFS and BCRFS. This study shows that improvement of current prostate cancer grading is possible and could result in comprehensive incorporation of new prognostic pathological parameters in clinical practice. Funding This study was sponsored by a generous grant of the Jaap Schouten Foundation.
The manuscript contains original unpublished work and is not being submitted for publication elsewhere. All authors agree with the results presented as well as the final draft of the manuscript. The institutional Medical Research Ethics Committee approved this study (MEC-2018-1614).

Declarations
Ethical approval The manuscript contains original unpublished work and is not being submitted for publication elsewhere. All authors agree with the results presented as well as the final draft of the manuscript. The institutional Medical Research Ethics Committee approved this study (MEC-2018-1614).

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.