Background

Non-muscle invasive papillary urothelial carcinomas (NMIPUC) of the urinary bladder, with tumors staged as non-invasive intraepithelial (Ta) or tumors with invasion of the lamina propria/submucosa (T1), are known to recur frequently (in up to 70% of cases), and occasionally (in up to 40% of cases) progress [1]. The European Association of Urology (EAU) guidelines for NMIPUC-(pTa/T1) of the bladder proposed risk stratification for progression into low-, intermediate-, and high-risk groups [2]. The same guideline stated that the patients in different groups should be managed using different strategies. Besides stage (Ta vs. T1), size (< 3 cm vs. > 3 cm), number of papillary tumors (single vs. multiple), concurrent carcinoma in situ (CIS), and a history of recurrence, the best estimator of risk is the histological grade. The four existing grading systems (1973 World Health Organization [WHO], 1998 International Society of Urologic Pathology [ISUP]/2004 WHO, Cheng et al. [3, 4], and 2016 WHO classifications) have divided PUCs based on subjective morphological parameters, which has led to a high interobserver/intraobserver variability in diagnoses made by pathologists, as well as lower predictive power in management by urologists [5]. In view of this, alternative grading systems have been sought to improve the grading discrepancy [6]. Many studies on immunohistochemical and molecular markers have been conducted to reduce the subjectivity of the histological grading systems, but the markers studied have been declared as having no potential for playing a role in grading schemes [7,8,9,10,11,12].

The present study was conducted to identify more objective and reproducible histological predictors that may correlate well with the clinical outcome, and compare these to the previous histological grading systems. Eleven uropathologists evaluated light microscopic histological parameters together in three rounds, using a multihead microscope. Through this study, vigorous attempts were made to select all possible histological parameters as countable variables. Each parameter was evaluated using univariate and multivariate analyses, to determine whether these variables had statistically significant effect in predicting the clinical outcome.

Methods

Patient selection

Surgically removed NMIPUCs of the urinary bladder were collected from the surgical pathology archives of 11 institutions in South Korea. The inclusion criteria were as follows: (1) pTa or pT1 stage at the initial bladder biopsy; and (2) a 5-year minimum follow-up period for non-event (NE) cases. The exclusion criteria, on the other hand, were as follows: (1) a prior history or the concurrent presence of urothelial carcinoma either in the ureter or in the renal pelvis; and (2) evidence of associated urothelial carcinoma in situ. A total of 296 cases were retrieved (Ta, 178; T1, 118). The number of cases contributed by each institute was 95, 47, 38, 22, 21, 21, 17, 15, 14, 4, and 2.

Clinical parameters

The retrieved cases were classified into three clinical subgroups: no event (NE), recurrence (R), and progression (P). NE was defined as cases with no evidence of tumor on the follow-up imaging study, urine cytology, or cystoscopy for at least 5-year follow-up duration; R was defined as cases showing a new tumor occurrence with the same or lower stage at least 3 months after the initial resection; and P was defined as a cases showing new tumor development with a higher stage than the initial stage, or metastasis to the lymph nodes or other organs. We collected clinical information on the patients from the medical records, including 1) age, 2) sex, 3) site, number, and size/volume of tumor in first biopsy, 4) interval to the 2nd event, 5) number of recurrences, 6) the type of final operation, 7) survival, 8) cause of death, and 9) site of metastasis. All types of specimens (cystoscopic biopsy, cold-cup biopsy, transurethral resection of bladder tumor) were included, but were not defined separately. However, there was no partial or radical cystectomy specimen (as an initial biopsy) among the 296 cases. The number of tumors was divided into two groups: single vs. multiple. The tumor size was also divided into two groups: < 3 cm vs. > 3 cm. The distribution of each group is shown in Table 1.

Table 1 Clinical characteristics of the patients

Histological evaluation

For microscopic examination, hematoxylin and eosin (H&E)-stained glass slides of formalin-fixed, paraffin-embedded tissue of the tumors were retrieved. Interobserver discrepancy had been solved through several round-table multihead microscopic examinations involving 11 pathologists from 11 institutions, during which consensus opinion was reached. Although the proper muscle inclusion was not verified by the reviewers in all samples, 11 contributors had reviewed the original diagnosis and pathological stage, not only by a slide review, but also from the surgical records. The 2nd biopsy was routinely performed 3 months later for check-up of incomplete resection (i.e., residual tumor). Even if the initial diagnosis was NMIPUCa (Ta, T1), the cases with a short-interval change in the T stage were excluded and were regarded as an inaccurate diagnosis. The clinical pathology of each case was reviewed individually by 11 pathologists from 11 institutions based on the 2004 WHO criteria and was afterward blindly evaluated by two participants, based on our proposed parameters. The histological parameters that were examined are shown in Table 2 and Figs. 1 and 2. For prediction comparison, the previous grading systems, i.e., the 2004 WHO grading, Papillary urothelial neoplasm low malignant potential (PUNLMP) /Low grade (LG) /High grade (HG), 1973 WHO, Transitional cell carcinoma (TCC) grade 1/2/3, and Cheng et al., grade 1/2/3/4, were utilized [3, 4].

Table 2 Histologic parameters evaluated in this study
Fig. 1
figure 1

(1) Representative images of histological parameters evaluated in this study. a Schematic figure of papillary fusion. b Delicate papillae with no fusion. c Papillary fusion (the arrow marks an imaginary fusion line). d Confluent fusion of papillae. e Presence of umbrella cells (arrowhead). f Absence of umbrella cells. g Schematic figure for the estimation of cell density based on the distance between cells. h Cell density score 1. i Cell density score 2. j Cell density score 3. k Discohesiveness (l). Nuclear pleomorphism category 4 based on a difference between the smallest and the biggest nucleus of the tumor cells of about 20-fold. m Multinucleated giant cells. n Mild nuclear hyperchromasia. o Moderate nuclear hyperchromasia. p Severe nuclear hyperchromasia. q Polarity loss score 2

Fig. 2
figure 2

(2) Further representative images of histological parameters evaluated in this study. a Example of a nuclear groove (arrow). b Prominent nucleoli. c Whorling pattern. d Single spotty necrosis (arrow). e Multifocal group necrosis (arrows). f Surface necrosis. g Confluent necrosis. h Glandular differentiation. i Squamous differentiation. j Micropapillary differentiation. k Mitosis level 1. l Mitosis level 2. m Mitosis level 3. n Apoptosis score 1. o Apoptosis score 3. p Capillary proliferation in fibrovascular core of papilla

Statistical analysis

All of the aforementioned parameters were evaluated in two paired comparison groups (i.e., R vs. NE and P vs. NE) at each stage. To identify the factors influencing R and/or P, univariate and multivariate logistic regression analyses were performed. To investigate the diagnostic utility of the new grading system, it was compared with the previous grading systems by area under the curve (AUC) of receiver operating characteristics (ROC) curves. All the statistical analyses were performed in the R software package (R version 3.1.2, R Foundation for Statistical Computing, Vienna, Austria; < http://www.R-project.org/ >).

Results

Univariate analysis

For PUC-Ta, among morphologic variables, the number (odds ratio [OR] 0.34 [95% confidence interval, CI: 0.17-0.67]; p-value = 0.002), size (OR 2.27 [95% CI: 1.05-5.07]; p-value = 0.0399), mitotic count (OR 1.03 [95% CI: 1.00-1.07]; p-value = 0.0468), mitotic level (OR 1.09 [95% CI: 0.24-4.83]; p-value = 0.010), and capillary proliferation in fibrovascular cores (OR 1.05 [95% CI: 1.01-1.10]; p-value = 0.0136) were associated with tumor recurrence. Nuclear pleomorphism showed borderline significance for association with recurrence of PUC-Ta (Table 3). The factors associated with PUC-Ta progression included patient age, cell density, nuclear pleomorphism, hyperchromasia, nuclear groove, prominent nucleoli, necrosis, mitotic count, mitotic level. Capillary proliferation and apoptosis had borderline statistical significance (Table 4). For PUC-T1, the whorling pattern was associated with recurrence and the mitotic level showed borderline significant association with recurrence. Divergent histology was associated with progression only (Additional file 1: Tables S1 and S2).

Table 3 Univariate analysis of factors associated with recurrence of PUC-Ta
Table 4 Univariate analysis of factors associated with progression of PUC- Ta

Proposal of new grading system using more objective and fewer histological variables for predicting clinical outcome

Based on the univariate analysis results, three grades were designed for prediction of the biological behavior of PUC-Ta. The univariate analysis results for PUC-T1 revealed that only tumor stage influenced the biological behavior. Thus, once the tumor had invaded the lamina propria/submucosa, the histological parameters had an insignificant impact on the clinical outcome. Therefore, our new grading system was designed focusing on the prediction of PUC-Ta tumors. To design a new grading system with more objective and reproducible, yet simpler parameters, we chose mitotic level, mitotic count, capillary proliferation, and nuclear pleomorphism as important histological parameters, based on the univariate analysis. All four of these parameters not only had a statistically significant influence on both recurrence and progression of PUC-Ta, but were also quantifiable. Additionally, divergent histology was also selected as one of the parameters in our grading system; even though it showed an insignificant p-value in both recurrence and progression of PUC-Ta, it was statistically significant in terms of progression in PUC-T1 tumors.

Because the mitotic level appeared to be the most important morphological parameter based on the univariate analysis, the mitotic level was set as the first step in our proposed new grading system. Grades 1, 2, and 3 were assigned based on mitotic level, i.e., level 1, level 2, and level 3, respectively. In cases with any additional unfavorable histological features, including increased mitotic count (> 10/10 high-power fields), significant nuclear pleomorphism (smallest-to largest-ratio of tumor nuclei of >20), presence of divergent histology, and significant capillary proliferation (> 20 capillary lumina per papillary core), the tumors were upgraded: for example, grade 1 became grade 2, grade 2 became grade 3, and grade 3 became grade 4. We designed three similar but slightly different grading schemes.

Comparison of our proposed grading system with previous grading systems

To investigate the diagnostic and prognostic utility of our proposed grading system, we compared the previous grading systems by comparison of AUC values in each system. All the statistical analyses were performed with adjustments for age, gender, tumor size, and number of tumors, to exclude the impact of factors other than histological parameters. For the prediction of recurrence of PUC-Ta, the AUCs of three previous grading systems were less than 0.7, whereas the AUCs of our proposed grading systems were over 0.7, and it was statistically significant (p-value <0.05). However, the differences between them were not statistically significant (Table 5). As for the prediction of progression of PUC-Ta, the AUCs of all of the previous and new grading systems were all larger than 0.7 (p-value <0.05) (Additional file 1: Figure S1).

Table 5 Comparison of AUC for predicting PUC-Ta tumor recurrence between previous grading systems and our proposed grading system

Discussion

In this study, we attempted to find an objective and reproducible histologic predictor of NMIPUCa that correlates well with the clinical outcome and to compare these to the previous histological grading systems. We found that the level of mitoses at the initial bladder biopsy was an independent predictor of the Ta PUCa outcome; the number of mitoses, nuclear pleomorphism, divergent histology, and capillary proliferation within the fibrovascular core were also significant factors.

The EAU guideline proposed a three-risk group stratification. In addition to the tumor stage, tumor size, number of tumors, and association with CIS, histological grade was an important parameter for predicting progression [2, 5]. The 2004 and 2016 WHO grading systems had been modified from 1973 WHO classification; recently, in 2012, Cheng et al. developed a modified system. These systems are similar, but show slight variation. Each parameter was measured without well-defined criteria and has led to suboptimal reproducibility [13,14,15]. Each parameter was rated in terms of severity (mild/moderate/severe) or frequency (rare/occasionally/frequently). In routine pathology practice, pathologists often encounter a PUC of the bladder showing high mitotic activity, but only mild nuclear atypia and minimal loss of polarity, or in contrast, a case showing moderate nuclear pleomorphism and mild to moderate loss of polarity, but without discernible mitotic activity. In those cases, grading was not straightforward, because there was no priority finding depending on the weighted value among the many criteria, which complicated the grading assignment, and resulted in low reproducibility.

We attempted to develop a simple and reproducible grading system that could predict the clinical outcome in NMIPUC of the bladder. In this study, we included only cases with available initial-biopsy specimens and cases with no concurrent CIS. Initially all 11 uropathologists evaluated all histologic parameters using individual light microscopes, for three rounds. Twenty-five histological features with their numerical parameters (e.g., categorized grade or absolute number), including mitotic level and number of mitoses, level of apoptosis, necrosis, whirling appearance, and capillary proliferation, which had not been evaluated prior to this study, were selected, as well as other histological factors mentioned in the literature. Thereafter, two pathologists blindly evaluated all 296 cases to determine interobserver reproducibility. Some parameters appeared to be influenced by fixation and stain conditions. Therefore, intranuclear groove and nucleolar prominence, which may be produced by procedural artifacts, were considered as low-priority parameters.

In the univariate analysis of T1-stage tumors, only a divergent histology correlated with progression. We considered that the pathological stage-factor, with the presence of stromal (lamina propria/submucosal) invasion, was the most important factor dictating biological behavior from among the histological factors. This finding was in accordance with the WHO recommendation that grading is performed only for noninvasive PUC (PUC-Ta), and with other reports in the literature [16]. Therefore, in this study, the construction of the histological predictive model was limited to noninvasive (Ta) tumors, with exclusion of T1 tumors.

Unlike T1 tumors, Ta tumors had many clinical and histologic parameters that influenced the clinical outcome. Among the clinical factors, the number and size of tumors correlated with recurrence, while patient’s age was associated with progression. In terms of histological factors, mitotic count, mitotic level, and capillary proliferation correlated with recurrence. Cell density, nuclear pleomorphism, hyperchromasia, nuclear groove, prominent nucleoli, necrosis as well as mitotic count and level correlated with progression. Apoptosis and capillary proliferation disclosed borderline significance for progression.

It is worth noting that mitotic count showed the highest OR in the prediction of both recurrence and progression of PUC-Ta. In the early twenty-first century, many studies had focused on mitotic index (Ki-67, AgNO3) of Ta/T1 urothelial carcinomas, and have reported those as associated with tumor recurrence [17,18,19]. However, the impact of mitosis has not been fully evaluated for use, or has not been applied with a detailed cutoff-value in the grading system, in contrast to other epithelial cancers in other organs (low vs. high serous carcinoma of the ovary, histological grade of breast cancer and etc.) [20, 21]. Our results indicated that mitotic count should be integrated in the histological grading of PUC.

The importance of mitotic count has previously been emphasized for histological grading of NMIPUCa [22, 23]. Pich et al. showed that a high proliferative index is the most important recurrence-predictor among LMP and low-grade tumors [24]. Akkalp et al. also emphasized that higher mitotic activity (> 5/single high-power field) is a strong predictor for recurrence in Ta PUCa [25]. The studies indicate that proliferative activity can play an adjunctive role in histologic grading (even in low grade tumors) and prediction of recurrence or invasiveness, as also shown in this study. However, the criteria for proliferative activity were variable, including a mitotic count per one or 10 high power fields in any level of the neoplastic epithelium, and cut-off values for AgNOR and Ki-67. Considering that urothelial neoplasms are bulky, mitotic counting in high-power fields might be inconsistent and discordant.

Mitotic level has not received much attention either. The upper level of mitosis (level 3 mitosis) correlated with increased mitotic count and worse clinical outcome in our cohort. If a bulky mass is evaluated for the level of mitosis, the mitotic-specific marker, phospohistamine H3 (PHH3), can be useful for rapid detection of the mitotic level. PHH3 has been used for grading of upper-tract urothelial carcinoma [26]. Since the mitotic level and count were measurable, reproducible, and the most statistically significant parameters in our univariate analysis, we strongly recommended that these factors should be included as essential parameters in histological grading of PUC, even though identifying mitoses in an entire specimen requires marked effort.

HG tumors in the WHO 2004 and 2016 classification cover wide ranges of tumors from immediately above low-grade to highly anaplastic tumors. Recently, Cheng and his colleges suggested a four-tiered grading system that included grade 4, which consisted of an anaplastic group, and separating this group from the usual HG [3, 4, 27]. Because we agreed with the assignment of such an upper grade, the second step of our newly proposed grading scheme was focused on the selection of a more aggressive group. Four additional histological parameters (mitotic count, nuclear pleomorphism capillary proliferation, and divergent histology) were used. We assigned tumors as grade 4 when high-level mitosis, with more than 10 mitoses per 10 high-power fields, and any of the following were present: divergent histology, nuclear pleomorphism of more than 20-fold, and more than 20 capillary lumens per papillary core. The other two upgrading schemes (grade 1 to grade 2, and grade 2 to grade 3) were similar, but slightly different from this scheme.

Capillary proliferation has been evaluated in terms of the number of capillary lumina per papillary core that was cross-sectioned, and microvessel density (MVD) has been studied as a prognostic factor in many solid tumors [28, 29]. MVD could not be determined in this study, because endothelial marker immunostaining was performed in a limited number of cases of Ta tumors. However, we evaluated the light microscopic neovascularization by counting the number of capillary lumina in the most vasoproliferative papillary core. The presence of more than 20 capillary lumina was correlated with a worse clinical outcome.

A divergent histology was defined as identifiable histological features differing from the usual urothelial carcinoma. A significant number of high-grade urothelial carcinomas demonstrated glandular or squamous differentiation. In this study, tumors with divergent histology showed a worse clinical outcome than those that were pure urothelial carcinomas. The divergent histology could represent a dedifferentiation with molecular events resulting in a gain of function. Cheng et al. classified tumors with divergent differentiation, such as the nested variant, micropapillary variant, plasmacytoid variant, sarcomatoid carcinoma, small-cell carcinoma, large-cell undifferentiated carcinoma, and pleomorphic giant cell carcinoma, as grade 4 tumors [3]. In our univariate analyses, divergent histology was associated with progression of PUC-T1, but it showed less statistical significance in PUC-Ta, with a P-value of 0.2. Most histological parameters played no significant roles in the clinical outcome of PUC-T1, except for divergent histology. This indicated that the presence of divergent differentiation should be considered, particularly in invasive carcinoma. The reason for the reduced significance of divergent histology in the prediction of clinical outcome in PUC-Ta may be related to the low frequency of Ta stage tumors. Aggressive tumors with a divergent histology were more apparent in the invasive stage (T1) and were not usually detected at the Ta stage. Thus, we included divergent histology as one of adverse histological parameters for upgrading. Large cohort studies of PUC-Ta with a significant number of tumors with divergent histologic differentiation may be needed to verify whether this parameter has a clear biological impact.

Necrosis or apoptosis may be detected easily in a low-power view, but differentiation between these two features was not easy. In addition, degeneration of the papillary cores with dystrophic calcification could be confused with necrosis.

The newly proposed grading system designed here was compared with previous grading systems. Even though the difference in the AUCs between them was not statistically significant, the AUCs of the new grading system were larger than those of the previous grading systems for the prediction of PUC-Ta recurrence. The former AUC was more than 0.7 (p < 0.05), but that of the latter was less than 0.7. In addition, our proposed grading system was focused on only few, but the most powerful histological parameters, which are not descriptive or subjective, are rather quantifiable and are more reproducible, for practical use. Therefore, our system may be a better option to use as a grading system if it has a similar power for the prediction of the clinical outcome of PUC-Ta.

Because this study was not prospectively designed, with a controlled biopsy protocol and treatment, the treatment factors cannot be considered in the clinical outcome. Resection only vs. intravesical chemo/Bacille de Calmette-Guérin (BCG) treatment cannot be separately reviewed among the same grade and stage tumors. However, this study is valuable because it provided a comprehensive analysis of all histological parameters, including mitotic level and count, through a nationwide multicenter study, involving experienced uropathologists.

Although diagnostic improvement should be verified by means of a kappa value, we were unable to do so in this study. In the near future, we will collect “gray zone” tumors, with divergent designations by pathologists, and apply the new grading system to determine whether it allows improved diagnosis.

Conclusion

The mitotic level based on the initial biopsy appears to be an independent predictor of the PUCa-Ta outcome. This finding could potentially help distinguish between low and high grade tumors in borderline lesions. Therefore, this result may help in selecting patients for a therapeutic strategy, based on the initial biopsy of NMIPUC of the bladder.