Introduction

According to the 1998 World Health Organization/International Society of Urological Pathology (WHO/ISUP) [7, 23], the 1999 WHO blue book [15], and 2004 WHO blue book [12], papillary urothelial neoplasms (PUNs) have been categorized as urothelial papilloma (UP), inverted papilloma (IP), papillary urothelial neoplasm of low malignant potential (PUNLMP), low-grade papillary urothelial carcinoma (LGPUC), and high-grade papillary urothelial carcinoma (HGPUC). The main aim of the 1998 WHO/ISUP classification guidelines was to develop a universally accepted and reproducible classification system. However, the reproducible classification and diagnostic criteria between non-invasive superficial PUN, a constellation of well-known papillary tumors of low progression incidence, remains in dispute. In particular, PUNLMP terminology has been the main point of discussion in the field since the introduction of the 1998 WHO/ISUP classification system.

UP or IP, whether exophytic or inverted, is well defined because it usually presents as a small single mass. PUNLMP is defined as papillary urothelial tumors which resemble exophytic UP, except for an increased cellular proliferation that exceeds a normal thickness [7, 12, 15, 23]. LGPUC is defined as complex papillary fronds with easily recognizable architectural and cytological variations. Tumors with moderate-to-marked atypia are defined as HGPUC [7, 12, 15, 23]. However, from the pathologic viewpoint, the diagnostic boundary between stage 0 (Ta) LGPUC and the non-carcinoma group is vague. By definition, the critical point for deciphering between PUNLMP and UP is the identification of layers exceeding an accepted urothelial thickness. This step can be very subjective and confusing, which regard to the section axes [6]. Moreover, architectural and cytologic atypia favor a LGPUC classification over PUNLMP or UP classification and are even less reproducible. Through survey of a consensus trial of papillary urothelial tumors, a severe discrepancy was observed in PUNLMP by even the most experienced urologic pathologists [16, 23]. Thus, the clinical significances and biological differences in PUNLMP are still inconclusive due to ambiguous diagnostic criteria [5, 1719].

We designed an algorithm based on the scoring system defined by several histopathological findings. For confirmation, we surveyed the clinical manifestation and recent 5-year follow-up data on 175 papillary urothelial tumors. In addition, we also assessed ancillary studies of MIB and p53 indexes with CK20 expression pattern analysis. While both ends of the MIB and p53 index spectrums were easily categorized as UP and LGPUC, cases within the shaded zone were reviewed clinically and re-assessed by our scoring system.

Materials and methods

Materials

One hundred seventy-five cases were selected from the medical records and archival slides. These cases were selected from a large collection of papillary urothelial tumors in Urologic Oncology Files of Yonsei University College of Medicine since 1990.

Demographic data with regard to age, sex, the site involved, tumor number, size, time to recurrence, frequency of recurrence, progression, associated malignancy, and survival time were obtained from each patient’s medical records. The time of recurrence was calculated from the time from a complete response (no evidence of disease) to the time of the first evidence of tumor recurrence at the latest follow-up. All patients underwent transurethral resection (TUR) of the bladder tumor. After the initial TUR was performed at 6-month interval for 2 years and yearly thereafter, patients did not receive any additional intravesical treatment until the first recurrence.

All cases were independently reanalyzed by seven pathologists specialized in urological pathology, in the first round of the consensus review. Each participant rendered a diagnosis after a prior training session based on the 2004 WHO classification. After completing the first round, all cases were reviewed with our new scoring system by all participants of the present study.

Scoring system

Well-known parameters used in different point systems were adopted for the new scoring scheme. The most critical, highly reproducible findings including thickness of layers and mitosis with cytologic atypia were depicted in Fig. 1. According to consensus rate between peer reviewers, relevant impact was given differently. Three points were given in the best reproducible criteria, thickness of layers and mitosis, and cytologic atypia. One point was given in the parameter with wide range of variation, papillae fusion.

Fig. 1
figure 1

Histopathological findings of papillary urothelial tumors. a Thickness of layers. Each number represents the corresponding score: 0 means a thickness less than seven layers with intact umbrella cells, 1 means a thickness less than seven layers with the loss of umbrella cells, 2 means a thickness more than seven layers with intact umbrella cells, and 3 means a thickness more than seven layers with the loss of umbrella cells. b Mitosis. 1 means a few mitoses less than five per high power field, 2 means mitotic index between five and ten mitoses per ten HPF, and 3 means more than ten mitoses per ten HPF. c Cytologic atypia. 0 Atypia restricted to the surface is not considered true atypia. 1 Diffuse cellular atypia in a random fashion, but with a mild degree is score 1. 2 Diffuse cellular atypia in a random fashion and a moderate to severe degree of atypia is defined score 2

Because increased cellular proliferation exceeding to seven layers is the minimal criteria for a score, indicating a more progressive disease than PUNLMP, this particular criteria should be emphasized. Our general rule was to judge areas of the section where the papillae are sectioned perpendicular to the surface and basement membrane and thus include the central fibrovascular stalk while avoiding areas of oblique or longitudinally cut. However, despite unequivocal LGPUC, the loss of umbrella cells leads to a section that is thinner than seven layers. The scoring criteria for cellular thickness in the sample was as follows: 0 point if all fronds had less than seven layers and intact umbrella cells were present, 1 point if any fronds had less than seven layers but no obvious umbrella cells were present, 2 points if samples had more than one frond with more than seven layers and intact umbrella cells, and 3 points if any fronds had more than seven layers and no umbrella cells were present (Fig. 1a).

Mitotic figures were assessed by recording the total numbers from ten high power (×400) fields (HPF) selected at random in each case (Fig. 1b). If more than ten mitoses/ten HPF were identified, 3 points were given. In case between five and ten mitosis/ten HPF, 2 points were given. In case with mitotic index less than 5/10 HPF, 1 point was given. Samples that did not show any mitosis were given a 0 score.

Although cellular atypia can be another parameter that can distinguish benign or malignant tumors, caution should be taken when interpreting this criterion because the cellular distribution for this characteristic is variable. If the cellular atypia is restricted to superficial cells, this is less significant with regard to determining whether a sample is benign or malignant (Fig. 2). In our scoring system, samples with randomly and frequently distributed cellular atypia with moderate to severe degree were given 2 points. One point was assigned to samples with randomly distributed mild atypia. Samples with no pleomorphism were given 0 scores (Fig. 1c). The moderate to severe atypia are defined to be easily recognized in low power field, whereas mild atypia is observed as slightly variable nuclear size and clumping of chromatin in high power field. Cell morphology could be another additive feature that can distinguish benign and malignant tumors, but not included in this category. Spindle cells frequently appear in the benign cases with a gradual transition to epithelioid cells occasionally as the tumor progresses.

Fig. 2
figure 2

Atypia in surface cells of urothelial papilloma or inverted papilloma. a Polypoid mass with an inverted pattern of urothelial nests. b Atypia (asterisk) is noted in the surface (circle) of the inverted nest. c Atypical cells are not immunoreactive to MIB. d Syncytial changes of atypical cells (asterisk) are restricted to the surface (circle). e MIB is scattered, but not expressed in the atypical cells (asterisk) of the surface (circle)

Another mechanism for distinguishing papillary urothelial tumors is by assessing architectural atypia. Because papillary fusion and branching is an easily recognized and fairly reproducible finding, a maximum of 1 point was given when this characteristic was observed. Unequivocal papillary fusion was given 1 point, and no branching or fused papillae was given 0 point. Although the palisading basal layer is another clue to polarity preservation, it is not definitely diagnostic. However, we experienced this feature as a strong supportive finding with approaching to the specific diagnosis of PUNLMP.

The entire scoring system is summarized in Table 1. Seemingly clear, a diagnostic conundrum occasionally occurs when a sample falls in the boundary zone. In such cases, diagnostic ancillary studies can be adopted for further clarification.

Table 1 Diagnostic scoring system for papillary urothelial tumors

Immunohistochemical analysis

Three consecutive serial sections from each case were immunostained for MIB, p53, and CK20. In brief, the 4-μm-thick sections of paraffin-embedded tissue were deparaffinized and rehydrated. After a treatment with a 3% hydrogen peroxide solution for 10 min to block endogenous peroxidases, the sections were boiled in 10 mmol/L citrate buffer (pH 6.0) in a microwave oven for 20 min. The sections were subsequently incubated at 4°C overnight with the aforementioned primary antibodies. After a thorough rinsing in phosphate-buffered saline, the sections were treated with the DAKO LSAB (labeled streptavidin–biotin) kit, stained with amino-ethyl carbazole and then counterstained with Mayer’s hematoxylin.

Quantitative method

The immunohistochemical staining results were assessed by a quantitative immunoreactivity score [(ratio of stained urothelial cells per the total number of tumor cells) × 100]. The MIB index was interpreted by counting the number of MIB-positive cells out of a total of 1,000 epithelial cells in the most staining area under the high power magnification (×400). The p53 index was scored by the ratio derived from the number of p53-positive cells out of a total of 1,000 epithelial cells. This ratio was then semi-quantitatively reassorted as 0, 1 (<15%), and 2 (>15%). CK20 focally stained the surface cells in normal urothelium. Whereas this focal and patch staining pattern restrictive to the surface (periphery) was considered negative in our study, we defined CK20 staining as positive only when CK20 antibodies diffusely stained through the entire epithelium. Cells with nuclear MIB and p53 staining as well as cytoplasmic and membranous CK20 staining were regarded as positive cells. The immunohistochemical staining results were analyzed by two independent investigators.

Recurrence and progression in long-term follow-up

Individual follow-up times were calculated as the number of months from the date of the diagnostic surgical procedure, which was usually a transurethral resection of the bladder, to the date of the most recent cystoscopy. The length of the follow-up for the whole series ranged from 15 to 195 months (median, 127 months). A recurrence was defined as a tumor which was fulgurated (no histopathological material) or resected (confirmed to be malignant by histological examination). Stage progression was defined as a recurrence with a lamina propria invasion (pT1) or the development of metastatic disease.

Statistical analysis

Statistical analysis was carried out using SPSS (Windows version 10.0). A Student’s t test was used to evaluate the associations among the continuous variables. The correlation between immunoreactive variables was analyzed using the Pearson correlation study, and frequency distributions were generated using descriptive statistics (analysis of variance). p < 0.05 was considered to be statistically significant. Associations between histological categories and immunoreactivities as well as with tumor recurrence or progression were assessed by a Chi test. The inter-observer agreement (kappa) was measured for each diagnostic category as well as for each score component. The kappa value was interpreted as follows: 0–0.2, slight; 0.21–0.4, fair; 0.41–0.6, moderate; 0.61–0.8, substantial; and 0.81–1.0, almost perfect.

Results

In the first round, seven pathologists unanimously agreed on the overall diagnosis in 97 of the 175 subjects (55.4%). The lowest consensus rate between the pathologists was in classifying the PUNLMP (27.3%), and the highest consensus rate was in classifying the HGPUC (92.85%). The overall inter-observer agreement in the first round was moderate (kappa = 0.587). After the completion of the first round, all cases were reviewed by all study participants, and a consensus diagnosis was reached. This consensus diagnosis was used as the introduction for a newly designed scoring system combined with ancillary tools mainly based on the MIB index and clinical factors.

The scoring system, as shown in Tables 1 and 2, consists of six separately graded components, the scores for which were then added to determine the total score equivalent to a diagnostic category. This scoring system was significantly improved with the adoption of the new scoring scheme in the second round (kappa = 0.832). The inter-observer agreement on separate components of this scoring system showed a substantially increased the classification to a perfect level of agreement: a kappa of 0.754 for mitosis (substantial agreement), a kappa of 0.635 for cellular thickness (substantial), and a kappa of 0.556 for cellular atypia (moderate agreement). With specific regards to PUNLMP, the result of which had the least consensus, the inter-observer agreement was substantially improved (kappa = 0.670). The most consistent findings in low magnification of PUNLMP were long and slender papillae with partly fused and branched papillae, and no obvious cytologic atypia (Fig. 3). Basal palisading features in each papilla can be accounted for the PUNLMP (Fig. 3c,d). It should be pointed that thickness more than seven layers with loss of surface cells in any papillae should never be seen in PUNLMP. The best diagnostic reproducibility for PUNLMP was mitosis. Randomly scattered mitosis was the most important finding against the diagnosis of PUNLMP.

Fig. 3
figure 3

Visualization of PUNLMP with light microscopy. a Long- and slender-branched papillae is a characteristic histological clue for a diagnosis of PUNLMP. b Branched papillae are fused. c Basal palisading feature is another indicator of PUNLMP. d Branched fused papillae with basal palisading feature are a characteristic feature of PUNLMP

Table 2 Total score ranges in papillary urothelial tumors

Clinicopathological classification of PUN

The mean patient age at the first diagnosis of UP, PUNLMP, PULC, and PUHC was 62.6, 68.4, 67.1, and 69.3 years, respectively. The following clinical parameters were included: single vs. multiple, small (less than 1–2 cm) vs. large (more than 2 cm), no recurrence vs. recurrence, and no progression vs. progression. UP was all single and small, but recurred in one patient (14.3%). No progression or association with LGPUC was found in any of the UP cases. IP was predominantly single, but 27.3% (6/22) of the cases were multiple, and 13.6% (3/22) of the cases were associated with LGPUC. Three cases of IP associated with LGPUC were typically benign, which were easily differentiated from inverted urothelial carcinoma. None of the IP cases progressed to LGPUC. PUNLMP frequently presents with a single and small lesion. Of the PUNLMP cases, 13.6% (3/22) had two or more lesions, but all of them were small, which were less than 1 cm. Of the PUNLMP patients, 18.2% (4/22) had a late recurrence (average 21.5 months) and 9.1% (2/22) progressed to LGPUC (after 5 and 56 months, respectively), but never underwent stage progression to invade muscle proper. Of the LGPUC patients, 70.4% (38/54) demonstrated a significantly higher recurrence frequency and 9.3% (5/54) progressed to higher stages.

MIB, p53 indices, and cytokeratin 20 expression according to papillary urothelial tumors

Immunohistochemical results of papillary urothelial tumors based on new scoring system were summarized in Table 3. The MIB index was variably expressed and increased proportionately with disease progression (Fig. 4). The mean MIB index was 5.85, 8.29, 38.74, and 58.32 in UP, LMP, LGPUC, and HGPUC (p value < 0.001), respectively. The MIB index was relatively and significantly higher in recurred PUNLMP than in non-recurrent cases (16.5 vs. 8.1, p < 0.001). Atypia in surface cells occasionally seen in papillomas were not immunoreactive to MIB (Fig. 2c,d, and e). p53 had variable expression ranges in urothelial tumors and no significant association with recurrence in spite of an apparent tendency towards disease progression. CK20 was diffusely immunoreactive in all layers of some PUNs, whereas these cells are negative or focally reactive in the superficial cells of UP and IP cases (Fig. 5). However, CK20 staining had no significant association with clinicopathological variables.

Fig. 4
figure 4

MIB index of LGPUC. a Areas of thick trabeculated papillae (closed circle) are opposed to areas of PUNLMP (semicircle). b Notice many scattered MIB indexes in the portion of thick trabeculae, which represents LGPUC. c Little MIB indexes were recognized in areas of PUNLMP

Fig. 5
figure 5

Cytokeratin 20 expression pattern analysis in PUNs. a CK20 expression was restricted to the surface. b Patchy areas of CK20 immunoreactivity had spread. c Solid compactum of positive expression was noted. d Diffuse CK20 expression was observed

Table 3 Immunohistochemical results in papillary urothelial tumors

Discussion

PUNs are a common characteristic of the indolent nature of papillary tumors with cytologically banal features. They are often difficult to define with solid classifying criteria for the low-grade, non-invasive PUNs, benign papillomas, papillary urothelial neoplasms of low malignant potential (PUNLMP), and low-grade papillary urothelial carcinomas (LGPUC). The classification of PUNLMP was initially introduced to replace the previously designated WHO grade 1 urothelial carcinoma that was defined by the 1998 WHO/ISUP classification system [7, 23]. Murphy [17] interpreted some of the 1973 WHO grade 1 tumors as PUNLMP and some as LGPUC, Bostwick and Mikuz [3] translated the 1973 WHO grade 1, 2, and 3 tumors as PUNLMP, LGPUC, and HGPUC, respectively. Reuter and Melamed [20] interpreted the 1973 WHO grade 1 tumors as PUNLMP, grade 2 as LGPUC or HGPUC, and grade 3 as HGPUC.

Based on the histopathological findings, both architectural atypia and cellular atypia are prime parameters that include thickness of papillae, fused papillae, cellular pleomorphism, and mitosis. However, some cases can often have overlapping findings. In a multi-institutional survey in the Korean Urologic Pathology Society, PUNs diagnoses were given little consensus by urologic pathologists, despite the fixed criteria introduced by the WHO guide. Our substantial discrepancies in diagnoses are quite similar to data collected from other institutes [16, 23]. Architectural atypia, such as a thickness more than seven layers and fused papillae, are very useful in differentiating benign from worse lesions, but are sometimes confusing due to tangential sections. In general, cytological atypia is much more reproducible than architectural atypia. Cellular atypia can be recognized only along superficial layers or in a random distribution, whether mild or severe. We found that the former lacks ki67 expression, which may represent degenerative changes, not proliferation. Cytologic atypia that is restricted to the superficial layer can be found in urothelial benign tumors [4, 6]. Mitosis can be the most representative and the highly reproducible variable, when using the total number of mitoses from ten HPF (×400) selected at random. The benign category of PUN demonstrated rare, but regular mitoses near the basal layer of the urothelium. The loss of umbrella cells, degree of the nuclear groove preservation, cytologic spindling or epithelioid feature, and palisading basal layers can be another supportive findings, but excluded in this algorithm, because these findings appear too subjective and less reproducible. For instance, the loss of umbrella cells may be caused by a manipulation or a biopsy procedure. Conversely, well-preserved umbrella cells can be observed in overt carcinoma. Basal palisading features in each papilla can be, however, useful in PUNLMP or IP, whereas overt epithelioid features instead of spindling can be a supportive finding to favor carcinoma over PUNLMP or papilloma.

For the purpose of achieving a reproducible diagnosis for PUN, a scoring method based on pathological parameters was recently proposed by the Korean Urologic Pathology Society. The main principle behind this scoring system is for each parameter to be scored differently with different weights of implication. For instance, more concordant parameters such as mitosis and with thickness of layers were scored by a three-tiered system. The scoring of atypia was two tiered, and fused papillae was one tiered. Each diagnosis can be clearly divided when using this system. Furthermore, PUNs were well categorized with a high consensus rate upon the application of this algorithm. A useful tool under the routine microscopy would be an algorithm derived from our peer-review data in the form of flow diagram starting features seen at low power (modest thickness with focal fused papillae) progressing to medium power (no obvious atypia) and high power (scanty mitosis) ending up at a diagnosis of PUNLMP.

The helpful ancillary findings for this new scoring system were the MIB index and clinical findings of multiplicity, size, and recurrences. Since the MIB index has been widely known as an independent prognostic predictor of recurrence [2, 10, 14], it was introduced to distinguish between PUNs. Most authors reported that the MIB index of recurrent predictors and poor prognostic factors is around 10% of the cut-off level. Using the same method, the MIB index varies according to the time of fixation [22]. The MIB index in specimens fixed for 50 h (over the weekend) was significantly lower than the overnight-fixed specimens. High p53 and MIB expression could be significant prognostic factors in the univariate analysis of bladder cancer [8], while p53 is an independent factor [21].

CK20 showed diffuse immunoreactivity in all layers of certain PUNs, while negative or focally reactive in the superficial cells of UP and IP cases. The diffuse and strong expression of CK20 was correlated to the tumor severity. Although highly controversial with regard to their implication of CK20 expression loss, Harnden et al. [9] found a decreased CK20 expression to be an independent (p < 0.0001) predictor of tumor recurrence in papillary tumors. Unlike the findings reported by Alsheikh et al. [1], our data did not reveal a significant correlation with tumor recurrence.

Upon our survey of clinicopathological correlations, UP recurred in 14.3% of the patients, but demonstrated no progression or association with PULC. IP was associated with LGPUC in 13.6% of the cases, but never progressed to LGPUC. PUNLMP showed two or more lesions in 13.6% of the cases, but all of the lesions were less than 1 cm. Of patients with PUNLMP, 18.2% demonstrated late recurrence (average 21.5 months) and 9.1% of the PUNLMP cases progressed to LGPUC in either 5 or 56 months. Of the patients with PULC, 70.4% (38/54) demonstrated a significantly higher frequency of recurrence and 9.3% (5/54) of these patients progressed to higher stage. With regard to the labile nomenclature and diagnostic criteria of PUNLMP, the biological behavior of PUNLMP remains controversial.

Jordan et al. [13] concluded in a study of 91 cases with grade 1 urothelial tumors that low-grade lesions, usually designated transitional cell carcinoma, grade 1, are benign and should be called papilloma rather than carcinomas. In contrast, a study by Cheng et al. [6] of 112 PUNLMP cases found that those with PUNLMP are at risk of cancer progression and death from bladder cancer. Thus, it is prudent not to consider a constellation of low-grade papillary urothelial neoplasms to be benign. Pich et al. [18] studied 19 LMP cases and found that PUNLMP had a relatively high recurrence rate (47.4%) with no progression. These results were also in accordance with the study of 95 LMP cases by Holmang et al. [11]. We apparently observed that PUNLMP cases demonstrated late recurrences and progression with relatively lower frequencies, a finding that represents LMP. Statistical variations of recurrence and progression depend on many factors, including follow-up intervals, a definition of recurrence or progression, and sample size. However, reaching an accurate diagnosis may be most important. Therefore, we invented an algorithm by which a greater number of uniform diagnoses with less discrepancy can be achieved.

In conclusion, the mitotic index and thickness of layers with cytologic atypia could be useful and reproducible parameters for the scoring algorithm to discriminate between papillary urothelial tumors. Based on the new scoring algorithm, PUNLMP can be clinicopathologically listed as an intermediary stage between UP and LGPUC.