Diagnostic significance of reassessment of prostate biopsy specimens by experienced urological pathologists at a high-volume institution

In prostate cancer, accurate diagnosis and grade group (GG) decision based on biopsy findings are essential for determining treatment strategies. Diagnosis by experienced urological pathologists is recommended; however, their contribution to patient benefits remains unknown. Therefore, we analyzed clinicopathological information to determine the significance of reassessment by experienced urological pathologists at a high-volume institution to identify factors involved in the agreement or disagreement of biopsy and surgical GGs. In total, 1325 prostate adenocarcinomas were analyzed, and the GG was changed in 452/1325 (34.1%) cases (359 cases were upgraded, and 93 cases were downgraded). We compared the highest GG based on biopsy specimens, with the final GG based on surgical specimens of 210 cases. The agreement rate between the surgical GG performed and assessed in our institute and the highest biopsy GG assessed by an outside pathologist was 34.8% (73/210); the agreement rate increased significantly to 50% (105/210) when biopsy specimens were reevaluated in our institute (chi-square test, P < 0.01). Multivariate logistic regression analysis showed that only the length of the lesion in the positive core with the highest GG in the biopsy was a significant factor for determining the agreement between biopsy GG and surgical GG, with an odds ratio of 1.136 (95% confidence interval: 1.057–1.221; P < 0.01). Thus, reassessment by experienced urological pathologists at high-volume institutions improved the agreement rate. However, it should be noted there is a high probability of discordance between a small number of lesions or short lesions and surgical GG.


Introduction
The incidence of prostate cancer is on the rise worldwide [1,2]. There are various treatment options available, including hormonal therapy, radiation, carbon-ion radiotherapy, and radical prostatectomy [3][4][5]. Along with clinical information, including serum prostate-specific antigen (PSA) levels and imaging findings, histopathological diagnosis based on prostate needle core biopsy results is important to provide the most appropriate treatment [6][7][8]. A biopsy requires the determination of Gleason score (GS) or grade group (GG), presence/absence of cancer, and histological type. Particularly, the highest GG in a positive core has a significant impact on treatment strategy [9][10][11]. For example, if the serum PSA level is low and the patient has GG1, active surveillance is recommended [12]. Patients with GG4 or higher are classified as high risk for D'Amico classification based on histology alone, and extended pelvic lymph node dissection is considered [13]. However, prostate needle biopsies are not always performed at the treating institution and are sometimes performed at the referring institute. At smaller facilities, the diagnosis is not always made by experienced pathologists. Although diagnosis by experienced urological pathologists provides a more accurate GG assessment than inexperienced pathologists [14][15][16], only a few cases are available to verify whether reassessments are truly beneficial for the patient [17]. As our institution is a high-volume institution with experienced urological pathologists, we often reassess specimens from other institutes. In this study, we sought to determine the significance of reassessment at a high-volume institution and the factors involved in the agreement or disagreement between the GG based on the prostate needle core biopsy results and that based on surgical specimens.

Study design and population
Patients who were referred to our institute between November 2018 and September 2021, and who underwent reassessment of prostate needle core biopsy results taken at another institute, were included in this retrospective study. All biopsy specimens were reassessed by our pathologists.

Data collection
In addition to reassessment, the following parameters were collected by confirming the pathology request form and electronic medical records and contacting the referring institute: age, serum PSA level, histological diagnosis, GG determined by outside pathologists, reassessed highest GG at our institute, number of biopsies obtained, number of positive cores, lesion length of the highest GG, and history of hormone therapy. Global scoring should be adopted if it could be strictly determined that the biopsy was from the same area, but in this study, biopsies were obtained at outside institutes, and specimen preparation methods vary from institution to institution. Although most sampling areas could be confirmed from reports provided by outside institutions, this was not always the case. Further, few institutes conducted multiple biopsies from the same lesion, so we decided to adopt the highest GG among the individual positive cores. For tertiary patterns, we followed the guidance of the 2019 International Society of Urological Pathology Consensus Conference [18] to include tertiary high grade patterns, regardless of percentage, in GG (e.g., a needle biopsy with 70% Gleason pattern 4, 27% pattern 3, and 3% pattern 5 would be reported as GS 4 + 5 = 9; GG5). Unfortunately, many pathologists who do not specialize in urology are unfamiliar with evaluations of prostate intraductal carcinoma (IDC-P), and therefore, most reports from outside institutes contained no IDC-P results. We referred to previous reviews [19] and excluded IDC-P lesion from the GG assessment in this study. We also confirmed whether the reassessed patients had subsequently undergone radical prostatectomy at our institute. In the case of multiple lesions in a surgical specimen, the lesion with the highest GG was included in the analyses. We also collected data on changes in lymph node dissection criteria with changes in GG.

Assessment of the biopsy and surgical specimens
Biopsy GG decisions were made independently by the pathologists (YO, EY, MS, and KW); in case of uncertainty, two or more pathologists discussed their opinions. The GG was assigned precisely according to the World Health Organization Classification of Tumours of the Urinary System and Male Genital Organs [20], which perfectly reflects the latest consensus of a prostate cancer grading conference held in 2014 in Chicago by The International Society of Urological Pathology [21]. If there was still disagreement, the decision of YO, who had diagnosed more prostate biopsies, was given priority (15 years of experience). For surgical specimens, the first pathologist (YO or SS) described the primary pathology findings, and the specimens were reviewed by a second pathologist (YM) using a multi-viewing biological microscope. In case of disagreement, the three pathologists discussed the various diagnostic findings; however, when consensus was not reached, priority was given to the expert opinion of YM who had the longest history of prostate cancer diagnoses (over 30 years of experience).

Statistical analyses
The chi-square test was used to compare the agreement between the highest GG based on the preoperative biopsy finding (the original highest GG by the outside pathologist and the reassessed highest GG at our institute) and the final GG based on the surgical specimen finding. Furthermore, the statistical relationship between the reassessed highest GG at our institute and the final GG based on the surgical specimen was determined using the adjusted residuals. We considered the adjusted residuals to be significantly different at ± 1.96; we interpreted them as tending towards higher agreement at ≥ + 1.96 and lower agreement at ≤ − 1.96. To statistically evaluate the differences between our and outside diagnoses, we also used Cohen's weighted kappa coefficients with quadratic weights to analyze the agreement rate of GG between biopsy and surgery. Scores nearer to 1 were considered to have a higher statistical agreement. Further, multivariate logistic regression analysis was performed to extract the factors related to the agreement between the highest GG based on preoperative biopsy findings and the GG based on surgical specimen findings. The dependent variable was the agreement or disagreement between the highest GG based on preoperative biopsy findings and surgical specimen findings. Explanatory variables included age, serum PSA level, biopsy GG, number of biopsies obtained, number of positive cores, number of positive cores that had the highest GG, and length of the positive core that was the highest GG (or longest lesion if there was more than one). Statistical significance was set at P < 0.05. Statistical analyses were performed using IBM SPSS Statistics (IBM Corp., Armonk, NY, USA). Hormone-treated cases and cases other than adenocarcinoma were excluded as missing values.

Overall findings
We reassessed 1334 cases of prostate needle core biopsy obtained from outside institutes between January 2018 and September 2021. In four cases, the diagnosis was changed to an atypical gland because it was difficult to identify adenocarcinoma; two cases were changed to small cell carcinoma, two to sarcoma, and one to prostatic invasion of urothelial carcinoma. Of the remaining 1325 cases, 248 (18.7%) received radical prostatectomy at our institution, 36 received preoperative hormone therapy after biopsy, and 2 had prostate sarcoma (Table 1). For the remaining 210 cases, we compared the highest GG based on preoperative biopsy findings (the original highest GG by outside pathologists and the reassessed highest GG at our institute) with the final GG based on surgical specimen findings. The agreement rate between the original highest GG by outside pathologists and the surgical GG was 34.8% (73/210), whereas that between reassessed highest GG at our institute and the surgical GG was 50% (105/210); there was a significant increase in the agreement rate (chi-square test, P < 0.01). In 79/1325 (6%) cases, a carcinoma lesion was missed. The average length of the missed lesions was 0.71 mm (Table 1, Fig. 1A, B); overall, the average lesion length in 63/79 (79.7%) cases was < 1 mm and < 2 mm in 74/79 (93.7%) cases. In addition, missed positive core resulted in a change of the highest GG in only one case.

Reassessment results
In 873/1325 (65.9%) cases, the original highest GG and the reassessed highest GG at our institute were in agreement. Among the 873 patients, 159 received radical prostatectomy at our institute, whereas 29 received hormone therapy after biopsy, which precluded a comparison of the GGs. In the remaining 130 cases, the agreement between the highest biopsy GG and the surgical GG was 49.2% (64/130).
In 452/1325 (34.1%) cases, the original highest GG by outside pathologists and the reassessed highest GG at our institute were not in agreement. Of these cases, 359 (79.4%) had an upgraded GG upon reassessment at our institute comprising 79 (22.0%), 109 (30.4%), 89 (24.8%), 31 (8.6%), and 51 (14.2%) GG1 to GG2, GG2 to GG3, GG3 to GG4, GG4 to GG5, and others (two or more upgrades), respectively. Among these, 70 patients received radical prostatectomy at our institute, and 5 received hormone therapy after biopsy. For the remaining 65 cases, the agreement between the highest GG at our institute and the surgical GG was significantly higher: 8/65 (12.3%) for the original highest GG by outside pathologists compared with 32/65 (49.2%) for the reassessed highest GG at our institute (chi-square test, P = 0.003). A representative case is shown in Fig. 1C, D. In contrast, 93 (20.6%) patients had a downgraded GG on reassessment at our institute comprising 27 (29.0%), 38 (40.9%), 13 (14%), 3 (3.2%), and 12 (12.9%) GG5 to GG4, GG4 to GG3, GG3 to GG2, GG2 to GG1, and others (two or more downgrades), respectively. Among these, 17 cases received radical prostatectomy at our institute, and 2 received hormone therapy after biopsy. For the remaining 15 cases, the agreement between the highest GG at our institute and the surgical GG was higher: 1/15 (6.7%) for the original highest GG by outside pathologists compared with 8/15 (60%) for the reassessed highest GG at our institute. Because of the small number of cases, no statistically significant difference was found (chi-square test, P = 0.205).
Comparison in Cohen's weighed kappa coefficient with quadratic weights.
The kappa score between the original highest GG by outside pathologists and the surgical GG was 0.507 (95% confidence interval: 0.411-0.602; P < 0.01), whereas that between the reassessed highest GG at our institute and the surgical GG was 0.644 (95% confidence interval: 0.562-0.727; P < 0.01). Fig. 1 Representative cases of missed lesions and grade group changes. A Low-power field view of a case with a missed lesion. A lesion of only 0.7 mm is identified, which at first sight seemed to be an inflammatory cell infiltration (hematoxylin and eosin (HE) staining, × 40). B High-power field view shows fused glands with irregular nuclei and clear cytoplasm (HE staining, × 400). C Low-power field view of cases upgraded from grade group (GG)1 to GG2; most tumor areas correspond to Gleason pattern 3 (HE staining, × 40). D Lowpower field view shows a few fused glands. In the case of needle core biopsy, even if the high grade is < 5%, it will be adopted as a secondary score. However, there are a certain number of diagnoses that were presumed to be unaware of this fact (HE staining, × 400)

Multivariate logistic regression analysis results
Only the length of the lesion in the positive core with the highest GG based on preoperative biopsy findings was a significant factor in agreement between the GG based on preoperative biopsy findings and the GG based on surgical specimen findings. The odds ratio was 1.136 (95% confidence interval: 1.057-1.221; P < 0.01).

Impact of highest group grade core numbers and lesion length in preoperative biopsies
The agreement rate between the lesion length of the highest GG core based on preoperative biopsies and surgical specimen findings ranged between 3.3% at < 1 mm and 36.7% at < 10 mm. The agreement increased with increasing length and tended not to reach a plateau (Fig. 2). A single positive core of the highest GG based on preoperative biopsy findings had an agreement rate of 23.3% with the GG based on surgical specimen findings, whereas six cores had an agreement rate of 45.2%. The higher the number of cores, the higher the agreement; however, after six positive cores, the agreement increased slowly (Fig. 3).

Impact of grade group change on lymph node dissection
At our institution, lymph node dissection is conducted for patients with high risk according to the D'Amico classification or those with a predicted lymph node metastasis rate of ≥ 7% on the Briganti 2012 nomogram [22]. Following reassessment, the above criteria were met in 54 of 210 (25.7%) patients who underwent radical prostatectomy, and lymph node metastasis was confirmed in five of these patients (5/54; 9.3%). Overall, 117 of 210 patients (55.7%) underwent lymph node dissection. Of these, lymph node metastasis was found in 13.7% (16/117). If reassessment had not been performed, 63 patients would have undergone lymph node dissection, and 11 would have been diagnosed with lymph node metastasis (17.5%, 11/63). There were three cases in which the above criteria were not met by reassessment and no lymph node dissection was done.

Discussion
Pre-treatment biopsy assessment is essential for determining appropriate treatment strategies for prostate cancer [23,24]. Although the GG is an excellent scoring system, it is also a subjective assessment by pathologists [25,26]; therefore, it is better for assessments to be performed by experienced urological pathologists for a more accurate diagnosis [15,[27][28][29]. However, it is unclear whether reassessment with experienced urological pathologists would be beneficial for patients. This study examined the significance of this approach. Reassessment of pre-treatment biopsies showed that in approximately one-third of the cases, the original highest GG by an outside pathologist did not agree with the highest GG reassessed in our institute. Approximately 80% of the disagreements were upgraded, with GG2 to GG3 as the most common upgrade, followed by GG3 to GG4 and GG1 to GG2. Overall, the assessments of other institutes tended to overestimate atypical gland ducts corresponding to GS3 and underestimate lesions corresponding to GS4, which may have led to the disagreements noted in this study. In contrast, approximately 40% of downgraded cases were from GG4 to GG3, and 30% were from GG5 to GG4. Because downgraded cases only accounted for 20% of all cases with changed GG, the limited number of cases should be considered; however, we also found that some institutes had a tendency of diagnosing GG5 and GG4.
We compared the highest GG based on preoperative biopsy findings with the GG based on surgical specimen findings; overall, we found that the agreement rate for the GG evaluated by outside pathologists was only approximately one-third, whereas that by experienced urological pathologists at our institute was approximately one-half. In particular, our reassessment improved the agreement rate by a factor of approximately 1.5. Noteworthily, in cases where the reassessed GG was upgraded or downgraded, the agreement rate between our reassessed GG and the surgical GG was approximately 50%, whereas that between outside assessed GG and the surgical GG was only approximately Fig. 3 Agreement rate between grade groups (GGs) based on the prostate needle core biopsy and surgical specimen findings for each number of highest GG cores. A single positive core of the highest GG based on the preoperative biopsy finding has an agreement rate of 23.3% with the GG based on the surgical specimen finding, whereas six cores have an agreement rate of 45.2%. The higher the number of cores, the higher the agreement, but after six positive cores, the agreement increases slowly 10%. Although most patients had an upgraded GG following reassessment, the agreement rate of the GG based on surgical specimen findings was almost the same regardless of an upgrade or a downgrade (approximately 50%), demonstrating that an appropriate assessment was performed. The possibility of bias should be considered because assessments were performed within the same institution. However, pathologists who assessed the biopsies and surgical specimens were not always the same; therefore, a certain level of objectivity can be expected. Furthermore, statistical analysis using Cohen's weighted kappa coefficients showed that the agreement rate statistically increased with our reassessment. Nevertheless, it should be noted that the agreement rate between the highest GG based on biopsy specimen findings and the GG based on surgical specimen findings was only approximately half even at our high-volume institution. All relevant staff (urologists, radiologists, and pathologists) should be aware of this rate. Missing cases were found in approximately 1/20 cases; most of these were lesions < 2 mm, and the highest GG was rarely changed. Although it depends on the burden of the pathologist, focusing on the GG assessment of positive cores rather than on negative cores during reassessment would be more beneficial. However, this result implies that an experienced urological pathologist may detect minimal lesions, indicating diagnosis by a urological pathologist might better prevent small lesions from being missed. Additionally, considering the burden on the pathologist performing the reassessment, it would be desirable to establish a system where not only the request form but also the report by outside pathologists is included, so the pathologist can determine the core containing the number of lesions.
In cases with upgraded GGs based on surgical specimen findings, preoperative biopsy GG3 became GG4 in approximately 10% of cases, which was slightly lower but not overly different than that of the other categories, and was presumably a reflection of the heterogeneity of prostate cancer. Contrastingly, in cases with a downgraded GG, preoperative biopsy GG4 became surgical GG3 in approximately 60% of cases, which is a large number. Because this study was based on the highest GG according to biopsy findings, we assumed that the positive core was influenced by the fact that only the sections corresponding to Gleason pattern 4 were obtained.
In terms of reassessment benefits, if lymph node dissection is applied following only original GGs from outside pathologists, 63 out of 210 cases were subjected, and lymph node metastasis was confirmed in 11/63 (17.5%). However, after our reassessment, 54 cases were subjected to additional lymph node dissection, and 5/54 (9.3%) were found to have lymph node metastasis. Namely, lymph node metastasis was detected in approximately 10% of cases of additional lymph node dissection by our reassessment. Further followup is required, but we believe that a lymph node metastasis rate of 10% should not be ignored, as this may speak to the reasonableness of our reassessment. Contrarily, there were only three cases that no longer met the criteria for lymph node dissection even though the biopsy was reassessed for downgrade. As serum PSA levels and imaging findings are considered in the risk classification, the impact of downgrading was small in lymph node dissections. However, operations for many cases were not performed at our institute; hence, it is necessary to carefully follow up on the clinical significance of downgraded cases.
We conducted a statistical analysis to identify predictive factors for GG in surgical specimens based on preoperative biopsies. The results of the chi-square test and adjusted residuals showed that the highest GG of GG1 based on preoperative biopsies was significantly upgraded according to surgical specimen findings, and the highest GG of GG4 was significantly downgraded. It should be recognized that if the highest GG in the preoperative biopsy is GG1 or GG4, then the GG in the surgical specimen is likely to vary.
We conducted a multivariate analysis to determine which information obtained from preoperative biopsy findings contributed to the GG based on surgical specimen findings and found that only lesion length in the positive core with the highest GG is an independent significant factor. However, whether lesion length is the only truly significant factor requires further investigation with a larger number of cases. We also investigated the relationship among the number of positive cores, lesion length, and agreement rate. Interestingly, the number of cores reached a plateau after 5-6 cores, whereas the lesion length did not reach a distinct plateau. One study suggested performing a prostate biopsy to obtain as many cores as possible [30]; however, considering our results, excessive biopsy may be unnecessary if the number of positive cores can be efficiently obtained. Notably, the agreement rate for a single highest GG is generally only onefourth, and only 3.3% when the lesion length of the highest GG is < 1 mm. All relevant staff should be aware that GG discrepancies are very high when the number of lesions is small or the lesion length is short, and they are required to determine a treatment strategy based on this assumption.

Limitations
This study had some limitations. Firstly, although we reassessed specimens previously assessed by outside pathologists, we did not collect information on whether these outside institutes were high-volume institutions and whether the outside pathologists were experienced in urology. Secondly, we could not confirm the biopsy methods, targeting, and preoperative image processing used at outside institutes. Nevertheless, as most outside institutes have fewer cases of prostate cancer than our institute, this study verified the significance of diagnosis by experienced urological pathologists at a high-volume institution on patient benefit.

Conclusions
Since reassessment by experienced urological pathologists at a high-volume institution increases the agreement between the highest GG based on the preoperative biopsy finding and the final GG based on the surgical specimen finding by a factor of approximately 1.5, it is desirable to reassess actual specimens unless it is excessively burdensome for pathologists. Moreover, it should be noted that there is a high probability of discordance between a small number of short lesions and the surgical GG.
Author contribution YO gathered the required information from the database, reassessed the specimens, constructed the database, integrated the data, and wrote the manuscript; YY provided information to YO on radiological diagnoses, made inquiries to the referring institute as appropriate, and provided YO with any missing information; SS, EY, MS, and KW participated in some of the reassessed specimens as pathologists; KO and TS reviewed, organized, and provided the clinical data to YO; TY revised parts of this manuscript regarding the pathological diagnosis from the perspective of a senior pathologist; TK, as head of the urologist department, provided detailed clinical information to YO and revised parts of the manuscript; YM reviewed and reassessed the specimens and revised the manuscript as a senior pathologist. All authors have read and approved the final version of the manuscript.
Funding This work was supported by JSPS KAKENHI (grant number: 17K08713 to YO; grant number: 20K09422 to SS; grant number: 18K15111 to MS; grant number: 20K16210 to KW; and grant number: 20K09093 to YM) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan and by the Kanagawa Cancer Center and Research Institute/Kanagawa Prefectural Institute Organization (funding granted to YO, grant number: 2021-1).

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Code availability Not applicable.

Declarations
Ethics approval Informed consent was obtained from all patients for research and publication. This study was performed in accordance with the Declaration of Helsinki and approved by the Ethics Review Committee of the Kanagawa Cancer Center (approval number: 2019-36; June 26, 2019).

Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.