Introduction

Interobserver variability in multiparametric (mp) prostate MRI interpretation for prostate cancer detection has been investigated since the introduction of PI-RADS [1, 2]. Although mpMRI remains the investigation of choice in detection and risk stratification of prostate cancer according to international guidelines [3], there is a demonstrated learning curve for mpMRI reporting [4, 5]. Increased reader experience has been shown to result in improved interobserver agreement and diagnostic accuracy for detection of clinically significant prostate cancer (csPCa) [6,7,8,9,10].

Although prior studies have examined the performance of PI-RADS in community centers [8, 11], there has been limited direct comparison of reporting between community and tertiary referral centers with access to subspecialized abdominal radiologists [12,13,14].

In our practice region, patients on active surveillance or with previous negative non-targeted biopsies are referred by urologists for multidisciplinary board review and consideration of transrectal ultrasound (TRUS)/MRI-guided fusion biopsy. These patients have mpMRIs performed at their referring community hospital, which are re-interpreted by subspecialized abdominal radiologists as part of multidisciplinary board review prior to biopsy. Studies evaluating subspecialty second reads in interstitial lung disease, screening mammography, and pediatric trauma imaging have all demonstrated clinically relevant discrepancies affecting patient management [15]. Thus, the aim of this study was to investigate how often second-opinion review of prostate MRIs by multidisciplinary review board at a single tertiary care differs from the initial community radiologist interpretation.

Methods

Study population

This single-institution retrospective study was approved by our institutional research ethics review board with a waiver of the requirement for written informed consent. Data on 303 consecutive patients was collected from multidisciplinary prostate MRI review rounds from January 2017 to August 2020 at a single tertiary care center. The review board was comprised of abdominal radiologists, oncologic urologists, abdominal imaging fellows, and oncology urology fellows.

Patients on active surveillance or with suspected prostate cancer due to elevated prostate-specific antigen (PSA), clinical symptoms, or prior high-risk biopsy were referred by urologists for consideration of transrectal ultrasound (TRUS)/MRI-fusion biopsy based on community-read prostate MRIs performed at regional hospitals and imaging facilities. Ninety-six patients were on active surveillance and 149 had a previous negative biopsy. A small subset of patients (17) had MRIs that were interpreted as negative in the community but were referred for second-opinion review due to high clinical suspicion.

Image acquisition

Patients underwent prostate mpMRI at 27 different regional hospitals or imaging facilities on either 1.5 T or 3.0 T MRI scanners with surface and no endorectal coil. Minimum MRI technical criteria were based off the Prostate Imaging Reporting and Data System (PI-RADS) version 2.0 [16]. Minimum sequence requirements were high resolution axial T2-weighted images (T2WI) of the prostate, axial T1-weighted images (T1WI) of the pelvis, diffusion-weighted imaging (DWI) with a minimum of two b values and a high b value of at least 1400, and dynamic contrast-enhanced (DCE) sequences. All scans had calculated apparent diffusion coefficient (ADC) maps.

Image interpretation

All mpMRIs were performed and first read at 27 community referral centers by different radiologists. Specific data on the referring radiologists’ experience level in reporting prostate mpMRI was not available, although only one radiologist held an abdominal radiology fellowship and maximum level of experience reading prostate MRI was estimated at 5 years given the relatively recent availability of prostate MRI in our practice region. All mpMRIs were prospectively second read at a single tertiary center with all clinical parameters and external reports available. All second reads were performed by 2–3 fellowship-trained subspecialist abdominal radiologists, always including one abdominal radiologist with 17 years of experience in reading prostate MRI. T2WI, DWI, and DCE sequences were evaluated using probability for clinically significant prostate cancer as per PI-RADS version 2.0 structured scoring criteria. The Likert-based scoring system was defined as follows: 1 = csPCa highly unlikely, 2 = csPCa unlikely, 3 = equivocal for csPCa, 4 = csPCa likely, and 5 = csPCa highly likely. A final score was determined by combining the scores for the T2WI, DWI, and DCE sequences as delineated in PI-RADS v2.0. The second read was performed after initial MRI acquisition and prior to biopsy, and therefore prospectively affected biopsy targeting.

Biopsy

All biopsies were performed by an abdominal radiologist or abdominal radiology fellow. The DynaCAD/UroNav transrectal MRI/TRUS-fusion biopsy system (Philips, Amsterdam, Netherlands) was used for all biopsies. In patients with PI-RADS ≥ 3 lesions identified on second read, 2–4 cores were taken from each lesion using a spring-loaded biopsy gun with an 18-gauge needle. All patients had 8–12 systematic biopsies taken in addition to the targeted biopsy. Eight samples were taken for a prostate volume < 30 cc (one per side at the right and left apex and base and two at the mid gland), 10 samples for a volume of 30–60 cc (one per side at the right and left apex and two at the mid gland and base), and 12 samples for a volume > 60 cc (two per side at the right and left apex, mid gland and base).

Histopathology

Biopsies were reviewed according to ISUP 2014 recommendations and the final Gleason score was used [17]. Any tumor within the targeted cores or within the nearest adjacent systematic core—i.e., “right base” or “left midgland”—was accepted as corresponding to the target MRI lesion.

Statistical analysis

Cohen kappa coefficients were used to quantify interobserver agreement. Levels of agreement were defined as: slight (0–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and excellent (0.81–1). Analysis was performed for each of the two reads on a patient and lesion level. Benign histopathology, prostatitis, and Gleason 3 + 3 = 6 were classified as negative histopathology. Positive predictive value (PPV) was used to determine accuracy of PI-RADS score for detection of csPCa, which was defined as any cancer with grade group ≥ 2 (corresponding to Gleason ≥ 3 + 4). All statistical analysis was performed using the SPSS Statistics software package (IBM, Armonk, U.S.A).

Results

Demographic data including patient age, PSA (ng/mL), prostate gland volume, PSA density, and prior biopsy status are summarized in Table 1.

Table 1 Baseline patient and lesion characteristics

Following multidisciplinary review of the 303 patients, 42 were downgraded from at least one PI-RADS ≥ 3 lesion to negative (PI-RADS ≤ 2) (Fig. 1). Ten scans which were called negative on initial interpretation were confirmed as negative on second read. The remaining 251 patients were confirmed to have at least one lesion PI-RADS ≥ 3 (Fig. 2). An additional 36 PI-RADS ≥ 3 lesions in 26 patients were identified at the time of second read that were not called on the initial interpretation (Fig. 3). Therefore, a total of 332 PI-RADS ≥ 3 lesions were identified. Of these, 53.3% (177/332) were in the peripheral zone, 43.3% (144/332) were in the transition zone, 2.4% (8/332) straddled both transition and peripheral zone, and 0.9% (3/332) were in the anterior fibromuscular stroma (Table 2).

Fig. 1
figure 1

False positive initial report. 64-year-old patient with an 11.3 ng/mL PSA. Initial MRI reported a 6 mm PI-RADS 3 lesion in the left mid posterior peripheral zone as an area of T2 hypointensity (arrow) (A), low ADC signal, (B) and normal DWI signal (C) without suspicious enhancement (D). Second-opinion review determined the lesion to represent normal central zone and no additional lesions were identified. Ultrasound-guided systematic and cognitive targeted biopsy of this area performed at a community center 4 years later for rising PSA showed all cores to be benign

Fig. 2
figure 2

True positive initial report. 61-year-old patient with a 4.8 ng/mL PSA. Initial MRI reported a 12 × 9 × 17 mm PI-RADS 5 lesion with decreased T2 signal (arrow) (A) and corresponding low ADC signal (B) and increased DWI (C). There was early focal dynamic enhancement (D). Second-opinion review was concordant. TRUS-fusion biopsy confirmed Gleason 3 + 4 prostate cancer

Fig. 3
figure 3

False negative initial report. 70-year-old patient with a 6.0 ng/mL PSA. Initial MRI called nodular BPH, all PI-RADS 2. Second-opinion review found a 10 × 7 × 10 mm PI-RADS 3(2 + 1) lesion in the left mid/apex anterior transition zone (arrow), with homogeneous mildly low T2 signal (A), low signal on ADC (B), markedly high signal on DWI (C), and contemporaneous enhancement (D). TRUS-fusion biopsy showed Gleason 5 + 4 prostate cancer

Table 2 Results by lesion

One hundred and ninety-eight eligible patients with 252 lesions proceeded to fusion biopsy (Fig. 4). Prevalence of csPCa in biopsied patients was 50.5% (100/198). Prevalence of Gleason 3 + 3 was 17.2% (34/198). Twelve patients had disease detected on the systematic biopsy but not the fusion biopsy; 7/12 were csPCa and 5/12 were Gleason 3 + 3 (Table 3). Overall prevalence of csPCa in biopsied lesions was 40.9% (103/252). Breakdown by lesion is shown in Table 4.

Fig. 4
figure 4

Flow chart demonstrating the stratification of patients following multidisciplinary board re-interpretation and subsequent decision to biopsy

Table 3 Biopsy results by patient
Table 4 Biopsy results by lesion

Two patients proceeded directly to radical prostatectomy without fusion biopsy. The remaining 51 patients did not receive biopsy for a composite of reasons including preference for clinical or imaging surveillance, loss to follow-up, and delays and cancelations due to the COVID-19 pandemic.

Re-interpretation changed the decision to biopsy in 48 patients; 42 with positive MRIs on initial interpretation were re-read as negative and did not receive biopsy and 6 with new PI-RADS ≥ 3 detected on second read were upgraded to biopsy. Of the 36 lesions called only on re-interpretation, 25 were biopsied and 10/25 lesions were confirmed as clinically significant prostate cancer with the following breakdown: 4/15 PI-RADS 3 lesions and 6/9 PI-RADS 4 lesions. There was one PI-RADS 5 lesion with biopsy results revealing Gleason 3 + 3.

Of the 42 patients who were downgraded from at least one PI-RADS ≥ 3 lesion to negative (PI-RADS ≤ 2) after second read, 34 did not receive biopsy. Eight patients underwent biopsy for rising PSA at a mean of 399 days after being reviewed at multidisciplinary rounds. Three patients had negative biopsies, 2 had Gleason 3 + 3, and 3 had clinically significant prostate cancer. Of the three patients with csPCa, two had fusion biopsies of the initially identified lesion and one had a systematic biopsy.

Interobserver agreement

Of the 332 PI-RADS ≥ 3 lesions, 201 (60.5%) were concordant, 26 (7.8%) were upgraded, 59 (17.8%) were downgraded and 46 (13.9%) were discordant for other reasons. Of these discordant lesions, 36 were not called on the initial interpretation, 7 had no PI-RADS score provided in the initial report, and 3 had an entirely different lesion called. Of the 7 reports with no PI-RADS score provided, 5 were concordant in identifying the lesion and expressing the degree of suspicion for prostate cancer without issuing a formal PI-RADS score, and 2 reports identified the lesion but were ambiguous in the degree of suspicion. Discordance between the initial and secondary interpretation of an identified lesion arose from characterization of the degree of suspicion for csPCa; there was no discrepancy in describing the anatomic location.

Overall agreement between community and tertiary center interpretation was fair (κ = 0.354), with greater agreement for PI-RADS ≥ 4 (κ = 0.523, moderate) than PI-RADS ≥ 3 (κ = 0.456, moderate) (Fig. 5). Agreement was greater for peripheral zone lesions (κ = 0.419, moderate) compared to transition zone lesions (κ = 0.251, fair). There was moderate agreement for biopsied lesions confirmed as clinically significant prostate cancer (grade group ≥ 2) (κ = 0.506).

Fig. 5
figure 5

Bar graph demonstrating the level of interobserver agreement between community and tertiary care multidisciplinary review board interpretation of prostate MRI for cancer detection

Positive predictive values

The positive predictive value of PI-RADS 3 lesions was similar between community and tertiary center interpretation both overall (22.6% vs 16.8%) and stratified by lesion location (21.4% vs. 20.0% in the peripheral zone and 25.0% vs. 15.3% in the transition zone) (Table 5 and Fig. 6).

Table 5 Positive predictive values of equivocal (PI-RADS 3) multiparametric MRI for clinically significant prostate cancer using TRUS/MRI-fusion guided targeted and systematic biopsy as the reference test
Fig. 6
figure 6

Bar graph comparing positive predictive value of PI-RADS score for clinically significant prostate cancer between community and tertiary care multidisciplinary review board interpretation

Community and tertiary center interpretations both achieved good PPVs for all PI-RADS 4 lesions (42.5% vs 55.0%) and peripheral zone PI-RADS 4 lesions (50.7% vs 54.2%). There was a lower PPV for community center interpretation of transition zone PI-RADS 4 lesions (33.3%) compared to tertiary center interpretation (56.5%) (Table 6).

Table 6 Positive predictive values of suspicious (PI-RADS 4) multiparametric MRI for clinically significant prostate cancer using TRUS/MRI-fusion guided targeted and systematic biopsy as the reference test

Community and tertiary center interpretations both achieved good PPVs for all PI-RADS 5 lesions (57.4% vs. 64.1%) and peripheral zone PI-RADS 5 (75.0% vs 91.7%). There was a slightly poorer PPV for community center interpretation of transition zone PI-RADS 5 lesions (42.3%) compared to tertiary center interpretation (51.9%) (Table 7).

Table 7 Positive predictive values of highly suspicious (PI-RADS 5) multiparametric MRI for clinically significant prostate cancer using TRUS/MRI-fusion guided targeted and systematic biopsy as the reference test

Discussion

Our study demonstrated that there is variability in community and tertiary center multidisciplinary interpretation of prostate MRI in cancer detection. To date, only two studies have examined second-opinion tertiary center reads in prostate MRI [12, 13]. In 2017, Hansen reported an overall by-lesion concordance rate of 33%. Our study demonstrated a much higher concordance rate of 60.9%, which may reflect the interval introduction of PI-RADS v2.1 and overall increased experience with PI-RADS. More recently, Ecke et al. reported a comparable rate of overall interobserver agreement; κ = 0.32 vs κ = 0.35 in our study.

Our study was the first to stratify the concordance both by individual PI-RADS score and lesion location to better delineate common areas of discrepancy between community and tertiary center review. Kappa values and PPVs were improved for higher PI-RADS scores, peripheral zone lesions, and lesions representing csPCa. Concordance and PPVs were poorer for transition zone lesions. These findings are consistent with the existing literature examining interobserver variability in interpretation of PI-RADS, particularly in readers of differing experience levels [2, 6, 7, 18,19,20]. Furthermore, unlike prior studies investigating this topic, every second read performed by a group of 2–3 subspecialty radiologists always included the same subspecialist uroradiologist, thereby avoiding the potential confounders of interobserver variability within the tertiary center second reads.

Our study demonstrated slightly higher overall PPVs for both community and tertiary center interpretation of PI-RADS 3 lesions (0.2 vs. 0.17 compared to 0.11 vs. 0.12 in Ecke and 0.11 vs. 0.12 in Hansen). We also demonstrated overall higher PPVs for both community and tertiary center interpretation in PI-RADS 4 and 5 lesions, ranging from 0.32 to 0.51 vs. 0.55 to 0.58 for PI-RADS 4 and 0.38 to 0.75 vs. 0.52 to 0.92 for PI-RADS 5 (pooled in Hansen as 0.23 vs. 0.43 and in Ecke as 0.35 vs. 0.61).

We posit that the higher PPVs for tertiary center interpretation were due to the advantages offered by a multidisciplinary board, including clinician input, clinical context, pooled expertise, and subspecialist interpretation. During the second read the subspecialist radiologist accesses clinical parameters such as PSA density, PSA trend, sites of prior targeted or systematic biopsy, and family history via the electronic medical record (EMR), which offer additional insight into the degree of clinical suspicion for csPCa. It is not clear if this is done with all the community interpretations, as only some of the reports include this information.

A recent ESUR consensus statement recommends that all radiologists who interpret prostate MRI participate in some form of multidisciplinary review [21]. Unfortunately, not all centers have the resources or manpower necessary for multidisciplinary rounds and centralized second-opinion reporting for all studies would be impractical. Our center is attempting to mitigate this issue by extending virtual access to multidisciplinary rounds to radiologists at community hospital sites. Further proposed strategies to improve performance in smaller volume centers focus on structured training, with a composite of attendance at hands-on workshops, continuing professional development credits, online course learning or a period of supervised double-reading by an experienced reader, as well as in-practice assessments to ensure quality control [22, 23].

Limitations

Our study has several limitations. If a lesion initially scored as PI-RADS ≥ 3 on community interpretation was determined to be PI-RADS < 3 on second read, the patient did not proceed to biopsy and we were therefore unable to determine the false negative rate for second-opinion reads. Furthermore, the decision to refer for multidisciplinary review was contingent on the initial community center interpretation, which introduces selection bias. Prostate MRIs that were interpreted to be normal at community centers were not referred for consideration of fusion biopsy and were therefore not reviewed at our multidisciplinary rounds unless a second read was specifically requested by the referring physician. Thus, we cannot comment on the concordance of findings between community and tertiary centers with respect to normal scans on first read. The initial reads were performed by a wide range of radiologists with a variable level of experience reading prostate MRI. Furthermore, although our site has been performing fusion biopsies for 7–10 years during the study period and has extensive institutional experience, we did not control for the level of individual operator experience in performing the biopsies, which may have affected targeting accuracy.

Of the 251 patients eligible for biopsy, 51 did not receive biopsy for a composite of reasons including preference for surveillance, loss to follow-up, and delays and cancelations due to the COVID-19 pandemic. In particular, downstaging a lesion from PI-RADS 4 and 5 to PI-RADS 3 on second-opinion review changes the risk–benefit ratio of biopsy such that the patient and referring urologist may elect not to proceed.

In conclusion, there is variability in community and tertiary care center interpretation of prostate MRI in cancer detection. Overall concordance rates were improved for higher grade and peripheral zone lesions, with tertiary center reinterpretation demonstrating higher PPVs for transition zone lesions.

Variability in the interpretation of prostate MRI for cancer detection between community and tertiary care centers demonstrate the added value of multidisciplinary round review and highlight the need for ongoing education, quality assurance, and feedback. Potential avenues include virtual participation in multidisciplinary review boards for radiologists practicing outside of tertiary care centers.