Introduction

Multiparametric magnetic resonance imaging (mpMRI) has emerged as a vital tool in the diagnosis of prostate cancer (PCa). Along with the widespread adoption of prostate MRI, a standardized interpretation and reporting of mpMRI findings have become necessary [1, 2]. For this purpose, the Prostate Imaging Reporting and Data System (PI-RADS) was developed, based on a synthesis of expert consensus and available evidence. The revised PI-RADS version 2 (v2) was released in December 2014 [3]. Recently, in March 2019, modifications to PIRADS v2 have been published constituting an updated version termed PI-RADS v2.1 [4]. As PI-RADS is intended to be a document in evolution, studies have been encouraged to test its efficacy.

Despite recent developments in improving quantitative radiological methods, the vast majority of prostate cancer diagnosis on MRI is performed in the traditional “radiologist reporting” setting. Thus, the vocabulary and subjective assessments of the radiologist are the cornerstones of the reports’ validity.

PI-RADS scoring is done by assessing lesions’ features on T2-weighted (T2WI), diffusion-weighted (DWI), and dynamic contrast-enhanced imaging (DCE). The assessed criteria include lesions’ signal intensity, shape, margins, size, and invasive behavior/extraprostatic extension. The PI-RADS v2 document provides a lexicon with defined descriptors in its appendix (Appendix III) that constitutes the very foundation of this assessment [3]. Analyzing and understanding this very foundation of PI-RADS could enable us to identify descriptors with high diagnostic accuracy, thus allowing these to be incorporated more prominently into the scoring criteria, while reducing the importance of descriptors with low accuracy.

Lexicon terms and their definitions remain largely unchanged in PI-RADS version 2.1, with the only changes being the redefinition of the term “negative DCE,” a new definition of the term “marked” and the introduction of the terms “typical” and “atypical BPH nodule.”

A number of studies have assessed the diagnostic value of the PI-RADS score [5,6,7,8,9], but to date, only sparse data is available concerning the underlying terminology. To the best of our knowledge, only few studies with small study populations have addressed the discriminatory power of individual lexicon terms [10, 11]. Therefore, the objective of this study is to systematically assess the diagnostic value of individual descriptors as specified in the PI-RADS v2 lexicon in a large patient cohort.

Materials and methods

Patients

The inclusion criteria for this retrospective study were the availability of a prostate MRI between January 2012 and July 2015 and subsequent in-house targeted MRI/TRUS fusion biopsy (TB) in combination with a 10-core systematic biopsy in the same session. From a total of 526 eligible patients, 72 patients with incomplete or non-standard MRI or MRI performed at an external institution were excluded. These exclusions left a final cohort of 454 patients. Patient characteristics are summarized in Table 1. Figure 1 contains a STARD 2015‑compliant patient flow diagram [12] for the study. The study protocol was approved by the institutional review board and patient consent was waived due to the retrospective design of the study. Subgroups of the same collective with various study endpoints have been included in earlier publications pertaining to the accuracy of prostate biopsies [13,14,15,16,17,18].

Table 1 Patient characteristics. Values are given as mean ± standard deviation [range] for continuous variables and absolute frequency (relative frequency) for biopsy results. PSA, prostate-specific antigen; PIN, prostatic intraepithelial neoplasia
Fig. 1
figure 1

Flow chart with inclusion and exclusion criteria, as well as lesion localization and histopathological outcomes. All men received prostate biopsy including MRI/TRUS fusion-guided targeted biopsy and systematic 10-core biopsy. mpMRI, multiparametric magnetic resonance imaging.; MRI/TRUS, MRI/transrectal ultrasound; GS, Gleason score

MR imaging

All imaging was performed on one of two identical 3-T MRI scanners (Skyra, Siemens Healthineers). The following imaging parameters were used in all patients: axial and coronal T2WI with a resolution of 3.0 × 0.47 × 0.47 mm, axial DWI with a resolution of 3 × 1.4 × 1.4 mm with measured b values of 0, 50, and 500 and high b value (800, 1000, or a calculated b value of 1400 s/mm2), and additional T1 axial and T2 axial and sagittal imaging of the whole pelvis. In 242 patients (54.6%), DCE imaging was performed additionally with a spatial resolution of 3 × 1.4 × 1.4 mm, a temporal resolution of 5 s, and a 3 ml/s injection flow (Gadobutrol, Gadovist, Bayer Healthcare).

Imaging review and lexicon term assessment process

Four hundred fifty-four MRI imaging datasets were divided into four similarly sized subgroups (113–114 each). Each group was evaluated by one of four readers (A.B., M.H., C.L., P.A.), all board-certified radiologists with more than 5 years of experience in prostate MRI. Each lesion was assessed once by a single reader using a dedicated in-house built reading software. The software presents all imaging in a standardized way to the reader (Fig. 2). Readers were blinded to all patient-related data including the initial radiological report and histopathological results. Readers were instructed to mark the most suspicious lesion or lesions in an MRI and tag every marked lesion with matching lexicon terms complying with the definitions supplied by the PI-RADS v2 lexicon. Definitions of all lexicon terms were displayed in the reading software exactly as specified in the lexicon of the original PI-RADS v2 document [3]. Table 2 contains a full list of the used terms and their classification. All groupings of lexicon terms (DWI-related, shape-related, border-related terms, etc.) were tagged separately. Lesions were attributed to either the peripheral zone (PZ) or the transition zone (TZ) and localized according to the segmentation model used in PI-RADS v2 [3]. Lesions that extended through PZ as well as TZ and lesions that were located in the anterior stroma (AS) or central zone (CZ) were assigned to either the PZ or the TZ group depending on the most probable zone of origin.

Fig. 2
figure 2

Sample screenshot of the proprietary review software Prostate Lesion Analyzer used in the study. Readers were instructed to mark lesions within the MRI pictures (left, 3 × 2 panels depicting T2WI axial, T2WI coronal, DWI/ADC, T1WI native, and DCE). Matching lexicon terms and lesion localization were selected on the panel on the right side. Definitions of lexicon terms were displayed exactly as specified in the original PI-RADS version 2 document when hovering the cursor over a term. T2WI, T2-weighted imaging; DWI/ADC, diffusion-weighted imaging/apparent diffusion coefficient; T1WI, T1-weighted imaging; DCE, dynamic contrast-enhanced imaging

Table 2 Predictive values of lexicon terms used for lesions in the PZ and TZ. Values are given as PPV in percent (cancer-positive lesions with term marked/all lesions with term marked) and NPV in percent (cancer-negative lesions without term marked/all lesions without term marked). PZ, peripheral zone; TZ, transition zone; PPV, positive predictive value; NPV, negative predictive value; TP, true positives; DWI/ADC, diffusion-weighted imaging/apparent diffusion coefficient; DCE, dynamic contrast-enhanced imaging; DP, delayed phase

Reference standard

Prostate biopsies taken ahead of this study were performed by experienced urologists or interventional radiologists using one of two biopsy devices (Aplio 500, Toshiba or HI VISION Preirus, Hitachi Medical Systems) and consisted of TB and systematic 10-core biopsy. These were used as a reference standard. Histopathological findings were classified according to the Gleason grading system [19]. A Gleason score (GS) of 3 + 4 or higher on TB or in a matching segment on systematic biopsy was considered a positive finding for clinically significant prostate cancer (csPCa). Tumor size or volume was not taken into consideration since size analysis was outside the scope of this study. Histopathological findings that indicated no cancerous changes (no tumor cells, acute prostatitis, chronic prostatitis, prostatic intraepithelial neoplasia, or benign prostatic hyperplasia) and GS 3 + 3 tumor were considered non-csPCA.

Statistical evaluation

Positive and negative predictive values (PPVs, NPVs) as well as sensitivity and specificity for detection of csPCa were computed for each of the terms within each zone. For TZ lesions, PPVs of shape and border terms in combination with DWI/ADC terms were additionally analyzed. PPVs of term combinations were compared with PPVs of single terms using the generalized score test by Leisenring, Alonzo, and Pepe [20]. Results were declared to be significant if p < 0.05. Statistical evaluation was performed using R version 1.1.419 (www.r-project.org) and Microsoft Excel version 16.16.17.

Results

Lesion characteristics

The readers marked 515 MRI lesions in the 454 MRI datasets. In 5 patients, no lesions were marked by the readers. Thirteen lesions were excluded due to unclear documentation of biopsy locations and one lesion was excluded due to its location in the seminal vesicles. This left a total of 501 lesions in 443 patients. CsPCa was detected in 175 (34.9%) of the lesions. As shown in Fig. 1, 300 (59.9%) lesions were found in the PZ; prevalence of csPCa in PZ lesions was 113 (37.7%). Two hundred one lesions (40.1%) were located in the TZ; prevalence of csPCa in TZ lesions was 62 (30.8%).

Diagnostic performance of lexicon terms for PZ lesions

Table 2 and Fig. 3a show PPVs and NPVs of the analyzed lexicon terms. Sensitivity and specificity are presented in Table 4 in the supplementary material. Lexicon terms with the highest PPVs were the following: restricted diffusion (50.7% [104/205]), DW hyperintensity (52.0% [105/202]), early phase washin (56.5% [48/85]), washout delayed phase (68.2% [30/44]), positive DCE (55.4% [56/101]), T2W markedly hypointense (67.2% [43/64]), spiculated (56.7% [17/30]), lenticular (58.6% [17/29]), lobulated (56.8% [25/44]), water-drop-shaped (100.0% [6/6]), and invasion (76.4% [42/55]). Lexicon terms with the highest NPVs were the following: restricted diffusion (90.5% [86/95]), DW hyperintensity (91.8% [90/98]), and ADC hypointense (89.4% [42/47]). Terms indicating benignity showed low PPVs for detection of csPCa, with the term “linear” displaying the lowest PPV of 6.3% (1/16).

Fig. 3
figure 3

Predictive values of PI-RADS v2 lexicon terms for peripheral zone (a) and transition zone (b). Current PI-RADS v2.1 assessment criteria with their respective predictive values as found in this study are shown in (c). PPV and NPV approximating 1 were considered favorable for terms indicating malignancy. PPV and NPV approximating 0 were considered favorable for terms indicating benignity. PPV, positive predictive value; NPV, negative predictive value; DCE, dynamic contrast-enhanced imaging; DWI, diffusion-weighted imaging

In a few cases, the readers also tagged PZ lesions with lexicon terms that are designed to describe TZ lesions (e.g., “erased charcoal sign” and “organized chaos”); these results are given in Table 2 and supplementary Table 4 for the sake of completeness.

Diagnostic performance of lexicon terms for TZ lesions

Table 2 and Fig. 3b show predictive values of lexicon terms used for TZ lesions. The highest PPVs were found for the following lexicon terms: T2W markedly hypointense (52.5% [21/40]), non-circumscribed (53.3% [16/30]), irregular border (50.0% [30/60]), spiculated (57.1% [8/14]), erased charcoal sign (61.0% [36/59]), water-drop-shaped (78.6% [11/14]), and invasion (58.9% [33/56]). Lexicon terms with the highest NPVs were the following: restricted diffusion (85.1% [63/74]), DW hyperintensity (89.7% [70/78]), and ADC hypointense (87.0% [47/54]). Terms with the lowest PPVs were the following: T2W hyperintense (3.1% [1/32]), T2W isointense (5.3% [1/19]), encapsulated (6.9% [4/58]), and organized chaos (5.4% [3/56]). Figure 3c summarizes predictive values of the most relevant lexicon terms that are currently used as assessment criteria in PI-RADS v2.1.

PPVs of border and shape terms combined with diffusion-related terms used for TZ lesions

Combining shape and border terms with the term “diffusion-weighted hyperintensity” yielded the most distinctive PPVs for TZ lesions. PPVs of these term combinations are shown in Table 3. In most cases, combining a positive finding of “DW hyperintensity” with a shape- or border-related term increased the PPV mildly compared with the border/shape term alone, but this increase was of no statistical significance in most cases. On the other hand, combining border/shape terms with a negative finding for “DW hyperintensity” yielded significant changes in PPV, mostly. The following terms showed a statistically significant decrease of PPV when combined with a negative finding for “DW hyperintensity”: circumscribed, non-circumscribed, indistinct, obscured, irregular border, round, oval, lenticular, lobulated, water-drop-shaped, and irregular shape.

Table 3 Positive predictive values of border- and shape-related terms describing TZ lesions combined with positive or negative findings for diffusion-weighted hyperintensity. Values are given as PPV in percent (cancer-positive lesions with term(s) marked/all lesions with term(s) marked). PPVs of term combinations were compared with PPVs of single terms using the generalized score test by Leisenring, Alonzo, and Pepe [20]. Differences in PPV were considered significant if p values were below 0.05, these values are set in italic. TZ, transition zone; PPV, positive predictive value; NPV, negative predictive value; DWI, diffusion-weighted imaging; TP, true positives; POS, all positives; TN, true negatives; NEG, all negatives

Discussion

In this study, the predictive power of PI-RADS v2 lexicon terms was analyzed with the aim of adding to the quantitative support of the PI-RADS guideline and identifying areas of improvement.

On the one hand, the presented data corroborates the use of many established assessment criteria in PI-RADS. Lexicon terms indicating a restricted diffusion in PZ lesions showed favorable combinations of both relatively high PPVs and high NPVs (e.g., PPV of 50.7% and NPV of 90.5% for the term “restricted diffusion”). Our work therefore confirms the importance of diffusion-related findings in the PZ [21,22,23]. Conflicting data have been published regarding the value of DCE imaging in cancer detection. The current consensus is that the addition of DCE imaging to DW imaging increases cancer detection in the PZ, while it might not be useful in TZ lesions [5, 24,25,26,27]. Our results support this consensus, with positive DCE findings showing high PPVs up to 68.2% in PZ lesions and poorer performance in TZ lesions with PPV at a maximum of 28.3%. Moreover, signs of invasive behavior or extraprostatic extension are considered highly suggestive of cancer [3]. In accordance with that, our study showed high PPVs for the term “invasion” (76.4% and 58.9% for PZ and TZ lesions, respectively). Features that are suggestive of benign findings, such as “encapsulated” and “organized chaos” for TZ lesions or “linear” for PZ lesions, were consistently associated with benign biopsy outcomes in this study.

On the other hand, our study showed areas of discriminatory potential that are currently not fully utilized in the PI-RADS v2 and v2.1 assessment: In PI-RADS v2, diffusion-related findings play a minor role for TZ scoring [3]. In PI-RADS v2.1, however, the DWI score has gained more importance and scores of 4 and 5 can now upgrade the overall score of a TZ lesion [4]. This adjustment is consistent with studies demonstrating lower ADC values in TZ cancers than in BPH nodules, albeit with a large overlap of ADC values [26, 28,29,30], and studies showing higher diagnostic accuracy when T2WI- and DWI assessments were combined in TZ lesions [11, 31]. The presented data shows another potential of DWI findings: the terms “restricted diffusion,” “diffusion-weighted hyperintensity,” and “ADC hypointense” had high NPVs of 89.4% to 91.8% for TZ lesions, whereas T2WI-related terms showed lower NPVs with a maximum of 66.0% for TZ lesions. Combining T2WI-related border and shape features with a finding of absent diffusion-weighted hyperintensity lowered the respective PPV markedly in most cases compared with the T2WI-related term alone. The means of integrating this negative predictive potential of DWI-related terms into the scoring system in addition to the positive predictive potential of T2WI-related findings could therefore be considered. From the data presented in this study, the absence of features related to restricted diffusion can be assumed to have a good potential to exclude csPCa in the TZ and could be used to downgrade TZ lesions.

Furthermore, we identified a number of border- and shape-related terms with high PPVs, which are currently not explicitly included as criteria for PI-RADS v2 and v2.1 scores: For both PZ and TZ lesions, the descriptors “lenticular,” “lobulated,” and “spiculated” showed rather high PPVs between 44.7 and 58.6%. In TZ lesions, the descriptors “water-drop-shaped,” “irregular,” “non-circumscribed,” and “erased charcoal sign” also showed high PPVs between 50.0 and 78.6%. In a previous study with 14 included patients, Pokharel et al [11] found a similar PPV for TZ lesions with an irregular border (55%). The term “organized chaos” showed a favorable PPV of 5.4% indicating benignity in TZ lesions. Including these highly discriminatory terms into the assessment criteria for PI-RADS categories should be considered.

There are a number of limitations to this study, potentially influencing the generalizability of the results. The standard of reference was histopathologic results of TB and a systematic 10-core biopsy taken ahead of the re-read conducted in this study. Although this method has been shown to have high rates of detection of malignant diseases [32], some malignant lesions may not have been targeted upon TB and may have been missed upon systematic biopsy. Rouvière et al [33] report in a large, prospective, multicenter study that GS ≥ 7 tumor would have been missed in 7.6% (95% CI, 4.6–11.6%) of patients, had TB not been done. The presence of false negatives could influence the reliability of diagnostic values referring to benign-appearing lesions especially, since these were not targeted upon TB. Furthermore, the fact that only patients that underwent biopsy were included may affect the results pertaining to benign-appearing lesions, as patients without suspicious lesions on MRI and low clinical risk factors did not undergo biopsy and were thus excluded from the study. Moreover, benign-appearing lesions in subjects with other suspicious areas may not have been identified as targets by the readers. A more reliable standard of reference would be histopathology after surgical prostatectomy, though this would bias the underlying collective towards medium-aggressive cancers.

Another limitation stems from interreader variability, which was not assessed in this study, as each MRI was read once by a single experienced reader. Additionally, three of the four readers worked at the same institution. The assignment of lexicon terms is, however, a subjective process. Several other studies have shown moderate interobserver agreement for PI-RADS final assessment [8, 34] and few studies have also demonstrated moderate interobserver agreement for assignment of individual lexicon terms [10, 34]. Nevertheless, larger multicenter studies that assess interobserver agreement in the assignment of lexicon descriptors could improve the generalizability of their predictive values.

Conclusions

The present study identifies lexicon terms with high and low discriminatory power for the prediction of csPCa. The presented data corroborate the importance of DWI/ADC- and DCE-related findings in the PZ by showing favorable PPVs and NPVs in the respective lexicon terms. We identify T2WI-related terms with high PPVs: “markedly hypointense,” “water drop-shaped,” and “spiculated” for the PZ and TZ; “lobulated” for the PZ; and “erased charcoal sign”, “non-circumscribed,” and “irregular border” for the TZ. Moreover, this study demonstrates that DWI/ADC-related lexicon terms can be useful for excluding csPCa in the TZ. We show that combining T2WI-findings with findings of absent DW hyperintensity in TZ lesions significantly decreases PPVs. While with the new PI-RADS v2.1 the positive predictive potential of DWI-findings in TZ lesions has been more prominently utilized, means of incorporating the negative predictive potential of DWI-related terms for TZ lesions (e.g., by downgrading lesions) could further increase diagnostic accuracy.