Validation of the PI-RADS language: predictive values of PI-RADS lexicon descriptors for detection of prostate cancer

Objectives To assess the discriminatory power of lexicon terms used in PI-RADS version 2 to describe MRI features of prostate lesions. Methods Four hundred fifty-four patients were included in this retrospective, institutional review board–approved study. Patients received multiparametric (mp) MRI and subsequent prostate biopsy including MRI/transrectal ultrasound fusion biopsy and 10-core systematic biopsy. PI-RADS lexicon terms describing lesion characteristics on mpMRI were assigned to lesions by experienced readers. Positive and negative predictive values (PPV, NPV) of each lexicon term were assessed using biopsy results as a reference standard. Results From a total of 501 lesions, clinically significant prostate cancer (csPCa) was present in 175 lesions (34.9%). Terms related to findings of restricted diffusion showed PPVs of up to 52.0%/43.9% and NPV of up to 91.8%/89.7% (peripheral zone or PZ/transition zone or TZ). T2-weighted imaging (T2W)–related terms showed a wide range of predictive values. For PZ lesions, high PPVs were found for “markedly hypointense,” “lenticular,” “lobulated,” and “spiculated” (PPVs between 67.2 and 56.7%). For TZ lesions, high PPVs were found for “water-drop-shaped” and “erased charcoal sign” (78.6% and 61.0%). The terms “encapsulated,” “organized chaos,” and “linear” showed to be good predictors for benignity with distinctively low PPVs between 5.4 and 6.9%. Most T2WI-related terms showed improved predictive values for TZ lesions when combined with DWI-related findings. Conclusions Lexicon terms with high discriminatory power were identified (e.g., “markedly hypointense,” “water-drop-shaped,” “organized chaos”). DWI-related terms can be useful for excluding TZ cancer. Combining T2WI- with DWI findings in TZ lesions markedly improved predictive values. Key Points • Lexicon terms describing morphological and functional features of prostate lesions on MRI show a wide range of predictive values for prostate cancer. • Some T2-related terms have favorable PPVs, e.g., “water-drop-shaped” and “organized chaos” while others show less distinctive predictive values. DWI-related terms have noticeable negative predictive values in TZ lesions making DWI feature a useful tool for exclusion of TZ cancer. • Combining DWI- and T2-related lexicon terms for assessment of TZ lesions markedly improves PPVs. Most T2-related lexicon terms showed a significant decrease in PPV when combined with negative findings for “DW hyperintensity.” Electronic supplementary material The online version of this article (10.1007/s00330-020-06773-1) contains supplementary material, which is available to authorized users.


Introduction
Multiparametric magnetic resonance imaging (mpMRI) has emerged as a vital tool in the diagnosis of prostate cancer (PCa). Along with the widespread adoption of prostate MRI, a standardized interpretation and reporting of mpMRI findings have become necessary [1,2]. For this purpose, the Prostate Imaging Reporting and Data System (PI-RADS) was developed, based on a synthesis of expert consensus and available evidence. The revised PI-RADS version 2 (v2) was released in December 2014 [3]. Recently, in March 2019, modifications to PIRADS v2 have been published constituting an updated version termed PI-RADS v2.1 [4]. As PI-RADS is intended to be a document in evolution, studies have been encouraged to test its efficacy.
Despite recent developments in improving quantitative radiological methods, the vast majority of prostate cancer diagnosis on MRI is performed in the traditional "radiologist reporting" setting. Thus, the vocabulary and subjective assessments of the radiologist are the cornerstones of the reports' validity.
PI-RADS scoring is done by assessing lesions' features on T2-weighted (T2WI), diffusion-weighted (DWI), and dynamic contrast-enhanced imaging (DCE). The assessed criteria include lesions' signal intensity, shape, margins, size, and invasive behavior/extraprostatic extension. The PI-RADS v2 document provides a lexicon with defined descriptors in its appendix (Appendix III) that constitutes the very foundation of this assessment [3]. Analyzing and understanding this very foundation of PI-RADS could enable us to identify descriptors with high diagnostic accuracy, thus allowing these to be incorporated more prominently into the scoring criteria, while reducing the importance of descriptors with low accuracy.
Lexicon terms and their definitions remain largely unchanged in PI-RADS version 2.1, with the only changes being the redefinition of the term "negative DCE," a new definition of the term "marked" and the introduction of the terms "typical" and "atypical BPH nodule." A number of studies have assessed the diagnostic value of the PI-RADS score [5][6][7][8][9], but to date, only sparse data is available concerning the underlying terminology. To the best of our knowledge, only few studies with small study populations have addressed the discriminatory power of individual lexicon terms [10,11]. Therefore, the objective of this study is to systematically assess the diagnostic value of individual descriptors as specified in the PI-RADS v2 lexicon in a large patient cohort.

Patients
The inclusion criteria for this retrospective study were the availability of a prostate MRI between January 2012 and July 2015 and subsequent in-house targeted MRI/TRUS fusion biopsy (TB) in combination with a 10-core systematic biopsy in the same session. From a total of 526 eligible patients, 72 patients with incomplete or non-standard MRI or MRI performed at an external institution were excluded. These exclusions left a final cohort of 454 patients. Patient characteristics are summarized in Table 1. Figure 1 contains a STARD 2015-compliant patient flow diagram [12] for the study. The study protocol was approved by the institutional review board and patient consent was waived due to the retrospective design of the study. Subgroups of the same collective with various study endpoints have been included in earlier publications pertaining to the accuracy of prostate biopsies [13][14][15][16][17][18].

MR imaging
All imaging was performed on one of two identical 3-T MRI scanners (Skyra, Siemens Healthineers). The following imaging parameters were used in all patients: axial and coronal T2WI with a resolution of 3.0 × 0.47 × 0.47 mm, axial DWI with a resolution of 3 × 1.4 × 1.4 mm with measured b values of 0, 50, and 500 and high b value (800, 1000, or a calculated b value of 1400 s/mm 2 ), and additional T1 axial and T2 axial and sagittal imaging of the whole pelvis. In 242 patients (54.6%), DCE imaging was performed additionally with a spatial resolution of 3 × 1.4 × 1.4 mm, a temporal resolution of 5 s, and a 3 ml/s injection flow (Gadobutrol, Gadovist, Bayer Healthcare).

Imaging review and lexicon term assessment process
Four hundred fifty-four MRI imaging datasets were divided into four similarly sized subgroups (113-114 each). Each group was evaluated by one of four readers (A.B., M.H., C.L., P.A.), all board-certified radiologists with more than 5 years of experience in prostate MRI. Each lesion was assessed once by a single reader using a dedicated in-house built reading software. The software presents all imaging in a standardized way to the reader (Fig. 2). Readers were blinded to all patient-related data including the initial radiological report and histopathological results. Readers were instructed to mark the most suspicious lesion or lesions in an MRI and tag every marked lesion with matching lexicon terms complying with the definitions supplied by the PI-RADS v2 lexicon. Definitions of all lexicon terms were displayed in the reading software exactly as specified in the lexicon of the original PI-RADS v2 document [3]. Table 2 contains a full list of the used terms and their classification. All groupings of lexicon terms (DWI-related, shape-related, border-related terms, etc.) were tagged separately. Lesions were attributed to either the peripheral zone (PZ) or the transition zone (TZ) and localized according to the segmentation model used in PI-RADS v2 [3]. Lesions that extended through PZ as well as TZ and lesions that were located in the anterior stroma (AS) or central zone (CZ) were assigned to either the PZ or the TZ group depending on the most probable zone of origin.

Reference standard
Prostate biopsies taken ahead of this study were performed by experienced urologists or interventional radiologists using one of two biopsy devices (Aplio 500, Toshiba or HI VISION Preirus, Hitachi Medical Systems) and consisted of TB and systematic 10-core biopsy. These were used as a reference standard. Histopathological findings were classified according to the Gleason grading system [19]. A Gleason score (GS) of 3 + 4 or higher on TB or in a matching segment on systematic biopsy was considered a positive finding for clinically significant prostate cancer (csPCa). Tumor size or volume was not taken into consideration since size analysis was outside the scope of this study. Histopathological findings that indicated no cancerous changes (no tumor cells, acute prostatitis, chronic prostatitis, prostatic intraepithelial neoplasia, or benign prostatic hyperplasia) and GS 3 + 3 tumor were considered non-csPCA.

Statistical evaluation
Positive and negative predictive values (PPVs, NPVs) as well as sensitivity and specificity for detection of csPCa were computed for each of the terms within each zone. For TZ lesions, PPVs of shape and border terms in combination with DWI/ADC terms were additionally analyzed. PPVs of term combinations were compared with PPVs of single terms using the generalized score test by Leisenring, Alonzo, and Pepe [20]. Results were declared to be significant if p < 0.05. Statistical evaluation was performed using R version 1.1.419 (www.r-project.org) and Microsoft Excel version 16.16.17.

Lesion characteristics
The readers marked 515 MRI lesions in the 454 MRI datasets. In 5 patients, no lesions were marked by the readers. Thirteen lesions were excluded due to unclear documentation of biopsy locations and one lesion was excluded due to its location in the seminal vesicles. This left a total of 501 lesions in 443 patients. CsPCa was detected in 175 (34.9%) of the lesions. As shown in Fig. 1, 300 (59.9%) lesions were found in the PZ; prevalence of csPCa in PZ lesions was 113 (37.7%). Two hundred one lesions (40.1%) were located in the TZ; prevalence of csPCa in TZ lesions was 62 (30.8%). Table 2 and Fig. 3a show PPVs and NPVs of the analyzed lexicon terms. Sensitivity and specificity are presented in Table 4  In a few cases, the readers also tagged PZ lesions with lexicon terms that are designed to describe TZ lesions (e.g., "erased charcoal sign" and "organized chaos"); these results are given in Table 2 and supplementary Table 4 for the sake of completeness.

PPVs of border and shape terms combined with diffusion-related terms used for TZ lesions
Combining shape and border terms with the term "diffusionweighted hyperintensity" yielded the most distinctive PPVs for TZ lesions. PPVs of these term combinations are shown in Table 3. In most cases, combining a positive finding of "DW hyperintensity" with a shape-or border-related term increased the PPV mildly compared with the border/shape term alone, but this increase was of no statistical significance in most cases. On the other hand, combining border/shape terms with a negative finding for "DW hyperintensity" yielded significant changes in PPV, mostly. The following terms showed a statistically significant decrease of PPV when combined with a negative finding for "DW hyperintensity": Definitions of lexicon terms were displayed exactly as specified in the original PI-RADS version 2 document when hovering the cursor over a term. T2WI, T2-weighted imaging; DWI/ADC, diffusion-weighted imaging/apparent diffusion coefficient; T1WI, T1-weighted imaging; DCE, dynamic contrast-enhanced imaging circumscribed, non-circumscribed, indistinct, obscured, irregular border, round, oval, lenticular, lobulated, water-dropshaped, and irregular shape.

Discussion
In this study, the predictive power of PI-RADS v2 lexicon terms was analyzed with the aim of adding to the quantitative support of the PI-RADS guideline and identifying areas of improvement.
On the one hand, the presented data corroborates the use of many established assessment criteria in PI-RADS. Lexicon terms indicating a restricted diffusion in PZ lesions showed favorable combinations of both relatively high PPVs and high NPVs (e.g., PPV of 50.7% and NPV of 90.5% for the term "restricted diffusion"). Our work therefore confirms the importance of diffusion-related findings in the PZ [21][22][23].   Fig. 3 Predictive values of PI-RADS v2 lexicon terms for peripheral zone (a) and transition zone (b). Current PI-RADS v2.1 assessment criteria with their respective predictive values as found in this study are shown in (c). PPVand NPVapproximating 1 were considered favorable for terms indicating malignancy. PPV and NPV approximating 0 were considered favorable for terms indicating benignity. PPV, positive predictive value; NPV, negative predictive value; DCE, dynamic contrast-enhanced imaging; DWI, diffusion-weighted imaging benign findings, such as "encapsulated" and "organized chaos" for TZ lesions or "linear" for PZ lesions, were consistently associated with benign biopsy outcomes in this study.
On the other hand, our study showed areas of discriminatory potential that are currently not fully utilized in the PI-RADS v2 and v2.1 assessment: In PI-RADS v2, diffusionrelated findings play a minor role for TZ scoring [3]. In PI-RADS v2.1, however, the DWI score has gained more importance and scores of 4 and 5 can now upgrade the overall score of a TZ lesion [4]. This adjustment is consistent with studies demonstrating lower ADC values in TZ cancers than in BPH nodules, albeit with a large overlap of ADC values [26,[28][29][30], and studies showing higher diagnostic accuracy when T2WI-and DWI assessments were combined in TZ lesions [11,31]. The presented data shows another potential of DWI findings: the terms "restricted diffusion," "diffusion-weighted hyperintensity," and "ADC hypointense" had high NPVs of 89.4% to 91.8% for TZ lesions, whereas T2WI-related terms showed lower NPVs with a maximum of 66.0% for TZ lesions. Combining T2WI-related border and shape features with a finding of absent diffusion-weighted hyperintensity lowered the respective PPV markedly in most cases compared with the T2WI-related term alone. The means of integrating this negative predictive potential of DWI-related terms into the scoring system in addition to the positive predictive potential of T2WI-related findings could therefore be considered. From the data presented in this study, the absence of features related to restricted diffusion can be assumed to have a good potential to exclude csPCa in the TZ and could be used to downgrade TZ lesions.
Furthermore, we identified a number of border-and shaperelated terms with high PPVs, which are currently not explicitly included as criteria for PI-RADS v2 and v2.1 scores: For both PZ and TZ lesions, the descriptors "lenticular," "lobulated," and "spiculated" showed rather high PPVs between 44.7 and 58.6%. In TZ lesions, the descriptors "water-dropshaped," "irregular," "non-circumscribed," and "erased charcoal sign" also showed high PPVs between 50.0 and 78.6%. In a previous study with 14 included patients, Pokharel et al [11] found a similar PPV for TZ lesions with an irregular border (55%). The term "organized chaos" showed a favorable PPV of 5.4% indicating benignity in TZ lesions. Including these highly discriminatory terms into the assessment criteria for PI-RADS categories should be considered.
There are a number of limitations to this study, potentially influencing the generalizability of the results. The standard of reference was histopathologic results of TB and a systematic 10-core biopsy taken ahead of the re-read conducted in this study. Although this method has been shown to have high rates of detection of malignant diseases [32], some malignant lesions may not have been targeted upon TB and may have been missed upon systematic biopsy. Rouvière et al [33] report in a large, prospective, multicenter study that GS ≥ 7 tumor would have been missed in 7.6% (95% CI, 4.6-11.6%) of patients, had TB not been done. The presence of false negatives could influence the reliability of diagnostic values referring to benign-appearing lesions especially, since these were not targeted upon TB. Furthermore, the fact that only patients that underwent biopsy were included may affect the results pertaining to benign-appearing lesions, as patients without suspicious lesions on MRI and low clinical risk factors did not undergo biopsy and were thus excluded from the study. Moreover, benign-appearing lesions in subjects with other suspicious areas may not have been identified as targets by the readers. A more reliable standard of reference would be histopathology after surgical prostatectomy, though this would bias the underlying collective towards mediumaggressive cancers.
Another limitation stems from interreader variability, which was not assessed in this study, as each MRI was read once by a single experienced reader. Additionally, three of the four readers worked at the same institution. The assignment of lexicon terms is, however, a subjective process. Several other studies have shown moderate interobserver agreement for PI-RADS final assessment [8,34] and few studies have also demonstrated moderate interobserver agreement for assignment of individual lexicon terms [10,34]. Nevertheless, larger multicenter studies that assess interobserver agreement in the assignment of lexicon descriptors could improve the generalizability of their predictive values.

Conclusions
The present study identifies lexicon terms with high and low discriminatory power for the prediction of csPCa. The presented data corroborate the importance of DWI/ADC-and DCErelated findings in the PZ by showing favorable PPVs and NPVs in the respective lexicon terms. We identify T2WIrelated terms with high PPVs: "markedly hypointense," "water drop-shaped," and "spiculated" for the PZ and TZ; "lobulated" for the PZ; and "erased charcoal sign", "noncircumscribed," and "irregular border" for the TZ. Moreover, this study demonstrates that DWI/ADC-related lexicon terms can be useful for excluding csPCa in the TZ. We show that combining T2WI-findings with findings of absent DW hyperintensity in TZ lesions significantly decreases PPVs. While with the new PI-RADS v2.1 the positive predictive potential of DWI-findings in TZ lesions has been more prominently utilized, means of incorporating the negative predictive potential of DWI-related terms for TZ lesions (e.g., by downgrading lesions) could further increase diagnostic accuracy.
Funding information Open Access funding provided by Projekt DEAL. The authors state that this work has not received any funding.

Methodology
• Retrospective • Diagnostic or prognostic study • Performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.