Introduction

Prostate cancer (PCa) is one of the most common malignant cancer in man. It is estimated that approximately 1.3 million new cases worldwide every year, and currently about 10 million people are living with PCa [1, 2]. MRI plays an important role in localizing, diagnosis, and staging of PCa [3,4,5]; moreover, previous studies demonstrated that MRGB is superior to conventional standard transrectal ultrasonography (TRUS)–guided biopsy, which could considerably reduce the risk of upgrading Gleason Score (GS) 3 + 4 lesions as compared to standard biopsy [6]. In 2019, the American College of Radiology and the European Society of Urogenital Radiology updated the PI-RADS to version 2.1, which is a standardized scoring system for performing, interpreting, and reporting the PCa with multiparametric MRI (mpMRI) [7]. According to PI-RADS, the full mpMRI examination consists of at least two orthogonal planes T2WI images, axial T1WI images, axial diffusion-weighted images (DWI) and their apparent diffusion coefficient (ADC) values along with axial dynamic contrast-enhanced (DCE) images. However, several preliminary studies demonstrated that DCE plays only a secondary role in the transition zone (TZ) because it is not reliable for the differentiation between prostate cancer and benign prostatic hyperplasia. Moreover, it is also ignored in the peripheral zone (PZ) [8]. Therefore, bpMRI has been investigated and demonstrated equivalent diagnostic performance to mpMRI while using PI-RADS, in which DCE was omitted thereafter this could decrease image acquisition time and costs, while retaining sufficient accuracy [9]. Nevertheless, some studies showed that bpMRI had higher specificity but lower sensitivity as compared to mpMRI [9]. Therefore, the combination of bpMRI and other clinical variables and biomarkers should be considered to improve the overall diagnostic performance. Recently, several simplified or revised scorings based on bpMRI had been proposed; however, these scorings have not been compared directly up to date [8, 10,11,12]. In this study, we aimed to compare the diagnostic performance of 3 scorings that based on bpMRI with PI-RADS v2.1. Additionally, we constructed integrated nomograms combining scorings based on bpMRI and clinical variables using multivariable logistic regression to identify csPCa.

Materials and methods

Patient selection

This retrospective study was approved by our institutional review board and the requirement for written informed consent was waived. We reviewed the medical records of our institutional database to identify patients who had the suspicion of PCa (with PSA levels > 4 ng/mL). Those who underwent mpMRI and subsequent MRGB between February 2017 and October 2021 were considered potentially eligible. Initially, 401 consecutive patients were identified, of them 77 were excluded for reasons as follows: (1) with PCa diagnosis or treatment prior to MRI (n = 29); (2) incomplete systematic biopsy (n = 14); and (3) insufficient clinical information (n = 34). Finally, 324 patients with suspected PCa were included in this study, of which 217 were assigned to the training group and 107 were assigned to the validation group. Figure 1 presents the detailed patient selection process.

Fig. 1
figure 1

Flowchart of the study population with the exclusion criteria. csPCa, clinically significant prostate cancer

MRI acquisition and interpretation

The prostate MRI examination was performed with a 3.0 T MRI scanner (Philips Ingenia, The Netherlands), and a pelvic 32-channel phased array coil was used for all patients. According to recommendations of PI-RADS v2.1 guideline, the following imaging protocols were used: axial turbo spin echo (TSE) T2 weighted image (T2WI, repetition time [TR] = 3000ms; echo time [TE] = 100ms; slice thickness = 3.0 mm, the scanning field of view (FOV) = 220 mm×220 mm, matrix = 276 × 240), sagittal TSE T2WI (TR = 6000ms, TE = 77ms, slice thickness = 3.0 mm, FOV = 220 mm×220 mm, matrix = 104 × 125), and axial DWI (TR = 6000ms, TE = 77ms, layer thickness = 3.0 mm, FOV = 260 mm×260 mm, matrix = 104 × 125) and multiple b values (b = 0, 100, 1000, 2000 s/mm2).

All examinations were independently reviewed by two board-certified radiologists (H.B.B., with 3 years of experience and J.Y.M., with 7 years of experience), who were blinded to clinical information and pathologic results. Three scorings (Scoring 1, Scoring 2, and Scoring 3) derive from the PI-RADS to assess each lesion, in which the DCE sequence was omitted. Regarding Scoring 1, the final score was determined as follows: for PZ lesions, the DWI sequence is the dominant scoring sequence, and TZ lesions remain unchanged; for Scoring 2, T2WI replaces the DCE sequence to determine the final score for PZ lesions; however, when DWI = 3 and when T2WI ≥ 4 the final score is upgraded to 4, and TZ lesions are unchanged; with respect to Scoring 3, the total score is determined by combining the T2WI and the DWI regardless of PZ and TZ lesions, details regarding these 3 scorings are summarized in Table 1.

Table 1 Three Scorings Based on bpMRI

Biopsy and histopathology

All patients underwent systematic TRUS-guided biopsy after at least 4 weeks of the MRI examination, in which “10 + x” cores were obtained in a double sextant pattern, sampling the lateral and medial portions of the apex, mid, and base of each hemi-gland. After examination of mpMRI, patients with suspicious lesions underwent fusion biopsy by using an ESAOTE Mylab Twice color Doppler ultrasound device (with a 7.5-MHz transrectal end-fire probe), which equipped with a Real-time Virtual Sonography imaging fusion system.

All fusion biopsies were performed by a urologist with 7 years of experience in prostate biopsy. Specimens (biopsy or radical prostatectomy) were assigned Gleason score (GS) by an expert genitourinary pathologist with more than 15 years of experience who was blinded to the MRI findings. All GS were assigned in concordance with the 2014 International Society of Urological Pathology consensus guideline [13]. Clinically significant PCa was defined as GS ≥ 7 and tumor diameter ≥ 5 mm. Prostate volume (PV) was calculated according to the ellipsoid volume formula (transverse width × transverse length × longitudinal height × 0.52), and PSAD was calculated by dividing the tPSA level by PV (tPSA/PV), tumor size was primarily calculated from T2WI.

Statistical analysis

For each scoring, we calculated their sensitivity and specificity along with their 95% confidence intervals (CI). The best threshold for each scoring that balanced sensitivity and specificity was calculated using Youden’s index [14]. For Scoring 1 and Scoring 2, ≥ 4 was defined as positive, while for Scoring 3 positive was defined as ≥ 8. Subsequently, we used the univariable and multivariable logistic regression analysis to identify independent predictors associated with PCa, which includes age, total PSA (tPSA), free PSA (fPSA), PV, PSAD, and scorings. After conducting univariable logistic regression analysis for all potential clinical variables and biomarkers, we performed multivariable logistic regression analysis to investigate significant clinical factors for csPCa. For each scoring we constructed corresponding model (Model 1, Model 2, and Model 3) combined with clinical variables, overall diagnostic performance of AUCs were calculated and compared with DeLong’s test, the best was defined as the one with the largest AUC [15]. A nomogram for the best combination in the multiple logistic regression analyses was generated for the prediction of csPCa. All analysis was performed using R statistical software (version 3.6.1).

Results

Patient characteristics

The patient characteristics are presented in Table 2. In the training group, 68 of 217 patients (31.3%) were diagnosed with csPCa based on the results from both targeted and/or systemic biopsies, whereas in the validation group 36 of 107 patients (33.6%) were diagnosed with csPCa.

Table 2 Characteristics of Patients

Diagnostic performance of 3 scorings based on bpMRI

For the detection of csPCa, the sensitivity and specificity for Scoring 1 at cutoff ≥ 4 were 64.7% (95% CI 52.2%-75.9%) and 80.5% (95% CI 73.3%-86.6%); for Scoring 2 were 86.8% (95% CI 76.4%-93.8%) and 73.2% (95% CI 65.3%-80.1%) at cutoff ≥ 4; and for Scoring 3 were 61.8% (95% CI 49.2%-73.3%) and 80.5% (95% CI 73.3%-86.6%) at cutoff ≥ 8, respectively. The results showed that Scoring 2 had significantly higher sensitivity compared with Scoring 1 (P < 0.001) and Scoring 2 (P = 0.03) but with lower specificity (P < 0.01 for both). Figure 2 shows an example of one lesion categorized by 3 scorings.

Fig. 2
figure 2

A 63 years old man with elevated PSA of 11.1ng/ml and PSAD of 0.27ng/ml/ml. (a) A marked focal lesion (white arrow) of 11 mm in the right of the PZ with low signal intensity on T2WI; (b) high signal intensity on high b-value image of the DWI; (c) on the ADC map, which showed hypointensity; (d) histopathologic image, Gleason score of 4 + 4 at prostatectomy. This lesion was assigned as score 3, 4, and 7 according to Scoring 1, Scoring 2, and Scoring 3, respectively

Logistic regression analysis

Our univariate logistic regression analysis revealed that age (OR = 1.06, 95% CI 1.03–1.10), PSA (OR = 1.05, 95% CI 1.03–1.07), PV (OR = 0.98, 95% CI 0.97–0.99), and PSAD (OR = 34.27, 95% CI 9.87-119.02), and 3 scorings (OR = 2.87, 95% CI 2.13–3.87 for Scoring 1; OR = 3.51, 95% CI 2.49–4.95 for the Scoring 2; and OR = 1.76, 95% CI 1.49–2.08 for Scoring 3) were significant predictors for csPCa. Nevertheless, as PSA and PV were strongly correlated with PSAD, which were excluded from the results. Finally, age, PSAD, and scorings were included in the multivariable logistic regression analyses. Table 3 shows the details of univariate and multivariate logistic regression analyses. For 3 models, the AUCs were 0.88 (95% CI 0.83–0.93) for Model 1, 0.90 (95% CI 0.85–0.94) for Model 2, and 0.88 (95% CI 0.83–0.93) for Model 3 (Fig. 3). For all models, the AUCs were significantly higher than using the scorings alone. We performed comparisons between these 3 models based on bpMRI, and the results suggested that Model 2 had significantly higher performance that Model 1 (P = 0.04) and Model 3 (P = 0.02) with DeLong’s test.

Table 3 Univariate and Multivariate Logistic Regression Analysis
Fig. 3
figure 3

ROC for the comparison 3 models. A, area under the ROC curve for 3 scorings; B, area under the ROC curve for 3 models. AUC, area under the ROC curve

Internal validation

On the validation data set, the AUCs were 0.88 (95% CI 0.81–0.95) for Model 1, 0.90 (95% CI 0.84–0.96) for Model 2, and 0.88 (95% CI 0.82–0.95) for Model 3. Delong’s test showed that the prediction of Model 2 was significantly higher as compared with Model 1 (P = 0.04), but did not substantially different than Model 3 (P = 0.05). A nomogram based on the combination of Scoring 2, age, and PSAD was constructed, in which points from each variable are added and a straight line from the total point score shows the probability of harboring csPCa (Fig. 4).

Fig. 4
figure 4

Construction of a nomogram for predicting the probability of clinically significant prostate cancer. bpMRI, biparametric magnetic resonance imaging; PSAD, prostate-specific antigen density

Discussion

In this study, we evaluated 3 scoring systems based on bpMRI for the detection of csPCa. Our findings demonstrated that all three scorings exhibited high diagnostic performance, and the sensitivity and specificity for Scoring 1 were 64.7% and 80.5%, for Scoring 2 were 86.8% and 73.2%, and for Scoring 3 were 61.8% and 80.5%, respectively. Notably, with Scoring 1 significantly more csPCa were identified as compared with Scoring 2 (P < 0.001) and Scoring 3 (P = 0.03); however, the specificity was decreased substantially (P < 0.001). In light of limited performance that merely depends on scoring systems, we constructed integrated nomograms combining clinical variables of age, PSAD, and scorings to identify csPCa lesions.

Our analysis revealed that all 3 models were substantially outperformed using scoring systems alone; moreover, we noted that Model 2 (AUC 0.90) was significantly superior to Model 1 (AUC 0.88, P = 0.04) and Model 2 (AUC 0.88, P = 0.02). To validate these three nomograms, we analyzed 107 patients (36 pathologically proved csPCa) in the validation group. The results showed that the diagnostic performance was consistent with results from the training group, and the AUC for the three models were 0.88, 0.90, and 0.88, respectively. Furthermore, Model 2 was significantly superior to Model 1 while compared using the AUC (P = 0.04); however, the difference was not statistically significant as compared with Model 3 (P = 0.05). Previous studies had reported that combining PI-RADS and other clinical variables or biomarkers was beneficial for improving the diagnostic performance for the detection of PCa; however, most of the nomograms or models from these studies were based on mpMRI and have not been validated in another cohort set [11, 16, 17]. In this study, we independently validated our nomograms using a different dataset, and the results showed the models generalized well.

Given the inconvenience and limitation of mpMRI, several preliminary studies have investigated bpMRI that omit the DCE sequence, as which is not need intravenous contrast media and is faster (up to 15 min) [18]. Many studies reported that the bpMRI has comparable performance with mpMRI, and the DCE was not necessary or secondary, especially for PZ lesions; however, the role of DCE imaging in the prediction of PCa has been the topic of much discussion. The use of DCE imaging is increasingly controversial, and more and more studies suggested that it could be abandoned [8, 10, 19, 20]. A previous meta-analysis that included 11 head-to-head studies comparing bpMRI and mpMRI showed that the former had lower sensitivity but higher specificity [9]. Nevertheless, Greer et al. demonstrated that DCE could improve the probability of cancer detection of PI-RADS category 2, 3, and 4 for PZ lesions; however, this study reported that the high rate of DCE positivity may lead to higher false-positive findings [10]. In the current study, our results showed that without DCE, bpMRI still yielded high diagnostic performance. Furthermore, fewer MRI sequences led to the higher inter-reader agreement, and a simplified scoring system could benefit those radiologists with less experience.

One problem that deserves attention is that no widely accepted standard for how bpMRI addresses the omission of DCE. Some studies proposed that replace the DCE sequence with T2WI [21,22,23], while others recommend abandoning the DCE sequence [11]. In our study, three different modified bpMRI scorings were evaluated and have been compared with each other, and our results were consistent with the study of Boesen et al. [24]. In that study, scoring of lesions in the PZ only depended on DWI findings (dominant sequence), and an equivocal score of 3 was not potentially upgraded to a score of 4 for lacking positive DCE findings. In the study of De Visschere et al., the absence of DCE was resolved by using T2WI for PZ lesions while the DWI score was 3, and three different threshold scores were used for T2-positivity (≥ 3, ≥ 4, and 5).

Our study has some limitations. First, this was a single-center retrospective study and patient selection bias may limit the generalizability of our study, thus the results and conclusions need validation in prospective multi-center studies with a larger number of patients. Second, we did not report PZ and TZ lesions separately. Considering different dominant MRI sequences for these 2 anatomy zones, the detailed diagnostic for PZ and TZ lesions should be investigated in future study. Third, use of systematic and/or MRI-TRUS fusion targeted biopsy as the reference standard may lead to some lesions with positive pathology missed as which showed negative in MRI. Last, in this study we did not report the inter-reader agreement between radiologists, which is important for the standardized scoring system as it relates to reducing the variability of interpretation. However, because our main goal was to develop and validate nomogram for csPCa, investigation of the inter-reader agreement was beyond the scope of our study.

Conclusions

In this study, we compared three scorings that based on bpMRI and the results showed that Scoring 2 performed better than the remaining two scorings. Additionally, 3 models were constructed using 3 scorings with age and PASD and Model 2 that based on Scoring 2 yielded the highest performance.