Background

Prostate cancer (PCa) is the most common malignant tumor among men in Europe and America, and the second leading cause of cancer-related deaths. Early detection of PCa is crucial for proper diagnosis and treatment. The 2017 European Association of Urology Prostate Cancer Guidelines [1] recommend radical therapy as a definitive treatment for intermediate- and high-risk patients with clinically significant prostate cancer (csPCa), defined as having an International Society of Urological Pathology Gleason grade group (ISUP GGG) of 2 or higher. Patients with very low- or low-risk disease are classified as having clinically insignificant prostate cancer (non-csPCa), defined as having a GGG of 1 or some moderately favorable risk (GGG 2), and are recommended to undergo active surveillance without any definitive treatment. Therefore, accurate identification of csPCa versus non-csPCa is critical when determining treatment options.

Currently, prostate cancer is typically diagnosed by conducting a biopsy on men who have elevated levels of serum prostate-specific antigen (PSA) and/or exhibit abnormal digital rectal examination (DRE) results. The biopsy result is then used to determine the aggressiveness of csPCa by measuring the ISUP grading group. However, due to the multifocal and heterogeneous nature of prostate cancer and random undersampling of the entire prostate, conventional 12-core systematic biopsy can produce a false negative rate of up to 30%. To improve the detection of csPCa, a technique called image fusion-guided biopsy is used, which combines mpMRI and ultrasound images to guide needle placement for prostate biopsy. This technique has been shown to improve the detection of csPCa compared to conventional systematic biopsy. However, it is still possible to underestimate csPCa, even when using mpMRI guidance during biopsy.

Over the past few decades, the use of mpMRI in the detection and staging of prostate cancer has become increasingly common. It can help to determine which men with elevated PSA levels should undergo biopsy, which can reduce unnecessary biopsies and increase the sensitivity of detecting csPCa [2]. Additionally, mpMRI has shown potential for predicting the Gleason score with moderate to high accuracy, particularly for csPCa with a Gleason score of 3 + 4 or higher [3]. Therefore, it is reasonable to assume that incorporating mpMRI can improve the accuracy of detecting csPCa in patients initially diagnosed with non-csPCa based on biopsy results. However, interpreting mpMRI for the detection and characterization of PCa requires specialized training and expertise in radiology and prostate cancer.

Using artificial intelligence (AI) methods, such as deep learning (DL), can improve the detection and classification of PCa on mpMRI images [4]. Previous studies have shown that AI can improve the accuracy and efficiency of PCa detection on mpMRI images by automatically detecting and segmenting suspicious areas for further evaluation by a radiologist [5]. Several recent studies have indicated the potential of AI in predicting tumor invasiveness of biopsy pathology [6,7,8]. Some researchers have also proposed the use of handcrafted or deep radiomics image features for predicting tumor invasiveness [9, 10]. However, there are a limited number of studies that have specifically evaluated tumor invasiveness in the post-biopsy assessment, taking into account both biopsy pathology results and mpMRI findings. Thus, in this study, we explored the feasibility of using mpMRI image features predicted by AI algorithms in the prediction of csPCa in comparison and in combination with biopsy pathology.

Materials and methods

Data enrollment

This retrospective study was approved by the institutional review board (IRB number: 2022-LY-361), which waived written patient informed consent. The data were retrospectively gathered from our hospital.

Patients who received prostate mpMRI and subsequent RP between November 2017 and December 2022 were included. The mpMRI images and clinical information, including age, PSA, GS of the biopsy pathology, and GS of the RP, were obtained from the picture archiving and communication system (PACS) and the electronic medical record (EMR) system. The exclusion criteria were (1) prior endocrine therapy, (2) benign prostate hyperplasia on RP pathology, (3) missing PSA data, (4) incomplete biopsy pathology records, (5) incomplete MR images, (6) obvious image artifacts, and (7) prostate cancer volume < 0.5 cm3 on MR images.

Reference standard

All patients underwent biopsy and RP with available pathology samples. The pathology was reviewed and reported by experienced pathologists according to the ISUP group. The reference standard was established based on the RP pathology results, with GS 3 + 3 = 6 considered non-csPCa and GS ≥ 3 + 4 considered csPCa. The data enrollment process is illustrated in Fig. 1.

Fig. 1
figure 1

Data enrollment and research process. The study involved the collection of cases with comprehensive clinical and mpMRI images. A pre-trained AI model was employed to identify the regions of interest (ROI) corresponding to the suspected lesion areas. Subsequently, image features were extracted from the ROIs based on specific criteria. Finally, two prediction models, the MR model and the combined model, were trained using logistic regression

MR scanning protocols

The mpMRI images were obtained from three MR scanners, with 176 cases (55.9%) acquired from a 1.5 T scanner, 135 cases (42.9%) acquired from a 3.0T scanner, and 4 cases (1.2%) acquired from a 1.436T scanner. The transmit coils were body coils, and the receiver coils were phased array coils. No endorectal coil was used. Table 1 provides details on the MR scanners and imaging parameters.

Table 1 MR scanning protocols

Lesion segmentation by AI algorithms

The MR images were anonymized using self-developed software written in C +  + . The patient information in the DICOM file header was replaced with anonymous information using predefined rules. The software read the DICOM data, made the necessary modifications, and updated the original file to achieve complete anonymization.

After anonymization, the DICOM files were converted to the NIFTI format using the dicom2nii.py tool implemented in Python 3.5 and then input into our in-house deep learning-based AI model for the segmentation of suspicious PCa foci [4]. The functionalities of the AI model include automated selection of DWI and ADC images, segmentation of the prostate gland within the images, and further segmentation of suspicious prostate cancer regions. The segmented areas, identified as potentially cancerous by the AI model, were subsequently utilized for extracting image features in the next step. Notably, none of the cases in this study were previously used for training the AI model. Thus, in this study, the AI model was externally validated.

Extraction of image features

First, within the suspicious lesion regions segmented by the AI, the largest lesion is identified and defined as the index lesion. Subsequently, the following image features were calculated from the index lesion and used in the prediction model: (1) lesion volume, (2) mean apparent diffusion coefficient (ADC) value of the lesion (ADClesion), (3) ADC value of the prostate outside the lesion (ADCprostate), (4) ratio of ADC value between the lesion and the prostate outside the lesion (ADClesion/prostate), (5) mean signal intensity of the lesion on diffusion-weighted imaging (DWI) (DWIlesion), (6) DWI signal intensity of the prostate outside the lesion (DWIprostate), (7) signal intensity ratio of DWI between the lesion and the prostate outside the lesion (DWIlesion/prostate), (8) signal intensity ratio of the lesion between DWI and ADC (DWIlesion/ADClesion), (9) the ratio of DWIlesion/prostate to ADClesion/prostate, and (10) the volume of the prostate gland. Moreover, the volume of the prostate gland was used to calculate the prostate-specific antigen density (PSAD), as shown in Fig. 2.

Fig. 2
figure 2

An example of AI segmented lesions and extraction of image features on mpMRI. A 74-year-old man with a serum PSA level of 7.93 ng/ml had mpMRI images showing multiple lesions on DWI (a) and ADC map (b). The DWI and ADC maps were automatically selected by the AI model, followed by segmentation of the prostate gland (blue area in (c) and (d)). Subsequently, the AI model segmented suspicious csPCa lesions on the DWI and ADC maps, as indicated by the green areas in (e) and (f). The largest lesion identified was designated the index lesion, represented by the red area in (g) and (h). Image features were then extracted specifically from the index lesion, which was classified as PI-RADS 4. Following a biopsy, pathology revealed non-csPCa with a Gleason score of 3 + 3. Subsequent pathology of the radical prostatectomy specimen showed csPCa, with a left lobe Gleason score of 4 + 4 = 8, accounting for approximately 10% of the gland, and a right lobe Gleason score of 4 + 3 = 7, accounting for approximately 7%

Prediction model development

Two logistic regression models were established to forecast csPCa after RP: an MR model and a combined model. The MR model consisted of the predictor variables such as age, PSA, PSAD, and the ten types of MR image features. The combined model included biopsy pathology outcomes and the variables mentioned earlier. Univariate analysis was performed initially, followed by a forward and backward stepwise algorithm, which employed the Akaike information criterion (AIC) to select the variables for the final multivariable model.

Model evaluation

The study evaluated the performance of three methods for predicting csPCa: biopsy pathology, MR model, and combined model. The evaluation was conducted using receiver operating characteristic (ROC) analysis, which calculates the area under the ROC curve (AUC). Decision curve analysis (DCA) compared each method’s clinical effects. Finally, a nomogram was created to visually display the performance of the prediction model.

Statistical analysis

The statistical analysis was performed using R 4.1.3 software. Descriptive statistics were used to summarize the data, with mean (standard deviation) reported for continuous variables that followed a normal distribution and median [Q1, Q3] for continuous variables that did not follow a normal distribution. Categorical variables were reported as frequencies (percentage %).

The Shapiro‒Wilk test was employed to assess the normality of continuous variables. If the continuous variables followed a normal distribution, additional testing was conducted to examine the homogeneity of variances using an F test. If the variances were found to be homogeneous, the t test was utilized to compare features between the non-csPCa and csPCa groups, while one-way ANOVA was used to compare features among the ISUP groups of post-operation pathology. ADClesion was found to be applicable in this particular scenario. On the other hand, if the variances were not homogeneous, the corrected t test was applied to compare the features between the non-csPCa and csPCa groups, and the Kruskal‒Wallis test was used to compare the features among the ISUP groups of post-operation pathology. In this case, ADCprostate and ADCprostate/lesion were deemed applicable. For continuous variables that did not conform to a normal distribution, the Mann‒Whitney test was employed to compare the features between the non-csPCa and csPCa groups, and the Kruskal‒Wallis test was utilized to compare the features among the ISUP groups of post-operation pathology. The following variables in this study fell into this category: age, PSA, PSAD, prostate volume, lesion volume, DWIlesion, DWIprostate, DWIlesion/prostate, DWIlesion/ADClesion, and DWIlesion/prostate/ADClesion/prostate.

The Nagelkerke test was used to obtain the coefficient of determination (R2) values of the multivariable regression models. The DeLong test was used to compare the AUCs of the biopsy pathology, MR model, and combined model. A P value less than 0.05 was considered statistically significant.

Results

Clinical characteristics

A total of 315 patients were enrolled in this study. The average age of the patients was 70.8 ± 5.9. Among them, 42 (13.3%) patients underwent MRI examination after biopsy, with a median interval of 15 [3, 17] days. On the other hand, 273 (86.7%) patients underwent MRI examination before biopsy, with a median interval of 6 [4, 9] days. Of the 315 patients, 59 (18.7%) were diagnosed with non-csPCa by biopsy pathology, and 256 (81.3%) were diagnosed with csPCa. However, based on RP pathology, only 18 (5.7%) patients were diagnosed with non-csPCa, while 297 (94.3%) were diagnosed with csPCa.

Table 2 provides a summary of the patient characteristics stratified by biopsy pathology and RP pathology. The median PSA level was 8.5 [5.7, 14.2] ng/mL, and the median PSAD was 0.2 [0.1, 0.3] ng/mL/cm3. The median volume of the prostate gland was 35.6 [29.3, 42.3] cm3. In terms of biopsy pathology, the median number of biopsy cores was 12 [10, 14], and the median percentage of biopsy cores positive for cancer was 30% [10%, 60%]. Among the 256 patients diagnosed with csPCa by biopsy pathology, the majority had a Gleason score of 7 (n = 173, 67.6%), followed by Gleason score 6 (n = 75, 29.3%) and Gleason score 8–10 (n = 8, 3.1%).

Table 2 Clinical characteristics of the enrolled patients

Statistically significant differences were observed among the five RP ISUP groups (Table 2) in terms of PSA, PSAD, biopsy pathology, and ADCprostate (all P < 0.05). However, no significant differences were observed in the other clinical and image features (all P > 0.05). In the comparison between the csPCa and non-csPCa groups (Table 3), significant differences were observed in terms of PSAD, PI-RADS score, biopsy pathology, and ADClesion/prostate (all P < 0.05). There were no significant differences observed in the other clinical and imaging features between the csPCa and non-csPCa groups (all P > 0.05).

Table 3 Odds ratios and their significance in the models

Model development metrics

Table 3 summarizes the results of univariable and multivariable logistic regression analyses to identify the variables associated with csPCa. The predictor variables that were independently associated with csPCa and included in the MR model were PSAD, ADClesion/prostate, DWIlesion/ADClesion, and DWIlesion/prostate/ADClesion/prostate. In the combined model, biopsy pathology, ADClesion/prostate, DWIlesion, DWIprostate, and DWIlesion/prostate were included.

Figure 3 represents the response curve of the logistic regression model. The MR model had an R2 of 0.219, indicating that it explains 21.9% of the variability in the outcome. On the other hand, the combined model had an R2 of 0.411, indicating that it explains 41.1% of the variability in the outcome. These results suggest that both the MR model and the combined model can be useful in predicting csPCa. The combined model, which includes biopsy pathology and imaging features, had a higher R2 value and, thus, may provide more accurate predictions.

Fig. 3
figure 3

Visualization of the regression models. This plot represents the response curve of the logistic regression model, providing a visual representation of the results obtained from the generalized linear models of the MR model (a) and the combined model (b). The x-axis represents the logit transformation of the response variable (Y). The y-axis represents the predicted probability, showing the estimated probability of the response variable (Y) falling into the “success” category (such as csPCa or non-csPCa) based on the predictor variables used in the generalized linear model. This plot aids in visualizing the relationship between the predictor variables and the probability of the response, enabling a better understanding of the model’s behavior and its predictions. A Nagelkerke test was conducted, resulting in an MR model with an R2 of 0.219, indicating that it explains 21.9% of the variability in the outcome. On the other hand, the combined model had an R2 of 0.411, indicating that it explains 41.1% of the variability in the outcome

Model evaluation

The predictive performance of the biopsy pathology, MR model, and combined model were evaluated using ROC analysis, and the results are displayed in Fig. 4. To compare the classification ability of the same method for csPCa in different regions, we conducted separate analyses for the lesions located in the peripheral zone (PZ, n = 59), the transitional zone (TZ, n = 76), and lesions located across both the TZ and PZ (n = 180), in addition to the analysis of the entire cohort (n = 315). The evaluation metrics, including AUC, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value, are presented in Table 4.

Fig. 4
figure 4

ROC curves of the models. This plot illustrates the ROC curves of various methods, with the red curve representing biopsy pathology, the blue curve representing the MR model, and the green curve representing the combined model. The AUC of the biopsy pathology was 0.820 (95% CI 0.728, 0.912), and the MR model had an AUC of 0.830 (95% CI 0.743, 0.916), with no significant difference observed between the two methods (P = 0.884). However, the AUC of the combined model (0.915, 95% CI 0.849, 0.980) was significantly higher than that of the biopsy ISUP (P = 0.042) and the MR model (P = 0.031)

Table 4 Evaluation metrics of the different methods

Within each method, we observed that there were no statistically significant differences in the AUC values when comparing the lesions located in the PZ (AUCpz), TZ (AUCtz), and PZ + TZ (AUCpz+tz) groups (all P < 0.05). The specific comparison of AUC values can be found in Table 5.

Table 5 Comparison of the AUCs in PZ, TZ, and PZ&TZ lesions in different methods

In terms of overall performance, the AUC of the biopsy pathology was 0.820 (95% CI 0.728, 0.912), and the MR model had an AUC of 0.830 (95% CI 0.743, 0.916), with no significant difference observed between the two methods (P = 0.884). However, the AUC of the combined model (0.915, 95% CI 0.849, 0.980) was significantly higher than that of the biopsy ISUP (P = 0.042) and the MR model (P = 0.031). The results of DCA, presented in Fig. 5, indicated that the combined model was superior to the biopsy pathology and MR model for all risk thresholds from 0.5 to 1. To further illustrate the predictive efficacy of the best model, a nomogram was created and is shown in Fig. 6.

Fig. 5
figure 5

DCA curves of the models. This plot depicts the decision curve analysis of biopsy pathology (red), the MR model (blue), and the combined model (green), aiming to assess the clinical utility of these methods by analyzing the net benefit obtained from their use across various threshold probabilities. The “all” curve in the plot corresponds to the scenario where all patients are classified as positive (csPCa), irrespective of their actual diagnosis. Conversely, the “none” curve represents the scenario where no patients are classified as csPCa. The x-axis represents the threshold probability, which indicates the probability at which the methods are willing to act upon a positive prediction. The y-axis represents the net benefit gained from employing the models. By examining the decision curve plot, it can be concluded that the combined model outperformed both the biopsy pathology and MR model for all risk thresholds ranging from 0.5 to 1, indicating its superior clinical utility

Fig. 6
figure 6

Nomogram of the combined model. This plot provides a visual representation of the combined model, which serves as a graphical tool for predicting the probability of csPCa based on the predictor variables. It allows for the estimation of an individual’s csPCa probability by assigning numerical values to each predictor variable and summing up the total points. This nomogram utilizes the uppermost line as a reference for scoring points ranging from 0 to 100, corresponding to each predictor. Predictor variables, including biopsy pathology and AI-extracted image features (DWIlesion/prostate, ADClesion/prostate, DWIlesion, DWIprostate), are displayed below with bars indicating their relative weight. The sum of points can be checked on the “Points” line, and the corresponding probability of csPCa can be ascertained from the lowermost line

Discussion

The risk of PCa is stratified based on the ISUP grade group from pathology. Preoperative pathology is typically obtained through biopsy. However, discrepancies between biopsy pathology and post-operative pathology can result in under- or overestimation of prostate cancer risk levels [11]. In this study, we proposed that mpMRI can aid in the identification of csPCa in patients initially diagnosed with non-csPCa by biopsy. An MR model and a combined model were developed using mpMRI image features to predict the presence of csPCa in post-operation pathology. The efficacy of both models was compared to biopsy pathology alone. The results demonstrated that the combined model had a significantly higher AUC than both biopsy pathology and the MR model.

Currently, a biopsy is considered the gold standard for diagnosing prostate cancer. The EAU guidelines on prostate cancer recommend combining targeted biopsy (TB) with systematic biopsy (SB) as the first-line biopsy method in patients diagnosed with PCa with an abnormal MRI [12]. Prostate MRI utilizes the PI-RADS scoring system to categorize patients who are candidates for biopsy on a 1-to-5 risk scale for csPCa. The main objective of PI-RADS is to establish a standardized and consistent approach for evaluating prostate mpMRI scans in the detection of csPCa. As research on PI-RADS has advanced, it has become evident that higher PI-RADS scores correspond to an increased likelihood of csPCa. Nonetheless, there is a lack of sufficient research dedicated to reassessing PI-RADS scores in conjunction with pathological findings after biopsy. Further investigation is required in this area to ascertain the effectiveness of PI-RADS scoring in assessing csPCa in patients after biopsy.

Prostate MRI and related MRI-directed biopsies have been shown to be at least as diagnostically effective as systematic biopsies alone in diagnosing significant cancers. Studies have proven that the concordance between biopsy and prostatectomy grading was highest in combined biopsy (CB) but still with misdiagnosis of csPCa in 25% of men [13]. Thus, we suggest that mpMRI should be re-evaluated after biopsy to compensate for the limitations of biopsy pathology. In this study, we propose two models that might have potential for three applications in the future. The first application is in the initial biopsy for prostate cancer, where the MR model can determine whether another biopsy is necessary if the biopsy results are negative. If the MR model predicts a low likelihood of csPCa, observation may be an option, but if the MR model predicts a high likelihood of csPCa, another biopsy is recommended. The second application is in cases where the biopsy result is non-csPCa, where the combined model results can be used as a reference. If the combined model predicts a low likelihood of csPCa, conservative treatment may be appropriate, but if it predicts a high likelihood, more aggressive treatment is recommended. The third application is in active surveillance of PCa patients, where measures can be taken to monitor patients according to the results of the MR model or the combined model.

In the PI-RADS system, DWI and ADC image features are utilized for the detection of csPCa, including the typical observations of significantly high DWI signal and significantly low ADC values. In this study, we transformed these descriptive features into computed image feature values. For example, a higher DWI signal intensity (DWIlesion) and a higher contrast ratio between the DWI signal intensity of the lesion and the background signal intensity (DWIlesion/prostate) indicate a more prominent display of the lesion on the DWI image. Similarly, a lower ADC value (ADClesion) and a lower contrast ratio between the ADC value of the lesion and the background ADC value (ADClesion/prostate) result in a more distinct display of the lesion on the ADC map. These types of image features have been widely used in previous studies and have demonstrated their value in indicating tumor invasiveness. Previous studies have found that there is a negative correlation between DWI signal intensities, ADC values and Gleason score, indicating that higher DWI signal intensities and lower ADC values are associated with higher Gleason scores and more aggressive prostate cancer [14,15,16]. The researchers suggest that ADC values can be used as a non-invasive biomarker to aid in the diagnosis and management of prostate cancer.

However, manually calculated ADC values can vary due to several factors, such as differences in the region of interest (ROI) placement, differences in the b-values used for calculation, and differences in the software used for calculation [17]. Furthermore, there is no agreed ADC tumor cutoff value that could be reliably used to determine abnormally low ADC within a lesion [14, 15]. Therefore, the potential of DWI and ADC for evaluating the aggressiveness of PCa is limited to theoretical use and not practical application in clinical settings. Our study confirms the relationship between features in ADC and DWI images and the aggressiveness of PCa, which is similar to previous research. However, we have three additional advantages. First, we employed an AI model to automatically segment the suspected areas of PCa, which eliminates human intervention. This reduces the burden on doctors and guarantees the consistency of feature extraction [18]. Second, unlike in previous studies where AI models were mainly used for pre-biopsy diagnosis, the AI model in this study was utilized for post-biopsy re-evaluation. Third, we developed an objective prediction model based on a nomogram that outputs the probability of csPCa, which provides doctors with an intuitive reference. This model has the potential to be a valuable tool for urologists’ decision making once it has been fully validated [19].

Our study has some limitations. The first limitation of this study is that the data were collected from a single institution and were not obtained prospectively. This limits the generalizability of the study findings to other settings and populations, and the retrospective nature of the data collection can introduce bias and confounding factors. Thus, caution should be exercised when interpreting the results of this study, and further studies are needed to validate the findings in larger and more diverse populations. The second limitation of this study is that only patients who underwent radical prostatectomy were enrolled, as RP pathology was required for the analysis. However, in the broader clinical context, many patients may not be candidates for RP due to various reasons, such as advanced prostate cancer with no chance for curative surgery. Therefore, the generalizability of the study’s conclusions should be further assessed in patient populations that do not undergo RP. Future studies could include patients who undergo alternative treatments or active surveillance to assess the performance of the AI model in these populations. The study has a limitation in that only a limited number of clinical variables were included, which were age, PSA, and mpMRI. Other important clinical and imaging data, such as digital rectal examination, ultrasound, PET-CT, and prostate health index (PHI), were not taken into account. Therefore, incorporating a broader range of relevant data in future studies is necessary to enhance the precision and dependability of the prediction model.

In summary, AI-extracted image features from mpMRI images can accurately predict the aggressiveness of prostate cancer, similar to biopsy pathology. The accuracy of this prediction can be further improved by combining the AI-extracted mpMRI image features with biopsy pathology, which outperforms biopsy pathology alone. After further evaluation, this prediction model can be used for the re-evaluation of biopsy pathology and active surveillance of prostate cancer.