Background

Extended pelvic lymph node dissection (ePLND) or PLND is recommended in intermediate- and high-risk prostate cancer (PCa) patients when the estimated risk for positive lymph nodes (LNs) exceeds 5% according to the European Association of Urology (EAU) guidelines [1]. To date, PLND represents the most accurate staging procedure to assess the presence of pelvic lymph node metastasis (PLNM) in PCa patients [2]. However, PLND may be associated with a higher risk of complications, and PLND at radical prostatectomy (RP) is currently performed blindly, without knowledge of the presence of metastases [3, 4].

Studies have shown that the incidence of LN involvement in patients with PCa is relatively low in patients who have undergone PLND at RP. Yaxley et al. reported that the positive rate of PLNM was only 5.5% in a series of 1,180 patients treated with ePLND [5]. Another cohort of 19,633 patients who had undergone PLND reported 505 positive LNs, translating to 2.5% of this cohort having node metastases [6]. These findings indicate that many patients are overtreated and must bear substantial adverse effects, morbidity, and health care costs. Therefore, accurate detection and identification of the LN status noninvasively and preoperatively are essential and helpful for clinicians to determine whether to perform a PLND, as well as which postoperative adjuvant therapy to use.

Pelvic multiparametric magnetic resonance imaging (mpMRI) has been accepted as the first choice for prebiopsy imaging of PCa patients, enabling the synchronous diagnosis of local nodal staging (N0/N1) [7, 8]. Diffusion-weighted imaging (DWI) is widely used to differentiate metastatic LNs from nonmetastatic LNs in patients with PCa, and the apparent diffusion coefficient (ADC) value is a critical parameter to identify the presence or absence of PLNM [9, 10]. However, their limitations are also clear. The traditional size criterion for metastatic LN evaluation on DWI images is a short diameter greater than 8–10 mm, which is considered unsatisfactory because of hyperplastic and micrometastatic LNs [11]. The suboptimal accuracy of DWI and ADC in diagnosing nodal metastasis for PCa presents a crucial challenge and opportunity to develop more accurate diagnostic methods.

Radiomics is a rapidly evolving field of research concerned with the extraction of a set of quantitative imaging features that can predict nodule and tumor behavior noninvasively, thus potentially overcoming some human limitations in diagnostic accuracy [12, 13]. However, to date, the specific PLNM prediction using radiomics analysis based on MRI remains limited.

Additionally, traditional radiomics models require radiologists to manually draw the volumes of interest (VOI), a time-consuming and challenging process [14]. Advances in deep learning techniques have facilitated the development of processes for automated and accurate lesion segmentation on MRI images and that can be used as the input to develop a radiomics model [15, 16].

Therefore, in this study, we aimed (1) to establish and validate an LN radiomics model based on automatically segmented VOIs of pelvic LNs on DWI images for preoperative PLNM prediction and (2) to further explore the clinical value of the radiomics model compared with quantitative radiological features and radiologists.

Materials and methods

This retrospective study was approved by our institutional review board, and the requirement for informed consent was waived (2021-060).

Study sample

A total of 537 consecutive patients with pathologically confirmed PCa between January 2017 and June 2021 were identified for this study. The patient inclusion criteria were as follows: (1) patients who had undergone ePLND/PLND at RP; (2) patients with at least one pathologically confirmed PLNM; (3) patients without preoperative treatment and other coexisting malignancies; (4) patients who had undergone MRI examinations including DWI performed less than 30 days before RP. Patients were excluded for the following reasons: (1) patients without available clinical and pathological characteristics; (2) patients with poor image quality (images with motion or susceptibility artifacts).

The preoperative MR images of 1116 pathologically confirmed LNs from 84 PCa patients were finally enrolled (Fig. 1). We divided the subjects into a primary cohort (January 2017 to December 2020) and a held-out cohort (January 2021 to June 2021) at a ratio of 4:1 according to the MRI examination time. The primary cohort included 67 patients [median age (interquartile range): 68 (62, 74) years] with 908 pelvic LNs (positive LNM, n = 192; negative LNM, n = 716), and the held-out cohort included 17 patients [70 (65, 72) years] with 208 LNs (positive LNM, n = 43; negative LNM, n = 165).

Fig. 1
figure 1

The flow chart of patient enrollment. LNM: lymph node metastases. The clinicopathologic characteristics of the patients were obtained from the medical records, including age, the Gleason score, prostate-specific antigen (PSA) level, and clinical and pathological T stage. The preoperative radiological features of the pelvic LNs include the ADC value, LN size (the shortest and longest diameters), and LN volume

The clinicopathologic characteristics of the patients were obtained from the medical records, including age, the Gleason score, prostate-specific antigen (PSA) level, and clinical and pathological T stage. The preoperative radiological features of the pelvic LNs include the ADC value, LN size (the shortest and longest diameters), and LN volume. All the enrolled patients were anonymous.

MRI acquisition

To eliminate contents from the bowel, the patients were prepared to self-administer a cleansing enema (Folium Sennae) the day before their scheduled mpMRI. All the patients had undergone pretreatment mpMRI using one of two 3.0 T scanners (Achieva, Philips Health care, The Netherlands; Discovery HD 750, GE Healthcare, USA) and a 16-channel matrix torso coil. The standard mpMRI protocol at our institution included transverse T1-weighted images (T1WI), transverse T2-weighted images with fat suppression (fT2WI), dynamic contrast-enhanced (DCE) images and axial DWI with the reconstruction of ADC maps. DWI images with two b values (800 s/mm2 and 0 s/mm2) were obtained, and the ADC parameters were calculated and constructed based on the two b values. Detailed information on the DWI parameters is presented in Table 1.

Table 1 The detailed imaging parameters for diffusion-weighted imaging

Pelvic LN segmentation

Once an LN is visualized in the setting of a patient with PCa, many potentially useful features can be used to determine whether it is involved with metastases [17]. Therefore, we used a previously trained 3D U-Net segmentation model based on deep learning to automatically segment the visible pelvic LNs on DWI images[18]. The segmented LNs located in the presacral, common iliac, internal iliac, external iliac, and obturator fossa sites were selected and used for PLNM prediction. The quantitative radiological features, including the ADC value, volume, short diameter, long diameter, and short-to-long diameter ratio of the segmented pelvic LNs based on deep learning, were measured automatically.

Standard reference

The standard reference of the status of pelvic LN was a node-by-node correspondence between DWI and the final pathological assessment of ePLND/PLND. If not all the LNs were metastatic or nonmetastatic at one site, the location of the metastatic LN was discussed by one dedicated uropathologist and an expert radiologist. Any LNs without a confirmed pathological status were excluded.

Radiomics analysis

The radiomics analysis workflow included the following four steps: (1) feature extraction, (2) feature selection, (3) cross-validation and 4) predictive performance evaluation.

The radiomics features were extracted using the PyRadiomics package in Python. In total, 1070 radiomics features were extracted from each VOI, containing 840 texture features, 216 first-order statistical features, and 14 shape-based features (Additional file 1: S1 and Table S1).

The radiomics models were developed for the primary cohort according to the following 5 steps: (i) data balance (3 methods); (ii) data normalization (4 methods); (iii) dimension reduction (2 methods); (iv) feature selection (3 methods, each with the top 1–20 features); (v) classification (6 methods) (Additional file 1: Table S2). Finally, 8640 (3 × 4 × 2 × 3 × 20 × 6) models were built. Fivefold cross-validation was applied to the primary cohort for model training and validation, and the held-out cohort was used to investigate the predictive power of the radiomics models in metastasis prediction for each LN. All the work related to radiomics model building and testing was completed using Feature Explorer Pro (FAEPro, v0.3.4) in Python (v3.6.0) [19].

Development of PLNM prediction models

Model 1: quantitative radiological features of the LNs on DWI images

The difference in radiological features between PLNM and non-PLNM was first assessed by univariate analyses. Next, features with P < 0.05 in the univariate analyses were entered into multivariate analyses to build Model 1 using forward selection logistic regression for PLNM prediction.

Model 2: radiomics and radiological features based on DWI images

The radiomics model achieving the best performance among the 8640 built models in the fivefold cross-validation was finally selected as the best model for PLNM prediction. A combined radiomics model (Model 2) was developed based on the incorporation of radiomics features and quantitative radiological features using multivariable logistic regression analysis in the primary cohort.

Models 3 and 4: visual assessment of the radiologists

Two junior radiologists (with 5 and 6 years of reading experience, respectively) and two senior radiologists (with 10 and 15 years of reading experience, respectively) participated in reading the mpMR images. The consensus reached by the two junior radiologists to determine the metastatic status of LNs was recorded as the result of Model 3. Similarly, the consensus reached by the two senior radiologists was regarded as the result of Model 4. All the available imaging and clinical information were unblinded to the four radiologists.

Evaluation of the PLNM prediction models

The diagnostic performance of the models in the held-out cohort was divided into three parts: discrimination ability, clinical usefulness and clinical benefit. The discrimination performance was assessed using the receiver operating characteristic (ROC) curve, and the corresponding area under the curve (AUC), accuracy, sensitivity, specificity, positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were also calculated. Additionally, we divided the held-out cohort into two subgroups according to the short diameter of each LN—(a) LNs with a short diameter of ≤ 10 mm and (b) LNs with a short diameter of > 10 mm—to facilitate subgroup analysis of PLNM discrimination.

To provide a visualized and individual tool to predict the probability of PLNM, nomogram analysis was conducted based on Models 1 and 2. The C-index was calculated to assess the discrimination performance of the nomograms. The calibration curve was plotted to explore the consistency between the nomogram-predicted probability of PLNM and actual results accompanied by Hosmer–Lemeshow tests. Decision curve analysis (DCA) was adopted to assess the clinical benefits of the four models at a range of threshold probabilities [20].

Statistical analysis

After testing normality, the Mann–Whitney U test was used to assess the characteristic differences between patients in the primary and held-out cohorts (age, F-PSA, T-PSA, F/T PSA) and between LNs in the LNM and non-LNM groups (ADC value, LN volume, short diameter, long diameter, and short-to-long diameter ratio) using Statistical Package for Social Sciences, SPSS 19.0 (SPSS Inc., Chicago, IL, USA). ROC analyses were performed using MedCalc statistical software version 15.2.2 (MedCalc Software bvba, Ostend, Belgium), and multiple and pairwise comparisons of AUCs were achieved using the DeLong nonparametric approach. The nomogram and DCA were performed using R 3.5.1 (Comprehensive R Archive Network, www.r-project.org). A significant test statistic of the calibration curve implies that the models’ prediction does not match the observed outcome perfectly [21]. The level of statistical significance was set at P < 0.05.

Results

Patients and LN characteristics

The statistical and clinical characteristics of the patients and pelvic LNs are summarized in Table 2. The prevalence rates of PLNM and non-PLNM were 26.82% (192/716) in the primary cohort and 26.06% (43/165) in the held-out cohort. No significant differences were found between the primary and held-out cohorts regarding age and the PSA level (including T-PSA, F-PSA, and F/T PSA). The median ADC value of the metastatic LNs was significantly lower than that of the nonmetastatic LNs in both the primary and held-out cohorts (primary cohort: 1.15 × 10–3 mm2/s vs. 1.54 × 10–3 mm2/s, P < 0.001; held-out cohort: 1.04 × 10–3 mm2/s vs. 1.38 × 10–3 mm2/s, P < 0.001). The short-to-long diameter ratio showed no significant difference between metastatic LNs and nonmetastatic LNs (primary cohort: 0.54 vs. 0.57, P = 0.632; 0.56 vs. 0.57, P = 0.660).

Table 2 Characteristics of patients in the primary and held-out cohorts

Performance of pelvic LN segmentation

The Dice scores between manual and automated LNs segmentation in the primary and held-out cohort were 0.85 ± 0.09 and 0.84 ± 0.07, respectively (P = 0.08). The Dice score distributions in the primary and held-out cohort are shown in Fig. 2. No significant differences were found between the Dice scores of metastatic LNs and non-metastatic LNs in both primary and held-out cohorts (P = 0.124, 0.09) according to the notched box plots.

Fig. 2
figure 2

Notched box plots of the Dice scores in the primary and held-out cohort. a Dice scores in the primary cohort; b Dice scores in the held-out cohort

Agreement of quantitative radiological characteristics

The agreement between the automatically segmented LNs and manually segmented LNs in terms of these quantitative radiological characteristics (ADC value, LN volume, short diameter, long diameter, and short-to-long diameter ratio) is shown in Fig. 3. The Bland–Altman analysis of the radiological features showed good consistency between the automated segmentation and manual annotation in the held-out cohort, and most values were within the consistency interval.

Fig. 3
figure 3

Agreement between the automatically segmented and manually segmented lymph nodes. a ADC values; b volume; c short diameter; d long diameter; e short-to-long diameter ratio

Construction of Model 1 based on the quantitative radiological features

On univariate analysis, the radiological features of the pelvic LNs were compared and are summarized in Table 2. The ADC value, LN volume, short diameter and long diameter showed significant differences between the metastatic and nonmetastatic LNs and were then used to establish a radiological model to discriminate the status of the LNs using multivariate logistic analysis (Table 3). The equation to calculate the probability of metastases was generated as follows:

Table 3 Independent predictors of metastases in the Model 1 and Model 2
$$x=-1.962 -0.01\times ADC value+0.002\times LN volume+0.264\times LN short diameter-0.137\times LN long diameter$$

Construction of Model 2 based on the radiomics and radiological features

The construction pipeline of the best model in the primary cohort was as follows: data balance: downsampling; data normalization: Z score; dimension reduction: Pearson correlation coefficient (PCC); feature selection: analysis of variance (ANOVA); classification: least absolute shrinkage and selection operator (LASSO). The detailed interpretation of the pipeline is shown in Additional file 1: S2.

Among the 1070 extracted radiomics features, after dimension reduction and feature selection, the top 10 best features for the model were selected for modeling using feature selectors (Table 4). The LN Rad-score was calculated by summing the selected features weighted by their coefficients. Multivariate logistic regression analysis of the combined radiomics model showed that a short diameter and LN Rad-score were significant risk factors for PLNM prediction (Table 3). The equation to calculate the probability of metastases was generated as follows:

Table 4 Key radiomics features and their coefficient
$$x=-1.439 -0.405\times LN Rad score+0.215\times LN short diameter$$

PLNM discrimination performance of the prediction models

The four models yielded AUCs of 0.89 (95% CI 0.85–0.94), 0.90 (95% CI 0.86–0.94), 0.71 (95% CI 0.69–0.78) and 0.78 (95% CI 0.77–0.88) in the held-out cohort for PLNM prediction (Table 5, Fig. 4a). Subgroup analysis (Fig. 4b, c) for PLNM prediction in LNs with short diameters ≤ 10 mm showed that Model 2 achieved the highest AUC (0.83; 95% CI 0.85–0.94) compared with the other models. Models 1 and 2 achieved high AUC values for PLNM prediction in the subgroup of LNs with short diameters > 10 mm, and both AUC values were significantly higher than that for Model 3 (Model 1 vs. Model 3: 0.91 vs. 0.73, P = 0.001; Model 2 vs. Model 3: 0.92 vs. 0.73, P = 0.001) according to the DeLong test (Fig. 4d–f). Example figures of PLNM prediction on different models are shown in Fig. 5.

Table 5 The discrimination performance of models for PLNM prediction in the held-out cohort
Fig. 4
figure 4

ROC curves and Delong test of the four models. a ROC curves of the four models in the held-out cohort, b LNs with short diameter ≤ 10 mm of the held-out cohort, and c LNs with short diameter > 10 mm of the held-out cohort. d Delong test of the four models in the held-out cohort, e LNs with short diameter ≤ 10 mm of the held-out cohort, and f LNs with short diameter > 10 mm of the held-out cohort

Fig. 5
figure 5

Examples for comparison of discrimination ability on PLNM. The red arrows point to the target LNs. LN: lymph node; PLND: pelvic lymph node dissection; PLNM: pelvic lymph node metastasis

In addition, at the patient level, Model 1, Model 2 and Model 4 discriminated all 17 patients as positive PLNM in the held-out cohort (sensitivity: 100%), while two patients with positive PLNM were wrongly taken as negative by Model 1 (sensitivity: 88.24%).

Clinical usefulness of Model 1 and Model 2

The nomograms of Models 1 and 2 yielded C-index values of 0.804 and 0.910, respectively, in the held-out cohort. The nomograms and calibration curves of Models 1 and 2 are shown in Fig. 6. The Hosmer–Lemeshow test showed a nonsignificant statistic (P = 0.075 and P = 0.088, respectively) for Models 1 and 2, demonstrating no significant deviation between the calibration curve and a perfect fit for PLNM prediction.

Fig. 6
figure 6

Nomograms and calibration curves in the held-out cohort. a The radiological nomogram of Model 1 integrated only quantitative radiological factors. b The radiomics nomogram of Model 2 integrated radiological factors with the radiomics signature. c The calibration curve of Model 1 based on the quantitative radiological features. d The calibration curve of Model 2 based on the radiomics and radiological features

Clinical benefit of the prediction models

The DCA curves of the four models for PLNM prediction are presented in Fig. 7. All the models obtained higher net benefits than the PLNM-all or PLNM-none protocol in different ranges of threshold probabilities. If the risk threshold probability is set over 35%, Models 1 and 2 have more advantages to predict PLNM than Models 3 and 4.

Fig. 7
figure 7

Decision curve analysis comparing the net benefits of the four model

Discussion

In this retrospective study, we constructed two prediction models for PLNM based on automatic LN segmentation with quantitative radiological LN features alone (Model 1: ADC value, LN volume, short diameter and long diameter) and the combination of quantitative radiological features and radiomics signatures (Model 2: short diameter and LN Rad-score) via multiple logistic regression. Our results showed no significant difference between the AUCs of Models 1 and 2 [0.89 (95% CI 0.85, 0.94) vs. 0.90 (95% CI 0.85, 0.94), P = 0.573) in the held-out cohort. Considering that the size, particularly the short diameter, of the LNs is an important factor influencing the performance of the model for LN prediction, all the LNs were divided into two subgroups according to a threshold of 10 mm [17, 22]. Regarding the subgroup of LNs with a short diameter ≤ 10 mm of the held-out cohort, Model 2 showed a higher AUC than Model 1 [0.83 (95% CI 0.76, 0.89) vs. 0.78 (95% CI 0.70, 0.84), P = 0.048]. Therefore, the prediction model of the combination of the radiomics signature and radiological features demonstrated better predictive efficacy than the radiological factors alone, indicating that the predictive model could be a better tool to predict PLNM in PCa patients.

MRI has been widely used as a noninvasive imaging modality to evaluate the LN status with mixed results because the diagnostic accuracy of LN metastases depends largely on the level and experience of the radiologists [23]. As a functional imaging technique, DWI enables the noninvasive characterization of biological tissues based on the random translational molecular motion of water molecules. The degree of diffusion restriction can be quantitatively expressed by calculating ADC maps, allowing tissue characterization [10, 24]. Several studies have reported significant differences between the ADC values in benign and malignant LNs, yielding high accuracy to differentiate between malignant and benign LNs [25, 26]. Fewer promising results were shown in other studies. For example, Thoeny et al. reported nonsignificant findings in the ADC values of metastatic and benign LNs, which were (0.94 ± 0.18) × 10−3 mm2/s and (1.01 ± 0.28) × 10−3 mm2/s, respectively [27]. Therefore, although a trend exists toward a lower mean ADC value in metastatic LNs, the role of ADC in the assessment of nodal status remains debatable.

Size is usually considered a fundamental criterion to diagnose nodal metastases. Various benign LN sizes can substantially overlap with the size of metastatic nodes, resulting in the suboptimal sensitivity and specificity profile of size criteria [28]. The shape of LNs can also be a helpful diagnostic feature. A normal LN has a fatty hilum and is an oblong kidney-bean-shaped structure, and a higher short-to-long-axis ratio (rounder than oblong) is more likely to be malignant [17]. In a study of patients with cervical cancer, the short-to-long ratio was not found to be a significant factor to differentiate metastatic and nonmetastatic LNs [29]. Given the limited accuracy of any of these features considered alone, using a combination of size, shape and ADC value criteria together seems prudent.

In this study, we built a multivariate model based on quantitative radiological features for PLNM prediction. Patterns of the LN short diameter, long diameter, ADC value and volume were finally included through multivariate logistic analysis. The AUC value of the model was 0.89 (95% CI 0.85, 0.94) in the held-out cohort but was relatively lower in the subgroup of LNs with short diameters ≤ 10 mm [0.78 (95% CI 0.70, 0.84)]. These normal-sized LNs represent the most challenging problem in clinical practice. Therefore, the diagnostic performance of normal-sized LNs must be improved.

Considering that radiomics features comprise quantitative and detailed information in multiple dimensions and could reflect the heterogeneity and biological behavior of metastases [30], we hypothesized that radiomics could provide more information to distinguish metastases that are challenging for traditional radiologic interpretations. In this study, ten radiomics features were included—2 shape features, 5 Gy level cooccurrence matrix (GLCM) features, 1 Gy level size zone matrix (GLSZM), and 2 Gy level dependence matrix (GLDM) features. Only features closely correlated with the LN status were selected for redundancy elimination by narrowing their regression coefficients based on the widely used LASSO algorithm. This proposed algorithm could identify wide associations between the extracted radiomics data and construct a robust radiomics signature with a panel of selected steady variables. This approach not only effectively identifies important radiomics features but also avoids the overfitting problem for the classification task [31, 32].

In this research, the model with the highest AUC value for PLNM prediction was finally regarded as the best model. Downsampling was selected as the method of data balance in the pipeline of the best model, which eliminated the impact of data imbalance on model development by by sampling the signal at a lower rate [33]. PCC and ANOVA were used to reduce the dimensions of the feature matrix and to select the best features, respectively [34]. LASSO is a popular penalized regression method that minimizes the residual sum of squares and places a bound on the sum of the absolute value of the coefficients [35].

Previous studies on PLNM prediction have shown that radiomics features combined with clinical and/or radiological features enable superior prediction ability [29, 36]. Therefore, in this study, Model 2 was constructed by combining the radiological features and the LN Rad-score using multivariate logistic regression. Because some radiological features might be covered by the radiomics signatures, radiological factors such as the ADC value and LN volume were no longer independent predictors when multifactor regression was performed.

Our results showed that Model 2 improved the discrimination ability of PLNM compared with Model 1, particularly for normal-sized LNs. An explanation might be that normal-sized metastatic LNs have a higher probability of being in the early phase, and the metastatic foci inside the LNs may be very tiny or even at the cellular level. These changes may be challenging to detect by MRI. Additionally, the high C-index of the nomogram analysis for Model 2 further confirmed its reliability and clinical usage. This radiomics nomogram model is promising as a visualized and easy-to-use tool for preoperative PLNM and helps clinically make individualized treatment decisions in patients with PCa.

Several studies have demonstrated the potential benefit of radiomics nomograms to predict PLNM in PCa [37, 38]. Almost all of these studies were conducted at the patient level. The main reason is that a node-by-node correspondence between DWI and histopathology could be challenging to obtain. In this study, we applied carefully matched one-to-one MR-pathologically confirmed LNs as the standard reference for PLNM prediction according to the ePLND/PLND results. All the labeled LNs were within the dissection region, and LNs with an uncertain status were neglected.

Additionally, comparing the visual assessments of the junior radiologists (Model 3) and senior radiologists (Model 4), we found that the AUC value of Model 2 was superior to that of the junior radiologists [AUC: 0.90 (95% CI 0.85, 0.94) vs. 0.71 (95% CI 0.69, 0.78), P = 0.001] and equivalent to that of senior radiologists [AUC: 0.90 (95% CI 0.85, 0.94) vs. 0.78 (95% CI 0.77, 0.88), P = 0.061] in the held-out cohort. For the subgroup of LNs with short diameters ≤ 10 mm of the held-out cohort, Model 2 achieved significantly higher discrimination ability than senior radiologists [AUC: 0.83 (95% CI 0.76, 0.89) vs. 0.74 (95% CI 0.66, 0.88), P = 0.048], demonstrating that our proposed radiomics approach could also promisingly provide an outperformed prediction performance for PLNM compared with the visual assessments of the radiologists.

Some practical issues should be considered when applying radiomics in the clinic, such as time and labor resources. VOI acquisition is a critical but time-consuming job that usually presents an obstacle for radiologists to perform radiomics analysis [39]. In this study, we applied a pretrained deep learning approach that enables rapid and accurate detection and segmentation of LNs on DWI images in the setting of PCa N-staging. Based on automatic nodal staging, Model 2 achieved excellent performance in PLNM prediction concerning accuracy, sensitivity, and specificity.

We acknowledge limitations to our study. First, this study was retrospective with limited sample size, and extending the primary cohort to more patients might further promote the performance of the radiomics model for clinical application. Second, in this study, all the data in the primary and held-put cohorts were collected from a single institution, therefore the application and performance of the model in other institutions remains unclear until now. Multi-institution validation is vital for model application in practice. Third, we did not compare or combine our results with radiomics analysis of other MRI sequences, such as T2WI or DCE, which might improve the diagnostic efficiency. We will compare the diagnostic efficiency in future research. Fourth, the sample size of positive LNM and negative LNM was imbalanced in the primary cohort (192:716). Although preprocessed down-sampling was performed for data balance, the influence of this operation on the diagnostic performance was not ensured. Finally, deep learning models have been shown to perform relatively well in many tasks and might outperform radiomics models [40]. Developing a prediction model based on deep learning and comparing its performance with the current study may be necessary for the future.

In conclusion, the noninvasive LN radiomics model based on the quantitative radiological features and radiomics signature in our study can achieve accurate PLNM prediction based on DWI preoperatively. A key advantage of this study is indicated by the result that the combined radiomics model has more predictive efficacy than senior radiologists for differentiating malignant and benign normal-sized PLNM, a finding that could be helpful for patients with PCa to optimize decision-making and adjust adjuvant treatments.