Introduction

Accurate primary staging of prostate cancer (PCa) is important for individualized treatment planning. Current guidelines recommend a bone scan and an abdominopelvic computed tomography (CT) or magnetic resonance imaging (MRI) for non-invasive initial staging [1, 2]. However, novel and potentially more reliable diagnostic procedures are evolving rapidly [3]. [68Ga]Gallium-Prostate-specific membrane antigen 11 positron-emission tomography (PET) CT or MRI (further referred to as “PSMA PET” showed a promising diagnostic accuracy for primary staging [4-9].

Despite recent advances in imaging, pelvic lymph node dissection (PLND) during radical prostatectomy (RP) represents the gold standard for nodal staging in PCa. However, the therapeutic and prognostic benefits of extended PLND (ePLND) and PLND still remain controversial [10-13]. PSMA-PET might impact the indication for PLND and its extent but the oncologic benefit is not yet known [14]. The European Association of Urology (EAU) recommended in 2022 ePLND in patients with a risk of lymph node invasion (LNI)  ≥ 7% using the Briganti 2019 nomogram [15]. However, since up to 20% of the patients suffer a complication after PLND, there is a strong need to improve patient selection for PLND [16].

More recently, incorporation of quantitative imaging data from mpMRI or PSMA PET has been proposed to further improve LNI prediction [17-20]. However, there is still only limited data especially on the added value of PSMA PET for LNI prediction.

We aimed to develop and externally evaluate a prediction model using a combination of clinical and qualitative/quantitative information from PSMA PET/CT for prediction of LNI at RP in patients with intermediate to high-risk PCa.

Material and methods

Study design

This is a retrospective, dual-center study reported according to the current guidelines [21]. We developed three LNI prediction models in a training cohort, selected the best performing model (including a clinically meaningful risk threshold), and applied it to a validation cohort. The performance of the model was compared to the performance of available LNI prediction models [17, 18, 22-25-]

Source of data and study population

Patient data from two tertiary referral centers served as data source for the training cohort (Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany) and the validation cohort (University Hospital Zurich, University of Zurich, Zurich, Switzerland).

For the training cohort, the retrospective analysis was approved by the Ethics Committee of the Technical University Munich (permit 5665/13). For the validation cohort, all patients gave a general written informed consent for retrospective use of their data (Ethics Commission of the Canton of Zurich, Switzerland, BASEC Nr. 2018-01284).

We used pre-existing cohorts at both centers that were collected for works on T- and N staging in PCa regarding the training cohort and patient selection for ePLND in the validation cohort, respectively. We extended these cohorts with consecutive new patients. In both cohorts, consecutive PCa patients with histologically proven (D’Amico criteria) intermediate or high-risk PCa who underwent PSMA PET for primary staging followed by RP and ePLND were retrospectively identified (training cohort n = 192 between January 2013 and June 2017, validation cohort n = 96 between April 2016 and July 2019). Patients with missing biopsy data (training cohort n = 19) and without written consent for retrospective use of their data (validation cohort n = 6) were excluded, leading to a final training cohort of 173 patients and validation cohort of 90 patients. Ninety-four of 173 patients of the training cohort were part of published works on T- and N staging in PCa patients [6, 8]. Sixty of 90 patients of the validation cohort were part of a published work on patient selection for ePLND in PCa [19].

In the training cohort, all included patients underwent mpMRI-targeted /standard 12-core biopsy followed by PSMA PET and RP with ePLND. ePLND was performed according to a predefined template including bilateral, separate dissection of the obturator fossa, external iliac, internal iliac, and common iliac vessels with the femoral canal and the aortic bifurcation as proximal and distal limits, respectively.

In the validation cohort, all patients with intermediate and high-risk PCa underwent a combined mpMRI-targeted/saturation biopsy (min. 40 cores) followed by PSMA PET and RP with ePLND. ePLND was performed as previously reported [26].

Because of the different biopsy approaches, difference in pathological upgrading at RP between the two cohorts was assessed.

In both cohorts, patients underwent PSMA PET according to standard procedure guidelines and no therapy has been initiated between PET and RP [8, 19].

Outcome

The predicted outcome was LNI at RP with ePLND. In the training cohort, RP was performed mainly open or robotic (< 10%) with ePLND to a template of 8 predefined anatomical fields. For the validation cohort, all surgical procedures were performed via robot-assisted transperitoneal laparoscopic RP with ePLND as described earlier [26]. In both cohorts, the removed LN were assessed for LNI by specialized uropathologists.

Predictors

The following data was collected for each patient: Clinical parameters: age [years], PSA value at the time of PSMA PET [µg/l], highest WHO/International Society of Urological Pathology (WHO/ISUP) grade group (grade groups 1–5) [27] at systematic/targeted biopsy; Quantitative 68Ga-PSMA-11-PET parameters of the primary tumor of the prostate: maximum standard uptake value (SUVmax), volume-based PSMA PET parameters were assessed using an absolute cut-off at SUV ≥ 4, yielding PSMA positive volume (PSMAvol, [cm3]), and total PSMA accumulation (PSMAtotal = PSMAvol x SUVmean) as described earlier [19]. Qualitative PET information: the conclusion of the interpreting physician regarding LNI, i.e., PSMA PET report N-status (0, LNI negative; 0.5, equivocal for LNI; 1, LNI positive, unitless) according to Fanti et al. [28]. Since PET reporting was not standardized during the inclusion time, all PSMA PET at both centers were reassessed by two nuclear medicine physicians in consensus and blinded regarding the outcome.

Data for comparison with published models

Additional data was extracted to compare the model’s performance with published models in the validation cohort as listed in Supplemental Table S1.

Missing data

Cases with missing data were omitted (i.e., complete-case analysis).

Model development and selection

We developed three models for LNI prediction. In the first model, we combined all clinical and quantitative PET information (Model_Clinical_PET). In the second model, we added the PSMA PET report N-status as an additional variable to the first model (Model_Clinical_PET_Report). In the third model, we added the PSMA PET report N-status to the first model as a combined (ensemble) model (Model_Clinical_PET/Report) [29]. WHO/ISUP grade groups (i.e., grade groups 1–5) and the PSMA PET report N-status were treated as continuous predictor. In an internal validation, we assessed the model’s discrimination ability. We chose the model with the highest internal discrimination ability for external validation. A probability threshold for clinical application was estimated from the training cohort and applied to the validation cohort.

External validation

In the validation cohort, we assessed the model’s performance in terms of model calibration, model discrimination ability, and clinical application.

Model comparison with published prediction models

In the validation cohort, the selected PSMA PET model was compared with six prediction models in clinical use (mpMRI-based models, 2019 Briganti nomogram [17], Draulans et al. nomogram [18]; conventional models, MSKCC Pre-Radical Prostatectomy nomogram [22], the updated Partin tables (v.2016) [23], the Roach formula [24], and the Winter nomogram [25]). For the probability of LNI of the MSKCC Pre-Radical Prostatectomy nomogram, we used the model properties published online (https://www.mskcc.org/nomograms/prostate/pre_op/coefficients, Model N 6599/11816, updated 01/2020). For all other prediction models, the probability for LNI was calculated using the published model formulas. The final selected model was compared to the published prediction models regarding calibration, discrimination, and clinical application.

Statistical analysis

The patient’s characteristics were summarized using the mean, median, standard deviation, and interquartile range (IQR), as appropriate. Comparison of patients’ characteristics was conducted by a two-sample t-test or Mann-Whitney U Test for continuous variables and χ2 test for categorical variables.

Predictors were investigated for linearity/multicollinearity using scatter plots/generalized-variance-inflation calculations, respectively. We used a multivariable logistic regression model for the Model_Clinical_PET and Model_Clinical_PET_Report model, and two separate logistic regression models for the ensemble model (Model_Clinical_PET/Report) that were averaged using weights that were optimized using nonlinear optimization [29, 30]. Model calibration was assessed at mean, weak, and moderate level including the Brier score and Spiegelhalter’s z [31]. Discrimination ability was assessed using AUC and clinical application using decision curve analysis, (DCA) respectively. Combined model calibration and discrimination was assessed using the index of prediction accuracy (IPA) [32]. AUC were compared using the Delong method [33]. For the internal validation, we applied a 10-times repeated tenfold cross-validation.

The probability threshold for the final model was selected using a 10-times repeated tenfold cross-validated DCA in the training cohort and by trying to match the reported spared ePLND (65.5%) and missed LNI (12.2%) for the 5% threshold for the 2012 Briganti model development [34].

For the external validation, the final model was trained on the training cohort and was applied to the validation cohort.

A 2-tailed P value of  < 0.05 was used to determine the statistical significance. We performed all statistical analysis in R version 4.0.5 (R Core Team (2021) R: A language and environment for statistical computing, Vienna, Austria).

Results

Patients’ characteristics and qualitative PSMA-11 PET performance

The data assembly process is demonstrated in Fig. 1. A total of 173 patients were available for the training cohort and 90 patients for the validation cohort. Table 1 lists all patient’s characteristics. The patients’ age was significantly higher in the training cohort (mean age 71.2 vs. 64.7 years, t = -7.34, df = 212.84, P < 0.001). Furthermore, the biopsy WHO/ISUP grade group distribution differed between the two cohorts with more ISUP grade 1 biopsies in the training group (ISUP grade 1, 8 vs. 0%, χ2 = 11.196, P = 0.02). There was no significant difference regarding pathological upgrading after RP (17 vs. 16%, χ2 = 0.0376, P = 0.85). The number of removed lymph nodes during ePLND did not differ between the training and validation cohort (mean 24.1 vs. 23.7, P = 0.76).

Fig. 1
figure 1

Flow-chart of the ensemble (combined) model (Model_Clinical_PET/Report). The input (predictors) is processed within two separate logistic regression models and combined using averaged weights that were optimized using nonlinear optimization

Table 1 Patient characteristics

Eighteen patients of the validation cohort had missing data concerning the 2019 Briganti model and 4 patients for the Draulans et al. model. Patient characteristics of the complete case cohorts of these models are demonstrated in the Supplementary Table S2.

Supplementary Table S3 lists the qualitative PSMA PET results.

Model development and selection

Age and SUVmax showed a non-linear relationship with the logit of the outcome and PSMAtotal showed the highest collinearity. Therefore, we excluded these predictors. The AUC for predicting LNI was consistently high with all three models during internal validation (Model_Clinical_PET 0.721 (CI 0.694-0.747) (ISUP, PSA, PSMAvol as predictors), Model_Clinical_PET_Report 0.816 (CI 0.791-0.841) (ISUP, PSA, PSMAvol, PSMA PET report N-status as predictors), and Model_Clinical_PET/Report 0.842 (CI 0.82–0.865) (ISUP, PSA, PSMAvol, PSMA PET report N-status as predictors, combined in two models). Supplemental Table S4 lists the full model specifications. Model_Clinical_PET/Report (Fig. 1, Supplemental Table S4) showed the highest internally validated AUC (0.842 CI 0.82–0.865) and was selected for further analysis. The internally cross-validated DCA of this model showed a better net benefit (NB) than either the treatment or no treatment schemes when the threshold probability was  ≥ 0.15 (Supplementary Fig. 1). A threshold probability of  ≥ 17% with an estimated spared ePLND of 54.3% and missed LNI of 19.1% fitted best the reported corresponding values of the 5% threshold of the 2012 Briganti model (spared ePLND of 65.5% and missed LNI of 12.2%) and was chosen as threshold for external validation.

External validation

In the external validation, the ensemble model Model_Clinical_PET/Report showed good calibration-in-the-large (event rate = 0.30, average predicted risk = 0.28) and there was no evidence of systematic over- or underfitting (Intercept = -0.297, Slope = 1.095, p = 0.41; Brier score 0.12, Spiegelhalter’s z -0.89). However, calibration curve showed an overestimation of the risk of LNI among patients with observed LNI probability below 0.22 and underestimated the risk of LNI among patients with observed LNI probability above 0.22 (Fig. 2). The model showed a high discrimination ability for LNI (AUC 0.923, 95% CI 0.863–0.984) (Table 2).

Fig. 2
figure 2

Model calibration plots of predicted probability versus observed probability of lymph node involvement for (A) the proposed model (Model_Clinical_PET/Report), (B) the MSKCC Pre-Radical Prostatectomy nomogram, (C) the updated Partin tables (v.2016), (D) the Roach formula, and (E) the Winter nomogram. The dotted lines represent the LOESS fit, the dashed lines represent a straight fit. The black bars denote the distribution of predicted probabilities

Table 2 Area under the curve of the proposed model and other lymph node invasion prediction models in the validation cohort

Table 3 lists the results of the model application to the validation cohort according to thresholds between a predicted probability of LNI of 0 and 0.30. By using the previously estimated cut-off of  ≥ 0.17, 45 ePLNDs (45/90, 50%) would have been avoided, 44 of them in patients without LNI (44/69, 63.8%) and one in a patient with LNI (1/21, 4.8%), respectively.

Table 3 Analyses of the proposed model-derived (Model_Clinical_PET/Report) cut-offs used to discriminate between patients with or without lymph node involvement confirmed at extended pelvic lymph node dissection

Model comparison with published prediction models

The proposed model showed similar calibration compared to the conventional and mpMRI-based models. Calibration curves/calibration characteristics are depicted in Fig. 2A-E/Supplementary Table S3 and Supplementary Fig. 2A-D/Supplementary Table S5/6 for comparison with conventional and mpMRI-based LNI prediction models, respectively.

The proposed model showed significantly higher discrimination (AUC 0.923, 95% CI 0.863–0.984) compared to all conventional prediction models except the MSKCC model (AUC 0.824, 95% CI 0.710–0.938), and non-significant higher AUC compared to the mpMRI-based LNI prediction models (Table 2).

The proposed model showed higher combined discrimination and calibration (IPA 0.35) compared to the conventional models and a combined discrimination and calibration higher than the Draulans et al. model (IPA 0.31) and lower than the Briganti 2019 model (IPA 0.37). All IPA values are depicted in Table S5/6.

DCA revealed a high NB (0.165) of the proposed model compared to the treat-all strategy at the proposed threshold of  ≥ 17% (Fig. 3). Of the conventional models, only the MSKCC Pre-Radical Prostatectomy nomogram showed higher NB compared to the treat-all strategy at the recommended threshold of 5% (Fig. 3A). Of the mpMRI-based models, the 2019 Briganti nomogram showed a lower NB compared to the treat-all strategy at the recommended threshold of 7% and the Draulans et al. nomogram (no recommended threshold available) showed a consistently better NB compared to the treat-all strategy at a threshold of 7% (NB 0.189) and above (Fig. 3B, C).

Fig. 3
figure 3

Decision curve analysis (DCA) of the proposed model (M_clinical_PET/Report) compared to (A) conventional nomograms, (B) the 2019 Briganti nomogram, and (C) the Draulans et al. nomogram. The DCA depicts the net benefit (NB, y-axis) of a model or a strategy (treat-all or treat-none with ePLND) according to a risk threshold (x-axis). Of the conventional models (A), only the MSKCC model showed a higher NB compared to the treat-all strategy at the recommended threshold of 5%. However, at this threshold, the net benefit of the MSKCC nomogram (0.195) was only slightly above the treat-all strategy (0.193), meaning that one can perform 195-192 = 3 more beneficial ePLND (out of 1000 patients) when using the MSKCC nomogram). At a threshold of  ≥ 17%, the net benefit was 0.165 for the proposed model and 0.076 for the treat-all strategy, meaning that one can perform 165-76 = 89 more beneficial ePLND (out of 1000 patients) when using the proposed model. Of the mpMRI-based models (B, C), the 2019 Briganti nomogram (B) showed a lower net benefit compared to the treat-all strategy at the recommended threshold of 7% and the Draulans et al. nomogram (C, no recommended threshold available) showed a consistently better net benefit (0.189) compared to the treat-all strategy at a threshold of 7% and above

Discussion

In this study, we developed and externally evaluated a multivariable prediction model including quantitative and qualitative information from PSMA PET for predicting LNI at RP with ePLND as reference and assessed its performance against conventional and mpMRI-based prediction models. Our results demonstrated that a model including imaging parameter from PSMA PET might improve models that are solely based clinical parameters. This is consistent with previous reports that assessed the inclusion of mpMRI parameters into prediction models for LNI [17, 18]. Our study introduces an innovative approach to predict LNI by combining PSMA PET reporting by a nuclear medicine physician with readily available quantitative PSMA PET parameters and clinical parameters. A web-calculator to determine the LNI probability according to the proposed model is available under https://psma-pet.com/predict (this calculator should only be used for research purposes).

Our ensemble model incorporating PSA, highest biopsy ISUP, PSMAVol and the PSMA PET report N-status yielded high sensitivity (0.95) and moderate specificity (0.64) for LNI detection at the proposed threshold of  ≥ 0.17. These results are comparable with the external validation of the 2019 Briganti nomogram (sensitivity of 0.97 and specificity of 0.61) [35]. Our results suggest that especially incorporating the LNI status of the imaging report may improve prediction models, which is in line with previous reports regarding esophageal cancer [36]. We suppose that a combination of predictors from both, mpMRI and PSMA PET might be of value in LNI prediction.

The proposed threshold (≥ 17%) appears rather high compared to recommended cut-offs of 5% for conventional nomograms and 7% for the 2019 Briganti nomogram. However, the external validation shows that this threshold led to reliable results (50% of ePLND spared, with a risk of missing only 4.8% LNI) despite very probable differences between the training and validation cohort (e.g., calibration of the PET scanner). Moreover, a threshold of  ≥ 19% in the validation cohort would have yielded an even higher NB (0.185) and would have spared more ePLND (67.8%), with a risk of missing 9.5 LNI. However, the optimal recommended threshold should also be based on clinical reason, and it is questionable if a potentially higher number of missed LNI is clinically acceptable. Therefore, we suppose that the model’s calibration and the threshold should be further investigated in a larger external cohort.

The potential of imaging variables for predicting LNI in PCa has been reported almost 20 years ago using neural networks [37]. Recently, advanced machine learning algorithms have been reported for LNI prediction in PCa [38, 39]. Cysouw et al. reported an internally validated AUC of 0.86 for LNI in intermediate- to high-risk PCa using PSMA PET radiomics [38]. We think that the sophisticated application of more complex models hinders its transition to clinical practice.

Our study has several limitations. Cases for which ePLND was not performed were not included, causing a selection bias. However, all LNI prediction models, which we used for comparison, are also based on PLND. Our proposed model is based on [68Ga]Ga-PSMA-11 PET, which is costly, not yet a standard procedure at many institutions and must be interpreted with care to avoid false positive findings. However, it may be included into widescale practice soon. Moreover, the proposed model is based on [68Ga]Ga-PSMA-11 and we did not assess its performance with other PSMA tracers. Moreover, we did not investigate potential bias introduced by different PET scanner and both, the training and the validation cohort were of relatively small sample size, which might have led to bias and may limit the generalizability of our results. The two cohorts differentiated regarding preoperative biopsy (mpMRI-targeted with saturation biopsy versus mpMRI-targeted / standard 12-core biopsy) and surgical approach for RP; however, we did not find a significant difference regarding pathological upgrading after RP, number of removed lymph nodes or LNI rate, respectively. Lastly, because of missing data, we could not directly compare the two mpMRI-based models.

Our results indicate that combining clinical and qualitative/quantitative 68Ga-PSMA-11 information may improve LNI prediction in intermediate to high-risk PCa patients undergoing primary staging. The proposed model with a  ≥ 17% threshold yielded a good performance compared to conventional and mpMRI-based models, sparing half of all ePLNDs with a risk of missing only  < 5% LNI. Future research should investigate the combination of information from both PSMA PET and mpMRI for LNI prediction in larger patient cohorts with PCa before RP.