Introduction

Prostate cancer (PCa) is one of the leading cancer types for the estimated new cancer cases and deaths in men worldwide [1]. Proper management of PCa patients required accurately assess the presence of, and a diagnostic evaluation of the characteristic severity of, the disease, thereby avoiding misestimation of patients [2]. Prostate-specific antigen (PSA) is a commonly used clinical biomarker for screening and diagnosis of PCa, while its high false-positive rate for diagnosis as a PCa biomarker has been questioned [3]. In clinical practice, multiparametric MRI (mpMRI) techniques are promising in detection and characterization of PCa [4]. However, mpMRI is still restricted by benign confounding appearances and substantial intra- and inter-reader variability. Systematic prostate biopsy is commonly performed for cancer detection with relatively low sensitivity and specificity, which could lead to delayed diagnosis as well as over-diagnosis with unnecessary discomfort and cost [5, 6]. Urologists are looking for a novel, non-invasive way to improve the accuracy of PCa detection, staging, and risk stratifications.

Minimally blood or urine-based approaches (“liquid biopsies”) are increasingly being used for cancer detection, enabling a precision oncology approach [7]. Information about tumors (e.g., circulating tumor cells, cell-free DNA and RNA) and immune responses (e.g., immune cell subsets, cytokines and exosome expression profiles) are potential diagnostic, prognostic and therapeutic targets of PCa [8, 9]. Inflammation and immune response contribute to tumorigenesis [10]. Many peripheral blood markers of inflammation and immune response are diagnostic and prognostic indicators of PCa [11,12,13]. Lymphocyte subsets, including T cells, B cells, and innate lymphoid cells, can distinguish between benign prostate disease (BPD) and PCa and predict clinical risk (low-/intermediate-risk disease and high-risk disease) in asymptomatic men [9, 13]. Clinically significant PCa (CSPCa) refers to intermediate- and high-risk PCa that still requires treatment in clinical practice according to the EAU guidelines [14]. Therefore, “indolent cancers” (low-risk PCa) and BPD are more appropriately grouped together than intermediate-risk PCa in PCa screening. Furthermore, treatment options for intermediate-risk patients range from focal therapy, radical prostatectomy to various radiotherapy approaches, whereas high-risk PCa is candidate for systemic therapy, indicating that a distinction should be made between intermediate-risk disease and high-risk disease [14, 15]. Unfortunately, few studies have examined the ability of lymphocyte subsets to distinguish among low-, intermediate-, and high-risk PCa [9, 13]. In addition, functional status of lymphocytes if not all, have rarely been studied in terms of diagnostic performance.

Automated methods to detect PCa and distinguish indolent from aggressive disease based on clinical records can assist in early diagnosis and treatment planning. Machine learning (ML), which employs computational algorithms that can accurately extract features without explicit pre-instructions, has been introduced as an advanced technique for aiding in the detection and characterization of PCa [9, 16,17,18,19,20]. ML approaches based on peripheral blood lymphocyte subsets can distinguish BPD from PCa, or low-/intermediate-risk from high-risk PCa from a small sample size in a hospital-based study [9, 13]. Thus, despite success of existing studies, these ML approaches don’t match the unmet medical need, with poor interpretation and low generalizability.

To address these challenges, this study included subjects ranging from BPD, low-risk, intermediate-risk, and high-risk PCa with clinical characteristics collected from two campuses of Wuhan Tongji Hospital, forming the largest sample size to date regarding functional subsets of peripheral lymphocyte for the diagnosis of PCa. We aimed to develop an easy-to-use and robust clinic-ML nomogram to aid in the non-invasive diagnosis and tripartite risk stratification of PCa.

Methods

Patient data collection

The study was approved by the Research Ethics Commission of Tongji Hospital and the requirement for informed consent was waived by the Ethics Commission (IRB ID: TJ- IRB20211246). The study screened 2039 patients with PCa and BPD who were admitted to Wuhan Tongji Hospital (China) from August 1st, 2020 to October 20th, 2022. Patients with missing laboratory, radiological or pathological data, or poor-quality MRI images were excluded from the study. Ultimately, 197 PCa patients, including 56 BPD, were enrolled in the study (Fig. 1). To maximize the utilization of the collected data, both nCSPCa and BPD were grouped into low-risk PCa category. All enrolled patients had the records of 42 clinic characteristics in functional subsets of peripheral lymphocyte (Table 1). The subsets of peripheral lymphocyte were detected by flow cytometry. The serum concentrations of interleukins were measured using the electrochemiluminescence immunoassay method (Cobas E602, Roche). The procedure for flow cytometry and interleukins detection by the clinical laboratory of Wuhan Tongji Hospital has been previously described [21].

Fig. 1
figure 1

The flowchart of patient enrollment and data preprocessing

Procedures

The workflow of this study is depicted in Fig. 2. Figure 3 illustrates the construction pipeline of the clinic nomogram and the proposed clinic-machine learning nomogram.

Data preprocessing and feature selection

The clinical records of the patients were manually inspected for quality control to identify any missing or abnormal values. Each clinic characteristic was visualized through boxplots (Additional file 1: Fig. S1) during this inspection process. To address uncertainty in the input data, a few recorded values were truncated. For example, if the Prostate-Specific Antigen (PSA) values exceeded 1000, they were re-processed and recorded as 1000. Similarly, in the case of ATL, Interleukin-6, Interleukin-1β, and Interleukin-10, certain characteristic values below a specific threshold cannot be accurately recorded due to machine measurement precision. Consequently, all these values for ATL, Interleukin-6, and Interleukin-1β were uniformly truncated to 5, 1.5 and 5, respectively. Additionally, Interleukin-10 was removed from the records due to too many duplicate values. As a result, a total of 41 clinic characteristics in functional subsets were used for the subsequent analysis.

After manual inspection, the clinical records were normalized using a min-max normalization scheme (Fig. 2A). The risk stratification of each patient was then manually assigned in accordance with the EAU guideline [14], resulting in 59 low-risk, 48 intermediate-risk, and 90 high-risk PCa patients.

These preprocessed clinic records, along with the corresponding risk stratification assignment, were fed into a Lasso regression algorithm, which selected the most significant features, generating the dataset used for the subsequent analysis (Fig. 2B). The Lasso-selected clinical records were randomly split into a training set and a test set in a 4:1 ratio. Consequently, a total of 157 records are used to train the machine learning (ML) models and construct the nomograms, and 40 records reserved for performance evaluation.

Machine learning models

Five commonly used ML algorithms were employed in this study for the task of predicting the risk stratification of PCa, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), XGBoost and AdaBoost. These ML models were trained using a 10-fold cross-validation approach on the training set (Fig. 2B). The optimal ML model was then selected based on its performance evaluated in the test set (Additional file 1: Table S1) and served as the performance baseline for comparison with nomograms.

Development and validation of the clinic-machine learning nomogram

First, a clinic nomogram was created using a multivariable Ordinal Logistic Regression (OLR) algorithm on the clinic data from the training set (Fig. 2C). Second, a ML nomogram was built through the application of a multivariable OLR algorithm utilizing the probabilistic predictions of the five trained ML models. Third, to fully leverage the interpretability of the nomogram, a feature mapping algorithm (FMA) was developed to convert the ML monogram into a clinic-ML nomogram, using clinic features as variables (Fig. 3). Finally, the performance of the clinic nomogram and the proposed clinic-ML nomogram was evaluated on the test set using the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) and the calibration curve, and the clinical utility was measured through Decision Curve Analysis (DCA) (Fig. 2D).

The FMA generates for the clinic-ML nomogram the values of clinic features (CF) as

$${CF}_{i}= \sum _{j=1}^{N}{FI}_{i,j}\times {MV}_{j}$$
(1)

where FIi,j is the feature importance of the ith clinic feature in the jth trained ML model, MVj is the value of the jth ML models in the ML nomogram with i∈(1,M) and j∈(1,N) where M is the number of clinic features and N is the number of ML models, respectively. With the help of the FMA, the ML nomogram can be conveniently converted into a new clinic-ML nomogram whose variables are clinic features. The conversion enhances the interpretability while keeping the efficiency and power of the ML models.

Fig. 2
figure 2

Workflow for development and validation of the proposed clinic-ML nomogram for predictions of the risk stratification of PCa based on functional subsets of peripheral lymphocyte

Fig. 3
figure 3

Diagram of the clinic-ML nomogram and the clinic nomogram. The clinic-ML nomogram (g) is converted from the ML nomogram (e) via FMA (f) which extracts the feature importance (d) from ML models (b) trained on patients’ records with clinical features (a). The clinic nomogram (h) is constructed directly based on clinical features (a)

Statistical analysis

T-test or Mann-Whitney U-test were used for continuous variables conforming to normal distribution and homogeneity of variance. The Kruskal-Wallis H-test was used for testing other continuous and categorical variables. The implementation of ML algorithms, Lasso regression and ROC analysis was carried out using the Scikit-learn package in Python 3.6. All other statistical analyses were performed using the R statistical software Version 3.4.1. The “rms” package was utilized for the univariate, multivariate, and ordinal logistic regression analyses. The calibration plots and DCA were performed using the “rms” and “dca” package, respectively. The statistically significant difference between the AUCs of two ROCs was analyzed using the Delong test. A two-sided p value of less than 0.05 was considered statistically significant.

Results

Characteristics of patients

There were no significant differences arising in most clinic features between patients in the training and test sets (Table 1). However, significant differences were detected among low-, intermediate- and high-risk PCa patients in twelve clinic features in the training set, including Age, PSA, Neutrophil percentage, Neutrophils, Hemoglobing, Alkaline phosphatase, Lactate dehydrogenase, Th/Ts, Activated Ts cells, Interleukin-1β, Interleukin-2R, and Interleukin-6 (p < 0.05) (Table 2).

Table 1 Clinical characteristics of patients
Table 2 Clinical characteristics of the training and test sets of PCa with risk stratifications

Selection of clinic features for ML models and the clinic nomogram

The Lasso regression was applied to determine the optimal subset of the clinic features (Fig. 4), yielding a total of nine features, i.e., Age, Alkaline phosphatase, B cells (CD3−CD19+), Interleukin-1β, Interleukin-2R, Lactate dehydrogenase, Neutrophil percentage, PSA and Th/Ts. These nine features were then utilized for the construction of both the ML models and the clinic monogram.

Fig. 4
figure 4

Lasso regression to generate the selected clinic features with iterative fitting using 5-fold cross-validation. Variation of the hyperparameter λ in Lasso regression is plotted vs. MSE (mean-squared-error) (A) and the coefficient profiles of clinic features (B). The light-blue vertical lines in (A) were drawn at the optimal values with one standard-deviation criteria. The vertical dashed line was drawn at the value selected at the logarithmic scale (λ), and nine features with non-zero coefficients are indicated

Performance assessment of ML algorithms

The data with Lasso-selected nine features were fed into five ML algorithms with a 10-fold cross validation. All ML algorithms show competitive performance in discriminating various risk stratifications (Fig. 5). The best performance was achieved by XGBoost which indicated favorable predictive efficacy in both training and test sets with AUC values of 0.989 and 0.842, sensitivity of 0.930 and 0.700, and specificity of 0.965 and 0.850, respectively (Table 5).

Fig. 5
figure 5

ROC of five ML algorithm in the training set (A) and the test set (B)

Development and performance assessment of the clinic-ML nomogram

Results of the univariate and multivariate logistic regression analysis (Table 3) suggested that predictions of four ML models, i.e., AdaBoost, Decision Tree, Random Forest, and XGBoost, were independent predictors of risk stratifications of PCa. Therefore, a multivariate OLR using probabilistic predictions of the four ML models was employed to construct the ML nomogram, which is then converted to a clinic-ML nomogram through the proposed FMA (Fig. 6B). VIFs of the variables in the ML nomogram were found to be within acceptable limits, as 5.13, 1.92, 5.08, and 2.39, respectively.

Fig. 6
figure 6

(upper) The clinic nomogram and (lower) the clinic-ML nomogram

The predictive scores of the clinic-ML nomogram were strongly correlated with the risk stratifications of PCa in both the training and test set (Fig. 7A). Using cutoff values of 2.24 and 6.00 for the clinic-ML nomogram predictive scores, the patients were classified into three risk stratification groups, and the results indicated the pattern of PCa patients was substantially different among the low-, intermediate- and high-risk stratification groups (Fig. 7B). For instance, in the test set, the probability of PCa patients was found to be significantly higher in the low-risk group compared to those in the intermediate- and high-risk groups (p < 0.05).

Fig. 7
figure 7

A Box plots indicating patterns of correlation between risk stratifications and the clinic-ML nomogram predictive scores in the training (upper left) and test set (upper right). B Number of PCa patients in low-, intermediate- and high-risk groups according to the clinic-ML nomogram predictive scores in the training (lower left) and test set (lower right)

Meanwhile, for the purpose of performance comparison, the Lasso-selected clinic features were utilized to construct the clinic monogram (Fig. 3). Analysis of univariate and multivariate logistic regressions revealed that five clinic variables, i.e., Age, B cells (CD3−CD19+), Neutrophil percentage, PSA and Th/Ts, were independent predictors of risk stratifications (Table 4). Subsequently, the corresponding clinic nomogram was constructed (Fig. 6A).

Performance of the clinic-ML nomogram and clinic nomogram was assessed using ROC analysis, showing the clinic-ML nomogram outperformed the clinic nomogram, with an AUC value of 0.998 vs. 0.897 in the training set, and 0.864 vs. 0.837 in the test set, respectively (Fig. 8; Table 5). The Delong test indicated that there was a significant difference in the AUC values of two nomograms in the training and test sets (p < 0.05). In addition, the performance of the clinic-ML nomogram was also superior to that of the optimal ML model, i.e., XGBoost (Table 5). The calibration curve demonstrated improved prediction performance of the clinic-ML nomogram compared to the other models (Fig. 9), which was further validated by the DCA, showing improved net benefits of the clinic-ML nomogram over both XGBoost and the clinic nomogram in both the training and test set (Fig. 10).

Table 3 Logistic regression for predicting risk stratifications of PCa based on predictions of five ML algorithms
Table 4 Logistic regression for predicting risk stratifications of PCa based on clinic features
Fig. 8
figure 8

ROC of the clinic-ML nomogram, the clinic nomogram and XGBoost in the training set (A) and the test set (B)

Table 5 Performance evaluation of XGBoost, the clinic nomogram and the clinic-ML nomogram in the training (first line in each cell) and test set (second line in each cell)
Fig. 9
figure 9

Calibration curve of the clinic-ML nomogram in the training set (A) and the test set (B). Dashed lines indicate the reference line where an ideal nomogram would be. Red solid lines indicate the performance of the nomogram, while green solid lines indicate bias correction in the nomogram

Fig. 10
figure 10

DCA for predicting risk stratifications (low-risk vs. intermediate- and high-risk) of PCa using XGBoost, the clinic nomogram, and the clinic-ML nomograms in the training (A) and test set (B)

Discussion

The retrospective study aims to develop a clinic-ML nomogram for predicting risk stratifications of PCa patients based on functional subsets of peripheral lymphocyte. A total of 197 PCa patients were included and 41 clinic characteristics were collected, forming the largest number of samples used in a study of its kind. After Lasso regression, an optimal subset of nine clinic features, i.e., Age, Alkaline phosphatase, B cells (CD3−CD19+), Interleukin-1β, Interleukin-2R, Lactate dehydrogenase, Neutrophil percentage, PSA and Th/Ts, was selected and explored for the prognostic validity of the proposed clinic-ML nomogram by comparing it with a conventional clinic nomogram and various ML models both of which were constructed directly based on clinic characteristics. The results demonstrated that the clinic-ML nomogram fully leveraged the predictive capability of ML algorithms and outperformed the conventional nomogram and the best ML model in terms of accuracy and clinical utility. Meanwhile, the clinic-ML nomogram was more distinguishable and easier to manipulate than the clinic nomogram among three risk stratifications (Fig. 6), and had a strong guiding effect on active surveillance treatment for low-risk PCa patients (Fig. 7). Thus, the clinic-ML nomogram can serve as an insight tool for preoperative assessment of risk stratifications of PCa, combining the interpretability and simplicity of a nomogram with the efficacy and robustness of ML algorithms.

This study divided PCa patients into three risk groups, which is more closely related to the clinical treatment. However, few studies have been conducted to predict three-levels of risk stratifications of PCa using lymphocyte subsets with a nomogram. Our study combined the nomogram and the ML models to further improve the diagnostic efficiency. Meanwhile, some other studies utilized imaging data (such as PSMA PET/CT, MRI, TRUS) with other clinic indicators to establish the nomogram for the prediction of PCa risk stratifications [22,23,24,25]. Despite of the improved performance with the imaging data modality, those studies achieved comparable, if not slightly inferior, results compared to the present study (Additional file 1: Table S2). In addition, the use of “scores” calculated by sophisticated algorithms as variables in the nomogram may be helpful in improving prediction accuracy, but may also increase the complexity of the nomogram and make it more difficult to interpret [17, 26]. The approach taken in this study, which used the most significant examination feature as variables in the clinical ML nomogram, may provide a more direct and simple method for assessing patient risk stratifications.

The study presented several limitations that should be acknowledged. Firstly, all the data were collected exclusively from one medical center with two campuses located in the same city. Therefore, the generalizability of the proposed clinic-ML nomogram to other populations and settings remains unknown and requires further evaluation in other cohorts. To address this issue, a multi-center study is planned to assess the external validity and robustness of the clinic-ML nomogram. Secondly, the number of ML algorithms used in the development of the clinic-ML nomogram was limited, and future studies may benefit from the inclusion of additional ML algorithms to enhance the performance of the nomogram. Thirdly, the imaging data plays a crucial role in the diagnosis and staging of PCa, and its integration into the clinic-ML nomogram could further improve its diagnostic efficiency and predictive power.

The application of nomograms in clinic diagnosis has gained popularity in recent years due to their simplicity, intuition, and interpretability [27]. The integration of nomograms with powerful ML algorithms to improve the performance while maintaining interpretability of the nomogram is a research hot-spot [28,29,30]. The proposed clinic-ML nomogram is an easy-to-use and powerful tool for accurately predicting the risk stratification of PCa patients, which could provide essential information for individual diagnosis and treatment in PCa.