Background

Bladder cancer (BC) is one of the most common urinary carcinomas, which was reported to be the tenth most common malignancy in both sexes and the sixth most common malignancy in men in 2020 worldwide [1]. As for patients with muscle-invasive bladder cancer (MIBC) and high-grade non-muscle‑invasive bladder cancer (NMIBC), treated with bladder-conserving therapies often result in early recurrence and progression. Therefore, radical cystectomy (RC) with pelvic lymph node dissection (PLND) is standardly recommended for these patients [2, 3].

Even if most patients treated with RC had negative surgical margins, approximately 50% patients had the possibility of recurrence, indicating the existence of extravesical tumor deposits at the time of surgery [4, 5]. Lymph node metastasis (LNM) is the most common site of BC metastases, which was reported to be ranging from 24 to 29% in patients receiving RC [4, 6]. It was reported that BC patients who had LNM had only 19% 5-year overall survival rate with RC treatment alone. Even if patients received RC combined with neoadjuvant or adjuvant chemotherapy, the 5-year overall survival rate was only around 30% [7]. Hence, preoperatively predicting LNM in patients with BC is necessary and beneficial.

Machine learning (ML) is one application field of artificial intelligence, which can automatically learn and improve the model performance without programming comparing with traditional methods [8]. Its algorithms can fit different configurations of data, assign weighting, and calculate the divinable power of each combination of variables in order to assess diagnostic and prognostic elements [9]. Several ML models which preoperatively predicted LNM in prostate cancer and renal cell carcinoma were established and validated, and some of which indicated the better performance compared with traditional logistic regression models [10, 11].

Over 90% of BC cases were pathologically diagnosed with bladder urothelial carcinoma (BUC). And previously we have established a traditional nomogram for predicting LNM in BUC, which was highly accurate, reliable, and clinically applicable in both internal validation and external validation [12]. However, there was no study aimed at preoperatively predicting LNM using ML in BC being reported. Therefore, we aimed to use ML algorithms to construct and validate a model for preoperatively predicting LNM in BUC using demographic information, imaging data, pathologic characteristics from transurethral resection of the bladder tumor (TURBT) specimens, and laboratory measurements.

Methods

Patient selection

This study was approved by the Medical Ethics Committee of the Affiliated Hospital of Qingdao University with the number of QYFYWZLL28026, and was carried out following the Declaration of Helsinki of the World Medical Association. We retrospectively collected the clinical data of patients who underwent RC and bilateral lymphadenectomy in the urology department of the Affiliated Hospital of Qingdao University between January 2013 and April 2022. We divided these patients into training set (80%) and testing set (20%) by stratified random sampling using the Stratified Shuffle-Split function in Python. LNM was defined as the confirmation of lymph node metastasis in the specimen from RC through pathology. Patients were excluded based on the following criteria: (a) age < 18 years; (b) without TURBT before RC or without muscle in TURBT; (c) patients with incomplete imaging examination data before RC; (d) tumor originated from sites other than the bladder; (e) patients with distant metastasis; (f) patients with incomplete laboratory measurements within a month before RC; (g) patients were diagnosed with non-urothelial carcinoma in pathology from RC; (h) patients receiving preoperative radiotherapy; (i) patients with severe or end-stage chronic kidney disease. This study complied with the principles of the Declaration of Helsinki and was conducted in accordance with the ethical standards of the medical ethics committee of our institution.

Data collection

The following preoperative data of included patients were recorded: age, sex, body mass index (BMI), tumor grade of TURBT, papillary tumor presence of TURBT, urothelial variants of TURBT, muscle invasion of TURBT, infiltration of TURBT, hydronephrosis on imaging, extravesical invasion on imaging, positive LN on imaging, tumor size on imaging, neutrophil count, monocyte count, basophil count, eosinophil count, lymphocyte count, erythrocyte count, platelet count, hemoglobin, fibrinogen, urea nitrogen, creatinine, and albumin. We used the respective cell counts to calculate neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR), monocyte to lymphocyte ratio (MLR), and neutrophil to platelet ratio (NPR). Besides, the systemic immune-inflammation index (SII) was defined as multiplying the platelet count by the neutrophil count and dividing this value by the lymphocyte count.

The pathological characteristics (including tumor grade, papillary tumor presence, differentiation, muscle invasion, and infiltration) of the highest tumor grade or cancer stage were recorded when they received several rounds of TURBT. If the latest TURBT was performed over one month before RC, the pre-RC laboratory measurements were recorded. Otherwise, the measurements before TURBT were collected, which could reduce the impact of surgery on the results.

Feature selection and model building

We conducted the univariate analysis for the recorded clinical variables to primarily determine potential preoperative risk factors for LNM in BUC. Secondly, spearman correlation analysis was performed to reduce collinearity among features. To reduce the risk of overfitting, the least absolute shrinkage and selection operator (LASSO) algorithm was applied to select features with non-zero coefficient values.

The prediction model of LNM in BUC patients after RC was established using five ML algorithms, including the support vector machine (SVM), light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), random forest (RF) and extra-trees classifier. All patients were randomly categorized into training set (80%) and testing set (20%). The training set was used to establish the prediction models using five-fold cross-validation, whereas the testing set was used to validate the prediction models using the area under the curve (AUC) of the receiver operating characteristics (ROC) and corresponding 95% confidence intervals (95%CI). We considered the model with the highest AUC as the best model. We calculated the correlation coefficient between features and drew shapely additive explanation (SHAP) summary plot, which were used to visualize the relative importance ranking of each feature to the model predictions. Decision curve analysis (DCA) was performed to demonstrate net benefit for each risk threshold probability, as well as the clinical application value of the best model.

Statistical analyses

The Stratified Shuffle-Split function was conducted in Python (version 3.7). Continuous variables with a normal distribution were described as means and standard deviations, continuous variables with an abnormal distribution were described as medians and interquartile ranges, and categorical variables were described as frequencies and proportions. Continuous variables with a normal distribution, continuous variables with an abnormal distribution, and categorical variables were univariately analyzed using Student’s t-test, Mann-Whitney U-test, and Chi-squared test, respectively. Univariate statistical analyses were performed using SPSS (version 24.0). Other statistical analyses, correlation analysis, and LASSO algorithm were implemented by importing the “scipy”, “numpy”, and “sklearn” packages in Python (version 3.7), and were performed using the “One-key AI” platform (http://www.medai.icu/), which was based on Python (version 3.7). The code used in this study was derived from: https://gitee.com/wangqingbaidu/OnekeyCompo. A bilateral P-value < 0.05 was considered as a measure of statistical significance.

Results

Patient characteristics

A total of 805 patients with BUC were potentially eligible from the Affiliated Hospital of Qingdao University between January 2013 and April 2022. After the selecting process, 655 patients were finally enrolled in our study, and 105 of which had LNM. The training set included 440 patients without LNM and 84 patients with LNM, while the testing set included 110 patients without LNM and 21 patients with LNM (Fig. 1). The baseline data of the included patient are shown in Table 1, which indicated that the grade, papillary, infiltration, hydronephrosis, extravesical invasion, positive lymph node, tumor size, neutrophil count, monocyte count, erythrocyte count, platelet count, hemoglobin, fibrinogen, creatinine, albumin, NLR, PLR, MLR, and SII were significantly different between patients with LNM and patients without LNM in univariate analyses.

Fig. 1
figure 1

Flow chart of the process of patients’ selection

Table 1 Baseline characteristics of the patients

The comparison of baseline characteristics between the training and testing sets with corresponding P values was shown in Table 2. LNM was not significantly different between training set and testing set (p = 1.000). All baseline characteristics between the training and testing sets were statistically insignificant except diabetes. Considering the rate of diabetes was not significantly different between patients with LNM or not (p = 0.227), the baseline characteristics between two sets were balanced. Besides, the baseline characteristics of the two sets by lymph node status were analyzed and shown in Supplementary Table 1.

Table 2 Comparison of baseline characteristics between the two sets

Features selection and model evaluation

We performed spearman correlation analysis and the lasso algorithm with fivefold cross-validation (Fig. 2a and b) to select predictors. 14 potential predictors of LNM after RC were ultimately determined (Supplementary Table 2), which were incorporated into the construction of the prediction model in our study.

Fig. 2
figure 2

(a) The process of feature selection. We used the LASSO regression model with penalty parameter tuning conducted by fivefold cross validation according to minimum criteria. Selection of the tuning parameter (λ). Based on the minimum criteria, the vertical dotted line is plotted at the optimal value λ = 0.0072. (b) The vertical line was plotted with 14 selected features. LASSO, least absolute shrinkage and selection operator

Five machine learning algorithms utilizing the 14 selected factors as inputs were used to establish the prediction models in the training set, and the performance of the models was evaluated using the testing set and expressed by the AUC, accuracy, sensitivity, and specificity. The performance results of the prediction models in the training set and testing set were shown in Table 3. The receiver operating characteristics (ROC) and the area under the curve (AUC) for each different prediction models in the testing set were shown in Fig. 3a. The SVM model performed the best prediction ability with an AUC of 0.934 (95%CI: 0.903–0.964) and accuracy of 0.916 in the training set, and an AUC of 0.855 (95%CI: 0.777–0.933) and accuracy of 0.809 in the testing set (Fig. 3b). The RF model had the lowest AUC value of 0.686 (95% CI: 0.563–0.810) and accuracy of 0.611 in the testing set.

Fig. 3
figure 3

(a) Performance for machine learning models in the testing set based on the AUC of the ROC curve. (b) AUC and the ROC curve of SVM model in the training set and the testing set. AUC, area under the curve; ROC, receiver operating characteristics; SVM, support vector machine

Table 3 Comparison of the performance of machine learning models in the training and testing set

Importance of features of the best model

Coefficients were used to interpret the results of the best prediction model by evaluating the contribution of each variable to the prediction model. We focused on the SVM model since it was the best prediction model, and visualized these variables in Fig. 4a. Moreover, SHAP summary plot was also adopted to show the contribution of each predictor of LNM in the SVM model (Fig. 4b). The results revealed that positive lymph node in imaging contributed the most to the prediction of the outcome, followed by tumor size, extravesical invasion, infiltration, grade, hydronephrosis, papillary, age, fibrinogen, NPR, creatinine, albumin, hemoglobin, and erythrocyte count. Results of the DCA of SVM model in testing set showed that the model offered a clinical benefit at a threshold of between 0.10 and 0.50 (Supplementary Fig. 1).

Fig. 4
figure 4

(a) Top 14 selected features and the corresponding variable coefficients of SVM model. Y-axis shows the top 14 variables, and X-axis shows their impact on the machine model. (b) SHAP summary plot of top 14 selected features of SVM model. SVM, support vector machine; LN, lymph node; NPR, neutrophil-to-platelet ratio; SHAP, shapely additive explanation

Discussion

RC plus bilateral PLND and neoadjuvant cisplatin-based combined chemotherapy are recognized as standard treatments of MIBC and some very high-risk NMIBC patients [2, 3]. However, the high possibility of concealed micro metastases resulted in the high recurrence rate of BUC after surgery [13]. Considering BUC patients with LNM were reported to have tumor tissue in LNs which were outside the region of standard PLND, the extended PLND were put up [14]. Even one study developed a nomogram aimed at LNM prediction in BC patients treated with extended PLND [15]. However, one prospective randomized trial demonstrated that extended LND showed no significant survival advantage over standard PLND [16]. Thus, preoperatively predicting LNM in BUC patients treated with RC is of high clinical value. Here, we developed and validated models for preoperatively predicting LNM in BUC using ML, and demonstrated that the SVM model performed the best prediction ability.

Several articles have extensively explored the independent predictors of LNM in BC after RC using various features. Two articles developed nomogram models containing gene signature. Cao et al. established epithelial-mesenchymal transition-LN signature containing 19 candidate genes and identified it as a predictor of LNM in BC [17]. Wu et al. also selected 5 LN-status-related mRNA and developed the five-mRNA-based classifier, which was also incorporated in the nomogram as an independent risk factor for LNM in BUC [18]. Besides, two studies used CT-based radiomics signature and MRI-based radiomics signature as independent variables to predict LNM in BC patients, respectively. And both CT-based radiomics signature contained nomogram and MRI-based radiomics signature contained nomogram showed good calibration and discrimination in the training and validation sets [19, 20]. However, the patients’ genomic and clinical features of gene-based nomograms were from online database such as The Cancer Genome Atlas, which could lead to the information selection bias and the restriction of the range of analyses. Although the radiomics features were selected from authors’ institution, the low sample size and the lack of external validation limited the validation and application of the nomogram models. Then, the genomic information and radiomics signature were difficult to collect and apply in clinical life, and finally restricted the clinical value of these models.

One study based on the Surveillance, Epidemiology, and End Results database identified age, tumor grade, tumor size, and tumor T stage as independent risk factors for LNM in BUC patients [21], which was similar to our selected predictors in the final ML model. Although the sample size was large, the AUC in training dataset was only 0.69, and the AUC in testing dataset was only 0.704, indicating the low accuracy of this model. Besides, another limitation was that the tumor grade in the database was from the pathology of RC, which could not be preoperatively collected. Ou et al. constructed a nomogram for predicting LNM in T1 high-grade BUC containing MLR and fibrinogen [22]. Another study also demonstrated that systemic inflammatory biomarker such as NLR was an independent risk factor for LNM in BUC [13]. Therefore, we selected information of laboratory measurements and calculated systemic inflammatory biomarkers. Although several articles reported that T stage was a risk factor of LNM in BC [13, 21], the staging accuracy of imaging tool such as CT was low. Therefore, we selected the “presence or absence of extravesical invasion” parameter instead of “T stage” to analyze. Although TURBT is typically recommended before RC and the resection specimen should contain bladder muscle tissue, the staging accuracy of TURBT is low, which was evidenced by the results that around 25–51% patients who were diagnosed with NMIBC in TURBT were upstaged to MIBC at RC [23,24,25]. The presence of lymphovascular invasion (LVI) was not accurately reported in our institution because immunohistochemistry is a nonessential tool in the diagnosis of BC [26]. Thus, we collected pathological information from TURBT excepted LVI. And finally, we developed the ML model based on preoperative demographic, pathological, imaging, and laboratory data, which were comprehensive and easily collected in clinical application.

We analyzed the relationship between positive LN on imaging and LNM in both training set and testing set using Chi-squared test, which was shown in Supplementary Table 1. Results indicated that positive LN was significantly different between patients with LNM and patients without LNM in both training set (p < 0.001) and testing set (p = 0.002). The accuracy of LN on imaging of diagnosing LNM was 0.821 and 0.832 in training set and testing set, respectively. However, the sensitivity was only 34.5% and 33.3% separately in training set and testing set, which was much lower than our model. Even though CT and MRI are most common imaging modalities in BUC, it was reported that both of them had limitation in the sensitivity of accessing LNM [3]. The sensitivity of diagnosing LNM in CT and MRI was only ranging from 14 to 30% [27, 28]. Even the most advanced imaging techniques such as PET-CT showed low sensitivity in predicting LNM [27, 29, 30]. Our machine learning model showed high accuracy, sensitivity, and specificity, which could aid clinical diagnosis. Positive LN still had the highest weight in our model (Fig. 4a and b), which might due to the high accuracy and specificity of this variable.

The precision medicine was commonly defined as the stratification of patients using clinical, lifestyle, genetic and further biomarker information with large-scale data [31]. ML is an accurate and new approach to facilely estimate individualized outcomes and bring better decision-making protocols with the availability of plenty of electronic patient clinical and genomic data at present [32]. Previously we had established a nomogram model for predicting LNM of BUC using multivariate traditional logistic regression. We identified tumor grade, infiltration, extravesical invasion, positive LN on imaging, tumor size, and serum creatinine levels as independent preoperative risk factors, while the AUC of 0.817 in training set and the AUC of 0.805 in testing set proved its accuracy and stability [12]. One study identified LNM related genes in prostate cancer using ML algorithm, and performed well in the validation process, indicating the excellent data handling capacity of ML methods [33]. Sabbagh et al. constructed a ML model to predict LNM in prostate cancer using standard clinicopathologic variables, and proved that ML model outperforms traditional tools by AUC and decision curve analysis [10]. The SVM model in our study containing 14 variables had AUCs of 0.934 and 0.855 in the training set and testing set, respectively, which were higher than the results in the traditional logistic regression model. Therefore, we concluded that the SVM model has better prediction performance in LNM of BUC patients than traditional model.

As far as we know, this is the first study to develop the ML model to preoperatively predict LNM in BUC patients using demographic, pathological, imaging, and laboratory data. The high AUCs in both training set and testing set demonstrated the high accuracy and discrimination ability of our model, and the large sample size guaranteed the stability of our results. Besides, the preoperative variables we selected were easily to get, facilitating the application of our model in clinical life.

Nevertheless, this study also had some limitations. First, the inaccurate data selection and the introduction of other potential confounders could not be eliminated due to the retrospective study design. Second, we only conducted the internal validation of all ML models and identified SVM model as the best model. The external validation with large sample size should be furtherly conducted. Third, several clinical trials and retrospective studies proved the survival benefit in BUC patients with neoadjuvant cisplatin-based chemotherapy [34,35,36]. Although neoadjuvant cisplatin-based chemotherapy has become the standard of care for cNOM0 patients with MIBC, we did not analyze the influence of neoadjuvant chemotherapy on LNM due to the lack of data. The relationship between neoadjuvant chemotherapy and LNM in patients with radical cystectomy should be explored with large sample size in the future. Finally, we only collected traditional pathological features of TURBT and traditional imaging data. One recently published study collected new pathological characteristics and radiomics features using deep learning algorithm and constructed the deep learning model to predict LNM in prostate cancer [37]. Thus, the micro variables of pathology and radiomics could be included and used to construct prediction model using deep learning in the future.

Conclusions

We developed and validated the ML models to preoperatively predict LNM in BUC patients received RC, and identified that the SVM model with 14 variables had the best performance. The SVM model displayed high levels of accuracy and clinical applicability by internal validation.