Background

Lobectomy with mediastinal systematic lymph node dissection (SND) is standard surgical strategy for lung cancer [1, 2]. Nevertheless, the significance of SND is controversial. The American College of Surgeons Oncology Group Z0030 trial revealed that there was no survival difference between patients with non-small cell lung cancer (NSCLC) who had SND or systematic sampling, with the 5-year disease-free survival rates were 68% and 69%, respectively (p > 0.05) [3]. Ishiguro et al. [4] and Ray et al. [5] also reported the similar findings: SND did not provide additional survival benefit. Central to avoid “overtreatment” (i.e., unnecessary SND) and provide a more precise and individualized lymph node (LN) dissection strategy is an accurate evaluation of node status at the mediastinal and hilar levels, especially the negative status [6].

The negative predictive value (NPV) of invasive endoscopic techniques is still unsatisfactory currently due to the difficulty of the selection suspected LN caused by the anatomical complexity of mediastinum, the location and size of LN, and the poor repeatability, etc. [7, 8]. Non-invasive imaging techniques, including chest computed tomography (CT) and [18F]-fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT, are commonly used for LN staging [9]. PET/CT evidently has significantly higher accuracy than CT, especially with the superior NPV greater than 85% [10]. However, the main problem of PET/CT evaluation is the false-positive (FP) findings caused by non-specific FDG uptake in non-neoplastic processes such as granulomas or other inflammatory diseases, especially when intrapulmonary lesions and mediastinal–hilar LNs are both FDG-positive, with the false positive rate (FPR) of 19 ~ 22% [11, 12]. The resulting overestimation of FP LNs would have a major impact on the patient’s further treatment strategy, including unnecessary resection of benign nodules and inappropriate exclusion of surgical treatment. Thus, FP FDG studies for LN staging are inevitable.

Previous radiomics analyses based on PET/CT have demonstrated the great potential of assessing the lymph node metastasis (LNM) in lung cancer using the machine learning (ML) algorithms to exhaust the full underlying information of non-invasive medical images [13,14,15]. However, only a few radiomics researches have focused on the evaluation of hypermetabolic mediastinal–hilar LNs status [16]. The results in our previous study also indicated the feasibility of PET/CT radiomics in achieving “pathology-like” diagnosis non-invasively in lung cancer. Furthermore, we found that clinico-biological-radiomics (CBR) data could evaluate the tumor heterogeneity more comprehensively due to the combination of multi-scale characteristics of tumors [17]. Multi-scale and high-dimensional features need appropriate filter strategies to reduce redundancy while ensuring model effectiveness [18]. On the basis of successfully screening features and establishing excellent models using the least absolute shrinkage and selection operator (Lasso) algorithm in our previous study, we added the minimum-redundancy maximum-relevance (mRMR) algorithm before Lasso to initially narrow the range of redundant and irrelevant features in this study, which contributed to the robustness of research.

Thus, the purpose of this study was to seek a more reliable, scalable and non-invasive biomarker-based CBR data via ML algorithms to reduce the FPR and improve the accuracy for predicting the hypermetabolic mediastinal–hilar LNs status in lung cancer.

Methods

Patients

This retrospective study reviewed the charts of 1280 patients with single pulmonary nodule examined by [18F]FDG-PET/CT scanning less than 30 days before curative surgery between January 2018 and December 2022, and finally identified 260 patients of resectable T1–4 lung cancer with complete baseline clinico-biological information. The retrospective study was approved by the Ethics Committee of Shanghai Proton and Heavy Ion Center, and informed consent was waived.

The specific inclusion criteria were as follows: (1) both the single intrapulmonary lesion and mediastinal–hilar LNs were FDG-positive with the maximum standardized uptake value (SUVmax) ≥ 2.5 [19, 20], the size of lesion > 1.0cm while the short axis diameter of target LNs > 0.5cm to ensure the quality of image and radiomics data; (2) first pathologically diagnosed of a primary lung cancer [21]; (3) postoperative pathological (p) N staging determined by SND as pN0-2 (N0: no regional LN involvement, N1: ipsilateral peribronchial, interlobar, or hilar LN involvement, N2: ipsilateral mediastinal LN involvement) [22]. The exclusion criteria included the following: (1) anti-tumor therapy before PET/CT examination or surgery; (2) lobectomy without SND; (3) distant metastasis; (4) poor image quality. The patient recruitment process is presented in Fig. 1.

Fig. 1
figure 1

Flowchart showing the patient selection and exclusion

Finally, totally 260 consecutive lung cancer patients were enrolled in this study, comprising 205 males and 55 females (mean age, 62.15 ± 8.62 years, range, 27–81 years), as summarized in Table 1. Among these included patients, the most common histologic subtype was adenocarcinoma (n = 145, 55.77%), followed by squamous cell carcinoma (n = 96, 36.92%). Rarer cases of small cell lung cancer (n = 11, 4.23%), large cell carcinoma (n = 6, 2.31%) and sarcomatoid carcinoma (n = 2, 0.77%) were reported. The patients were pathologically divided into the LN negative (LN−, pN0, n = 109) and positive (LN + , pN1-2, n = 151) groups, and assigned to a training (n = 182) and test (n = 78) sets by the random split-sample (7:3) method. Baseline clinico-biological data of each patient were reviewed and recorded.

Table 1 Clinical and demographic characteristics of lung cancer patients

[ 18 F]FDG-PET/CT image protocol

All included patients with a blood glucose levels < 8.7 mmol/L fasted for at least 6 h before the [18F]FDG-PET/CT scan. The scanning protocols of this retrospective study conducted in the single center were consistent with our previous study [17], and complied with the standard clinical scanning protocols [23]. The details of image acquisition process are given in Additional file 1 and “Methods” section.

Tumor segmentation and analysis

The target lesions in this study were hypermetabolic single primary tumor and mediastinal–hilar LNs. The volume of interest (VOI) of each primary tumor was segmented by two separated experienced nuclear medicine physicians on the PET images using the gradient-based semi-automatic contouring algorithm, named “PET_Edge”, on the Medical Image Merge software (MIM, version 6.5.4, https://www.mimsoftware.com) without knowing the pathology determined by consensus. PET_Edge has been confirmed to be the most accurate and consistent method for tumor segmentation than manual and constant threshold methods [24, 25]. Then, six metabolic parameters including minimum SUV (SUVmin), SUVmax, SUVmean, metabolic tumor volume (MTV), and total lesion glycolysis (TLG) were automatically measured from each VOI.

The highest SUVmax of LNs was also recorded for each patient, simultaneously, the size of the LN with the highest SUVmax was measured with the nodal enlargement criterion of greater than 1.0 cm in short axis diameter on a transverse CT image of the fused PET/CT [10].

Quantitative radiomics feature extraction

Subsequently, a total of 1702 quantitative radiomics features for each VOI were automatically extracted and calculated from the PET (n = 851) and CT (n = 851) images using the “PyRadiomics” module [26], respectively. The radiomic features were divided into four groups: (1) shape (n = 14); (2) intensity (n = 18); (3) texture (n = 75, 24 Gy level co-occurrence matrix (GLCM), 14 Gy level dependence matrix (GLDM), 16 Gy level run length matrix (GLRLM), 16 Gy level size zone matrix (GLSZM), and 5 neighboring gray tone difference matrix (NGTDM)); and (4) wavelet-based (W) features obtained from the filters (H: high pass filter, L: low pass filter) applied in the x, y, z directions (n = 744). The feature extraction and its definition were in accordance with the Imaging Biomarker Standardization Initiative [27], and its details are described in Additional file 1: Table S1.

Features dimension reduction and selection

So far, we have constructed a CBR dataset containing 1738 multi-scale features (25 clinico-biological features, 11 conventional image features, and 1702 radiomics features) for all included patients. The processes of features dimension reduction and selection were performed using the classical supervised ML algorithms in the training set. Firstly, the features with intra- and inter-class correlation coefficients (ICC) < 0.8 were excluded due to the poor consistency and reproducibility. Then, we performed the mRMR algorithm to preliminarily narrow the range of redundant and irrelevant features, and selected the top 50 features. Finally, the Lasso algorithm with tenfold cross-validation was applied to further screen the optimal features for prediction model development.

Prediction models and individualized nomogram development and evaluation

The models for predicting the hypermetabolic mediastinal–hilar LNs status in lung cancer were developed by the multivariable regression with the Akaike’s information criterion (AIC), with prediction scores (pre-scores) of each model calculated for each patient by the linear fusion of the selected non-zero features weighted by their coefficients. The performance and clinical utility of these models were evaluated and compared by the receiver-operator characteristic curve (ROC) analysis, DeLong test, and decision curve analysis (DCA) in both the training and test sets. The area under the curve (AUC) with 95% confidence interval (CI), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), FPR, and false negative rate (FNR) were calculated for each model.

For models with similar overall AUC and accuracy, a lower FPR is more clinically relevant for this study. Thus, we developed an individualized nomogram to visually quantify the risk of hypermetabolic mediastinal–hilar LNs metastasis on the basis of prediction model corresponded to this rule. Calibration curves were plotted to assess the agreement between the actual probability and predicted probability of the nomogram by bootstrapping (1000 bootstrap resamples) in both the training and test sets.

Statistical analysis

All data analysis in this study was performed on the R software (version 4.2, http://www.r-project.org). The following packages “mRMRe”, “glmnet”, “pROC” and “rmda” were applied for mRMR, Lasso, ROC, and DCA analyses, respectively. The “rms” package was used to construct nomogram and calibration curves. Numerical data with normal distribution were expressed as mean ± standard deviation (SD) and compared using an independent t-test, while one with non-normal distribution was expressed as median (interquartile range) and compared using a Mann–Whitney U test. Categorical data were described as counts and their percentages, and compared using Fisher’s exact test or χ2 test. A two-sided p value < 0.05 was considered statistical significance. The study process was systematically evaluated using the radiomics quality score (RQS, range − 8 to + 36 points, https://www.radiomics.world/rqs) [28].

Results

The quality of this study was good with the RQS of 20 (55.56%) (Additional file 1), which was better than the average of PET/CT radiomics-based lung cancer researches, all of which scored below 50% [29].

Clinico-biological and conventional image characteristics of patients

In total, 260 lung cancer patients with both the hypermetabolic primary tumor and mediastinal–hilar LNs were eventually enrolled in this study, including 109 LN− and 151 LN + patients. The patients’ statistically significant clinico-biological-image (CBI) features in the training set are presented in Table 2, while the comparison results of a total of 36 CBI features between LN- and LN + patients in the total, training, and test sets are provided in Additional file 1: Table S2.

Table 2 Statistically significant clinico-biological-image of lung cancer patients

LN− patients were more likely to be elderly ones with lighter body weight, while LN + patients were more likely to be younger ones with higher body weight (p < 0.05). Simultaneously, LN + patients generally had higher level of carbohydrate antigen (CA) 153 and higher positive rate of carcinoembryonic antigen (CEA) (cut-off value: 5.2 ng/ml) than LN− patients (p < 0.05). The SUVmax and size of LN were significantly related to the LN status in both the training and test sets (p < 0.05). There were no significant differences in other clinical characteristics (such as gender and smoking status), biological factors (such as other conventional lung cancer tumor markers levels and status, and tumor histological types), and PET/CT image features (such as the size, location and all metabolic parameters of primary tumor) between the LN− and LN + patients according to the univariate analysis (p > 0.05).

Features selection and prediction models development

Two independent prediction models have been established based on the SUVmax and size of hypermetabolic mediastinal–hilar LNs, respectively. Their combination was considered as the diagnostic efficacy of PET/CT. The CBI Model was developed via 4 valuable clinical and biological features selected only using Lasso algorithm due to the low dimensionality of CBI sub-dataset with 34 features (n = 25 + 11–2) (Fig. 2a). Subsequently, 19 PET/CT radiomics features were selected from the radiomics sub-dataset (n = 1702) by the ICC rule, mRMR and Lasso algorithms sequentially in the training set (Fig. 2b), and then 10 radiomics features were confirmed by the multivariable regression with the AIC to establish the Radiomics (Rad) Model. Similarly, the CBR Model was built using the most valuable 7 clinical, biological, image, and radiomics features for predicting the hypermetabolic mediastinal–hilar LNs status in the training set (Fig. 2c). The Pre-scores of each model for each patient were calculated using the following formulas:

Fig. 2
figure 2

Features selection for prediction models using Lasso algorithm using tenfold cross-validation in the training set. The X-axis showed log (λ), and the Y-axis showed the model misclassification rate. The dotted vertical lines were drawn at the optimal values using the minimum criteria and the 1-se criteria, respectively. The 4, 19, and 7 features with non-zero coefficients were initially indicated for CBI Model (a), Rad Model (b), and CBR (c), respectively, according to the 1-se criteria

Pre-score (LN SUVmax) = − 2.79 + 0.57*LN SUVmax.

Pre-score (LN Enlarged) =  − 0.62 + 1.84*LN Enlarged (Negative: 0, Positive: 1).

Pre-score (LN_PET/CT) =  − 2.68 + 0.50* LN SUVmax + 0.56* LN Enlarged.

Pre-score (CBI Model) = 1.70–0.07*Age + 0.03*Weight (Kg) + 0.04*CA153 (U/mL) + 0.65*CEA status (Negative: 0, Positive: 1).

Pre-score (Rad Model) =  − 381.80 + 4.71e−09*PET_WLLH_GLCM_Cluster Shade + 312.10*PET_WHLL_GLRLM_Short Run Emphasis + 5.31* PET_WHLH_GLDM_Large Dependence Low Gray Level Emphasis − 2.23e–10*PET_WHHL_GLCM_Cluster Prominence + 71.66*PET_WHHL_GLCM_Informational Measure of Correlation 2 (Imc2) − 4.26* CT_shape_Surface Volume Ratio (SVR) + 0.03*CT _first order_90 Percentile − 0.04*CT_GLDM_Large Dependence Low Gray Level Emphasis + 0.33*CT_WLHH_first order_Median − 3.00*CT_WHHL_GLCM_MCC.

Pre-score (CBR Model) =  − 88.38 + 0.57*LN SUVmax + 0.57*LN Enlarged − 0.12*Age + 0.91*CEA status + 90.95*PET_WHHL_GLCM_Imc2 − 2.06* CT_ shape_SVR + 0.38*CT_WLLH_GLDM_Dependence Entropy.

LN + patients generally had higher Pre-scores in all prediction models than those in LN− patients (p < 0.05, Fig. 3).

Fig. 3
figure 3

Violin plot of 6 prediction models for LN− (blue) and LN + (red) patients in training set (a). The black line running up and down through the violin diagram represented the range from the smallest non-outlier value to the largest non-outlier value. The waterfall plot of the CBR Model was used to visualize the distribution of the Pre-scores of individual LN− and LN + patients (b)

Prediction models evaluation and comparison

The performance of these 6 prediction models to discriminate LN− from LN + is shown in Fig. 4a, b. All the prediction models were significantly associated with the hypermetabolic mediastinal–hilar LN status, while the DeLong test showed that the CBR Model, which consisted of 1 clinical factor, 1 biological marker, 2 conventional PET/CT image features, 1 PET and 2 CT radiomics parameters, presented the lowest FPR and optimal discrimination among these models in both the training set (FPR of 12.82%, AUC of 0.90, and accuracy of 84.07%) and test set (FPR of 6.45%, AUC of 0.89, and accuracy of 82.05%) (both p < 0.05) (Table 3).

Fig. 4
figure 4

Receiver-operating characteristic analysis of models for predicting LNs status in the training set (a) and (b), respectively. Decision curve analysis of prediction models in the training set (c). The X-axis represented the threshold probability that was where the expected benefit of treatment was equal to the expected benefit of avoiding treatment. The Y-axis represented the net benefit. The grey and black line represented the hypothesis that all lung cancer patients were LN + and LN−, respectively

Table 3 Performance of models for predicting hypermetabolic mediastinal–hilar LNs status in lung cancer

Compared to the PET/CT, the CBR Model’s FPR decreased by 9.08%, while the AUC and accuracy separately increased by 8.43% and 11.69% in the training set. In the test set, the FPR of CBR Model was consistent with that of PET/CT, but its AUC and accuracy were significantly higher than PET/CT, with an increase of 17.11% and 16.37%, respectively.

The DCA also showed that the CBR Model was the most reliable clinical treatment tool for predicting the LN status in lung cancer when the threshold probability was greater than 18% (Fig. 4c).

Individualized nomogram development and evaluation

According to the above results, an individualized nomogram based on the CBR Model’s risk factors was successfully developed for the visualization. The nomogram’s score and probability threshold for predicting LNM were 0.19 and 0.55, respectively (Fig. 5a). The calibration curves demonstrated a good agreement between the prediction of the LNM probability by the nomogram and the actual observation in both the training and test sets (Fig. 5b, c). Then, physicians could perform a pretherapeutic individualized prediction of the LNM risk to develop more reasonable and effective treatment plans for patients (Fig. 6).

Fig. 5
figure 5

The nomogram was developed using the risk factors of CBR Model in the training set (a). The probability of each predictor could be converted into scores according to the first scale at the top of the nomogram. After adding up the corresponding prediction probability at the bottom of the nomogram was the risk of LNM. The nomogram’s score and probability threshold for predicting LNM were 0.19 and 0.55, respectively. Calibration curves showed the actual probability corresponded closely to the prediction of nomogram in training (b) and test (c) sets, respectively

Fig. 6
figure 6

Example of nomogram clinical use. The preoperative whole-body PET/CT of this 61-year-old female with negative CEA status indicated the primary tumor was located in the right upper lobe (purple circle), with hypermetabolic mediastinal–hilar LNs (red square and arrows), and without distant metastasis (ac). After completing the radiomics process (b, c) and applying the nomogram (d), the LNM probability of this patient was 0.12 (< 0.55), indicating a low risk of LNM. The pathological result of lobectomy with SND confirmed the negative status of mediastinal–hilar LN (0/7). The nomogram could improve the accuracy of hypermetabolic mediastinal–hilar LNs evaluation in lung cancer

Discussion

In this study, we successfully explored a CBR nomogram incorporating multi-scale features, which held a more excellent performance in non-invasively N staging for lung cancer patients with hypermetabolic mediastinal–hilar LNs than conventional PET/CT, thereby greatly reducing the risk of overestimation and assisting for precision treatment.

Growing evidence suggests that radiomics integrated general CBI features achieve higher diagnostic efficacy than using them alone [30,31,32]. Thus, the clinico-biological factors of patients, PET/CT radiomics data of primary tumors, and image features of hypermetabolic mediastinal–hilar LNs were all applied to develop the prediction model in this study. Furthermore, on the basis of successfully screening features and establishing excellent models using a single ML algorithm (Lasso) in our previous study [17], we applied a combination of ML algorithms (mRMR + Lasso) to ensure the predictive performance of the model while minimizing the number of selected features to improve the model interpretability in the present study. The prediction performance of CBR Model with only 7 features established in this study was comparable to that of Combined Model with 14 features established in previous study. The result confirmed the feasibility of this approach.

Accurately identifying FP LNs is more challenging than assessing all LNs in lung cancer. To the best of our knowledge, only Ouyang et al. attempted a similar study using ML strategy [16]. They found that PET radiomics extracted from hypermetabolic mediastinal–hilar LNs integrated with CT image features could identify true and false positives of LNM in patients with non-small cell lung cancer with the highest AUC of 0.87. However, they mainly focused the role of LNs and did not concern the effect of the primary tumor. In this study, the CBR Model was developed using both the characteristics of the tumor and LNs, and validated to have more excellent potential in differentiating LN− (pN0) from LN + (pN1-2) patients in lung cancer (AUCs of 0.90 and 0.89 in the training and test sets, respectively). Moreover, the incorporating PET radiomics feature “WHHL_GLCM_Imc2” for characterizing tumor texture heterogeneity and CT radiomics feature “shape_SVR” for measuring tumor shape of CBR Model have also been selected in the Radiomics Model, indicating the robustness of these two features with high repeatability and reproducibility, which has also been confirmed in previous researches [33,34,35]. LN + patients generally had higher WHHL_GLCM_Imc2 and lower shape_SVR values than those in LN− patients (p < 0.05), suggesting that these two features were related to the tumor invasiveness, leading to a higher risk of LNM.

The accuracy of histologic staging of hypermetabolic LNs is also related to the clinico-biological-image factors. Patients’ age has been proven to be an independent risk factor, which means younger patients were more prone to having LNM, consistent with the positive status of pretherapeutic serum CEA [36]. Compared to conventional image tools, PET/CT is a significantly more accuracy non-invasive diagnostic procedure for LN staging in lung cancer, although it also has FP FDG-uptake in benign LNs [37]. Metastatic LNs generally have higher FDG uptake and bigger size than FP LNs (p < 0.05). However, it was difficult to achieve satisfactory prediction performance only using these conventional image parameters with the relatively higher AUC of 0.83. The efficiency of non-invasive LN prediction would increase by 8.43% in the case of CBR Model application. Simultaneously, the FPR of CBR Model for hypermetabolic mediastinal–hilar LNs evaluation was also outstanding with a decrease of 32.53 ~ 41.73% than previous clinical trials with the FPR of 19 ~ 22% [11, 12].

Furthermore, we generated an integrated nomogram on the basis of the CBR Model for facilitating its use in clinical practice. Then, the physicians could perform a preoperative individualized prediction of the LNM risk with this easy-to-use scoring tool, which could provide a non-invasive and accurate approach for patients who were unwilling or unable to undergo biopsy to develop more reasonable and effective treatment plans. The DCA also showed the nomogram added more benefit than either the treat-all-patients as LN− or the treat-all-patients as LN + , which was more valuable for the current trend toward personalized medicine [38, 39].

This study still has some limitations. Firstly, this retrospective study was conducted in a single center, which was the main cause of the decrease in RQS and also led to patient selection bias. It is necessary to design another prospective, multi-center, and large-cohort study to further validate the performance and generalization ability of the CBR Model in the real-world clinical settings [40]. Secondly, there is no significant statistical difference in primary tumor size, histologic type and metabolic parameters between LN− and LN + patients, consistent with the report [41]. This may be related to the weakened role of primary tumor in cases the included patients with both hypermetabolic tumor and LNs, equivalent to subgroup analysis. The sample size of patients, including ones with FDG-negative LNs, will be expanded to verify this hypothesis in further works. Thirdly, the radiomics analysis in this study only applied for primary tumor with semi-automatic segmentation, not LNs. This is due to the fact that larger tumors are more suitable for VOI segmentation that contribute to the robustness of study. The automatic segmentation approaches [42, 43] suitable for full volume VOI will be continually explored in future work.

In conclusion, an integrated CBR nomogram was successfully developed and validated in our study, which could further reduce the FPR and improve the accuracy of hypermetabolic mediastinal–hilar LNs evaluation in lung cancer than conventional PET/CT, thereby greatly reducing the risk of overestimation and assisting for precision treatment.