Introduction

Lung cancer is considered the leading cause of cancer death worldwide. In patients with non-small cell lung cancer (NSCLC), nodal staging (N staging) is usually closely associated with the prediction of prognosis [1,2,3,4] and clinical therapeutic planning [5, 6]. Previous studies [7,8,9,10] have found that N staging after induction therapy for stage IIIA lung cancer may determine patient survival. For advanced patients with positive N2, preoperative downstaging from N2 to N0 can improve the 5-year survival rate for up to 35.8%, while those with a persistent tumor in their lymph nodes (LNs) (N1 and N2) tend to have a 5-year survival of 9% [8]. Therefore, accurate preoperative clinical N staging and restaging is the key to clinically individualized decision making, which may enable patients to receive more effective treatment and improve prognosis.

18F-Fluorodeoxyglucose positron-emission tomography/computed tomography (18F-FDG PET/CT) is considered to be the most reliable functional imaging method for assessing the status of mediastinal and hilar LNs [11, 12]. This non-invasive method can simultaneously provide metabolic and anatomical information. Metastatic LN is usually accompanied by tumor cells’ active metabolism of 18F-FDG and lymphadenopathy, increased standard uptake value (SUV) [13]. It often appears as a bright light spot on PET images and enlarged in size as round lesions on CT images, mainly in short-axis diameter (SAD) [14]. 18F-FDG PET has become a standard imaging procedure for N and M staging in NSCLC. However, due to the complications, such as inflammatory, granulomatous, and infectious diseases, highly sensitive PET may detect false-positive LNs [15,16,17,18]. Low spatial and organizational resolution of PET images, the small size of LNs, and inevitably cardiac and breathing motion artifacts during the long duration of the PET imaging process result in a missed diagnosis of small-sized metastatic LNs. Moreover, CT is the most widely used clinical examination. SAD ≥ 1 cm is often used as the diagnostic criterion. However, normal, inflammatory proliferative, and metastatic LNs partly overlap in size, and normal LNs vary in size and shape [19, 20], which results in a missed diagnosis of micro-metastatic LNs and misdiagnosis of inflammatory hyperplastic LNs, leading to an increase in the false-negative rate. Therefore, there is an urgent need to develop a more effective method to predict LN status preoperatively.

Radiomics is a medical image (PET, CT, and MRI) method based on high-throughput extraction of quantitative image features, which is used to predict underlying tumor biology and behavior [21, 22]. Over the last decade, a variety of prediction models [23,24,25,26] have been established using primary tumor as the region of interest (ROI) based on the changes in the tumor-induced microenvironment [27,28,29], which can be used to classify LN metastasis in N or N+, thus providing a qualitative diagnosis of LN metastasis of cancer, including lung cancer, colorectal cancer, breast cancer, and others fields. However, with the promotion of precision medicine, qualitative diagnosis of LN metastasis still does not meet the requirement for individualized treatment; thus, quantitative N staging has become of urgent importance in clinical practice.

To the best of our knowledge, so far, there is a rare correlated study that investigated high-latitude radiomics features and maximum standard uptake value (SUVmax) based on LNs for prediction of N staging in NSCLC. Therefore, the objective of our study was to establish and validate a PET/CT nomogram that incorporates the metabolic information (SUVmax) and structural information (radiomics features) of LNs for preoperative quantitative estimation of LN metastasis that could further help clinicians in making individualized therapeutic decisions and predict prognosis.

Materials and methods

Study design

The institutional review board has approved this single-center study. A comprehensive workflow diagram of this study is presented in Fig. 1.

Fig. 1
figure 1

Workflow diagram of the study. In the first row, the blue solid border figures represent the SUVmax of LNs measured on PET image; the blue dashed border plot represents SUVmax single factor prediction model established with a cutoff value of 2.5. In the second row, the green solid border figures represent the short-axis diameter (SAD) of LNs measured on CT image; the green dashed border plot represents the SAD single-factor prediction model established with a cutoff value 10.00 mm. In the third row, the red solid border figures represent that VOIs segmented based on LNs; the red dashed border plot at the bottom the second column represents the radiomic features of LNs including first-order, shape, and high-order features, and the red dashed border plot in the third column represents the radiomics model (Rad-Score) built by LN status-related radiomics features using multivariable logistic regression analysis. The black dashed border plot at the bottom of the third column represents the PET/CT nomogram (Int-Score) incorporated SUVmax and Rad-Score by multi-variable logistic regression analysis. In the last column, these three plots represent the receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves in the testing cohort in order from top to bottom

Patients

A total of 124 NSCLC patients (male:female = 60:64; average age, 58.5 ± 0.73 years; range, 33–74 years) were retrospectively enrolled in this study between January 2017 and May 2019. The inclusion criteria were as follows: (1) NCLSC pathologically confirmed by surgery or biopsy; (2) lymphadenectomy or lymph node biopsies were performed, and the pathological reports were obtained; (3) preoperative 18F-FDG PET/CT scans were performed (the interval between PET/CT scans and surgery or biopsy was less than 2 weeks). The exclusion criteria were as follows: (1) preoperative history of other malignancies besides lung cancer; (2) poor image quality.

The patients who satisfied the inclusion criteria were identified for the whole patient cohort, as shown in Fig. 2a. A total of 110 patients underwent surgical resection; LN pathology results from supra-clavicular LN biopsy pathway were obtained from 14 patients. There were 93 (75.0%) patients with adenocarcinoma, followed by squamous cell carcinoma (n = 29, 23.4%), and few other histological subtypes of lung cancer (n = 2, 1.6%). There were 63 patients with N0 stage, 27 patients with N1 stage, 20 patients with N2 stage, and 14 patients with N3 stage. These patients were divided into the LN(+) group and LN(−) group based on pathological reports.

Fig. 2
figure 2

Flow diagram of the selection of patients (a) and LNs (b)

The LN enrolled flowchart is shown in Fig. 2b. The patients who underwent surgery had relatively early TNM pathological stages (Supplemental Table S1). The inclusion criteria were as follows: (1) according to the pathology report defined with the N staging standard of the eighth edition [30] to locate and select surgical or supra-clavicular biopsy LN station; (2) 1–3 LNs with the maximum SUVmax were selected in the LN station; (3) the number and size of LNs selected in each LN station were limited by the number and size of actual pathological results. The exclusion criteria were as follows: the negative LN stations of LN(+) patients were excluded to ensure as much as possible the LNs selected are truly pathologically positive.

18F-FDG PET/CT acquisition and reconstruction

A Biograph 16 HR PET/CT scanner (Siemens Healthineers) was used in the single-center study. Briefly, patients were fasted for more than 6 h; and before imaging, their blood glucose levels needed to be less than 8 mmol/L. 18F-FDG (Sumitomo HM-12, pH 4–8, radioactive purity > 95%, radioactive concentration > 370 MBq/ml) was then intravenously injected at a dose of 3.7 MBq/kg; the image acquisition was started 1 h later. CT image scan was preformed (tube voltage 120 kV, tube current 50 mAs, rotational speed 0.5 s/r, FOV 812 mm × 812 mm, 512 × 512 matrix, slice thickness 4 mm) from the vertex to the proximal legs. Low-dose CT was used for attenuation correction, and a standard B19f soft-tissue reconstruction kernel was used for CT images. Consequently, PET image scans (8-bed positions at 2.5 min each, FOV 812 mm × 812 mm, 144 × 144 matrix, slice thickness 4 mm) were acquired from the vertex to the proximal legs with correction for dead time, scatter, and decay. PET images were iteratively reconstructed in 3D mode using ordered subset expectation maximization (2 iterations, 24 subsets, and Gaussian filtering). CT and PET images were reconstructed at a slice thickness of 2.0 mm and an increment of 1.0 mm.

Measurement of CT radiomics features, SAD, and SUVmax

The VOIs were semi-automatically segmented using imaging post-processing software (Intellispace Discovery, Philips Healthcare) (Supplemental Appendix A-1). We segmented the LNs and measured SAD on the largest cross-sectional area of LNs. 18F-FDG uptake was evaluated using SUVmax.

The radiomics features were extracted using a plug-in (pyradiomics 2.1) on the IntelliSpace Discovery platform. A total of 1472 features, including shape-based features, first-order histogram features, high-order textural features, and transformed features, were obtained based on the 3D VOI (Supplemental Appendix B). The image biomarker standardization initiative (IBSI) was regarded as reference, and was taken into consideration in the radiomics features extraction and selection procedure [31].

In our department, the clinical standard for evaluating LNs on PET imaging was to classify them as PET positive when the SUVmax is ≥ 2.5, and PET negative when the SUVmax is < 2.5 [32].

Radiomics model

With the abundance of exceptionally high-dimensional data, the Minimum-Redundancy Maximum-Relevance (mRMR) algorithm [33] and the Least Absolute Shrinkage and Selection Operator (LASSO) method [34] were used to select the most useful and strongest features in the training cohort. The principle of the mRMR algorithm is to identify the features that were highly correlated with the status of LN but had a minimum correlation with other features in order to reduce overfitting of the model. The LASSO method is suitable for the regression of high-dimensional data and can be used to select useful features with the non-zero coefficient. A radiomics score (Rad-Score) was calculated for each LN via linear combination of the selected features weighted by their respective coefficients. The performance of the built radiomics model (Rad-Score) was evaluated with discrimination, calibration, and clinical application in both training and testing cohorts.

PET/CT nomogram

SUVmax and Rad-Score were applied to develop an integrated estimative model (PET/CT nomogram) for the status of LN in the training cohort using multivariable logistic regression analysis. Similarly, an integrated score (Int-Score) was calculated for each LN using a linear combination of the selected features weighted by their coefficients. The performance of the PET/CT nomogram (Int-Score) was evaluated with discrimination, calibration, and clinical application in both training and testing cohorts.

Statistical analysis

All statistical analyses were performed using R Studio software (version 1.2.1335). mRMR algorithm was performed using the “mRMRe” package. LASSO regression was performed using the “glmnet” package. p values < 0.05 (two-sided) were considered to be statistically significant. The differences in LN-status related features between the malignant group and benign group in both training and testing cohorts were assessed by the independent t test or Mann-Whitney U test, according to the distribution type of the data. The chi-squared testing was used to compare the significance of the differences between categorical variables. The performance of the models was evaluated with discrimination, calibration, and clinical application.

Discrimination

Receiver operating characteristic (ROC) curves were plotted to assess the diagnostic performance of SUVmax, SAD, the Rad-Score and the Int-Score in discriminating malignant from benign LNs in training and testing cohorts. The optimal cutoff of the biomarkers calculated from the training cohort was applied in the testing cohort. The bar chart was plotted to intuitively display the discrimination performance. DeLong testing was used to compare the area under ROC curves (AUC) between training and testing cohorts.

Calibration

Calibration curves were plotted in both training and testing cohorts to explore the agreement between the observed outcome frequencies and predicted probabilities of the model. The Hosmer-Lemeshow testing was used to determine the goodness of fit of the models, and p values of more than 0.05 were considered well-calibrated.

Clinical application

Decision curve analysis (DCA) was used to assess the clinical usefulness of the built models by quantifying the net benefits at different threshold probabilities in the testing cohort.

Results

Clinical characteristics

A total of 263 LNs from 124 patients were identified in the present study and were further assigned to either the training cohort or testing cohort. Of the 263 LNs, 70% (N = 185) were assigned to the training cohort by stratified sampling; 89 LNs were malignant, and 96 were benign. The remaining 30% (N = 78) were selected for the testing cohort; 38 were malignant and 40 were benign. There was no statistically significant difference in the clinical characteristics between the LN(+) group and LN(−) group. Besides, there was no significant difference in the clinical characteristics between both cohorts, as shown in Table 1.

Table 1 Summary of characteristics in training and testing cohorts

SUVmax and SAD

The SUVmax (p < 0.001) and SAD (p < 0.001) were statistically different between the LN(+) group and LN(−) group. There was no significant difference between both cohorts (Table 1).

Int-Score

A multivariate logistic regression analysis was conducted to integrate the Rad-Score (Supplemental Appendix C), SUVmax, and SAD. There was a significant difference in Rad-Score, SUVmax, and SAD between LN(+) and LN(−) groups (p ˂ 0.001; Table 1). Considering the SAD redundancy, only SUVmax and Rad-Score were used (named Int-Score). Calculation of the Int-Score was performed using the formula: Int-Score = 5.40 × Rad-Score + 3.64 × SUVmax + (−3.31). To ensure that the model was easy to use, we presented it as a nomogram (Fig. S4, Appendix A-2). Int-Score was significantly higher in LN(+) group (Table 1).

Performance of the model

Discrimination

ROC curves of SUVmax, SAD, Rad-Score, and Int-Score were plotted to assess the diagnostic performance, as shown in Fig. S5 (Appendix A-2). The diagnostic efficiency is shown in Table 2. SUVmax had an AUC of 0.828 with 95% confidence interval (CI) 0.739–0.917. SAD had an AUC of 0.729 with 95% CI 0.619–0.839. The AUC (95%CI), sensitivity, and specificity of the Int-Score were 0.872 (0.797–0.946), 0.895, and 0.625, respectively. The bar chart was used to intuitively display the discrimination performance of Int-Score, as shown in Fig. S6 (Appendix A-2). There was no significant difference in ROC curves of Int-Score (DeLong test, p = 0.836).

Table 2 Performance of models

Calibration

Calibration curve is presented in Fig. S8 (Appendix A-2). The Hosmer-Lemeshow test showed no significant difference (p > 0.05) in the training cohort, demonstrating a good fit.

Clinical application

DCA is presented in Fig. 3. DCA showed that using SUVmax, SAD, Rad-Score, and Int-Score increases more benefit than the treat all project or the treat none project if the threshold probability of a patient or doctor is ˃ 10%. Compared to the other methods, Int-Score had a higher net benefit. As well as little overlaps within a range from 0.1 to 1.0, the curve of Int-Score is always at the top right.

Fig. 3
figure 3

DCA for SUVmax, SAD, Rad-Score, and Int-Score in training (left) and testing (right) cohorts. The y-axis measures the net benefit. The y-axis measures the threshold probability. The solid red curve represents Int-Score. The blue dotted curve represents Rad-score. The yellow dotted curve represents SUVmax. The green dotted curve represents a short-axis diameter (SAD). The gray curve represents the assumption that all LN metastases. The black line represents the assumption that no LN metastases. The net benefit [40] was calculated by subtracting the proportion of all LNs, which are false positive from the proportion which is true positive, weighting by the relative harm of forgoing treatment compared with the negative consequences of unnecessary treatment

Discussion

In the present research, a radiomics model (Rad-Score) based on CT which aimed to noninvasively predict LN status was established, compared to, and combined with SUVmax (functional information) and SAD (measurable anatomical information), after which a PET/CT nomogram was built and validated. The PET/CT nomogram, here we called Int-Score, provides a more accurate preoperative estimation of LN status in patients with NSCLC, and, in turn, may guide N staging and further clinical decisions.

PET/CT is a widely used non-invasive imaging tool for evaluating the TNM stage in NSCLC patients. When using traditional diagnostic procedures, we comprehensively judge LN status by measuring SAD and attenuation, observing morphology and edges on CT image, and measuring SUV on PET image. Yet, artificially judging the change in density and evaluating the morphological edge of the lesion, many subjective outcomes would inevitably appear and a lot of information that cannot be visually perceived may be missed. In this study, we quantitatively extracted the information that is difficult to be visually perceived of LNs using radiomics, and accurately described the attenuation variation and shape difference of LNs [21, 22, 35], thus avoiding the one-sidedness of subjective judgments and visual differences to some extent. By using a nomogram (Int-Score) that incorporates the function information (SUVmax) and structural information (radiomics features), we provided a scale for the preoperative estimation of LN status. Previously, Billé A et al [16] suggested an AUC for the ratio of LN to primary tumor SUVmax multiplied by the maximal diameter of tumor, the ratio of LN to primary tumor SUVmax, and SUVmax of 0.709, 0.590, and 0.673, respectively. In this study, the Int-Score showed moderate improvement of the efficiency with AUC of 0.881 (95% CI, 0.834–0.928) in the training cohort and AUC of 0.872 (95% CI, 0.797–0.946) in the testing cohort. Moreover, compared with other studies [35], which examined texture features and multi-resolution histograms from 18F-FDG PET/CT images, Int-Score had better diagnostic efficacy. In addition, more detailed radiomics features were extracted to avoid the deviation of results caused by incomplete information. Previously, Bayanati H et al [36] combined three texture features and three shape-based features to predict LN metastasis with an accuracy of 0.71 and AUC of 0.87. By contrast, we examined the generalization and diagnostic efficacy of the model with an independent validation group, incorporating metabolic information, and accurately assessing the global morphology of focal using 3D VOI.

In this study, 20 patients of NSCLC with N2 were evaluated as clinical N1 or even N0 and underwent surgical resection. The sensitivity of PET/CT nomogram for N2 disease was 85% (17/20). Endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) is an invasive diagnostic tool for N2 diseases, which also can help improve the preoperative staging of these patients [37], but the puncture path is often restricted by anatomical location. Therefore, PET/CT nomogram seems to be more flexible and non-invasive in clinical use than minimally invasive EBUS-TBNA staging for patients with uncertain N2.

This study also has a few limitations. Firstly, the sample size was relatively small for radiomics analysis; however, we used a sample size estimation method [38] to prove that the testing cohort exceeded the minimum sample size required. A future multicenter study with an external validation dataset is necessary to improve and generalize our model. Secondly, there was some bias in the selection of LNs, and we may choose false-positive LNs with higher SUVmax, leading to a high false-positive rate in the overall data. Nevertheless, no significant differences were found in the distribution of SUVmax and SAD values between training and testing cohorts. Thirdly, our study did not include clinical information, such as CEA, CA125, and CA153 levels. Yang X et al [39] found that these clinical parameters were not independent predictors for LN metastasis; yet, a comprehensive assessment is still needed for clinical decisions.

Conclusion

This study analyzed radiomics features extracted from LN CT imaging, and also developed and validated a PET/CT nomogram, which incorporated Rad-score and SUVmax for preoperative estimation of N staging in NSCLC. The PET/CT nomogram, which is a non-invasive predictive tool that improves the diagnostic accuracy, specificity, and sensitivity of N staging compared to SUVmax and SAD alone, can be used to better assist clinicians in making individual treatment decisions.