Development and validation of a preoperative CT-based radiomic nomogram to predict pathology invasiveness in patients with a solitary pulmonary nodule: a machine learning approach, multicenter, diagnostic study

Objectives
 To develop and validate a preoperative CT-based nomogram combined with radiomic and clinical–radiological signatures to distinguish preinvasive lesions from pulmonary invasive lesions. Methods This was a retrospective, diagnostic study conducted from August 1, 2018, to May 1, 2020, at three centers. Patients with a solitary pulmonary nodule were enrolled in the GDPH center and were divided into two groups (7:3) randomly: development (n = 149) and internal validation (n = 54). The SYSMH center and the ZSLC Center formed an external validation cohort of 170 patients. The least absolute shrinkage and selection operator (LASSO) algorithm and logistic regression analysis were used to feature signatures and transform them into models. Results The study comprised 373 individuals from three independent centers (female: 225/373, 60.3%; median [IQR] age, 57.0 [48.0–65.0] years). The AUCs for the combined radiomic signature selected from the nodular area and the perinodular area were 0.93, 0.91, and 0.90 in the three cohorts. The nomogram combining the clinical and combined radiomic signatures could accurately predict interstitial invasion in patients with a solitary pulmonary nodule (AUC, 0.94, 0.90, 0.92) in the three cohorts, respectively. The radiomic nomogram outperformed any clinical or radiomic signature in terms of clinical predictive abilities, according to a decision curve analysis and the Akaike information criteria. Conclusions This study demonstrated that a nomogram constructed by identified clinical–radiological signatures and combined radiomic signatures has the potential to precisely predict pathology invasiveness. Key Points • The radiomic signature from the perinodular area has the potential to predict pathology invasiveness of the solitary pulmonary nodule. • The new radiomic nomogram was useful in clinical decision-making associated with personalized surgical intervention and therapeutic regimen selection in patients with early-stage non-small-cell lung cancer. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08268-z.


Introduction
Low-dose computed tomography (LDCT) screening has been shown to reduce lung cancer mortality in a high-risk group [1]. In lung cancer screening trials, the nodule prevalence (%) is 33% on average, while only 1.4% of the detected nodules are diagnosed as lung cancer. How to select patients with malignant pulmonary nodules for timely intervention has become a major challenge.
Numerous appropriate follow-up protocols are utilized to manage these pulmonary nodules detected by CT screening. For indeterminate nodules, the Fleischner Society guidelines [2] and the Lung CT Screening Reporting and Data System (Lung-RADS) prescribe a CT screening after a particular time interval based on nodule size. However, recommendations from the British Thoracic Society (BTS) guidelines [3] reduce the need for follow-up imaging for patients with nodules of < 5 mm diameter or < 80 mm 3 , and a reduction of the follow-up period to 1 year for solid pulmonary nodules (SPN). However, the awareness of recommendations and the management choices in clinical practice have exhibited heterogeneity between radiologists and pulmonologists [4].
Differentiating pathology types between a pulmonary precancerous lesion (i.e., atypical adenomatous hyperplasia and adenocarcinoma in situ (AAH/AIS)) and early-stage invasive adenocarcinoma (IAC) leads to vastly divergent prognoses after standard thoracic surgery [5][6][7]. Minimally invasive adenocarcinoma (MIA) is a small, solitary adenocarcinoma with mostly lepidic growth and an invasion smaller than 5 mm at its largest dimension at any one point, whereas invasive adenocarcinoma involves growths larger than 5 mm [5]. It is challenging to identify the pathological nature of suspected malignant pulmonary nodules through visual assessment with CT scans because of the considerable overlap in morphologic features between them, such as pleural tags, spiculation, and lobulation [8].
Models including machine learning and artificial neural networks have been applied in lung cancer diagnosis [9][10][11][12], and excellent identification efficiency and accuracy have been achieved according to internal data. However, these tools suffer from limited external validity, overfitting, and unexplainable results [13]. Radiomics provides a noninvasive approach and is more promising in its sensitivity, selectivity, and experimental feasibility for disease diagnosis, tumor staging, and patient prognosis [14][15][16][17].
Tumor-infiltrating lymphocytes and tumor-associated macrophages were observed to be distributed at the edge of the invasion lesions (ILs) in the pathological map [18] and be associated with the likelihood of metastasis [19]. The perinodular parenchyma may be considered to represent the tumor microenvironment and has biological importance in defining tumor behavior, including cell migration, stromal inflammation, immune infiltration, and vascularization [20,21]. We assumed that radiomic signatures from perinodular areas might provide a preoperative reference for the accurate prediction of pathological invasiveness in solitary pulmonary nodules and for guiding surgical methods and the extent of resection.
Herein, we developed and validated a nomogram based on clinical-radiological and radiomic signatures from nodular and perinodular areas for preoperative prediction of pathological invasiveness in patients with a solitary pulmonary nodule using data from a multicenter study.

Study design and patients
In this multicenter, retrospective, diagnostic study, patients with a solitary pulmonary nodule were recruited from three independent centers (Guangdong Provincial People's Hospital, Guangdong Province, China, named as the GDPH center; Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangdong Province, China, named as the SYSMH center; Zhoushan Lung Cancer Institution, Zhejiang Province, China, named as the ZSLC center) during the period of August 1, 2018, to May 1, 2020. Information about the three institutions that participated in this study is shown in eTable 1.
The inclusion and exclusion criteria were applied at the three centers, and 373 patients from the 571 recruited patients were finally enrolled after the application of the exclusion criteria. The patients (N = 203) enrolled at the GDPH center from March 1, 2015, to December 31, 2019, were divided into two cohorts: The development cohort comprised 149 patients (73.4%) randomly selected by a computer algorithm in a ratio of 7:3, and the validation cohort comprised 54 patients (26.6%). The SYSMH center (N = 63) and the ZSLC center (N = 107) cooperatively formed an external validation cohort of 170 patients from December 18, 2012, to July 30, 2019, and January 1, 2019, to December 30, 2019. Figure 1 presents the exclusion criteria and the patient recruitment process.
The following were the criteria for inclusion: (I) patients ≥ 18 years of age who underwent CT screening and were diagnosed with SPN for the first time, (II) patients who underwent preoperative enhanced chest CT scans (within 3 months), (III) pathologically confirmed precancerous lesions (AAH/AIS) or early-stage lung adenocarcinoma (MIA/IAC), and (IV) lesions smaller than 30 mm without distant metastases, or lymph node involvement.
The exclusion criteria were (I) preoperative therapy (neoadjuvant chemotherapy or radiotherapy), (II) a history of previous lung tumor diseases or (III) past/present history of other malignant tumors, and (IV) incomplete clinical information or unavailable standard enhanced chest imaging data.
Because of the retrospective nature of this study, the institutional review board waived informed patient consent. The study protocol was approved by academic ethics committees and conducted according to the Declaration of Helsinki and Good Clinical Practice guidelines and was registered with ClinicalTrials.gov (registration number NCT04452058).

Image review and feature extraction
The Picture Archiving and Communication System (PACS) was used to retrieve preoperative CT images from three centers, and all researchers assessed the initial screening of image data. The CT protocol is described in detail in eTable 2.
The regions of interest (ROIs) in the pulmonary nodular area (ROI-1) were all manually refined by one researcher (W.H.L.) slice by slice in three orthogonal planes (axial, coronal, and sagittal) under the guidance of two senior radiologists with 13 years (S.Y.W.) and 17 years (G.Y.W.) of experience in chest CT interpretation. Other irrelevant components, such as air, peripheral vessels, normal tissue, ribs, pleura, and surrounding organs, were removed by the researchers to avoid interference. The 3D Slicer program was used to semi-automatically segment the perinodular area (ROI-2, including the perinodular parenchymal representing a 5-mm extension outward) (https:// www. slicer. org/, version 4.10.2) [22]. The disagreement was resolved by discussion among senior researchers, including two radiologists and three thoracic surgeons (Q.G.B., Z.H.Y., and Z.D.K.).
All assessors were blind to the final pathology diagnostics which were reviewed by a senior pathologist (S.Y.) using the 2017 8th TNM staging system and the 2011 International Association for the Study of Lung Cancer/ American Thoracic Society/European Respiratory Society (IASLC/ATS/ERS) classification for pathological staging and pathological grading after thoracic surgery, respectively [5,23].
After the ROI-1 and ROI-2 were segmented and reconstructed, the volume of interest (VOI-1 and VOI-2) images (DICOM format) were transferred to the SlicerRadiomics code using an in-house texture extraction platform based on the Python package PyRadiomics.
There are 1722 quantitative radiomic features in all, which include first-order statistics, shape, gray-level cooccurrence matrix (GLCM), gray-level size zone matrix (GLSZM), gray-level dependence matrix (GLDM), and neighborhood gray-tone difference matrix (NGTDM), which were extracted from two segmented regions (VOI-1 and VOI-2). These features were used for further analysis and regression modeling. The same image segmentation process and feature extraction were conducted among 30 SPNs in the cohorts after 3 months. More information about the standard radiomic workflow and model construction is shown in Fig. 2.

Development of the radiomic signatures
High-dimensional imaging data is featured from the two VOIs by the LASSO algorithm (eFigure 1 in the Supplementary materials). By linear combination, the most useful predicted combination of data was used to create two radiomic signatures (RS1 for VOI-1 and RS2 for VOI-2).
The final radiomic signature was combined with two radiomic signatures obtained by logistics regression. The tenfold cross-validation was implemented to avoid overfitting. Based on the combined radiomic signature (RS-C), the radiomic score was calculated and presented in the development and two validation cohorts.

Development of the clinical-radiological signature and nomogram
Baseline clinical data were obtained from medical records. The researchers also recorded several radiological feature descriptors of each pulmonary nodule, such as the size, number, location, border, and internal characteristics (e.g., density and consolidation tumor ratio (CTR)), and any disagreement was resolved through consultation. The densities of pulmonary nodules were described using terminology derived from the BTS guidelines [3].
After the analysis, significant risk factors were used to build a clinical-radiological signature. This signature was combined with the final radiomic signature (RS-C) to form a nomogram using logistic regression.

Statistical analysis
Normalization was performed on radiomic features using a z-score transformation. To investigate differences in categorical variables, the chi-square test was used. The differences in continuous variables between PILs and early-stage pulmonary interstitium ILs were compared using a two-sample t test.
Univariate logistic regression analysis was used to select the independent clinical and radiological prognostic factors in the internal cohort. Significant risk factors were then introduced into stepwise logistical regression analyses to build a clinical-radiological signature. To visualize the results of the multivariable logistic regression analysis for risk stratification of pathological invasiveness, a nomogram based on both the clinical-radiological signature and the combination radiomic signature was created. Intrarater agreement in radiomic features between two times ROI segmentation was assessed using the two-way random ICC model.
To evaluate the performance of models, a receiver operating characteristic (ROC) analysis was done, and the accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were calculated. The DeLong test compared the nomogram to other models in the development cohort in terms of the area under the ROC curve (AUC).
The Akaike information criterion (AIC) [24] was used to compare and rank multiple competing models and emphasize the comparison of the goodness of fit of the competing models while considering the principle of parsimony. We chose the model with the lowest AIC value (representing the "best-approximating model") in this study.
The utility and clinical value of models can be evaluated using decision curve analysis (DCA) [25], which determines the net benefit for patients at each threshold probability. The calibration of the nomogram was assessed using the Hosmer-Lemeshow test and calibration curves in the three cohorts. Two-sided p values < 0.05 indicated statistical significance. The packages of GLMNET were run, and statistical analysis was performed using R software (version 3.6.2; http:// www. Rproj ect. org). In the PIL group, 35.6% (53/149) of the patients were diagnosed with PILs (AAH/AIS) in the development cohort, 38.9% (21/54) were diagnosed with PILs in the internal validation cohort, and 37.6% (64/170) were diagnosed with PILs in the external validation cohort. Table 1 shows the baseline characteristics of the patients in the development and two validation cohorts.

Validation of the radiomic signatures
In total, 1722 radiomic features were extracted from two VOIs (RS1: four features for VOI-1, RS2: eight features for VOI-2) and were selected by the LASSO algorithm. b Two regions of interest (ROIs) were constructed into volumes of interests (VOIs), and radiomic features were extracted from two VOIs. c Radiomic features were selected by the LASSO algorithm and constructed into a radiomic signature. d Discrimination and calibration of the nomogram which was formed by the clinical-radiological and combined radiomic signatures Moreover, RS1and RS2 were combined into a final radiomic signature (RS-C) using logistic regression, and the radiomic score calculation formula was presented in eTable 3 in Supplementary materials.
The radiomic score for each patient was significantly different between the PIL group (AAH/AIS) and IL group (MIA/IAC) in three cohorts (p < 0.001 for eFigure 2A; p = 0.003 for eFigure 2B; p < 0.001 for eFigure 2C, eTable 4 in the Supplementary materials). The mean value of the radiomic score for patients in the IL group (MIA/IAC) was significantly higher in both the development and two validation cohorts (7.28, 7.45, and 8.34, respectively) compared with Among all radiomic-related signatures in the development and two validation cohorts, the RS-C had the greatest AUCs of 0.93 (95% CI, 0.89 to 0.97), 0.91 (95% CI, 0.83 to 0.98), and 0.90 (95% CI, 0.85 to 0.94) (eFigure 3C in the Supplementary materials). The two-way random ICC model was applied to measure the reliability of the radiomic features between two-times image segmentation and feature extraction process. The agreement levels are defined regarding ICC values: excellent (ICC ≥ 0.81), good (0.61 < ICC < 0.8), moderate (0.41 < ICC < 0.60), and poor (ICC ≤ 0.40). eTable 5 summarizes the results of the intrarater agreement analysis. Radiomic features in the RS1 show excellent intrarater reliability (ICC = 0.92 to 0.99) between this process, and the high ICC values for radiomic feature in the RS2 ranging from good (ICC = 0.74; 95% CI, 0.521 to 0.872) to excellent (ICC = 0.98; 95% CI, 0.949 to 0.989).

Validation of the clinical-radiological signature
Clinical-radiological characteristics, including density (part-solid nodule (PSN)/solid nodule), pleural retraction, irregular shape, lobulated borders, CTR ≥ 0.5, and blurred margins, were significantly associated with pathology invasiveness after the univariate analysis (p < 0.05; eTable 6 in the Supplementary materials), and four of these characteristics (PSN/solid nodule, irregular shape, pleural retraction, and blurred margins) were selected using the stepwise logistic regression model to form the clinical-radiological signature (eTable 3 in the Supplementary materials).

Validation, calibration, and discrimination of the nomogram
To develop a clinically applicable approach that could predict pathological invasiveness in patients with a solitary pulmonary nodule, we constructed a radiomic nomogram that considers the clinical-radiological and radiomic signatures (Fig. 3). The multivariate logistic regression analysis showed that the clinical-radiological signature (odds ratio (OR) = 1.60; 95% CI, 1.03 to 2.59; p = 0.04) and the radiomic signature (odds ratio (OR) = 2.43; 95% CI, 1.76 to 3.66; p < 0.001) represented independent predictors in the nomogram (eTable 7 in the Supplementary materials).
As shown in the nomogram (Fig. 3), when compared to the clinical-radiological signature, the radiomic signature accounted for the most significant proportion, making it the cardinal biomarker for distinguishing PILs from early-stage ILs. Based on the obtained features, the clinical-radiological signature and combined radiomic signature could be calculated using the formula. The value assigned to each signature was scored on a point scale from 0 to 10. By adding the scores for each signature, one can obtain a total score. The risk of this solitary pulmonary nodule having pulmonary interstitial invasion can be predicted by projecting the score to the bottom risk axis. The nomogram formed by the clinical-radiological and combined radiomic signatures performed better than any isolated signatures. The nomogram achieved an excellent predictive value, with an AUC test of 0.94 (95% CI, 0.90 to 0.97) in the development cohort, which achieved better discriminatory performance than the radiomic signatures and clinical-radiological signature (Fig. 4b). Similar findings of model comparisons were also observed in the two validation cohorts (Fig. 4c, d). In the two validation cohorts, the nomogram also yielded high AUCs of 0.90 (95% CI, 0.81 to 0.98) and 0.92 (95% CI, 0.88 to 0.92) (Fig. 4a). Moreover, the accuracy, specificity, and PPV of the nomogram were higher than 80.0% in the three cohorts ( Table 2).
In order to prove that the nomogram model also has a good discriminatory performance among the SPNs with different densities (pure ground-glass nodule (pGGN), PSN, solid), we conducted a subgroup analysis in this research. The DeLong test was performed on the ROC curves of five models among the AUCs in the development cohort. The differences were statistically significant between the nomogram and RS1 and the nomogram and the clinical-radiological signature, with p = 0.005 and p < 0.001, respectively (eTable 9 in the Supplementary materials). The clinical-radiological signature achieved the highest AIC value at 159.36 among all prediction models in eTable 9. The AIC of the nomogram (121.68) was similar to that of the radiomic signature (121.0) and RS2 (117.62), which was less than the AIC of RS1 (149.83). Based on the overall consideration of the AIC and ROC curves, the nomogram model proved to have excellent goodness of fit and parsimony. Fig. 4 ROC curves of the nomogram and models in the development and validation cohorts. a ROC curves of the nomogram in the development and validation cohorts; b ROC curves of five models in the development cohort; c ROC curves of five models in the internal validation cohorts; d ROC curves of five models in the external validation cohorts. ROC, receiver operating characteristic; RS1, radiomic signature selected from the nodular area; RS2, radiomic signature selected from the perinodular area; RS-C, combined radiomic signatures selected from the nodular area and perinodular area; C-R, clinical-radiological Moreover, the nomogram calibration curve in cohorts indicated a good agreement between the nomogram prediction and actual observation. The Hosmer-Lemeshow test revealed that the nomogram was well fitting, with a nonsignificant difference (p > 0.05) (eFigure 5 in the Supplementary materials). DCAs (Fig. 5) were used to assess the utility of the three predictive models by calculating the net benefit at various probability thresholds. According to the decision curves, the radiomic signature showed more benefit than the clinical-radiological signature in predicting the risk of the interstitial invasion when the probability threshold in the clinical decision of a patient or physician was above 0.2 in the development cohort. The nomogram line achieved the highest clinical net benefit across the entire range of threshold probabilities in three cohorts, which indicated that the nomogram was a reliable clinical tool to predict pathology invasiveness.

Discussion
In this multicenter study, we built and validated a radiomic nomogram to distinguish PILs from early-stage pulmonary interstitium ILs preoperatively. The nomogram incorporated radiomic signatures selected from the nodular and perinodular areas and the clinical-radiological signature and performed well in the development and validation cohorts. The low AIC in the nomogram demonstrated the good quality of this available tool. The DCAs indicated that the nomogram is a reliable clinical treatment decision support tool to predict pulmonary interstitial invasion for patients with a solitary pulmonary nodule.
This research describes some important radiological characteristics that contribute to the differential diagnosis of SPN. The nodule with part-solid/solid density, pleural retraction, irregular shape, and blurred margins had a higher risk for malignancy, consistent with the radiologists' experience. Previous researches [26,27] have used nodule size and CTR to distinguish PILs from ILs. However, they were not included in the final predictive nomogram in our study.
We found considerable reliability of the radiomic features in the repeatability study. Overall, more than 83% (10/12) of the radiomic features achieve an excellent intrarater reliability (ICC ≥ 0.81) in the RS-C. To a certain extent, it can reflect the stability and generalization of the nomogram, which is mainly composed of the combined radiomic signature. Through commonly used and simple metrics, first-order statistics explain the distribution of voxel intensities inside the image region defined by the mask. GLCM is a statistical texture analysis method that evaluates the spatial relationship between pixels and determines how frequently a particular combination of pixels appears in an image. The radiomic features, including the sum average in the GLCM category and uniformity in the first-order category, were also reported to differentiate invasive pulmonary adenocarcinoma from PILs in a previous study [28]. The GLSZM provides information on the size of homogeneous zones for each gray level in three dimensions. Nearly half (5/12) of the combined radiomic signature was related to the GLCM and GLSZM categories and is stable by changes in the ROIs [29]. Although not statistically significant according to the DeLong test, RS2 can numerically distinguish PILs from early-stage pulmonary interstitium ILs and showed better performance than RS1 in all cohorts. Moreover, the lower AIC in RS2 indicates that its model quality is better than that of RS1. These findings may be due to more unstable features in the focal area than perinodular ones [29].
We constructed the first radiomic model combined with a 5-mm perinodular radiomic signature for the early diagnosis of pathological invasiveness. The predictive reproducibility of the models was evaluated in this multicentric study. Previous studies classified MIA as a benign or preinvasive pulmonary nodule because it has a good prognosis after surgical treatment as AIS [5]. Unlike other studies conducted by She et al [30] and Xu et al [31], we prefer to regard MIA as early-stage pulmonary malignant lesions because the difference between AIS and MIA lies in the microenvironment [30], especially in the expression level of laminin-5 [32][33][34] and the frequency of tumor protein p53 gene (TP53) mutations [35][36][37].
It is difficult for radiologists or thoracic surgeons to differentiate from PILs to ILs in pGGNs and PSNs. On the one hand, the surgical treatment strategies for patients with high-risk pulmonary nodules remain a massive challenge as the histopathologic definition is difficult to make before an operation. On the other hand, the entire histologic sampling of the tumor is required to diagnose the AIS or MIA, which may prolong the procedure time and lead to inappropriate surgical decision-making. In our study, the subgroup analysis shows good discrimination of the nomogram in the pGGNs and PSNs. A nomogram may determine whether surgical treatment or conservative surveillance is required and recommend management strategies for patients with nodules diagnosed as ILs.
The relevance of biological importance to the distribution of immune cells in the perinodular area has already been demonstrated [18]. The combination of intratumoral and peritumoral features proved useful in predicting the complete pathological response and lymph node metastasis, and identifying molecular subtypes [38][39][40]. Several studies demonstrated the added value of using radiomic features from perinodular parenchyma to differentiate nodules in terms of potential malignancy, and the definition of perinodular area differs between studies [41,42]. Beig et al conducted a study [19] that used 30-mm perinodular radiomic features to distinguish IAC from benign granulomas, and the most predictive features were within a 5-mm perinodular area. The same distance of the perinodular area was also used in the study conducted by Wu et al [43]. As Wu et al indicated, adding perinodular features did not improve the radiomic model performance. However, the radiomic model achieved a better predictive value in our study after combining with the radiomic signature selected from the perinodular area.
In this study, pathological findings were used as the gold standard rather than the consensus malignancy rating of each nodule used in other studies [44]. This retrospective study was restricted to only suspected malignant nodules (PILs and ILs) and excluded some benign pulmonary lesions (tuberculosis or granulomas) to simulate the most likely clinical situation. Additionally, to improve the reliability of the results and enhance the homogeneity of the population, this Limitations of the study included its retrospective nature and the variation in the research period among the multiple centers, which prevented some clinical factors from being obtained, and a certain bias and heterogeneity may have existed in the study. Second, the CT acquisition protocol (i.e., image thickness) was not unified among all patients in the three centers. A standard process on radiomic features was performed to alleviate this problem, and the nomogram finally performed well in the external validation group. This finding indicated that the nomogram has good universality and is worthy of clinical application. Third, owing to the limitation of data, we did not have transcriptomics and mutation-sequencing data. Therefore, we could not further explain the relevant mechanism between radiomics and the tumor microenvironment. In the future, prospective, high-quality research with a larger population is still required to verify our results further.

Conclusion
The perinodular radiomic signature improved the distinction between pulmonary interstitium ILs and PILs when combined with the nodular radiomic signature. This study demonstrated that a nomogram constructed by identifying the clinical-radiological signature and the combined radiomic signature has the potential to be an easy-to-use, non-invasive preoperative biomarker to precisely predict pathological invasiveness and add diagnostic value to clinical decisions for optimal intervention benefit.