Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules

Sun, Haitao; Zhang, Chunling; Ouyang, Aimei; Dai, Zhengjun; Song, Peiji; Yao, Jian

doi:10.1186/s12938-023-01180-1

Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules

Research
Open access
Published: 30 November 2023

Volume 22, article number 112, (2023)
Cite this article

Download PDF

You have full access to this open access article

BioMedical Engineering OnLine Aims and scope Submit manuscript

Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules

Download PDF

Haitao Sun¹,
Chunling Zhang¹,
Aimei Ouyang¹,
Zhengjun Dai²,
Peiji Song¹ &
…
Jian Yao ORCID: orcid.org/0009-0000-4682-4433³

1244 Accesses
Explore all metrics

Abstract

Purpose

To develop a comprehensive multi-classification model that combines radiomics and clinic-radiological features to accurately predict the invasiveness and differentiation of pulmonary adenocarcinoma nodules.

Methods

A retrospective analysis was conducted on a cohort comprising 500 patients diagnosed with lung adenocarcinoma between January 2020 and December 2022. The dataset included preoperative CT images and histological reports of adenocarcinoma in situ (AIS, n = 97), minimally invasive adenocarcinoma (MIA, n = 139), and invasive adenocarcinoma (IAC, n = 264) with well-differentiated (WIAC, n = 99), moderately differentiated (MIAC, n = 84), and poorly differentiated IAC (PIAC, n = 81). The patients were classified into two groups (IAC and non-IAC) for binary classification and further divided into three and five groups for multi-classification. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) algorithm to identify the most informative radiomics and clinic-radiological features. Eight machine learning (ML) models were developed using these features, and their performance was evaluated using accuracy (ACC) and the area under the receiver-operating characteristic curve (AUC).

Results

The combined model, utilizing the support vector machine (SVM) algorithm, demonstrated improved performance in the testing cohort, achieving an AUC of 0.942 and an ACC of 0.894 for the two-classification task. For the three- and five-classification tasks, the combined model employing the one versus one strategy of SVM (SVM-OVO) outperformed other models, with ACC values of 0.767 and 0.607, respectively. The AUC values for histological subtypes ranged from 0.787 to 0.929 in the testing cohort, while the Macro-AUC and Micro-AUC of the multi-classification models ranged from 0.858 to 0.896.

Conclusions

A multi-classification radiomics model combined with clinic-radiological features, using the SVM-OVO algorithm, holds promise for accurately predicting the histological characteristics of pulmonary adenocarcinoma nodules, which contributes to personalized treatment strategies for patients with lung adenocarcinoma.

The development and validation of a radiomic nomogram for the preoperative prediction of lung adenocarcinoma

Article Open access 08 June 2020

The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules

Article 04 June 2018

Development and validation of a preoperative CT-based radiomic nomogram to predict pathology invasiveness in patients with a solitary pulmonary nodule: a machine learning approach, multicenter, diagnostic study

Article Open access 16 October 2021

Find the latest articles, discoveries, and news in related topics.

Medical Imaging

Introduction

Pulmonary nodules are prevalent during CT screening, with at least one nodule detected in up to 51% of initial screenings [1]. Despite over 95% of the nodules being ultimately determined as benign, a significant number of malignant pulmonary nodules are still detected due to the vast number of cases screened [2, 3]. Lung adenocarcinoma (LAC) is the dominant histological subtype of malignant pulmonary nodules [4, 5]. As per the pathological classification by the World Health Organization, LAC is categorized into three types based on the level of invasiveness—adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) [6]. In 2020, the International Association for the Study of Lung Cancer (IASLC) reclassified IAC into three grades based on varying levels of differentiation including well-differentiated IAC (WIAC), moderately differentiated IAC (MIAC), and poorly differentiated IAC (PIAC), which proved superior to models incorporating nuclear or cytologic grade [7]. Each LAC subtype exhibits distinct biological characteristics and prognosis. As a preinvasive lesion, AIS is typically managed through follow-up surveillance, but some nodules may progress to MIA or IAC [8]. AIS and MIA perform the excellent prognosis after sub-lobar resection [9], while poorly and moderately differentiated IAC exhibit higher postoperative recurrence rates compared to well-differentiated IAC [7, 10]. Therefore, accurately identifying the invasiveness and differentiation of adenocarcinoma to classify LAC may provide guidance for surveillance, surgical strategy, and prognosis based on preoperative CT images.

Traditionally, the classification of LAC subtypes relies on visual assessment and verbal description of radiological features. Previous investigations have highlighted the significance of nodule characteristics, including size, type, margin, pleural indentation, vacuole sign, and vascular convergence sign, in determining the pathological nature of pulmonary nodules [11,12,13,14]. However, accurate classification heavily relies on the expertise and diagnostic proficiency of radiologists. Li et al. found that senior radiologists exhibited superior predictive capabilities in discerning the grading of IAC compared to their junior counterparts [15]. Presently, histological classification necessitates invasive tissue sampling through surgery or needle biopsy, which can be burdensome. Therefore, the development of a non-invasive and convenient approach to anticipate the histological subtypes of pulmonary nodules holds significant clinical implications.

Radiomics, a highly promising methodology, involves the extraction of numerous high-dimensional, retrievable features from medical imaging data, either independently or in conjunction with clinical features [16,17,18]. This approach has shown utility in distinguishing between benign and malignant lung nodules, predicting the invasiveness of lung adenocarcinoma, and identifying the preoperative IASLC grade of IAC [19,20,21,22]. Several studies have developed radiomics models to categorize and predict the pathological attributes of specific nodules, such as pure ground-glass nodules (pGGNs), mixed ground-glass nodules (mGGNs), and solid nodules (SNs), demonstrating robust predictive capabilities [23,24,25]. However, these studies have primarily focused on two or three-classification radiomics, thus lacking coverage of the majority of nodule types and new pathological gradings. Therefore, the development of a multi-classification radiomics approach that can predict the pathological invasiveness and differentiation of pulmonary nodules holds greater clinical value and practicality [26].

With this foundation in mind, the objective of this investigation is to construct a five-classification radiomics model integrating clinic-radiological features for the prediction of invasiveness and differentiation of adenocarcinoma nodules, encompassing AIS, MIA, WIAC, MIAC, and PIAC. The ultimate aim is to establish a non-invasive approach that enables comprehensive assessment of the histological classifications of pulmonary nodules.

Materials and methods

Patients

A total of 951 patients who underwent complete resection for suspected lung cancer were included in this study, with clinical data and preoperative CT images collected between January 2020 and December 2022. To enhance the homogeneity of the patient cohorts, specific exclusion criteria were applied, which encompassed: (1) patients with confirmed non-adenocarcinoma histology, such as squamous carcinoma, mucinous adenocarcinoma, metastases, and others (n = 187); (2) lung nodules larger than 3 cm in diameter (n = 192); (3) patients who received clinical treatment and needle biopsy prior to the CT examination (n = 76); (4) CT images of inadequate quality (n = 27); and (5) patients with lymph-node metastases (n = 45). Ultimately, a total of 500 patients (201 men and 299 women) were retrospectively enrolled in this study, with a median age of 59 years (age range: 19–83 years). Among these patients, there were 97 cases of AIS, 139 cases of MIA, 99 cases of WIAC, 84 cases of MIAC, and 81 cases of PIAC. The detailed process of patient recruitment is presented in Fig. 1.

Histopathological evaluation

All lung specimens that underwent surgical resection were meticulously examined following the 2021 WHO classification of thoracic tumors and the newly proposed grading system by the IASLC [6, 7]. The diagnosis of LAC was made based on comprehensive histologic patterns, which encompassed lepidic, acinar, papillary, micropapillary, solid, cribriform, and complex glandular patterns [27]. The proportion of each histologic pattern was recorded in 5% increments to determine the predominant histologic subtype and quantify any patterns for tumor grading. Adenocarcinoma was categorized into two groups based on the degree of invasion: non-IAC (including AIS and MIA) and IAC (including WIAC, MIAC, and PIAC). Within the IAC group, low-grade subtypes were further divided into WIAC and MIAC, while high-grade subtypes (PIAC) were also considered. This grading scheme was showed in Table 1 [7]. According to the pathological invasiveness and differentiation of pulmonary nodules, the definition of two-, three-, five-classification task was as follows: two-classification was (AIS, MIA) vs (WIAC, MIAC, and PIAC), three-classification was (AIS, MIA) vs (WIAC, MIAC) vs (PIAC), and five-classification was (AIS) vs (MIA) vs (WIAC) vs (MIAC) vs (PIAC).

Table 1 Grading scheme for invasive adenocarcinomas of pulmonary nodules

Full size table

CT acquisition

The patients underwent CT plain imaging of the lungs using one of three CT systems: Somatom Definition AS 64 (Siemens Healthcare, Germany), Somatom Definition Flash (Siemens Healthcare, Germany), GE Discovery CT750 HD (GE Medical Systems, USA). Patients were scanned in the supine position with complete inspiration and breath-holding, from the apex of the lung to the diaphragm. The scanning process utilized a tube voltage of 120 kV, automatic tube current, and reconstruction slice thickness and interval of 1 or 1.25 mm. Reconstruction settings included a lung window with a mean of -500 HU and a width of 1500 HU, with a matrix size of 512 × 512.

Clinical and radiological features

The clinical data, including gender, age, smoking history, hypertension, diabetes, and neoplasia history, was retrieved from the hospital information system. The radiological characteristics were assessed by two thoracic specialists, and a consensus was reached. These characteristics encompassed the involved lobe (right upper lobe, right middle lobe, right lower lobe, left upper lobe, and left lower lobe), nodule shape (regular and irregular), nodule type (including pGGNs, mGGNs, and SNs), boundary (clear or blurred border definition), lobulation (indentation at the edge of a round or oval lesion), speculation (linear strands extending into the lung parenchyma but not touching the pleural surface), vacuole (small focal areas of low attenuation within the nodule), air bronchogram (tubelike or branched air structure within the nodule), vascular convergence (multiple supplying vessels converging toward the lesion), pleural retraction (linear strands extending toward the pleura or major/minor fissure from the mass, causing pleural distortion), bronchial cut-off (sudden truncation of a bronchus due to obstruction within the nodule), and presence of abnormal vessels within the nodules (distorted, dilated, and complicated vessels within the lesions) [14, 28,29,30].

Pulmonary nodules’ segmentation

A single radiologist (with 10 years of experience in chest imaging), who was blinded to the pathological results, conducted semi-automated lesion segmentation on CT images with lung window settings using the Radcloud Platform [31] (version 7.5; Huiying Medical Technology Co., Ltd., Beijing). The segmentation process involved delineating the pulmonary nodules on a section-by-section basis to generate a three-dimensional region of interest (ROI). After a month, 50 cases were randomly selected, and the same radiologist repeated the segmentation to assess intra- and interobserver reproducibility. The aforementioned segmentation results were further validated by an experienced radiologist (with 15 years of experience in chest imaging).

Radiomics and clinic-radiological features extraction and selection

The extraction of image features plays a fundamental and crucial role in radiomics analysis, as it enables the identification of relevant features that effectively capture the biological characteristics of lesions and tumor heterogeneity. In this study, the Radcloud platform was utilized for the extraction of radiomics features. Specifically, the platform employed PyRadiomics [32] (version 3.1.0, https://pyradiomics.readthedocs.io/), a Python-based library, to extract a comprehensive set of radiomics features from the medical images. To enhance the reproducibility of the radiomics analyses, pre-processing steps were meticulously addressed. Prior to feature extraction, \(z\) normalization of CT images was conducted using PyRadiomics. Additionally, grayscale discretization employed fixed Bin Width values set at 25HU, and voxel size resampling was executed at 1 × 1 × 1 mm³ using PyRadiomics [33].

In our study, we extracted a total of 1688 image features belonging to five major categories from the ROI of each patient. These categories include first-order statistics, 3D shape features, gray-level co-occurrence matrix (GLCM) features, gray-level run length matrix (GLRL) features, gray-level size zone matrix (GLSZM) features, neighboring gray tone difference matrix (NGTDM) features, and gray-level dependence matrix (GLDM) features. Notably, shape features were solely derived from the original images, while the remaining features were obtained by applying various filters such as wavelet, square, square root, gradient, logarithm, exponential, local binary pattern in 2D (LBP-2D), and local binary pattern in 3D (LBP-3D). For the extraction of texture features, preprocessed CT images underwent wavelet filtering, which involved transforming the VOI into the wavelet domain while preserving low-pass (LLL) and high-pass (HHH) subbands and assigning different weights to other subbands (LHL, LHH, LLH, HLL, HHL, and HLH). Additionally, the LBP-3D image type comprised three subcategories, including the kurtosis map (LBP-3D-k), as well as two categories calculated using different levels of spherical harmonics, namely LBP-3D-m1 and LBP-3D-m2. It is worth mentioning that all the aforementioned radiological features adhere to the Image Biomarker Standardization Initiative (IBSI, https://theibsi.github.io).

Prior to selecting radiomics features, Z-score normalization was applied to all features. Each patient possessed a total of 1688 features, resulting in a significant amount of redundancy. To avoid diminishing the predictive performance of the model and to reduce computational time, it is necessary to perform feature selection in training set before model development. First, the evaluation of interobserver reproducibility for radiomics features was conducted utilizing the intraclass correlation coefficient (ICC). Specifically, ICC values falling below 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and exceeding 0.90 are indicative of poor, moderate, good, and excellent reliability, respectively [34]. Consequently, features with ICC values surpassing 0.75 were retained for subsequent stages of feature selection. Subsequently, a variance threshold of 0.8 was employed to refine the feature selection process. Furthermore, the univariate analysis method, SelectKBest, was utilized to identify features with a p value less than 0.05 for further analysis. Finally, the least absolute shrinkage and selection operator (LASSO) regression method with tenfold cross-validation was employed to assist in feature selection, aiming to identify relevant and informative features associated with lung cancer classification.

On the other hand, LASSO regression is also applicable for the selection of clinic-radiological features [35], integrating radiological scores with independent clinical risk factor scores to establish a predictive model for lung cancer classification. Specifically, LASSO shrinks all regression coefficients close to zero based on a regularization parameter λ and precisely sets the coefficients of many irrelevant features to zero. To determine the optimal value of λ, we employed a tenfold cross-validation with a minimum criterion, resulting in the λ value that yielded the lowest cross-validation error. The retained non-zero coefficient features were used to fit the regression model and combined into a radiomics and clinic-radiological features model. The predicted values of the model for each patient were computed through a linear combination weighted by the correlation coefficients of the selected features.

Given the involvement of two-classification, three-classification, and five-classification tasks in our study, it is essential to note that the LASSO labels employed during feature selection are tailored to the specific categorization requirements of each respective task. In other words, the labels used in the LASSO regularization process correspond uniquely to the distinct classification schemes associated with the two-classification, three-classification, and five-classification tasks. This tailored approach ensures the relevance and appropriateness of the selected features for each specific classification task within our study framework.

Development of machine learning (ML) models

The features extracted from lesion segmentation using computer learning techniques are subjected to data analysis and model construction, enabling the reflection of lesion information and prediction of the lesions. The radiomics dataset comprises a training set for training and a testing set for model testing. Currently, commonly used radiomics models include logistic regression (LR), support vector machine (SVM), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), gradient boosting decision tree (GBDT), among others. One versus rest (OVR) and one versus one (OVO) are two well-known strategies that decompose multi-class classification problems into multiple binary classification problems. Since LR and SVM are two-classification models, this study categorizes them into LR-OVR, SVM-OVR, LR-OVO, and SVM-OVO for three-classification and five-classification tasks, while also comparing them with other multi-classification models, such as KNN, DT, RF, and GBDT.

Validation of the optimizing ML models

In the evaluation of the testing cohort, the performance of the two-classification problem was assessed using quantitative measures, such as accuracy (ACC), sensitivity, specificity, and the area under the receiver-operating characteristic curve (AUC). Furthermore, considering the inherent characteristics of the multi-classification problem, the predictive capabilities of the models designed for multiple classes were examined by computing macro- and micro-averaged AUCs [36]. Additionally, macro-average accuracy, F1-score, recall, and precision were calculated to evaluate the classification performance of the multi-classification models. Notably, in addition to the radiomics model, the same methodology was employed to develop the clinic-radiological model and combined model, with the objective of verifying whether the inclusion of clinical variables enhances the classification performance of the machine learning models for pulmonary nodules. Our study flow diagram is shown in Fig. 2.

Statistical analyses

The statistical analyses were conducted using R software (version 4.2.1; https://www.r-project.org/) to compare the differences in clinical and radiological data among the five groups. For categorical variables, the Chi-square test was employed, and in terms of quantitative variables, the Mann–Whitney U test was applied. The overall performance of the multi-classification models in the development and testing cohorts was evaluated through receiver-operating characteristic (ROC) curve analysis and the calculation of micro- and macro-AUC. All statistical tests were two-sided, and a significance level of P < 0.05 was deemed statistically significant throughout the entire study duration.

Results

Patient characteristics

Table 2 displays the comprehensive clinical and radiological features observed in a cohort of 500 patients. The enrolled patients underwent random allocation into a training set (n = 349) and a testing set (n = 151) at a ratio of 7:3. Statistical analysis revealed significant differences among the five histological subtypes for various clinical and radiological features, including gender, smoking history, hypertension, age, nodule boundary, lobulation sign, bronchial cut-off sign, speculation sign, vacuole sign, air bronchogram sign, vascular convergence sign, pleural retraction, abnormal vessels within nodules, and nodule type (p < 0.05).

Table 2 The clinical and radiological features of patients in the histological classifications

Full size table

Consequently, two-classification system comprised non-IAC (AIS, MIA, n = 236) and IAC (WIAC, MIAC, and PIAC, n = 264). The three-classification system comprised non-IAC (AIS, MIA, n = 236), low-grade IAC (WIAC, MIAC, n = 183), and high-grade IAC (PIAC, n = 81). In the five-classification system, the subtypes included AIS (n = 97), MIA (n = 139), WIAC (n = 99), MIAC (n = 84), and PIAC (n = 81).

The selected radiomics and clinic-radiological features in varying classification models

After extracting a total of 1688 radiomics features, only those demonstrating good feature consistency (ICC≧0.75) were selected for further analysis. For the two-classification model, a combination of variance thresholding, SelectKBest, and LASSO regression methods was utilized to identify 31 non-zero coefficient radiomics features from the CT sequences. These selected features were then used to calculate the 2-Rad-score for each patient in both the training and testing cohorts. The calculation of the 2-Rad-score involved summing the products of the corresponding feature values and their respective weights. Similarly, for the 17 clinic-radiological features, LASSO regression was employed to extract 11 non-zero coefficient clinic-radiological features. This process resulted in the computation of the 2-clinic-radiological-score using the same methodology.

Following the same approach, the 3-rad-score and 3-clinic-radiological-score for the three-classification model were calculated using 39 retained radiomics features and 14 retained clinic-radiological features, respectively. Additionally, for the five-classification model, 26 radiomics features and 11 clinic-radiological features were preserved to calculate the 5-rad-score and 5-clinic-radiological-score, respectively.

Performance of the ML models in different classifications

Clinic-radiological, radiomics, and combined models were constructed for the two-classification task using LR, SVM, KNN, DT, RF, and GBDT algorithms. The performance of these ML models in the testing cohort was evaluated, and the results are presented in Table 3. All ML models exhibited satisfactory performance in predicting non-IAC and IAC. The SVM model showed the best overall performance, achieving the highest AUC and ACC values. Notably, the combined model demonstrated improved performance compared to the clinic-radiological and radiomics models, achieving an AUC of 0.942 and an ACC of 0.894, whereas the clinic-radiological and radiomics models achieved AUC values of 0.905 and 0.938, and ACC values of 0.848 and 0.868, respectively.

Table 3 The performance of ML models in testing cohort of two-classification model

Full size table

Similarly, for the three-classification task, clinic-radiological, radiomics, and combined models were constructed using LR-OVR, SVM-OVR, LR-OVO, SVM-OVO, KNN, DT, RF, and GBDT algorithms. The performance of these models in predicting non-IAC, low-grade IAC, and high-grade IAC was evaluated, and the results are presented in Table 4. The SVM-OVO model demonstrated the best overall performance among all ML models in the testing cohort, regardless of the clinic-radiological, radiomics, or combined model. However, the combined model achieved a higher ACC compared to the other two models, with a value of 0.767 versus 0.740 and 0.753, respectively.

Table 4 The performance of ML models in testing cohort of three-classification model

Full size table

The results of the five-classification task, as presented in Table 5, were consistent with the three-classification results. The SVM-OVO model exhibited the best overall performance in predicting AIS, MIA, WIAC, MIAC, and PIAC. Once again, the combined model outperformed the other two models in terms of ACC, achieving a value of 0.607 compared to 0.513 and 0.553.

Table 5 The performance of ML models in testing cohort of five-classification model

Full size table

Optimal ML model classification evaluation

The SVM-OVO ML model, combined with radiomics and clinic-radiological features, exhibited excellent performance in classifying pulmonary adenocarcinoma nodules, as presented in Table 6. To visualize the correlation between radiomics features, clinic-radiological scores, and histological types, a cluster graph was constructed using representative patient data, as shown in Fig. 3.

Table 6 The performance of multi-classification using SVM-OVO for predicting histological classification of pulmonary adenocarcinoma nodules

Full size table

Figure 4 displays the confusion matrix of the combined model using SVM and SVM-OVO in both the training and testing cohorts. The matrix illustrates that the selected models were not prone to making errors and effectively captured the relationships among histological subtypes. The ACC in the testing cohort exceeded 0.6, even for the challenging five-classification task. All histological subtypes in the three and five-classification tasks were accurately identified. The macro-AUC and micro-AUC values of the three-classification model in the testing cohort were 0.884 and 0.896, respectively. Similarly, the macro-AUC and micro-AUC values of the five-classification model were 0.858 and 0.866, respectively. The AUC values of the histological subtypes ranged from 0.787 to 0.942, with the lowest AUC observed for MIAC in the testing cohort, as depicted in Fig. 5.

Discussion

In this research endeavor, our objective was to devise classification models for two, three, and five histological stratifications of pulmonary adenocarcinoma nodules by integrating radiomics features with clinic-radiological characteristics. Additionally, we conducted a comprehensive comparison of various machine learning techniques, including LR-OVR, SVM-OVR, LR-OVO, SVM-OVO, DT, KNN, RF, and GBDT, to identify the most suitable model for multi-classification tasks in predicting the histological subtypes of pulmonary nodules. Consequently, the SVM-OVO model emerged as the optimal choice, exhibiting superior overall performance in accurately predicting the histological subtypes of pulmonary nodules.

Undoubtedly, the accurate histological classification of LAC, which evaluates invasiveness and differentiation, is pivotal in determining appropriate treatment strategies. Typically, this classification relies on postoperative pathological examination following complete surgical resection [9, 37]. Consequently, the development of a non-invasive and convenient method to predict the histological classification of pulmonary nodules based on preoperative CT images holds substantial clinical significance. Radiomics, as a burgeoning field, has shown great promise in the diagnosis, treatment, and monitoring of pulmonary nodules, surpassing the capabilities of radiologist-based assessments, as evidenced by a growing body of literature [15, 21, 22, 38].

In our present investigation, we put forth multiple common machine learning models encompassing the diverse histopathologic stratifications of pulmonary adenocarcinoma nodules, as per the fifth WHO classification of lung tumors. This encompassed the subtypes of AIS, MIA, WIAC, MIAC, and PIAC. Upon evaluating their discriminatory abilities, we observed that the SVM model yielded the most favorable outcomes in the testing cohort for the two-classification task, while SVM-OVO demonstrated superior performance among the ML models in the testing cohort for the three- and five-classification tasks. The area under the AUC ranged from 0.787 to 0.942, and the ACC ranged from 0.607 to 0.894. Notably, our findings align with previous research that highlighted the satisfactory performance of classifying pulmonary adenocarcinoma nodules in the two-classification task [22, 25, 39]. Furthermore, our results indicate that the ACC achieved in the five-classification task surpassed the previous multi-classification of the predominant histologic pattern [40].

However, previous studies focusing on the classification of pulmonary nodules have generally overlooked the crucial aspect of selecting appropriate machine learning models. In light of this gap, our study sought to compare various models and assess their performance in classifying pulmonary nodules. Encouragingly, our findings aligned with prior research, demonstrating that the SVM outperformed other machine learning models in the testing cohort. SVM, widely employed for classification and predictive modeling tasks, has established itself as a reliable choice, even when confronted with limited data availability [41,42,43].

In the realm of multi-class classification, two widely adopted strategies are the OVO and OVR approaches. Park et al. utilized the OVO method to predict the three primary subtypes of lung adenocarcinoma, as described in their study [21]. Similarly, Chen et al. employed the OVR method to construct a three-classification model for the preoperative prediction of risk stratification in gastrointestinal stromal tumors [36]. In another investigation by Liu et al., a four-classification OVO model was established to differentiate subtypes of non-small cell lung cancer [44]. When comparing the performance of the ML models for multi-classification, it was consistently observed that the OVO approach outperformed the OVR method in the testing cohort, which aligns with the findings reported by Liu et al.

Previous studies have also highlighted the correlation between clinic-radiological features and the histological classification of pulmonary nodules. Consequently, experienced radiologists can utilize clinic-radiological features to classify the pathologic subtypes of pulmonary nodules [2, 21, 45]. Certain unique or characteristic features, such as the presence of minute airspaces or dilated vessels within the lesions, play a significant role in nodule classification. However, these specific features may not exhibit a correlation with the radiomics feature category [46]. To enhance the predictive capability of the radiomics model for the classification of pulmonary nodules, we developed a combined model that incorporates both radiomics and clinic-radiological features. Our study demonstrated that the predictive power of the combined model surpassed that of the radiomics and clinic-radiological models, regardless of whether the classification task involved two classes or multiple classes.

The predictive performance of high-grade IAC in the testing cohort of three-classification combined models was found to be lower compared to non-IAC and low-grade IAC, which is in contrast to the results obtained from the five-classification model. This discrepancy may be attributed to the relatively lower number of high-grade IAC patients, resulting in data bias. Additionally, the predictive power of MIAC in the five-classification combined model was lower than that of other subtypes. This could be attributed to the insufficient sample size, leading to uncertainties in the analysis and slight variations in the proportion of invasive components within some MIAC patients.

It is important to acknowledge several limitations of the present study. First, the study's retrospective design, utilization of a single central dataset, and absence of external validation may limit the generalizability of the developed models. Second, the relatively small sample size may impact the statistical power of the analysis. Third, the incorporation of deep learning and improvement of machine learning models are necessary as more data become available. Fourth, the presence of spread through air spaces, which is commonly observed in patients with invasive adenocarcinoma, is closely linked to patient prognosis. Therefore, further investigations are warranted to establish additional radiomics classifications for predicting pathological characteristics.

In conclusion, this study underscores the importance of selecting appropriate machine learning models and demonstrates the utility of multi-classification radiomics combined with clinic-radiological features in predicting the invasiveness and differentiation of pulmonary adenocarcinoma nodules. The SVM-OVO model for the multi-classification task exhibited the best overall performance and successfully predicted the histological stratification of non-invasive subtypes.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AIS:: Adenocarcinoma in situ
MIA:: Minimally invasive adenocarcinoma
IAC:: Invasive adenocarcinoma
WIAC:: Well-differentiated invasive adenocarcinoma
MIAC:: Moderately differentiated invasive adenocarcinoma
PIAC:: Poorly differentiated invasive adenocarcinoma
LASSO:: Least absolute shrinkage and selection operator
ML:: Machine learning
ACC:: Accuracy
AUC:: Area under the curve
SVM:: Support vector machine
SVM-OVO:: One versus one strategy of SVM
LAC:: Lung adenocarcinoma
GGN:: Ground glass nodule
SN:: Solid nodule
ROI:: Region of interest
GLCM:: Gray-level co-occurrence matrix
GLRL:: Gray-level run length matrix
GLSZM:: Gray-level size zone matrix
NGTDM:: Neighboring gray tone difference matrix
GLDM:: Gray-level dependence matrix
LR:: Logistic regression
KNN:: K-nearest neighbors
DT:: Decision tree
RF:: Random forest
GBDT:: Gradient boosting decision tree
SVM-OVR:: One versus rest strategy of SVM
ICC:: Intraclass correlation coefficient

References

McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, Yasufuku K, Martel S, Laberge F, Gingras M, et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med. 2013;369(10):910–9.
Article Google Scholar
Mazzone PJ, Lam L. Evaluating the patient with a pulmonary nodule: a review. JAMA. 2022;327(3):264–73.
Article Google Scholar
National Lung Screening Trial Research T, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011; 365(5):395–409.
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Article Google Scholar
Kuhn E, Morbini P, Cancellieri A, Damiani S, Cavazza A, Comin CE. Adenocarcinoma classification: patterns and prognosis. Pathologica. 2018;110(1):5–11.
Google Scholar
Nicholson AG, Tsao MS, Beasley MB, Borczuk AC, Brambilla E, Cooper WA, Dacic S, Jain D, Kerr KM, Lantuejoul S, et al. The 2021 WHO classification of lung tumors: impact of advances since 2015. J Thorac Oncol. 2022;17(3):362–87.
Article Google Scholar
Moreira AL, Ocampo PSS, Xia Y, Zhong H, Russell PA, Minami Y, Cooper WA, Yoshida A, Bubendorf L, Papotti M, et al. A grading system for invasive pulmonary adenocarcinoma: a proposal from the international association for the study of lung cancer pathology committee. J Thorac Oncol. 2020;15(10):1599–610.
Article Google Scholar
Kakinuma R, Muramatsu Y, Kusumoto M, Tsuchida T, Tsuta K, Maeshima AM, Asamura H, Moriyama N. Solitary pure ground-glass nodules 5 mm or smaller: frequency of growth. Radiology. 2015;276(3):873–82.
Article Google Scholar
Kadota K, Villena-Vargas J, Yoshizawa A, Motoi N, Sima CS, Riely GJ, Rusch VW, Adusumilli PS, Travis WD. Prognostic significance of adenocarcinoma in situ, minimally invasive adenocarcinoma, and nonmucinous lepidic predominant invasive adenocarcinoma of the lung in patients with stage I disease. Am J Surg Pathol. 2014;38(4):448–60.
Article Google Scholar
Haoran E, Wu J, Ren Y, Xia L, Xu L, Li S, Zhao Y, Li C, She Y, Su C, et al. The IASLC grading system for invasive pulmonary adenocarcinoma: a potential prognosticator for patients receiving neoadjuvant therapy. Ther Adv Med Oncol. 2023;15:17588359221148028.
Google Scholar
Honda T, Kondo T, Murakami S, Saito H, Oshita F, Ito H, Tsuboi M, Nakayama H, Yokose T, Kameda Y, et al. Radiographic and pathological analysis of small lung adenocarcinoma using the new IASLC classification. Clin Radiol. 2013;68(1):e21-26.
Article Google Scholar
Bartholmai BJ, Koo CW, Johnson GB, White DB, Raghunath SM, Rajagopalan S, Moynagh MR, Lindell RM, Hartman TE. Pulmonary nodule characterization, including computer analysis and quantitative features. J Thorac Imaging. 2015;30(2):139–56.
Article Google Scholar
Wu Z, Wang F, Cao W, Qin C, Dong X, Yang Z, Zheng Y, Luo Z, Zhao L, Yu Y, et al. Lung cancer risk prediction models based on pulmonary nodules: a systematic review. Thorac Cancer. 2022;13(5):664–77.
Article Google Scholar
Gao F, Li M, Ge X, Zheng X, Ren Q, Chen Y, Lv F, Hua Y. Multi-detector spiral CT study of the relationships between pulmonary ground-glass nodules and blood vessels. Eur Radiol. 2013;23(12):3271–7.
Article Google Scholar
Li Y, Liu J, Yang X, Wang A, Zang C, Wang L, He C, Lin L, Qing H, Ren J, Zhou P. An ordinal radiomic model to predict the differentiation grade of invasive non-mucinous pulmonary adenocarcinoma based on low-dose computed tomography in lung cancer screening. Eur Radiol. 2023. https://doi.org/10.2139/ssrn.4164143.
Article Google Scholar
Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6.
Article Google Scholar
Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, Bussink J, Monshouwer R, Haibe-Kains B, Rietveld D, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006.
Article Google Scholar
Lee G, Park H, Bak SH, Lee HY. Radiomics in lung cancer from basic to advanced: current status and future directions. Korean J Radiol. 2020;21(2):159–71.
Article Google Scholar
Beig N, Khorrami M, Alilou M, Prasanna P, Braman N, Orooji M, Rakshit S, Bera K, Rajiah P, Ginsberg J, et al. Perinodular and intranodular radiomic features on lung CT images distinguish adenocarcinomas from granulomas. Radiology. 2019;290(3):783–92.
Article Google Scholar
Xu Y, Ji W, Hou L, Lin S, Shi Y, Zhou C, Meng Y, Wang W, Chen X, Wang M, Yang H. Enhanced CT-based radiomics to predict micropapillary pattern within lung invasive adenocarcinoma. Front Oncol. 2021;11: 704994.
Article Google Scholar
Park S, Lee SM, Noh HN, Hwang HJ, Kim S, Do KH, Seo JB. Differentiation of predominant subtypes of lung adenocarcinoma using a quantitative radiomics approach on CT. Eur Radiol. 2020;30(9):4883–92.
Article Google Scholar
Yang Z, Cai Y, Chen Y, Ai Z, Chen F, Wang H, Han Q, Feng Q, Xiang Z. A CT-based radiomics nomogram combined with clinic-radiological characteristics for preoperative prediction of the novel IASLC grading of invasive pulmonary adenocarcinoma. Acad Radiol. 2022. https://doi.org/10.1016/j.acra.2022.12.006.
Article Google Scholar
Zhang R, Wei Y, Shi F, Ren J, Zhou Q, Li W, Chen B. The diagnostic and prognostic value of radiomics and deep learning technologies for patients with solid pulmonary nodules in chest CT images. BMC Cancer. 2022;22(1):1118.
Article Google Scholar
Zhang R, Sun H, Chen B, Xu R, Li W. Developing of risk models for small solid and subsolid pulmonary nodules based on clinical and quantitative radiomics features. J Thorac Dis. 2021;13(7):4156–68.
Article Google Scholar
Xu F, Zhu W, Shen Y, Wang J, Xu R, Qutesh C, Song L, Gan Y, Pu C, Hu H. Radiomic-based quantitative CT analysis of pure ground-glass nodules to predict the invasiveness of lung adenocarcinoma. Front Oncol. 2020;10:872.
Article Google Scholar
Pasini G, Stefano A, Russo G, Comelli A, Marinozzi F, Bini F. Phenotyping the histopathological subtypes of non-small-cell lung carcinoma: how beneficial is radiomics? Diagnostics (Basel). 2023;13(6):1167.
Article Google Scholar
Nishio M, Nishio M, Jimbo N, Nakane K. Homology-based image processing for automatic classification of histopathological images of lung tissue. Cancers (Basel). 2021. https://doi.org/10.3390/cancers13061192.
Article Google Scholar
Ji H, Liu Q, Chen Y, Gu M, Chen Q, Guo S, Ning S, Zhang J, Li WH. Combined model of radiomics and clinical features for differentiating pneumonic-type mucinous adenocarcinoma from lobar pneumonia: an exploratory study. Front Endocrinol (Lausanne). 2022;13: 997921.
Article Google Scholar
Erasmus JJ, Connolly JE, McAdams HP, Roggli VL. Solitary pulmonary nodules: Part I. Morphologic evaluation for differentiation of benign and malignant lesions. Radiographics. 2000;20(1):43–58.
Article Google Scholar
Sun H, Zhou P, Chen G, Dai Z, Song P, Yao J. Radiomics nomogram for the prediction of Ki-67 index in advanced non-small cell lung cancer based on dual-phase enhanced computed tomography. J Cancer Res Clin Oncol. 2023. https://doi.org/10.1007/s00432-023-04856-2.
Article Google Scholar
Xiangfei C, Panli Z, Xunhong Y, Fang W, Yuwei X, Ying C, Ran Z, Fengyan Z, Yayuan G. RadCloud—an artificial intelligence-based research platform integrating machine learning-based radiomics, deep learning, and data management. J Artif Intell Med Sci. 2021;2(1–2):97–102.
Google Scholar
van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts H. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77(21):e104–7.
Article Google Scholar
Shin J, Seo N, Baek SE, Son NH, Lim JS, Kim NK, Koom WS, Kim S. MRI radiomics model predicts pathologic complete response of rectal cancer following chemoradiotherapy. Radiology. 2022;303(2):351–8.
Article Google Scholar
Prayer F, Watzenbock ML, Heidinger BH, Rainer J, Schmidbauer V, Prosch H, Ulm B, Rubesova E, Prayer D, Kasprian G. Fetal MRI radiomics: non-invasive and reproducible quantification of human lung maturity. Eur Radiol. 2023;33(6):4205–13.
Article Google Scholar
Pripp AH, Stanisic M. Association between biomarkers and clinical characteristics in chronic subdural hematoma patients assessed with lasso regression. PLoS ONE. 2017;12(11): e0186838.
Article Google Scholar
Chen Z, Xu L, Zhang C, Huang C, Wang M, Feng Z, Xiong Y. CT radiomics model for discriminating the risk stratification of gastrointestinal stromal tumors: a multi-class classification and multi-center study. Front Oncol. 2021;11: 654114.
Article Google Scholar
Da Cruz V, Yvorel V, Casteillo F, Tissot C, Luchez A, Bayle-Bleuez S, Fournel P, Tiffet O, Peoc’h M, Forest F. Histopathological subtyping is a prognostic factor in stage IV lung adenocarcinoma. Lung Cancer. 2020;147:77–82.
Article Google Scholar
He L, Huang Y, Ma Z, Liang C, Liang C, Liu Z. Effects of contrast-enhancement, reconstruction slice thickness and convolution kernel on the diagnostic performance of radiomics signature in solitary pulmonary nodule. Sci Rep. 2016;6:34921.
Article Google Scholar
Yang Y, Tan M, Ma W, Duan S, Huang X, Jin L, Tang L, Li M. Preoperative prediction of the degree of differentiation of lung adenocarcinoma presenting as sub-solid or solid nodules with a radiomics nomogram. Clin Radiol. 2022;77(9):e680–8.
Article Google Scholar
Wang C, Shao J, Lv J, Cao Y, Zhu C, Li J, Shen W, Shi L, Liu D, Li W. Deep learning for predicting subtype classification and survival of lung adenocarcinoma on computed tomography. Transl Oncol. 2021;14(8): 101141.
Article Google Scholar
Erickson BJ, Korfiatis P, Akkus Z, Kline TL. Machine learning for medical imaging. Radiographics. 2017;37(2):505–15.
Article Google Scholar
Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19(1):64.
Article Google Scholar
Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281.
Article Google Scholar
Liu J, Cui J, Liu F, Yuan Y, Guo F, Zhang G. Multi-subtype classification model for non-small cell lung cancer based on radiomics: SLS model. Med Phys. 2019;46(7):3091–100.
Article Google Scholar
Nam JG, Goo JM. Evaluation and management of indeterminate pulmonary nodules on chest computed tomography in asymptomatic subjects: the principles of nodule guidelines. Semin Respir Crit Care Med. 2022;43(6):851–61.
Article Google Scholar
Kalpathy-Cramer J, Mamomov A, Zhao B, Lu L, Cherezov D, Napel S, Echegaray S, Rubin D, McNitt-Gray M, Lo P, et al. Radiomics of lung nodules: a multi-institutional study of robustness and agreement of quantitative imaging features. Tomography. 2016;2(4):430–7.
Article Google Scholar

Download references

Funding

This work was supported by National Funds for Guiding Local Scientific and Technological Development under Grant No. (YDZX2021012).

Author information

Authors and Affiliations

Medical Imaging Center, Central Hospital Affiliated to Shandong First Medical University, 105 Jiefang Road, Lixia District, Jinan, 250013, Shandong Province, China
Haitao Sun, Chunling Zhang, Aimei Ouyang & Peiji Song
Scientific Research Department of Huiying Medical Technology Co., Ltd, 66 Xixiaokou Road, Haidian District, Beijing, 100192, China
Zhengjun Dai
Medical Imaging Center, Shandong Provincial Hospital Affiliated to Shandong First Medical University, 324 Jingwuweiqi Road, Huaiyin District, Jinan, 250021, Shandong Province, China
Jian Yao

Authors

Haitao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chunling Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Aimei Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengjun Dai
View author publications
You can also search for this author in PubMed Google Scholar
Peiji Song
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HS: conceptualization, methodology, investigation, visualization, and writing—original draft. CZ and AO: methodology, resources, and data curation. ZD: software, visualization, and formal analysis. PS: resources and data curation. JY: Conceptualization, investigation, supervision, project administration, and writing—review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jian Yao.

Ethics declarations

Ethics approval and consent to participate

Approval was obtained from the Ethic committee of Central hospital affiliated to Shandong First Medical University. The procedures used in this study adhere to the tenets of the Declaration of Helsinki. Since this study was a retrospective study, the Ethic committee of Central hospital affiliated to Shandong First Medical University waived the need to obtain informed consent from the patients.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Sun, H., Zhang, C., Ouyang, A. et al. Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules. BioMed Eng OnLine 22, 112 (2023). https://doi.org/10.1186/s12938-023-01180-1

Download citation

Received: 28 August 2023
Accepted: 23 November 2023
Published: 30 November 2023
DOI: https://doi.org/10.1186/s12938-023-01180-1

Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules

Abstract

Purpose

Methods

Results

Conclusions

Similar content being viewed by others

The development and validation of a radiomic nomogram for the preoperative prediction of lung adenocarcinoma

The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules

Development and validation of a preoperative CT-based radiomic nomogram to predict pathology invasiveness in patients with a solitary pulmonary nodule: a machine learning approach, multicenter, diagnostic study

Explore related subjects

Introduction

Materials and methods

Patients

Histopathological evaluation

CT acquisition

Clinical and radiological features

Pulmonary nodules’ segmentation

Radiomics and clinic-radiological features extraction and selection

Development of machine learning (ML) models

Validation of the optimizing ML models

Statistical analyses

Results

Patient characteristics

The selected radiomics and clinic-radiological features in varying classification models

Performance of the ML models in different classifications

Optimal ML model classification evaluation

Discussion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation