Background

Lung cancer is the most common diagnosed cancer and the leading cause of cancer death for both men and women [1,2,3,4,5]. As the most common type of lung cancer, non-small-cell lung cancer (NSCLC) comprises 85% of the primary lung malignancies, and the 5-year survival rate is less than 20% [1,2,3, 5, 6]. According to the New England Journal of Medicine [5], non-small-cell lung cancer (NSCLC) can be divided into three major histologic subtypes, namely squamous cell carcinoma (LUSC), adenocarcinoma (LUAD), and large-cell lung cancer, and all these subtypes are malignant tumors, among which LUSC and LUAD constitute approximately 35% and 60% of the primary NSCLC cases, respectively [1, 2, 4, 5, 7]. LUSC and LUAD have their own tissue characteristics, anatomical site and location, and glucose metabolism, which indicates different optimal treatment decisions to improve the clinical outcomes [4,5,6,7]. Therefore, it is very crucial to accurately confirm the histological subtype of the NSCLC prior to the treatment decisions [6].

Clinically, the histopathological analysis of the tumor tissues by biopsy is the first-line reference in identifying the NSCLC subtypes [4,5,6,7,8]. It is an invasive diagnostic process and full of risk in actual practices [6]. Besides, considering the spatial and temporal heterogeneity of the tumors, the biopsy can only extract very limited portions of the target tissue, incapable of a complete characterization of tumor properties [1, 7]. Hence, a non-invasive approach for the preoperative, accurate identification of LUSC and LUAD with the whole tumor site is required.

In recent years, computed tomography (CT) and magnetic resonance imaging (MRI) have been widely used for preoperative detection and diagnosis of lung cancer [1, 7, 9,10,11,12]. Compared with the CT, MRI has excellent soft tissue contrast and does not involve the use of ionizing radiation. However, as for the discrimination between LUSC and LUAD, it is a real challenge for the radiologists to make a visual judgment based on the MRI data. Besides, the performance and consistency of the previous studies varied dramatically [2, 13, 14]. Deep extraction of the quantitative features beneath the MRI data, i.e., radiomics [15,16,17,18,19], for the objective accurate discrimination between LUSC and LUAD deserves more attention.

Currently, the radiomics strategies based on multimodal MRI data, including the T2-weighted images (T2WI), diffusion-weighted images (DWI) and the corresponding apparent diffusion coefficient (ADC) images, have been widely used for breast cancer, bladder cancer, nasopharyngeal carcinoma and glioblastoma subtypes discrimination and outcomes prediction [15,16,17,18,19,20,21,22,23,24,25,26]. Whether the radiomics features extracted from multimodal MRI could reflect the significant differences of tissue distribution patterns between LUSC and LUAD, remains inconclusive up to now.

Therefore, the first aim of this study was to investigate whether the radiomics features extracted from multimodal MRI could significantly reflect the tissue distribution differences between LUSC and LUAD, and explore a feasible way for preoperative discrimination of LUSC and LUAD. To achieve this goal, five feature categories were employed in this study, including the histogram features, the Haralick features of co-occurrence matrices (CM features hereafter) [27], and features derived from the run length matrix (RLM features hereafter) [28], the neighborhood gray-tone difference matrix (NGTDM features hereafter) [29], and the gray-level size zone matrix (GLSZM features hereafter) [30], to fully characterize the global, local and regional heterogeneity differences of tumor tissues between LUSC and LUAD [19].

Considering that the semantic clinical features like age, sex, smoking history, size, location, the longest diameter (LD) and its longest perpendicular diameter (LPD) of the target lesion, and carcinoembryonic antigen (CEA) are closely related to lung cancer [4, 5], the second aim of this was to investigate whether integrating the radiomics features with these clinical features could further improve the diagnostic performance.

Results

Clinical characteristics of the patients

The baseline demographics and clinical information of the patients in both the training and the validation cohorts were collected from the archival medical documents, as shown in Table 1. The statistical analyses showed no significant differences between the training and validation cohorts in term of all these factors.

Table 1 Baseline demographics of the patients involved in this research

Performance of the optimal features selected for the discrimination between LUSC and LUAD

After Student’s t tests for all the 1404 radiomics features in the training cohort, 534 features showed significant differences between LUSC and LUAD, indicating that the multimodal MRI radiomics features describing the tissue distribution patterns, could well reflect the tissue distribution differences between LUSC and LUAD.

With a non-linear support vector machine (SVM)-based recursive feature elimination (SVM-RFE) approach further applied in these significant features, 13 features were finally selected as the optimal features, as shown in Fig. 1a. The discrimination performance of the optimal features in both the training and validation cohorts was then evaluated using a radial basis function-based non-linear SVM classifier with the LUSC patients labeled as “1” and the LUAD labeled as “−1”, as shown in Fig. 1b, c, indicating a favorable prediction performance.

Fig. 1
figure 1

Optimal features selection process and their classification performance with both cohorts: a features selection process (AUC indicates the area under the curve of the receiver operating characteristic); b the performance of the selected features in the training cohort; c the performance of the features with the validation cohort

The Radscore calculation

To simplify the prediction model furtherly, a Radscore formula based on these optimal features was generated by using a logistic regression algorithm, and the coefficient for each feature is listed in Fig. 2a. The intercept of the formula was 2.975. Figure 2b shows the sum of the absolute coefficients of these features in terms of image modalities or feature categories, from which we noticed that (1) the RLM feature category had the highest weight in the Radscore formula, and (2) the features derived from ADC maps contributed most in the Radscore calculation. Using the formula, the Radscore of each patient in both cohorts was calculated, which exhibited significant differences between the LUSC and LUAD patients (p value < 0.01), as shown in Fig. 2c.

Fig. 2
figure 2

Radscore generation and its inter-group distribution (ADC, DWI, T2WI, CM, RLM, GLSZM and GL represent the apparent diffusion coefficient, the diffusion-weighted images, the T2-weighted images, the co-occurrence matrices, the run length matrix, the gray-level size zone matrix, the gray level, respectively): a coefficient map of the 13 features; b sum absolute coefficients of the features with different modalities or categories; c the distribution and inter-group analyses of the Radscore

Performance of the radiomics–clinical nomogram in the discrimination task

In order to further improve the discrimination performance, the clinical features and the Radscore were jointly considered. After the univariate and multivariable analyses of the Radscore and the clinical features, age, smoking, location, LD, LPD, and Radscore were identified as independent predictors for the discrimination task, as shown in Table 2.

Table 2 Univariate and multivariable regression analyses of the Radscore with primary clinical features for the histological subtype prediction of NSCLC in the training cohort

Then, the radiomics–clinical nomogram was developed by integrating these five predictors, as shown in Fig. 3a. Based on the nomogram, the risk of each patient for being identified with LUSC was quantitatively calculated. Figure 3b shows the significant differences of the risk distribution between the LUSC and LUAD patients in both cohorts (p value  ≪ 0.01). With this nomogram, the discriminative performance was greatly improved, as shown in Fig. 3c and Table 3. With a risk threshold of 0.450, the prediction accuracy and AUC were improved to 83.0% and 0.901 in the training cohort and 79.2% and 0.872 in the validation cohort, respectively. Besides, comparing the performance of proposed approach (Fig. 3c) with the existing techniques on the basis of t test + SVM (Fig. 1b), the former achieved a much higher predictive precision in terms of the accuracy and AUC.

Fig. 3
figure 3

Construction and validation of the nomogram: a development of the nomogram based on the Radscore and independent clinical predictors (LD and LPD represent the longest diameter and the longest perpendicular diameter, respectively); b the risk calculated and its statistical inter-group distribution differences; c performance verification (AUC indicates the area under the curve of the receiver operating characteristic)

Table 3 Performance of the radiomics–clinical nomogram in discriminating between lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) in both training and validation cohorts

Additionally, the Hosmer–Lemeshow test yielded a p value of 0.893 without statistical significance, suggesting a favorable agreement between the predicted and observed results using this nomogram model. Clinical usefulness was assessed by decision curve analysis, as shown in Fig. 4, which indicated a greater net benefit than individually using the clinical model or the radiomics model as the risk larger than 0.1.

Fig. 4
figure 4

Clinical usefulness assessed by using the decision curve analysis indicating a greater net benefit than individually using the clinical model or the radiomics model

Discussion

In this study, we developed and validated a radiomics–clinical nomogram incorporating the multimodal MRI-based radiomics features and the primary clinical features for the preoperatively individualized discrimination and the risk stratification of the patients with LUSC and LUAD. The results of using the nomogram in both the training and the validation cohorts demonstrate a favorable discriminative power and clinical usefulness, suggesting that the proposed nomogram could be an effective, non-invasive and absolutely safe manner for the preoperative identification of histological subtypes of NSCLC.

In recent years, the MRI was widely used for a variety of cancers diagnosis like glioblastoma, nasopharyngeal carcinoma, lung cancer, bladder cancer, and prostate cancer [13, 17, 31,32,33,34,35,36,37,38]. Most of the diagnoses were based on the visual interpretation of the experts. With the rapid development of multimodal MRI and image analysis techniques, radiomics approaches based on multimodal MRI data have recently drawn great attention for cancer properties and subtype prediction and prognosis, preoperatively [15,16,17,18,19,20,21,22]. However, as for the NSCLC histological subtype discrimination, the feasibility and performance of the multimodal MRI-based radiomics approach remain largely unknown up to now. Therefore, we aimed to (i) investigate whether the radiomics features extracted from multimodal MRI could significantly reflect the tissue distribution differences between LUSC and LUAD, exploring a feasible way for preoperative discrimination between LUSC and LUAD and (ii) verify if integrating the radiomics features with the clinical features would further improve the discriminative power in this study.

Due to the different original grayscales of the T2WI, DWI and ADC images, grayscale standardization was indispensable prior to the CM, RLM, NGTDM and GLSZM features calculation in the process of radiomics feature extraction. In this study we implemented a multi-grayscale normalization strategy with five commonly normalized grayscales based on the previous researches [18, 19], to extract more features potentially useful for the discrimination task. With the Student’s t test and SVM-RFE approaches jointly used for feature selection, 13 features with significant inter-group differences were determined as the optimal features, and their classification results with both the training and validation cohorts demonstrated the feasibility and fairly good performance of the multimodal MRI-based radiomics strategy for the preoperative discrimination of patients with LUSC or LUAD.

Although the SVM classifier has several drawbacks, including the apparent complexity increase and large time consumption for large database [39, 40], its merits are also very apparent. Specifically, as for the small samples like the circumstance in this study, SVM can usually get favorable results using the limited datasets in the training set [39,40,41]. Besides, the generalizability of the SVM classifier is also remarkable in terms of the small and limited datasets [39, 40].

Among these optimal features selected, the sum of the absolute coefficients of the RLM features was the highest, potentially demonstrating that features well reflecting the regional heterogeneity of tumor tissues could better characterize the heterogeneous differences between LUSC and LUAD. In addition, the sum of the absolute coefficients of the features extracted from ADC maps was exceedingly the highest, indicating that the ADC maps could well reflect the histological differences between LUSC and LUAD of NSCLC.

Concerning that the primary clinical features like age, sex, smoking, side, location, LD and LPD of the target lesion, and CEA are commonly used for the clinical diagnosis of patients with lung cancer, whether incorporating these factors with the Radscore generated by the 13 optimal features would improve the discriminative performance was in great need to answer. The univariate and multivariable analyses results showed that age, smoking, the longest diameter of the target lesion, the longest perpendicular diameter of the target lesion and the Radscore were independent predictors for the discrimination task. Based on these predictors, a nomogram was then generated. The discriminative performance of the nomogram was evidently better than that of the radiomics model, apparently demonstrating that integrating the radiomics features with the primary clinical features could further improve the discriminative power. Besides, the Hosmer–Lemeshow test and the decision curve analysis results further demonstrate good predictive precision and clinical usefulness of the nomogram.

In recent years, only a few CT-based studies have investigated the performance of the radiomics strategy for preoperatively differentiating LUSC from LUAD. Previously, Zhu et al. [7] used 485 radiomics features extracted from 81 patients’ CT images to generate a radiomics signature for the discrimination task. It finally achieved an AUC of 0.893 in the validation cohort (48 patients) [7]. In another study, Linning et al. [42] adopted the preoperative non-enhanced CT images and dual-phase chest contrast-enhanced CT images acquired from 90 LUAD and 84 LUSC patients with the radiomics strategy to generate two predictive models, respectively. And the AUC of these models were 0.801 and 0.806, respectively. In a more recent study [2], Bashir et al. employed 115 radiomics features extracted from 106 patient’s CT images with the random forest classifier to develop the predictive model. And its performance in the validation cohort (100 patients) was really poor, with AUC of only 0.56 [2]. Comparing with the results of these studies, the proposed algorithm in our study achieved a favorable and compatible performance, with AUC of 0.901 and 0.872 in the training cohort and the validation cohort, respectively. Besides, the proposed approach in our study could also realize the quantitative estimation and risk stratification for patients with LUSC and LUAD, promisingly working as an effective and complementary tool to help the clinicians make appropriate treatment decisions.

Apart from the current study, development of the quantitative image-based diagnostic models for disease definition has received unprecedented attention these years [2, 7, 18, 25, 38, 43,44,45,46,47], not only in the field of cancer diseases, but also in more broad research fields, such as retinal diseases [43, 44], diabetes [46], calcaneal fracture [45], and mental disorders [36]. These models have achieved favorable performance in diseases diagnoses and understanding, demonstrating the great power and promising application of these approaches for clinical practice. However, the results of this study should be carefully interpreted due to several limitations. First, inherent bias might exist because of the retrospective nature of this current study with relatively limited patient cohorts. A larger amount of participants from two or more clinical centers are needed to further validate the overall performance of the proposed approach. Moreover, other potentially clinical factors, such as gene mutations and key molecular biomarkers, are not included in the current study because of the incomplete data in the archival database, and should be further analyzed.

Conclusions

The proposed multimodal MRI-based radiomics signature could be an effective tool for the quantitative description and discrimination of NSCLC subtypes. Additional integration of the significant clinical factor with the signature further improves the discriminatory power. Extensive multicenter validations of the proposed approach are required prior to real clinical application.

Methods

The institutional ethics review board of the Xijing hospital approved this retrospective study and waived the requirement for informed content. Overall methodology of this study is shown in Fig. 5.

Fig. 5
figure 5

The overall schematic outline of this study for the preoperative discrimination between squamous cell carcinoma (LUSC) and adenocarcinoma (LUAD)

Patients

This study consisted of 148 eligible patients in which all the lesions we included were postoperatively confirmed with LUSC or LUAD from a single clinical center between January 2015 and December 2018. Then, their preoperative imaging datasets were enrolled and used for model development. If he/she is a healthy subject, the tumor mass will not be observable and delineated for feature extraction. Therefore, it is impossible and unnecessary to launch the model for the prediction. According to the previous studies [21, 37, 48,49,50], we randomly allocated the entire datasets into the training cohort and the validation cohort. Therefore, 100 patients (79 males and 21 females) with postoperative pathologically confirmed LUSC (n = 50) or LUAD (n = 50) were allocated as the training cohort for model development, and 48 patients (33 males and 15 females) were allocated as the independent validation cohort. The overall inclusion and exclusion criteria are illustrated in Fig. 6. Only the lesions greater than 8 mm were included in this study to ensure sufficient counting statistics and consistent region of interest analysis. Besides, the primary clinical features including age, sex, smoking, side, location, LD, LPD, and CEA were obtained from the archival medical records. Postoperative histological subtypes were used as the true label of the NSCLC patients.

Fig. 6
figure 6

Inclusion and exclusion criteria of this study (LUSC and LUAD represent the lung squamous cell carcinoma and lung adenocarcinoma, respectively)

Image acquisition and region of interest delineation

All patients underwent MRI using a 1.5 T scanner (MAGNETOM Aera, Siemens Medical Solutions, Erlangen, Germany) with an 8-channel phased-array torso coil. MRI sequences, including T2-weighted and Diffusion-weighted MRI sequences, were performed to obtain the corresponding images. The ADC maps were derived automatically from the DWI using a biexponential model with b values of 50 and 800 s/mm2. The primary parameters of these sequences were described in the Additional file 1.

Before tumor region of interest (ROI) delineation, the axial image slice for each MRI modality was selected based on obtaining the largest area of the archived tumor with the maximal size in each patient’s lung region. Then, a manually depicted polygonal ROI was used to segment the tumor area on the selected images. Two radiologists with 9 and 5 years of MRI interpretation experience of lung cancer, independently performed tumor delineation with a custom-developed package. Then, divergence of their delineation results was carefully corrected by consensus. Considering that the ADC maps were calculated from the DWI using the biexponential model, the tumor ROIs obtained from the DWI were mapped onto the ADC maps to extract the corresponding tumor regions. Examples of the ROI delineation results are illustrated in Fig. 7.

Fig. 7
figure 7

Examples of the delineated lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) on the multimodal MRI data

Feature extraction

The image features including 8 histogram features, 39 CM features, 33 RLM features, five NGTDM features [29], and 15 GLSZM features [30], were extracted from the tumor ROIs of the MRI data to fully characterize the local, regional and global tissue distribution variations of the tumor [18, 50]. Detailed feature information is shown in Additional file 1: Table S1. Due to the different grayscales of the original T2W, DW and ADC images, which is prior to the second-order (CM features) and higher-order (RLM, NGTDM, and GLSZM features) texture feature extraction, a multi-grayscale standardization strategy was performed on all the tumor ROIs delineated from three MRI modalities by using 8, 16, 32, 64, and 128 grayscales [15, 17, 22]. Then, a total of 1404 features were obtained, and their values were linearly normalized in the range of − 1 to 1 to reduce the computational burdens. The feature extraction process was performed using a publicly shared MATLAB package available online [18, 19, 51].

Feature selection, predictive performance evaluation, and Radscore generation

Multiple test methods were utilized in combination to select the optimal features for the discrimination between LUSC and LUAD of patients with NSCLC. First, the Student’s t test was employed to select the features with statistically significant differences between the two groups in the training cohort. Subsequently, SVM-RFE approach was adopted to select an optimal feature subset from these features in the training cohort [37], and its differentiation performance was evaluated with both the training and the validation cohorts. Detailed description on SVM-RFE has been summarized in the Ref. [18, 19]. After that, a logistic regression algorithm was performed with these optimal features in the training group to obtain the coefficient of each feature and the intercept for Radscore formula generation [18, 19, 37]. Based on the formula, the Radscore of each patient in the two patient cohorts was then computed for the further analysis [18, 19, 37].

Radiomics–clinical nomogram development and its predictive performance assessment

After exploring the feasibility and evaluating the performance of the radiomics model for the discrimination between LUSC and LUAD, whether the inclusion of both radiomics and the primary clinical features could improve the diagnostic accuracy for the discrimination task was further investigated. First, the univariate and multivariable regression analyses were performed with the Radscore and the clinical features in the training cohort to determine the independent predictors for the discrimination between LUSC and LUAD [19, 49, 52]. Then, the nomogram based on these independent predictors was developed using the training cohort [19, 49, 52], and its predictive performance was quantitatively assessed in terms of the sensitivity, specificity, accuracy, and AUC of receiver operating characteristic (ROC) using both the training and validation cohorts [19, 49, 52]. Among these metrics, sensitivity measures the percentage of positives samples which are correctly identified, specificity evaluates the proportion of negatives samples which are correctly predicted, and the accuracy is the ratio of all samples which are correctly identified, as shown in Eq. 1, where TP, TN, FP, FN are the abbreviations of true positive, true negative, false positive and false negative, respectively [52,53,54]. The AUC measures the area under the curve of the receiver operating characteristic (ROC) after the test, assessing the general performance of the predictive model [53,54,55]. The Hosmer–Lemeshow test and decision curve analysis were performed to verify the precision and net benefit of the nomogram in clinical applications [56].

$$ \left\{ \begin{aligned} {\text{Sensitivity}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} \hfill \\ {\text{Specificity}} = \frac{\text{TN}}{{{\text{TN}} + {\text{FP}}}} \hfill \\ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{\begin{array}{*{20}c} {{\text{All}}\,{\text{samples}}} \\ \end{array} }} \hfill \\ \end{aligned} \right. $$
(1)

Statistical analysis

All statistical analyses were performed using R statistical software (version 3.4.4., × 64), R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/ and two-sided p values less than 0.05 were considered to be significant [19, 49, 52]. Univariate and multivariable regression analyses were applied to identify independent predictors for the discrimination task [19, 49, 52]. The Hosmer–Lemeshow test was performed to quantitatively assess the calibration and agreement between the predicted and observed results, and decision curve analysis was employed to evaluate the clinical usefulness of the proposed nomogram model.

Data statement

The datasets in this study are currently not available for freely public access owing to the patient privacy concerns, but may be obtainable from the corresponding authors with the reasonable request approved by the institutional review boards.