Background

Lung cancer remains the leading cause of cancer incidence and death worldwide in 2018, when it was estimated to comprise 2.1 million new cases and cause 1.8 million deaths [1]. Among all patients with lung cancer, lung adenocarcinoma accounts for the largest proportion, and its incidence continues to increase [2]. The literature has shown that phenotypes of cancers are dependent on the tumor microenvironment, especially tumor-infiltrating immune cells [3, 4]. The immune response is characterized by many types of cells, and differences in their potential prognostic value depend on the cancer type [5]. Computational approaches were adopted to characterize immune-interactions in lung adenocarcinoma, mainly focused on B cell and CD8+ T cell [6]. Meta-analysis also showed that CD8+, CD3+ and FOXP3+ T cells infiltration had significantly prognostic value in lung adenocarcinoma [7]. Furthermore, the prognostic significance of integrated immune infiltration in some cancers has been validated, and this has been proposed to supplement the tumor, node, and metastasis (TNM) staging system for predicting patient survival [8,9,10,11,12,13]. However, very few studies reported integrated immune infiltration in lung adenocarcinoma.

Most previous studies have adopted an immunohistochemistry method to evaluate the landscape of tumor-infiltrating immune cell subtypes. However, a bioinformatics tool, known as the CIBERSORT algorithm, has been previously developed for computational enumeration of immune cell subtypes. The landscape and proportions of tumor-infiltrating immune cells are evaluated on the basis of the changes in expression of immune-relative and other genes in one sample. This meta-gene approach has been well designed and validated in previous studies [9, 14,15,16].

In the present study, CIBERSORT was used to estimate the fraction of immune cell types based on different lung adenocarcinoma gene expression profiles. Least absolute shrinkage and selection operator (LASSO) Cox regression analysis was utilized to establish an immune cell infiltrating score model in both training and validation cohorts. We believe that the immune infiltrating score could assist in predicting survival, along with clinical factors, in lung adenocarcinoma.

Methods

Study cohort and data processing

We adopted the public datasets from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) as the training cohort. Four GEO datasets (GSE30219, GSE37745, GSE50081 and GSE68465, the latter also known as the Director’s Challenge Consortium) were included in this study. All of the four datasets were derived from the genechips of Affymetrix® (Santa Clara, California, USA). GSE30219, GSE37745 and GSE50081 were based on the GPL570 genechip, while GSE68465 were from the GPL96 microarray. Then, raw data and genechip files were downloaded directly. A total of 85, 106, 127 and 443 samples of lung adenocarcinoma from GSE30219, GSE37745, GSE50081 and GSE68465 datasets were enrolled. We used a robust multichip average method via RMAExpress for background adjustment, quantile normalization and summary to process the gene profiles [17,18,19].

Level 3 RNA sequence data from lung adenocarcinoma samples were also downloaded from The Cancer Genome Atlas (TCGA) before June 5, 2018 (https://portal.gdc.cancer.gov/). A total of 594 samples were obtained, comprising 535 adenocarcinoma and 59 normal lung samples. Baseline clinicopathological factors and treatment information were also acquired from TCGA. Primary lung adenocarcinoma samples with complete follow-up and baseline information (age, sex and pathological stage) were included as the validation cohort in this study. Due to the original records of all included datasets, the seventh edition of the TNM staging method was utilized in our study. Information regarding histologic subtypes of lung adenocarcinoma was intergraded with records from the University of California Santa Cruz Xena database (http://xena.ucsc.edu/) and previous literature [20,21,22]. According to previous studies, tumors were classified into three subgroups (low-, intermediate- and high-grade adenocarcinoma) regarding the survival data for each predominant lung adenocarcinoma subtype [23, 24].

Estimation of tumor-infiltrating immune cell types

The CIBERSORT method was adopted to quantify the proportions of the immune cell in both training and validation cohorts [25, 26]. Common methods, like flow cytometry and immunohistochemistry, rely on few phenotype markers and tissue disaggregation prior to cytometry, may cause damaged or lost cells, altering analyses [26]. The CIBERSORT was developed to accurately evaluate the relative levels of the 22 immune cell phenotypes, especially closely related types, using signature gens within complex expression mixtures [26]. Such mixtures could derive from both patients’ solid tissues and blood profiled by array or RNA-sequencing [26]. The 22 immune cells are mainly composed of B cells, T cells, macrophages, dendritic cells, plasma cells, natural killer cells and mast cells. To process data by the CIBERSORT, the targeted gene profile was uploaded to its website (https://cibersort.stanford.edu). Then, a heat-map table with levels of 22 immune cell subtypes returned as the final results. As previously described, we set the threshold P-value < 0.05 and excluded samples without immune infiltration in training (5 patients) and validation (78 patients) cohorts [25]. The optimal cut-off values of the proportions of different immune cells in the training cohort were calculated on the basis of the prognostic significance using X-Tile software [27]. Each immune cell fraction level of samples from both cohorts was then divided into two groups based on the cut-off point. The immune cell fraction level divided by the cut-off value was valued as 0 or 1 in the subsequent scoring formula.

Statistical analysis

All statistical analyses in this study were performed using R version 3.5.0 (R Foundation for Statistical Computing, Vienna, Austria) and IBM SPSS Statistics 22.0 (IBM, Inc., Armonk, NY, USA). We used the LASSO Cox regression model to select the ideal prognostic model among the estimated immune cell subtypes using the glmnet package in R [28]. The optimal model parameter λ and corresponding coefficients were determined by tenfold cross validations. We chose λ via 1 − SE (standard error) criteria as previously recommended [8, 9]. Based on the final immune infiltrating score model and corresponding coefficients, a formula for the score was constructed using the training cohort. The sensitivity and specificity of the prognosis prediction of the immune cell infiltrating score were evaluated by a time-dependent receiver operating characteristic (ROC) curve in R software. The nearest neighbor estimation method in the ROC curve was adopted with cut-off times of 1, 3 and 5 years. Survival curves were estimated using the Kaplan–Meier method and the log-rank test was utilized for comparing survival curves. Univariate and multivariate Cox proportional hazard models were adopted to identify all significant prognostic factors. The prognostic value of the included variables was evaluated by Harrell’s concordance index (c-index) [29]. The score stratified by subgroups were compared using the one-way ANOVA method, if normal distribution and homogeneity of variance were followed. Correlation analysis was also conducted to explore the potential relationship between the score and the expression of immune checkpoint regulators. In this study, a two-tailed P-value of < 0.05 was considered statistically significant.

Results

Construction of the immune cell infiltrating score model

After the initial selection process, 751 and 418 patients were enrolled as the training and validation cohorts, respectively. The characteristics of the patients are listed in Table 1 and Additional file 1: Table S1. Optimal cut-off values for the fraction of each immune cell type in the training cohort were generated as described in “Methods” section (Additional file 2: Table S2). LASSO Cox regression analysis was performed, based on the fraction of immune cells in the training cohort (Fig. 1a, b). Finally, the formula of the immune cell infiltrating score for lung adenocarcinoma in the training cohort was built (Additional file 3). The prognostic accuracy of the scoring model was evaluated using time-dependent ROC analysis in the training cohort. The predictive time points were set at 1, 3 and 5 years, respectively (Fig. 2a). The area under the ROC curve for the immune infiltrating score at each time point was 0.674, 0.684 and 0.675, respectively.

Table 1 Baseline characteristics of patients with lung adenocarcinoma in the training cohort
Fig. 1
figure 1

a Coefficient profiles of immune cell types in the LASSO Cox regression model. b Tenfold cross-validation for turning parameter selection in the LASSO Cox regression model. λ is the turning parameter. The partial likelihood deviance is plotted in log(λ), in which vertical lines are shown at the optimal values by minimum criteria and 1 − SE criteria

Fig. 2
figure 2

a The immune cell infiltrating score evaluated by time-dependent ROC curves at 1, 3 and 5 years in the training cohort. The area under the curve at each time point was 0.674, 0.684 and 0.675, respectively. b Survival analysis stratified by immune cell infiltrating score in the training cohort (P < 0.001). c The immune cell infiltrating score evaluated by time-dependent ROC curves at 1, 3 and 5 years in the validation cohort. The area under the curve at each time point was 0.650, 0.616 and 0.567, respectively. d Survival analysis stratified by immune cell infiltrating score in the validation cohort (P = 0.013)

Survival analyses were performed stratified by the score, in the training cohort (Fig. 2b). Patients were separated into low- and high-score groups by the median score of the cohort. We observed that patients with a high score had significantly worse survival than those with a low score (P < 0.001, Fig. 2b). A high score was also associated with significantly shorter progression-free survival (P < 0.001, Additional file 4: Figure S1).

As a continuous variable, the immune cell infiltrating score was found to have significantly prognostic value using univariate Cox regression analysis (P < 0.001, Table 2). Multivariate analysis confirmed that the immune infiltrating score was an independent prognostic factor in the training cohort, along with the TNM staging system (P < 0.001, Table 2). According to the corresponding c-index, the combination of the scoring model and TNM stage showed a significantly better predictive value than the TNM stage alone in this cohort (P < 0.001, Table 3).

Table 2 Results of univariate and multivariate Cox regression of overall survival in the training cohort
Table 3 Comparison of the accuracy of survival prediction between the TNM stage with and without the immune infiltrating score

Validation of the immune cell infiltrating score model

We adopted a validation group based on the dataset from TCGA to evaluate the prognostic value of the proposed scoring model. The same formula for the immune cell infiltrating score and optimal cut-off point for each immune cells were applied to the validation group. Likewise, ROC analysis was adopted to assess the prognostic value of the scoring model. The area under the curve was 0.650, 0.616 and 0.567 at the predictive time of 1, 3 and 5 years, respectively (Fig. 2c). In the validation group, a high score was associated with significantly worse prognosis (P = 0.013, Fig. 2d).

Consistent with the previous findings, the score was an independent prognostic factor in the validation cohort, based on the univariate and multivariate Cox regression model (P = 0.001 and 0.002, Additional file 5: Table S3). Furthermore, the combination of the scoring model and TNM stage improved the prognostic model of lung adenocarcinoma in this cohort (P < 0.001, Table 3).

Clinical significance and bioinformatics analyses based on the immune cell infiltrating score

In the training group, data regarding adjuvant chemotherapy were documented only in the GSE68465 and GSE37745 datasets. Patients were divided into low and high score groups based on the score median value. Given the differences in staging methods and initial study design, patients with stage I disease were excluded from this analysis. The survival advantage of the low-score group was observed in patients with or without adjuvant chemotherapy (P = 0.004 and 0.002, Additional file 6: Figure S2A, B). Moreover, there was a trend for better prognosis in patients with a low score who received adjuvant chemotherapy, although the difference was not statistically significant (P = 0.909, Fig. 3a). No significant prognostic difference regarding adjuvant chemotherapy was observed in high score group (P = 0.764, Fig. 3b). We also extracted data on adjuvant chemotherapy from the validation group. There was a trend for better survival in low score group who received chemotherapy (P = 0.084, Additional file 6: Figure S2C). No difference was found in patients without chemotherapy (P = 0.761, Additional file 6: Figure S2D). Furthermore, patients with a low score may benefit from adjuvant chemotherapy (P = 0.010, Fig. 3c), while no difference was observed in high-score group (P = 0.213, Fig. 3d).

Fig. 3
figure 3

a Survival analysis for patients (stage II–IV) with low scores stratified by the receipt of adjuvant chemotherapy in the training cohort (P = 0.909). b Survival analysis for patients (stage II–IV) with high scores stratified by the receipt of adjuvant chemotherapy in the training cohort (P = 0.764). c Survival analysis for patients (stage II–IV) with low scores stratified by the receipt of adjuvant chemotherapy in the validation cohort (P = 0.010). d Survival analysis for patients (stage II–IV) with high scores stratified by the receipt of adjuvant chemotherapy in the validation cohort (P = 0.213)

Smoking status was included in two datasets (GSE50081 and GSE68465) for the training cohort. Patients with unknown smoking status were excluded. We observed that patients who are current smokers had significantly higher score than those who do not smoke (P < 0.001, Fig. 4a). In the validation cohort, patients who are current or reformed smokers (< 15 years) had higher scores than non-smokers or reformed smokers (> 15 years), although the difference was not statistically significant (P = 0.932, Fig. 4b).

Fig. 4
figure 4

a Histograms of the immune cell infiltrating score stratified by the patient smoking status in the training cohort (P < 0.001). b Histograms of the immune cell infiltrating score stratified by the patient smoking status in the validation cohort (P = 0.932). c Histograms of the immune cell infiltrating score stratified by the histologic subtype of lung adenocarcinoma in the validation cohort (P = 0.003). High-grade: micropapillary or solid predominant lung adenocarcinoma; intermediate-grade: acinar or papillary predominant lung adenocarcinoma; low-grade: lepidic predominant lung adenocarcinoma

Furthermore, we explored potential relationships between the immune cell infiltrating score and histologic subtypes of lung adenocarcinoma. In the validation cohort, 186 patients with information on histologic subtypes were included. We found that high-grade adenocarcinoma subtypes were associated with significantly higher score than low-grade tumors (P = 0.003, micropapillary/solid vs acinar/papillary vs lepidic predominant lung adenocarcinoma, Fig. 4c).

Owing to the difference of the microarray platforms used for the training cohort, correlation analyses between the immune infiltrating score and the expression of immune checkpoint regulators or inflammatory mediators were performed in the validation cohort. We found that there was a significant positive correlation between the score and some of the regulators and mediators, including PD-L1 (P = 0.002), IDO1 (P = 0.003) and LAG3 (P = 0.017), IFNB1 (P = 0.002), IL-1A (P = 0.005), TNFA (P = 0.010) and IL-6 (P = 0.007) (Additional file 7: Table S4).

Discussion

In this study, we adopted the CIBERSORT method to calculate the fraction of 22 immune cell in lung adenocarcinoma. This newly developed algorithm can work accurately for bulk tumor samples profiled by microarray or RNA-Sequencing. And unlike flow or mass cytometry-based methods, it is applicable to archived RNA and cellular specimens [26]. Previous studies have validated the efficacy of the CIBERSORT method [14,15,16]. On the basis of the estimated fractions of signature immune cells, LASSO regression was utilized to construct an immune cell infiltrating score model, of which the predictive accuracy has been demonstrated previously [14, 28, 30, 31]. In both cohorts, the prognostic value of the scoring model was confirmed. A lower score was associated with significantly better prognosis, which may be due to more immunity-activating lymphocyte infiltrations. Further analyses also implied that the score could significantly improve the accuracy of survival prediction when combined with the TNM staging system. Significant correlation between the score and the expression of common immune checkpoint regulators or inflammatory mediators was also observed, including PD-L1, IDO1 and LAG3, which further confirmed its potential values. Previous studies also indicated that the expression of PD-L1 and LAG-3 was related with early recurrence and poor prognosis [32, 33]. In a word, we hope that our work will provide new insights into the construction of staging or prognostic models in lung adenocarcinoma.

The role of different tumor-infiltrating lymphocytes in lung adenocarcinoma has been explored separately [7, 34]. Kinoshita et al. [35] found that CD8+ T-cells accumulation was identified as a negative prognostic factor in lung adenocarcinoma, but not in lung squamous carcinoma. Comprehensive immune profiling of lung adenocarcinoma revealed plasma cell infiltration was related to worse prognosis [36]. Mast cell exosomes were also reported to promote lung adenocarcinoma cell proliferation [37]. The function of mast cells against or in favor of the cancers still requires investigations [38, 39]. Previous studies also observed that the prognostic implication differs according to histological types and smoking status [35, 40]. The potential role of tumor-infiltrating lymphocytes may require exploration comprehensively according to immune microenvironment. The functionality of CD8+ tumor-infiltrating lymphocytes was reported to be affected by competition between antitumor and exhaustion programs [41]. Choi and Na investigated the relationship between the immune landscape and tumor glucose metabolism in lung adenocarcinoma [42]. Varn et al. adopted computational approaches to characterize tumor-immune interactions, mainly focused on six immune cell subtypes [6]. In our study, we adopted the CIBERSORT method with 22 cell phenotypes in lung adenocarcinoma like the study of Gentles et al. [5]. The final formula of the immune cell infiltrating score was composed of 13 types of immune cells using LASSO regression, which may better characterize the potential internal interactions and microenvironment.

Several studies have focused on the relationships between tumor-infiltrating lymphocytes and the efficacy of adjuvant chemotherapy [8, 43,44,45]. In this study, the potential associations with chemotherapy were not all statistically significant. The weak effect may be due to small sample size in this part of analyses and the difficulty in finding the optimal cut-off value for the score. Previous studies showed that the chemotherapy sensitivity may depend on the lymphocytes infiltrating the tumor [8, 9]. One potential mechanism is that the interferon secreted by lymphocytes could sensitize cells to chemotherapy [46]. The innate and adaptive immune responses may also be activated by immunogenic tumor cell death [47]. More investigations may be required for the underlying mechanisms [48]. Our study also indicated that the score was significantly associated with patient smoking status and histologic subtype, which may help us to better characterize patient subgroups and choose the optimal treatment.

The strength of this study is the large cohort derived from several institutions, which outnumbered than most similar researches of lung adenocarcinoma. We adopted a newly developed CIBERSORT method to estimate the level of tumor-infiltrating lymphocytes, which outperformed other methods (like immunohistochemistry) regarding noise, unknown mixture content and closely related cell types. There are also some limitations that should be noted. First, the public datasets in this study were based on different genechips, but we adopted the RMAExpress method which designed for Affymetrix® microarrays and merged the raw data. All the datasets were collected with different study purposes, which may lead to heterogeneities in baseline features and therapies. Also, only retrospective analyses were made. Future prospective studies will be needed to confirm the findings. Second, some clinicopathologic factors were missing or incomplete, especially histological subtype, smoking status and adjuvant chemotherapy. Further investigations should collect more clinical factors and endpoints, along with possible alternative methods for testing and validation, which may provide more evidence for lung adenocarcinoma [13]. In addition, optimal cut-off value and practical approaches are essential for future clinical application.

Conclusion

An immune cell infiltrating score model was established based on the immune cells in lung adenocarcinoma. A high score was associated with significantly worse survival. The scoring model could improve the accuracy of predictions of survival when combined with the staging system. It was also shown that the score was closely associated with the smoking status, histologic subtype and the expression of some immune checkpoint regulators or inflammatory mediators. This study may provide new insights into the construction of staging or prognostic models in lung adenocarcinoma. The close associations between the immune infiltrating score and clinical factors may help us to better identify patient subgroups and choose the optimal treatment for patients with lung adenocarcinoma.