Comparison of the tumor immune microenvironment and checkpoint blockade biomarkers between stage III and IV non-small cell lung cancer

Background Adjuvant immune checkpoint blockade (ICB) following chemoradiotherapy and adding ICB to chemotherapy have been key advances for stages III-IV non-small cell lung cancer (NSCLC) treatment. However, known biomarkers like PD-L1 are not consistently indicative of ICB response. Other markers within the tumor immune microenvironment (TIME) may better reflect ICB response and/or resistance mechanisms, but an understanding of how TIMEs differ between stage III and IV NSCLC has not been explored. Methods Real-world data from unresectable, stage III-IV, non-squamous, pretreatment NSCLCs (stage III n = 106, stage IV n = 285) were retrospectively analyzed. PD-L1 immunohistochemistry (IHC) was compared to CD274 gene expression. Then, differential gene expression levels, pathway enrichment, and immune infiltrate between stages were calculated from whole-transcriptome RNA-seq. Analyses were stratified by EGFR status. Results PD-L1 IHC and CD274 expression in tumor cells were highly correlated (n = 295, P < 2.2e-16, ⍴ = 0.74). CTLA4 expression was significantly increased in stage III tumors (P = 1.32e-04), while no differences were observed for other ICB-related genes. Metabolic pathway activity was significantly enriched in stage IV tumors (P = 0.004), whereas several immune-related KEGG pathways were enriched in stage III. Stage IV tumors had significantly increased macrophage infiltration (P = 0.0214), and stage III tumors had a significantly higher proportion of CD4 + T cells (P = 0.017). CD4 + T cells were also relatively more abundant in EGFR-mutant tumors vs. wild-type (P = 0.0081). Conclusion Directly comparing the TIMEs of stage III and IV NSCLC, these results carry implications for further studies of ICB response in non-resectable stage III NSCLC and guide further research of prognostic biomarkers and therapeutic targets. Supplementary Information The online version contains supplementary material available at 10.1007/s00262-022-03252-y.


Introduction
The addition of adjuvant or consolidative immune checkpoint blockade (ICB) following concurrent chemoradiation was the key practice-changing development for locally advanced non-small cell lung cancer (NSCLC) in the last decade. Recently published 5-year survival results from the PACIFIC study demonstrate patients who initiate ICB early carry a significant risk reduction for 5-year overall survival [1]. Additionally, the IMpower010 and CheckMate 816 trials have shown survival benefit in even earlier-stage patients, where ICB plus chemotherapy was favorable compared to chemotherapy alone in patients with stage IB-IIIA NSCLC [2,3]. These data suggest an approach leveraging both ICB and chemotherapy in the first line may be more efficacious. Importantly, this combination strategy has significantly improved survival rates of patients with stage III or IV NSCLC. However, across both stages, we do not have a strong understanding of which tumor immune microenvironment (TIME) signatures are associated with response or resistance to ICBs. Furthermore, although ICB is a promising avenue for improving the treatment of stage III NSCLC, some clinical trials have shown negative results for primary endpoints and/or reported higher than expected rates of grade 3/4 adverse events, such as pneumonitis [4][5][6][7].
Several TIME mechanisms may contribute to response, resistance, or immune-related toxicities following ICB administration and help identify patient populations likely to experience durable response. The current standard in assessing eligibility for ICB treatment in NSCLC is levels of PD-L1 protein in the tumor, as commonly prescribed ICBs directly target PD-L1 or its receptor PD-1. While PD-L1 scores from immunohistochemistry (IHC) may enrich prediction of response to ICB monotherapy or combination therapy in advanced NSCLC, some evidence suggests responders are not limited to those with high PD-L1 expression [8]. For instance, a recent meta-analysis of 7,617 patients from ICB clinical trials concluded that a subset of patients defined as PD-L1 negative still benefit from ICBs [9]. Other FDA-approved biomarkers such as tumor mutational burden (TMB) and microsatellite instability (MSI) have shown only modest predictive ability, illustrating the clear, ongoing need for better biomarkers of ICB response in NSCLC [10]. TIME-related markers and genetic signatures increasingly have been associated with response to ICB and may offer more comprehensive biomarkers to account for the complexity of ICB treatment [8,[10][11][12][13].
An understanding of how TIMEs differ between locally advanced and metastatic NSCLC has not been thoroughly explored, and knowledge of those differences will be crucial to optimizing therapeutic efficacy. Furthermore, studies including large-scale, real-world data are needed to assess the molecular landscape of patients who may be ineligible for controlled trials due to restrictive inclusion criteria [14].
Combining RNA-seq, DNA-seq, IHC, and real-world clinical data from the Tempus Database, we retrospectively analyzed a cohort of pretreatment tumors from patients with unresectable stage III and IV non-squamous NSCLC. We provide a comparison of the TIME in stage III-IV NSCLC by using whole-transcriptome RNA-seq to measure relative gene expression levels, pathway enrichment, and immune infiltrate proportions. To our knowledge, this direct comparison of RNA data between locally advanced and metastatic NSCLC has yet to be described in the literature. In addition, we presented an evaluation of PD-L1 (CD274) levels from both IHC and RNA-seq data among the cohort. Together, this study supplies a comprehensive characterization of the immune landscape of unresectable stage III and stage IV NSCLC tumors and may guide future research of ICB biomarkers and novel drug discovery.

Cohort overview
A representative sample of 400 de-identified health records from patients with non-squamous, unresectable stage III-IV NSCLC was selected from the Tempus Database. The cohort was restricted to pretreatment tumors to reduce medicationassociated transcriptional variation and only included nonsquamous histology to limit molecular heterogeneity within stages. After applying inclusion criteria based on histology and biopsy procedure, where biopsies from the lung and/or airway were considered as primary and included (see Methods: Cohort Selection), the final stage III and IV cohorts contained 106 and 285 samples, respectively. Oncogenic driver frequencies, demographics, and clinical characteristics of the cohort are provided in Fig. 1 and Table 1. There were no significant differences in age at biopsy collection, sex, or smoking history between stage III and IV patients. Likewise, the frequencies of common oncogenic driver mutations were similar between stages. EGFR mutations were observed in 11% (n = 11) of stage III and 15% (n = 42) of stage IV tumors, similar to previous large-scale evaluations of real-world NSCLC data [15]. While TMB and tumor purity did not significantly differ between the two stages, EGFR-mutant tumors had a significantly lower TMB than EGFR wild-type (WT) tumors, reflecting previous reports [15,16]. To account for this difference and other anticipated EGFR-related effects, the cohorts were further divided by EGFR status for comparisons between EGFR-mutant (stage III n = 11 and stage IV n = 41) and WT tumors in addition to non-EGFR-stratified analyses. Other targetable alterations such as ALK fusions were not subset for analyses as they represented smaller portions of the cohort and could not be adequately powered for statistical comparisons.
After establishing concordance between PD-L1 TPS and CD274 expression in the combined cohort, we next compared PD-L1 TPS and CD274 expression by stage. There was no significant difference in PD-L1 TPS (Fig. 3a) or CD274 expression (Fig. 3b) between stage III and IV tumors. No significant differences by stage in either PD-L1 TPS or TMB were observed after stratifying by EGFR, STK11, KEAP1, or KRAS mutation statuses ( Supplementary Fig. 1).

Transcriptome-wide differential expression analysis between stage III and IV NSCLC tumors
We next characterized gene expression differences by stage across the transcriptome using a multivariable linear model. At a false discovery rate (FDR) of 5%, 205 genes were differentially expressed between stage III and IV tumors (Fig. 4a Table 4). Among the 1,526 differentially expressed genes identified, only 12 also significantly differed by stage (C1orf111, CCR4, POU5F1, CDCA7, CYP4A11, CD83, BTN2A2, PBXIP1, OSGEP, DYNLL1, CD1A, and PIM3), suggesting the expression differences identified between stage III and stage IV NSCLC tumors were not driven by EGFR-mutated tumors.

Immune checkpoint biomarker expression between stage III and IV NSCLC tumors
While CTLA4 was significantly differentially expressed by stage, no other selected ICB genes were (Supplementary  Table 3), nor was CTLA4 differentially expressed by EGFR (n = 53), KRAS (n = 160), STK11 (n = 59), or KEAP1 (n = 39) mutation status (Supplementary Table 4). The analysis of ICB gene expression by driver mutation status revealed mostly driver-specific differences. While CD276 was

Gene set enrichment analysis (GSEA) of differentially expressed genes in stage III and IV NSCLC
To further characterize the gene expression differences between stage III and stage IV tumors, we performed gene set enrichment analysis (GSEA) on all genes with significant differential expression using the Kyoto Encyclopedia of Genes and Genomes (KEGG), Hallmark, and Gene Ontology (GO) gene sets. In stage IV tumors, the activity of metabolic pathways was significantly enriched (P = 0.004) ( Fig. 5a-b). The herpes simplex virus 1 (HSV-1) pathway was also enriched in stage IV tumors (P = 0.014), which was mostly driven by expression of genes from the zinc finger protein (ZNF) family (Fig. 5c).
In contrast, several immune-related KEGG pathways were enriched in stage III tumors, including TNF signaling (P = 0.017), cytokine receptor interaction (P = 0.02),  (Fig. 5a). Underlying these pathways is a shared enrichment of several core pathway genes. Apart from the cytokine-cytokine receptor interaction pathway, all KEGG pathways enriched in stage III tumors shared relatively increased expression of genes from the JUN and FOS family. Meanwhile, IL-17 signaling, rheumatoid arthritis, and TNF signaling KEGG pathways were all linked to the cytokine-cytokine receptor interaction pathway through expression of CXCL2 (Fig. 5c). A similar enrichment of immune regulatory pathways in stage III tumors was observed following Hallmark and GO GSEA ( Supplementary Fig. 5 and Supplementary Fig. 6, respectively).

Differences in immune cell proportions estimated from RNA-seq data
As differences in the immune cell composition of the TIME have been previously shown to correlate with survival and ICB treatment efficacy in patients with NSCLC, we next estimated the relative immune cell proportions from RNAseq data in each tumor and compared between subsets of NSCLC tumors in the cohort [10,[17][18][19]. While overall estimated immune proportion did not differ by stage (Fig. 6) or EGFR status ( Supplementary Fig. 7), there were differences observed in the proportions of individual cell types. Stage IV tumors had a significantly increased proportion of macrophages relative to stage III (P = 0.0214). Besides macrophage infiltration, differences were measured in CD4 + T cell quantities present in the TIME of each clinical subset. The proportion of CD4 + T cells was significantly higher in stage III tumors compared with stage IV (P = 0.017) (Fig. 6), and CD4 + T cells were relatively more abundant in EGFRmutant tumors compared with WT (P = 0.0081) (Supplementary Fig. 7).

Discussion
As concurrent treatment with ICBs in the first line has become standard care for stage III and IV NSCLC, clarifying distinctions between the TIMEs of these two groups may provide a new perspective on earlier-stage disease characteristics and spur opportunities to improve therapeutic strategies for both stages. Here, we assessed the transcriptomic landscape of stage III and IV NSCLC to establish a basic understanding of differences between these two populations and observed a variety of distinctions in gene expression, pathway activation, and estimated immune infiltration. To our knowledge, this direct comparison of RNA data between stage III and IV NSCLC has yet to be described in the literature.
While transcriptomics is of interest for future clinical applications, the current standard ICB biomarker is PD-L1 IHC. In this dataset, CD274 gene expression from where a positive fold change (red) represents genes with increased expression in stage IV tumors, and negative fold change (blue) represents increased expression in stage III tumors. Dot size represents the number of genes evaluated in the pathway RNA-seq was highly correlated with PD-L1 TPS. Along with previously reported concordance between PD-L1 IHC and RNA-seq [20], these results suggest the viability of CD274 RNA assessment as a proxy biomarker for PD-L1 IHC. Gauging PD-L1 levels initially from RNA-seq may be beneficial for patients as sequencing can simultaneously measure immunological signals and molecular characteristics often obtained through multiple approaches [21], as illustrated in our subsequent analyses of gene expression, pathway signaling, and estimated immune cell proportions. Although companion diagnostic IHC tests currently do not incorporate PD-L1 expression on TILs, it should be noted that TIL PD-L1 IHC was not correlated with overall CD274 expression in this cohort. This is likely because the bulk RNA-seq data generated here were from a mixture of cells that were mainly tumor in origin. Future explorations of PD-L1 concordance with CD274 could incorporate spatial transcriptomics to ameliorate the issues with RNA-seq of mixed cell populations [22,23]. While the broad scope of whole-transcriptome RNA-seq is somewhat limiting in that sense [24], the same attribute is beneficial for unbiased large-scale assessments, which was highlighted in our differential expression analyses. Interestingly, CD274 (and, accordingly, PD-L1 IHC) expression was not significantly different between stage III and IV cancers, nor was the expression of PDCD1 or TIGIT. Despite these findings and the substantial risk associated with inoperable stage III NSCLC, the recent breakthrough designation for dual blockade of TIGIT/PD-L1 was restricted to stage IV patients [25]. Along the same lines, combination of PD-L1 blockade with the CTLA4 inhibitor ipilimumab is currently not approved for stage III NSCLC patients, but CTLA4 expression in this cohort was significantly higher in stage III tumors, although we acknowledge that expression of targets does not always correlate with more favorable responses and the potential for increased toxicity with dual CTLA4 blockade is a consideration. In addition to the differential expression observed in the RNAbased estimates of those clinically relevant biomarkers, we also identified differences in less substantiated therapeutic targets and prognostic markers. Stage III cancers had significantly higher expression of CXCL2, CD69, CCR4, and TNFRsf13c, all of which have been implicated as potential therapeutic targets in other malignancies [26][27][28][29][30], or in the case of CCR4, poor prognosis in lung cancer [31]. Conversely, TLR4 expression was significantly higher in stage IV samples, consistent with a previous finding that TLR4 overexpression was independently prognostic of poor overall and disease-free survival in NSCLC [32]. Considering that the differential expression patterns by stage described above were largely not observed in EGFR mutation status comparisons, these effects are likely independent from one another.
Along with differentially expressed genes identified, our evaluation of pathway activities also revealed distinctions between stage III and IV TIMEs. First, we observed significant enrichment of metabolic pathways in stage IV compared to stage III tumors. Increases in GLUL and ELOVL signaling were the most pronounced, which reflect increased metabolism of glutamine and lipids, respectively. These metabolic changes have been associated with cancer progression and suggest NSCLC metastasis is accompanied by an increased uptake in nutrients [33]. Stage III tumors had relatively increased activity in immune response pathways, on the other hand, especially those involved in T cell response. Again, many of the pathways enriched in stage III samples from this cohort have been implicated as potential therapeutic targets or prognostic indicators. IL-17, for example, is linked to NSCLC tumor progression via STAT3/NF-κB/Notch1 signaling in Th17 cells, and inhibition of the pathway was found to slow metastases [34,35]. MAPK activity, which was also enriched in stage III tumors here, has been implicated as a therapeutic target and linked to the promotion of metastasis by IL-17 [36][37][38]. Accordingly, we found that the MAPK and IL-17 signaling pathways were connected through interaction with Jun and Fos genes in stage III tumors, which were also significantly upregulated. Moreover, previous studies have observed that exhausted T cells have lower expression of Fos, Fosb, and Junb [39,40], supporting the conclusion that stage III tumors may have a more inflammatory (or at least less exhausted) TIME.
Lastly, estimates of immune cell proportions from RNAseq data in the two stages showed a similar level of immune infiltration overall but a few key differences between individual cell type quantities. Most notably, CD4 + T cell quantities, which are crucial for anti-tumor immunity, were estimated as significantly higher in stage III tumors [41]. In fact, a recent study found that CD4 + T cells can enhance the activity of CD8 + cells and are correlated with response to ICB treatment in NSCLC [42].
Due to the broad, retrospective nature of this study, there are limitations in the findings. While age and sex were included as covariates in the differential expression analyses, we could not account for similar features that may have affected PD-L1 levels in the PD-L1 IHC correlation analyses. Race and ethnicity, for example, were not included due to sparsity of metadata from patient records. Germline and somatic variation, along with epigenetic effects, may have also influenced correlation between CD274 expression and PD-L1 protein levels but were not considered here.
There are also several ICB-related factors to explore with next-generation sequencing that were not broached in these analyses. Relationships between circulating tumor cells [43], T cell receptor repertoires [44], and/or homologous recombination deficiency with ICB treatment have all been described [45], and further comparisons between stage III and IV TIMEs should consider these molecular characteristics. Regarding clinical characteristics, the data presented here are from pretreatment tumors. This selection was intentionally designed to control for treatment effects; however, many stage III-IV patients who receive ICB in real clinical scenarios have already undergone chemoradiotherapy. Given that neoadjuvant chemotherapy has been shown to activate immune response mechanisms in NSCLC [46], the same analysis in post-treatment tumors would be of benefit to the field. We also plan to extend the analyses conducted here to earlier stages, considering the recently published results in stage IA-IIIB patients [2,3]. Specifically, we intend to integrate our findings with outcomes and treatment data to develop predictive transcriptomic signatures of response and/or resistance to ICBs in earlier-stage NSCLC.
Although the ultimate goal of future analyses is to inform clinical practice, the findings here are intended to serve as a base for ICB biomarker investigations and demonstrate the scope of possible TIME assessments through RNA-seq. These results add to the literature by demonstrating a direct comparison between the TIME in stage III and IV NSCLC, carry implications for further studies of ICB response in non-resectable stage III NSCLC, and provide data to guide further research of prognostic biomarkers and potential therapeutic targets in both stages.

Cohort selection
From the Tempus Database, 400 de-identified stage III or stage IV NSCLC patient health records and tumor samples were selected. Patients with neuroendocrine or pseudosarcomatous carcinoma histologies were excluded. Biopsies originating from the lung and/or airway were considered primary and included in the cohort. To reduce transcriptional variation due to medication and/or inclusion of unwanted tissue, samples were excluded if patients received prior therapy or if the biopsy was from a fine-needle aspirate procedure. Tumor stage was determined using information recorded within 30 days of the biopsy collection date, leaving a total of 391 samples for further analysis. Patient demographics and characteristics are presented in Table 1.

Imputed smoking history
Recorded smoking status from clinical notes was missing in approximately 10% of patient records. To impute smoking status in these records, a transcriptome-based support vector machine (SVM) model was applied. The model was trained using 320 NSCLC patients with known smoking history. Using normalized counts of 18,570 genes as features, we selected the 500 most differentially expressed genes between known smokers (current smokers or ex-smokers) and nonsmokers. 'SelectKBest' from the 'sklearn' package in python was applied, using an ANOVA F-value as the metric. Our model was trained on an SVM with a linear kernel, and recursive feature elimination was applied to simplify. Cross-validation during recursive feature elimination was applied using the 'RFECV' function to select the best number of features. The final model achieved a cross-validation area under the curve (AUC) of 0.881 and a final validation AUC of 0.85.

PD-L1 IHC
PD-L1 percentages in tumor cells (tumor proportion scores) and TILs were assessed by the 22C3 IHC assay, either through internal Tempus testing or results abstracted from de-identified patient records.
Correlations between PD-L1 IHC and CD274 gene expression from RNA-seq were evaluated by the Spearman rank test. Stage-wise differences in mean values of PD-L1 IHC were assessed by the Mann-Whitney U test.

Tempus xT panel sample processing and nucleic acid extraction
Expert pathologist assessment of overall tumor content and percent tumor cellularity as a ratio of tumor to normal nuclei verified specimens met a 20% threshold. Macrodissection was utilized as required to enrich specimens below the 20% threshold. Solid-tumor total nucleic acid was extracted from FFPE tissue sections using Chemagic 360 sample-specific extraction kits (PerkinElmer) and digested by proteinase K. RNA was purified from the total nucleic acid by DNase-I digestion.

Tempus xT panel DNA and RNA library construction and sequencing
DNA sequencing of 648 genes and whole-transcriptome RNA sequencing were performed as previously described [47,48]. Briefly, 50-300 nanograms (ng) of DNA for each tumor sample was mechanically sheared to an average size of 200 base pairs (bp) using a Covaris Ultrasonicator. DNA libraries were prepared using the KAPA Hyper Prep Kit, hybridized to the xT probe set, captured using Streptavidincoated beads, and amplified with the KAPA HiFi Library Amplification Kit. For each tumor sample,100 ng of RNA was heat fragmented in the presence of magnesium to an average size of 200 bp. Library preps were hybridized to the xGEN Exome Research Panel v1.0 (Integrated DNA Technologies), and target recovery was performed using Streptavidin-coated beads, followed by amplification with the KAPA HiFi Library Amplification Kit. The amplified, target-captured DNA tumor library was sequenced using 2 × 126 bp paired-end reads to an average unique on-target depth of 500x (tumor) and 150x (normal) on an Illumina NovaSeq 6000. The amplified target-captured RNA tumor library was sequenced using 2 × 75 bp paired-end reads to an average of 50 million reads on an Illumina Novaseq 6000. Samples were further assessed for uniformity, with each sample required to have 95% of all targeted bp sequenced to a minimum depth of 300x.

Tumor mutational burden
TMB was calculated by dividing the number of nonsynonymous mutations by the megabase size of the panel. All nonsilent somatic coding mutations, including missense, indel and stop-loss variants with coverage greater than 100 × and an allelic fraction greater than 5%, were counted as nonsynonymous mutations.

RNA processing and analysis
Reads were pseudoaligned to hg19 using kallisto [49], and transcript counts were summed to genes for analysis. Genes located in mitochondrial DNA or the Y chromosome were removed, as were genes with < 1 counts per million in > 50% of patients, leaving 14,395 genes for downstream analysis. Data underwent trimmed means of M-value normalization (TMM) and voom transformation using the R packages edgeR and limma [50][51][52]. Principal components analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) using the R package umap were used on the normalized data to visualize the data and assess for potential confounders in the differential expression analysis.
Differentially expressed genes by stage were identified using limma, with age, sex, histology, tissue source, smoking history, and tumor purity included as covariates in a multivariable linear model. Significance was assessed using the methods of Benjamini and Hochberg after empirical Bayes moderation (function eBayes()) to better estimate gene-wise variability, and genes with a False Discovery Rate (FDR) < 5% were considered differentially expressed [53]. ICB gene list highlighted in supplementary tables is taken from supplementary table 1 from Auslander et al. [54].
The relative proportions of immune cell subtypes were estimated using a support vector regression (SVR) model, as previously described [48].

Statistical analysis
Differences in means were assessed using Student's t test, unless noted. Correlations between PD-L1 IHC and other features were calculated using Pearson's correlation. All differential expression results were determined using the R package limma in a linear regression approach (see Methods above). Figure 1 is generated using CoMut [57], while all other figures were generated with R package ggplot2 [56]. Consent for publication All analyses were performed using de-identified data and, as such, no patient personal information is presented in the study.
Ethical approval All analyses were performed using de-identified data; IRB exemption Pro00042950 was obtained from Advarra on April 15, 2020.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.