Background

Combined targeted therapies represent novel therapeutic approaches simultaneously acting on several specific molecular pathways in cancer and having a number of advantages over standard single-targeted agents [1, 2].

Several trials have shown the beneficial effect of epidermal growth factor receptor tyrosine kinase inhibitors (EGFR-TKIs) in advanced non-small cell lung cancer patients (NSCLC) harboring activating EGFR mutations leading to the adoption of EGFR-TKI as standard treatment in this population [3, 4]. Preclinical studies suggested that the combination of an EGFR-TKI together with an angiogenesis inhibitor (e.g. targeting the vascular endothelial growth factor VEGF) can have a synergistic effect [5, 6]. Recent clinical trials showed superior efficacy of the combined anti-angiogenesis bevacizumab (B) with the TKI erlotinib (E) in EGFR mutated patients compared to E alone [7, 8]. More specifically, these trials showed that first line treatments combining BE improved the progression free survival (PFS)—but not overall survival (OS)—of patients harboring an EGFR driver mutation in comparison with E alone [7, 9].

In unselected patients, BE had better PFS than E alone without improvement of survival in recurrent NSCLC suggesting moderate activity of BE [6]. As first line therapy in unselected patients, the overall response rate of BE was 12%, whereas PFS was 3.5 months, again showing moderate activity [10]. Despite the favorable toxicity profile of BE, these results are inferior to chemotherapy first line or immunotherapy second line. The SAKK 19/05 trial from the Swiss Group for Clinical Cancer Research showed that first-line combined BE treatment followed by chemotherapy regimen is feasible with acceptable toxicity and activity in an unselected advanced non-squamous NSCLC population [11]. On the other hand, the phase II TASK study did not show a benefit in terms of PFS for the combination BE in unselected first line advanced non-squamous NSCLC compared with chemotherapy plus B [10, 12].

Although the presence of EGFR mutations is the strongest predictor of the response to anti-EGFR-TKI, a recent meta-analysis showed that wild-type EGFR patients can benefit from the therapy with an improved OS compared with placebo or standard chemotherapy [hazard ratio \(=\) 0.780 (95% CI 0.654–0.930)] [13]. Therefore, the identification of very early predictive markers of multiple targeted therapies and the understanding of their mechanisms of action in advanced non-squamous NSCLC is of paramount importance in order to better identify subsets of patients who may still benefit from these treatments.

Blood-based biomarkers in NSCLC are of particular interest as they can be easily and non-invasively accessed [14]. Whole-genome exon arrays provide an ideal platform for the discovery of novel putative biomarkers by investigating expression variations at an exon-level resolution [15]. More specifically, exon arrays allow analyses both at the gene and at the exon level. Exon-level analyses are usually performed to detect alternative splicing events [16].

The aim of the current study was to analyze blood-level exon array profiling data from unselected patients with advanced non-squamous NSCLC before and 24h after initiation of the combined targeted therapy BE. The specific objectives are twofold: (1) uncover which genes from whole blood circulating RNAs are immediately impacted by the effect of the combined therapy BE, and (2) assess the predictive value of these dysregulations.

Methods

Lung cancer dataset

The gene expression data set originated from a translational substudy of the phase II SAKK 19/05 trial from the Swiss Group for Clinical Cancer Research (ClinicalTrials.gov: NCT00354549). In the original study, 103 unselected patients with advanced non-squamous NSCLC were enrolled, among which 101 were evaluable. The experimental design of the trial SAKK 19/05 is summarized in Fig. 1. Patients were treated using the combined targeted therapy BE until disease progression or unacceptable toxicity. At progression, standard platinum-based chemotherapy (CT) was used. The primary endpoint of this trial was disease stabilization 12 weeks after initiation of therapy. Secondary endpoints included tumor shrinkage at 12 weeks (TS12), time to progression under BE (TTPBE), time to progression under CT (TTPCT) and OS. Further detailed information about this trial can be found in previous publications [11, 17, 18]. As part of a translational substudy, blood samples were taken at baseline and 24h after initiation of treatment for gene expression analysis in a subset of 49 patients. The current study was approved by the ethics committee of the canton St. Gallen (EKSG 06/012).

Fig. 1
figure 1

Treatment scheme of the SAKK 19/05 trial. Patients received BE until progression or unacceptable toxicity. Upon disease progression, patients received standard chemotherapy with cisplatin and gemcitabine

Exon array analysis

RNA from whole blood samples was extracted and quality checked. Six pairs of samples had to be excluded from the analysis due to low quality, whereas RNA extracts provided sufficient quality for microarray hybridization in 43 out of 49 pairs of sample. Messenger RNAs were hybridized on Affymetrix Human Exon 1.0 ST arrays (Affymetrix, Santa Clara, CA, USA) following standard recommendations from the manufacturer. This microarray platform measures genome-wide exon-level expression in over 1.4 million probe sets, and allows the investigation of genomic variations both at the gene and at the exon level. For the sake of this analysis, 439,778 exonic probe sets (within 38,900 genes) were kept in the analysis, after filtering out intronic, intergenic and unreliable probe sets (according to the nomenclature defined in the R package annmap [19]). Raw data (Affymetrix CEL files) have been deposited in NCBI’s Gene Expression Omnibus (GEO), and are accessible through GEO Series accession number GSE61676. The exon level probe sets were pre-processed, quality checked and normalized using the RMA procedure (including background correction, quantile normalization and median-polish summarization) as implemented in the R package oligo [20, 21].

Statistical considerations

The current experimental design includes a repeated measurement of genome-wide exon-level expressions in 43 patients at two different time points (baseline and 24h after initiation of therapy). The exon expression data are stored in a pair of fully matched tables. Dually constrained correspondence analysis (DCCA)—an extension of the multivariate technique correspondence analysis—was used to investigate the 24h-change in exon expression levels [16]. DCCA uses observation- and variable-wise linear constraints in order to take into account complex experimental designs including within-patient repeated measurements and variables grouped into hierarchical levels (within-gene exonic structures). The theoretical scheme of DCCA as applied to our experimental design is summarized in Fig. 2. The mathematical underpinnings of DCCA are further described in Additional file 1. In the current work, the following parametrization of DCCA was used: (1) the two fully matched tables of exon-level expression intensities at baseline and 24h were stacked observation-wise; (2) an observation-wise between-time constraint (baseline vs. 24h) was applied after partialling out the within-patient effect; (3) a variable-wise constraint indicating the within-gene exonic structure was applied.

Fig. 2
figure 2

Design of experiment and scheme of dually constrained correspondence analysis. Two matched tables \(\mathbf {X}\) and \(\mathbf {Y}\) are analyzed by DCCA. The 2 tables are rearranged into one stacked table. Additional external information on both rows and columns are used as positive and/or negative constraints

Generalized linear (mixed effects) models were used in order to test the predictive value of the identified biomarkers. Binary endpoints were tested using logistic regression, time to event data were modeled using Cox proportional hazards regression, and continuous variables were modeled using (multiple) linear regression. Mixed effects modeling was used when testing associations in the frame of the within-patient repeated measurement design. The adjustment for patients’ EGFR mutational status was done by including the mutational status as covariate in the predictive models. The significance level used for the discovery of the putative predictive markers was set to 0.001. Gene signatures combining the information of the best predictive candidates were built using the metagene approach [22]. In this approach, the linear combination of several genes is calculated as follows: (1) The matrix of normalized gene expression intensities for all patients at a given time point (rows) and for all candidate genes (columns) is analyzed using unscaled principal component analysis (PCA); (2) The row coordinates (scores) on the first PCA axis summarizing the largest amount of variance are extracted (metagene score); (3) Based on the median of the metagene score, a binary score is created to discriminate between low versus high-risk patients.

Computational considerations

All analyses were implemented using the R statistical software [23] including the extension package ade4 [24], as well as dedicated packages from the Bioconductor project [25] such as the package oligo for microarray preprocessing [21] and the annotation package annmap (Homo sapiens database version 86). Functions available in ade4 (including dudi.coa, wca, bca, pcaiv and pcaivortho) can theoretically be used to carry out DCCA. However, for reasons of computational efficiency due to the extensive size of exon array datasets, the DCCA algorithm was substantially optimized and a new R extension package dcca is available (see Additional file 1). Hypothesis testing for the identification of predictive biomarkers was carried out using the following R packages: lme4, coxme and multcomp.

Gene functional annotations and validations

Gene set enrichment analysis was done by interrogating the molecular knowledge databases Kyoto Encyclopedia of Genes and Genomes (KEGG) [26] and WikiPathways [1] using the functional annotation web-service WebGestalt [27]. Enrichment analyses were based on the list of 100 most dysregulated genes (as identified by DCCA), as well as the lists of genes which were significantly predicting patient’s outcome (each of the investigated endpoints). The significance of the enrichment was obtained using hypergeometric tests. Validation was carried out using the lung data set from the Kaplan–Meier Plotter (KMplotter) web tool [28]. KMplotter is a manually curated database including gene expression level information about more than 50,000 Affymetrix probe set IDs together with associated clinical information. The prognostic value of single or multiple genes can be assessed with regard to relapse free and overall survival. Another independent lung cancer dataset was used for external validation. This gene expression microarray dataset includes 85 lung adenocarcinoma tumor samples and is part of the program “Carte d’Identité des Tumeurs” (CIT) from the french national cancer league [29]. Samples were profiled using the Affymetrix Human Genome U133 Plus 2.0 Array and raw data are available in NCBI’s Gene Expression Omnibus through GEO Series accession number GSE30219. Furthermore, the results were discussed in the light of available literature findings.

Results

Patients characteristics

The characteristics of the 43 patients are reported in Table 1. Patients were late stage (91% stage IV/9% stage IIIb) non-squamous NSCLC. Five out of 43 patients had demonstrable EGFR mutations: one on exon 18 (E709A-G719S), three on exon 19 (Del L747-G749, Del E746-A750 and R748-S752) and one on exon 21 (L858R). The median age was 61 years old (IQR 54–66) and the sex ratio was 0.44 (19 males/24 females). Disease stabilization at 12 weeks was reached in 53% of patients. The median tumor shrinkage at 12 weeks was 15.8%. The median overall survival was 11.1 (95% CI 10.1–17.9) months. The median time-to-progression under BE was 4.0 (95% CI 2.8–6.0) months, whereas the median time-to-progression under CT was 2.6 (95% CI 1.7–5.7) months.

Table 1 Patients characteristics

24h gene dysregulation

The 100 genes mostly dysregulated (54 up-regulated vs. 46 down-regulated) by the 24h effect of BE are summarized in Table  2. The genes were involved in all aspects of tumor biology. This includes genes involved in mitosis and cell cycle processes such as the cancer susceptibility candidate 5 (CASC5) encoding for a protein influencing the spindle assembly checkpoint during eukaryotic cell cycle; the centromere-associated protein E (CENPE) which accumulates and play a key stabilizing role during mitosis; the protein furry homolog (FRY) which plays a crucial role in the structural integrity of mitotic centrosomes; kinetochore associated 1 (KNTC1) encoding for a protein ensuring proper chromosome segregation during cell division; Phospholipase D1 (PLD1) involved in cancer progression [30] and in the regulation of mitosis in relationship with the “Ras signaling pathway” and “Pathways in cancer”. Other dysregulated genes were involved in energy-dependent metabolisms (ATPase/GTPase). This includes genes from the ATPase Family (ATAD2) known to be related to gastric cancer network, and which may play an important role in cell proliferation and cell cycle progression of breast cancer cells; Guanylate Binding Protein 4 (GBP4) related to GTPase activity and associated with the interferon signaling pathway; dedicator of cytokinesis 10 (DOCK10) acting on GTPase and related to hemostasis and regulation of cell division cycle 42 (CDC42) activity. Mechanisms of cell migration are also controlled by a series of dysregulated genes such as ADAM Metallopeptidase Domain 19 (ADAM19) regulating cell migration, cell adhesion and cell-cell/cell-matrix interactions, supposed to play an important role in pathological processes including cancer; BMX non-receptor tyrosine kinase (BMX) encoding for a protein implicated in several signal transduction pathways regulating tumorigenicity of cancer cells; Epidermal Growth Factor Receptor Pathway Substrate 8 (EPS8) encoding for a protein having functions in part of the EGFR pathway and being related to Tyrosine Kinases/Adaptors and Development FGFR signaling pathways; Fms-Related Tyrosine Kinase 3 (FLT3) encoding for a class III receptor tyrosine kinase regulating hematopoiesis, and whose action is related to apotosis, proliferation and differentiation processes; Interleukin 1 Receptor, Type II (IL1R2) controling many cellular functions including proliferation, differentiation, and cell survival/apoptosis; Matrix Metallopeptidase 9 (MMP9) involved in tissue/matrix remodeling, playing a central role in cell proliferation, migration, differentiation, showing an altered expression in a number of different human cancers with poor prognosis. Apoptosis was regulated through the action of various genes including baculoviral IAP repeat containing 3 (BIRC3) encoding for an inhibitor of apoptosis protein acting on killing tumor cells; death-associated protein kinase 2 (DAPK2) whose overexpression was shown to induce cell apoptosis; Retinoblastoma-Like 1 (RBL1) encoding for a tumor suppressor protein involved in cell cycle regulation; insulin-like growth factor 1 receptor (IGF1R) encoding for a growth factor with tyrosine kinase activity, having an anti-apoptotic effect and being highly overexpressed in most malignant tissues [31].

Table 2 List of the 100 most dysregulated genes due to the 24h effect of BE

Based on the 100 best candidates identified by DCCA, three main pathways (according to the KEGG functional annotation database) were significantly altered by the 24h effect of the combined therapy BE: Hematopoietic cell lineage (KEGG pathway hsa04640; \(p=0.0094\)); ABC transporters (KEGG pathway hsa02010; \(p=0.0085\)); Pathways in cancer (KEGG pathway hsa05200; \(p=0.0204\)).

Figure 3 displays the gene expression levels of each genes (baseline and 24h) within these dysregulated pathways. Figure  4 shows the detailed KEGG’s “Pathways in Cancer” highlighting the down-regulated genes within the sub-pathways “Cytokine–cytokine receptor information” and “MAPK signaling pathways”. The down-regulation of these sub-pathways contributes to the limitation of cell proliferation.

Fig. 3
figure 3

Boxplot representation of the gene expression levels (logarithm base 2 normalized intensity) before (B) and 24h after initiation of bevacizumab/erlotinib in the KEGG pathways “Hematopoietic cell lineage” (hsa04640), “ABC transporters” (hsa02010), and “Pathways in cancer” (hsa05200). The genes depicted in this representation belong the list of the 100 most dysregulated genes

Fig. 4
figure 4

KEGG pathway hsa05200 “Pathways in cancer”. Genes highlighted in red and green were up-regulated and down-regulated due to the 24h action of bevacizumab/erlotinib, respectively

These pathway dysregulations are well in line with the expected effects of both erlotinib (“Pathways in cancer”; “ABC transporters”) and bevacizumab (“Hematopoietic cell lineage”). “Pathways in cancer” is a generic pathway, including genes involved in various aspects of tumorigenesis such as phenomena of proliferation, invasion, resistance and apoptosis. Several diseases are inter-related to this pathway, including non-small cell lung cancer. Most genes from the “Pathways in cancer” (FLT3, IGF1R, DAPK2, PLD1 and MMP9) are inhibited due to the action of BE resulting in an anti-proliferative and pro-apoptotic action of the combined therapy, with the notable exception of the up-regulation of the apoptosis inhibitor BIRC3. The combined action of BE results in the inhibition of all genes that belong to the “Hematopoietic cell lineage” (FLT3, IL1R1, IL1R2, MME). Hematopoietic stem cells play important roles for angiogenesis [32]. The down-regulation of genes within the pathway “Hematopoietic cell lineage” may be related to the specific anti-angiogenic action of bevacizumab [33, 34]. On the other hand, the dysregulation of genes that belong to the pathway “ABC transporters” is probably associated with the energy-related mechanisms of action of erlotinib [35, 36].

Gene set enrichment analysis based on the WikiPathways database provides information on additional cancer- and energy-related activated pathways including “gastric cancer network 2” (WP2363, \(p=0.010\)), “IL1 megakaryocytes in obesity” (WP2865, \(p=0.006\)), “apoptosis modulation and signaling” (WP1772, \(P=0.011\)) and “gastric cancer network 1” (WP2361, \(p=0.009\)).

Gene expression predictive value

The predictive value of the genes that were mainly dysregulated due to the 24h effect of BE was investigated. The following endpoints were considered: disease stabilization at 12 weeks, tumor shrinkage at 12 weeks, time to progression under BE, time to progression under chemotherapy, and overall survival. The magnitude of the 24h change in expression of the 100 most dysregulated genes was not significantly associated with any of the investigated endpoints, after adjustment for the patients mutational status.

Table 3 summarizes the predictive value of the blood gene expression at baseline and 24h after initiation of BE. Putative markers at baseline that predicted patient’s overall survival included Cancer susceptibility candidate 1 (CASC1). At baseline, 142 genes were identified as putative predictive markers of tumor shrinkage at 12 weeks. Among those genes, there was a significant enrichment of the KEGG signaling pathways “Phagosome” (hsa04145) and “Protein digestion and absorption” (hsa04974).

Table 3 Putative predictive markers of the patient’s response to bevacizumab/erlotinib

Putative predictive markers from blood gene expression at 24h after initiation of treatment of the TTPBE included four genes enriched in the pathway “Pathways in cancer”. These genes were the E2F transcription factor 1 (E2F1), RAD51 recombinase (RAD51), the junction plakoglobin (JUP), and the microphthalmia-associated transcription factor (MITF). E2F1 plays a critical role in the control of cell cycle and acts as a tumor suppressor. E2F1 was found to be associated with phenomena of resistance of targeted therapy in breast cancer [37]. There was also a significant enrichment in the signaling pathway “Pathways in cancer” among the predictors of TS12 at 24h: genes frizzled class receptor 5 (FZD5) and epidermal growth factor (EGF). EGF encodes for a protein playing an important role in the cell growth, proliferation and differentiation. It binds with high affinity epidermal growth factor receptor. Its dysregulation has been associated with cancer progression [38]. Other pathways associated with 24h predictive markers of TTPBE included the cancer-related pathways “Melanoma”, “Pancreatic cancer”, “PPAR signaling cancer”, as well as the metabolism-related pathways “arginine and proline metabolism” and “Pyrimidine metabolism”.

All putative predictive markers of TTPBE at 24h were combined into a 91-gene metagene. Patients could be significantly classified into low-risk versus high-risk according to their median metagene score (HR 4.93 [95% CI 2.34 to 10.39], log-rank test: \(p < 0.001\)) (Fig. 5, left panel). The median TTPBE were 2.46 (95% CI 1.54–3.22) months vs. 6.87 (95% CI 4.14–13.31) months in the high-risk and low-risk populations, respectively. This finding was successfully validated using the KMplotter web tool (Fig. 5, central panel) and the external CIT validation dataset (Fig. 5, right panel). The predictive value of the metagene remained significant after adjusting for the patient’s mutational status (Cox proportional hazards regression after adjustment for the mutational status: HR 2.63 [95% CI 1.87–3.70], \(p < 0.001\)). An illustration of selection of responders based on the metagene score is provided in the Results section of the Additioanl file 1.

Fig. 5
figure 5

Metagene classifier of time-to-progression under BE. The left panel displays the classification of low- versus high-risk patients based on the 91-gene metagene. The central panel shows the classification obtained by the KMplotter online validation tool using a multigene classifier. The right panel shows the classification obtained by the external CIT validation dataset. Hazard ratios and log rank test p values are reported in the upper right corner of the each panel

Discussion

The analysis of the immediate effect of BE in late stage non-squamous NSCLC reveals a series of important mechanisms dysregulated by the combined action of both therapies. Important activated pathways involved mechanisms such as apoptosis evasion, anti-proliferation and anti-angiogenesis. Interestingly, it was possible to detect these dysregulations directly in the blood, showing that potential biomarkers could be identified at the blood level. The changes measured in the blood over a small time period (24h) were of small magnitude, yet consistent among patients. The use of a within-patient design of experiment including 2 time points before and after treatment helped to characterize these gene variations despite the relatively small sample size.

The choice of the multivariate method DCCA over more common gene-by-gene approaches was driven by the fact that DCCA addresses the problem of the identification of differentially expressed genes (in within-patient repeated measures designs) in a multivariate manner. This is more statisfactory since it allows to take into account potential gene correlations/interactions using a single computationally efficient procedure. DCCA is an exploratory method appropriate for the purpose of the current hypothesis-generating translational study. On the other hand, gene-by-gene approaches are simple and flexible and could be preferred in case of more complex designs or when applied to confirmatory analyses.

Although the magnitude of change of the most dysregulated genes over 24h was not predictive of the patient’s outcome, both the gene expression level at baseline and 24h revealed a series of putative predictive genes. While DS12 was defined as primary clinical endpoint of the original SAKK 19/05 trial, endpoints reflecting the activity of the treatment on the disease were more specifically investigated in the current translational substudy. TTPBE and TS12 are two endpoints which are objectively associated with the direct effect of BE. In both cases, a series of key predictive markers at 24h were enriched within the KEGG pathway “Pathways in cancer”. This pathway appears to play an important role both in the immediate effect of BE as measured in the blood, and in the prediction of the response to BE.

Our findings could be validated using two independent datasets (meta-analysis from the KMplotter web tool and external CIT validation dataset). The combination of the key predictive markers at 24h regarding TTPBE into a metagene was used to generate a gene signature, predicting with high significance patients into high vs. low risk populations. This signature was successfully validated, and could be used independently from the patient’s EGFR mutational status for proper patient selection.

Because our gene signature is independent from the patient’s mutational status, it can be used as predictive marker both in EGFR mutated and wild-type populations. BE has potential to become a standard therapy in NSCLC patients with EGFR mutations, and our signature may help to select patients which may not respond to the therapy despite the presence of the mutation. Inversely, our signature may be useful for proper selection of BE responders among patients not harboring EGFR activating mutation.

Our findings based on exon array data are in essence exploratory and future prospective confirmatory studies are needed to further validate the clinical relevance of our discovery.

Conclusion

The 24h effect of BE could be accurately monitored in peripheral blood using the exon array technology. Genes impacted by the immediate effect of BE belonged to key signaling pathways, according to the expected mechanisms of action of both bevacizumab and erlotinib. Although the magnitude of change over 24h had no predictive value with regard to the investigated endpoints, the blood gene expression level measured 24h after initiation of BE could be used to predict TTPBE independently from the patient’s mutational status. Proper selection of responders to the combined targeted therapy BE could be monitored from blood level gene expression.