Background

Lung cancer is the first cause of death from cancer [1]. Approximately 85 % of diagnosed patients present Non-Small Cell Lung Cancer (NSCLC), and adenocarcinoma is the most frequent histological type. Despite efforts, innovations, and progress in diagnosis and treatment, 5-year overall survival is approximately 15 % with high mortality rates [2]. Tobacco smoking is the main risk of lung cancer. Other factors include pulmonary tuberculosis, genetic susceptibility, exposure to secondhand smoke, asbestos and radon [3]. In Mexico, the crude mortality rate of lung cancer is 6.68 per 105 individuals, representing nearly 9000 cases per year, most of them presenting metastatic stage at diagnosis [4, 5].

Nowadays about 15 % of lung cancer in men and 53 % in women is not associated to smoking [6]. Besides, due to the impact of tobacco control policies, a bigger percentage of non-smoking patients with lung cancer is expected in the following years. According to cancer statistics from the USA, lung cancer death rates declined 36 %, from 1990 to 2011, among males and 11 %, between 2002 and 2011, among females due to reduced tobacco use as a result of increased awareness of the health hazards of smoking and the implementation of comprehensive tobacco control [2]. There have been reports of a doubling in the annual incidence of lung cancer in never smokers, identifying as well that non-smoker NSCLC patients tend to be female and young [7, 8]. Regarding mortality, never smokers present lung cancer death rates greater in men than in women and a large fraction of cases have no identified risk factors [9]. Meanwhile former smokers present an increased risk of lung cancer but cumulative risk decreases with earlier smoking cessation compared to smokers who continue smoking [10].

Chronic wood smoke exposure (WSE) is related to obstructive pulmonary disease in developing, European and American countries [11, 12]. Wood dust has also been identified as a human carcinogen and a risk factor for lung cancer [3, 13]. Wood byproducts such as benzene, 1-butadiene, formaldehyde and acetaldehyde, are well-known carcinogens [14]. For more than 50 years, WSE has been associated with an increased risk of lung cancer as compared with pulmonary tuberculosis, interstitial lung disease and various pulmonary conditions (OR: 1.9; 95 % confidence interval (CI): 1.1–3.5) after adjusting for age, education, socioeconomic status and tobacco smoke exposure [13]. In Mexico, approximately 16 % of the population has long-term exposure to wood smoke for residential heating and/or cooking, and 30 % of lung cancers are associated with WSE [5, 15]. Molecular assays have shown up-regulation and phosphorylation of p53 in WSE related lung cancer [16]. Moreover, WSE is associated with macrophage dysfunction and an increase in the activity of metalloproteinases, like MMP-2 and MMP-9, which could be related to lung injury in chronic obstructive pulmonary disease and have a role in the physiopathology of lung cancer [17].

Ethnical origins and different risk factors for lung cancer might explain the distinct mutation profiles, as in the case of epidermal growth factor receptor (EGFR) and KRAS for Asians, Caucasians and Latins [1820]. Our group previously reported a high rate of treatment response and a better outcome in patients with WSE related lung cancer treated with EGFR-Tyrosine Kinase Inhibitors (TKIs) [21]. We have further described that WSE related lung cancer is associated with an older age at diagnosis, adenocarcinoma histology, pleural effusion, high prevalence of EGFR mutations (55.4 %) and a low prevalence of KRAS mutation (6 %), compared to patients with smoking history [15]. These situations indicate clear differences in the molecular and clinical evolution of WSE related lung cancer compared with tobacco associated lung cancer.

In order to further analyze the molecular differences observed in WES-related lung cancer, the objective of our work was to compare the genetic expression profile of lung adenocarcinoma in patients with WSE or a smoking history.

Methods

Experimental design

This study used clinical, longitudinal, prospective, observational and analytical cohorts with the selection of a non–probabilistic sample type. The protocol was approved by the Scientific and Bioethical committees of the Instituto Nacional de Cancerología (INCan, 008102510M1, CB451).

Patients and tissue samples

Patients admitted to the INCan with a pulmonary lesion suggestive of primary lung cancer were prospectively biopsied from January 2008 to June 2011. After informed consent, tissue was obtained by computer tomography-guided tru-cut (Care fusion, San Diego, CA, USA) from the clinically suspected primary tumor. Data were excluded from the analysis if there was no histological diagnosis, a different type of primary cancer was present, or if the pathology report indicated a histology different from lung adenocarcinoma. The patients with histologically confirmed advanced lung adenocarcinoma (stages III B and IV) were eligible for inclusion in the study (Fig. 1).

Fig. 1
figure 1

Consort

A complete medical history that included a detailed history of smoking, wood smoke exposure and a physical examination was obtained. Tumor specimens were collected at the time of diagnosis. WSE was defined as exposure to fumes resulting from burning wood in fireplaces and wood stoves for ≥ 4 h per day for ≥ 5 years. The WSE exposure index was calculated as the average number of hours spent on cooking daily per the total number of years spent cooking [22]. A smoker was defined as being someone having a lifetime exposure of more than 100 cigarettes [6]; the tobacco-smoking index was calculated by multiplying the number of cigarette packs consumed per day by the number of years spent smoking [15].

RNA isolation and RNA preparation for microarrays

Primary tumor core-biopsy was performed prior to any treatment and snap-frozen in nitrogen for RNA extraction. A trained pathologist confirmed histological diagnosis and quantified tumor cell percentage.

The procedure for extraction and purification of total RNA from tissue (up to 5 mg tissue) was done using RNeasy Micro Kit (QIAGEN, Germany) (cat. 217084). RNA integrity was evaluated by capillary electrophoresis using the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Samples with RNA integrity number (RIN) of six or higher were included for microarray analysis.

RNA amplification and expression microarray analysis

Gene expression analysis was done using the Affymetrix GeneChip® Human Gene 1.0 ST Array System, which evaluates the expression of 28,869 different genes. Sample processing was done following the manufacturer’s instructions.

Strategy for microarray gene-expression analysis

Statistical analysis for differential expression was conducted using R and Bioconductor. Background correction for non-specific hybridization was performed with Robust Multiarray Average (RMA) [23] which uses a fairly complex statistical model that supposes both additive and multiplicative noise components. After background correction to the individual probes, quantile normalization [24] was applied, both steps are implemented in the oligo package [25]. Normalized and corrected probes are summarized into probe sets using the median polish algorithm, which is a type of robust 2-way ANOVA, where one factor is the array and the other is the probe set. The algorithm is robust to outliers, making single probes with large values are down-weighted. Batch correction was also applied to all samples using combat in the sva package [26]. Differential expression was identified through linear models implemented in the limma package [27], genes were selected as significant according to two summary statistics: p-value (<10−2) and fold- changes larger than 1.2 in absolute values.

For the biological networks and functional analysis, we used QIAGEN’s Ingenuity Pathway Analysis (IPA®, QIAGEN Redwood City, http://www.ingenuity.com/). IPA was used to identify gene-signaling pathways that were involved in biological processes of WSE versus tobacco exposure. Networks of these genes were algorithmically generated based on their connectivity and assigned a score. The score ranks networks according to how relevant they are to the input genes not necessarily to the quality or significance of the network. The network indicates the molecular relationships between genes/gene products. Node color indicates up- or down-regulation and intensity is associated to degree of regulation. It is important to remark that uncolored genes were not identified as differentially expressed in our experiment, however, IPA integrate those into the computationally generated networks based on the collected evidence indicating a relevance to this network.

Identification of differentially expressed genes

Significant changes in gene expression were selected according to p-values <10−2 and fold changes in absolute values larger than 1.2. Differences in gene expression are shown in volcano plots representing fold changes in log2 base along the x-axis and the level of trust in the form of –log10(p-values) along the y-axis. A value of 2 on the y-axis represents our cut-off of 10−2 (top right and left corners). Heat maps show gene profiles clustered, result of an unsupervised hierarchical clustering of genes significantly different (p < 0.01) between patients with different risk factors.

Results

Overall 53 tumor samples were collected, and 29 samples were suitable for gene expression analysis. Two samples with suitable material were excluded due to patients’ history of asbestos exposure and thoracic radiotherapy. The 27 remaining samples included 12 patients with an exclusive history of tobacco exposure and 15 patients with exclusive WSE history (Fig. 1).

Clinical and molecular results

The clinical characteristics of the 27 patients included in the microarray analysis are: mean age 62.9 ± 11.7 years, 55.6 % (15/27) were females, WSE was present on 55.6 % of patients (15/27) and 44.4 % (12/27) had tobacco exposure (Table 1). EGFR mutational status was statistically significant (p = 0.003); 53 % (8/15) of patients presented positive EGFR mutation status in the WSE group (Table 2).

Table 1 Baseline clinical pathological and molecular characteristics (N = 27)
Table 2 Clinical and molecular characteristics related to exposure history in all patients

On differentially expressed genes in WSE compared with tobacco smoke exposure

Figures 2 and 3 show differences in gene expression, the gene profile shown in the clustered heat map displays significantly different genes (p < 0.01) between patients with WSE versus patients with a tobacco smoking history.

Fig. 2
figure 2

Volcano Plot showing gene differential expression of patients with adenocarcinoma and history of wood smoke exposure vs. tobacco smoke exposure. Fold changes are represented in log2 base along the x-axis and the level of trust in the form of –log10 (p < 0.001) along the y-axis. The cut off value of 10−2 used is 2 on the y-axis (top right and left corners). Red and green dots represent up and down-regulated genes respectively: Fold change (≥1.2) and significance level (p < 0.001). Other colors: yellow indicates significant but low fold change, magenta shows not statistically robust changes and blue shows low fold change with low statistical significance

Fig. 3
figure 3

Heat map result of an unsupervised hierarchical clustering of genes significantly different (p < 0.01) of NSCLC adenocarcinoma exposed to wood smoke versus tobacco smoke. Each column represents a patient and each row a gene. The heat map indicates the level of gene expression. Red: high expression; Green: Low expression

The comparison of adenocarcinomas from patients exposed to tobacco smoke versus WSE revealed that both groups can be separated based on the differential expression of 57 genes (p < 0.01), 35 up-regulated and 22 down-regulated (Fig. 3).

Enrichment and functional analysis through biological networks was conducted using IPA software. The top functional networks were related to five different biological categories as follows: Lipid metabolism; Biochemistry of small molecules; Transport of molecules; Cell morphology; Function and cell maintenance. Table 3 lists the up- and down-regulated genes with significant changes in expression from patients with NSCLC exposed to tobacco smoke versus WSE (p < 0.01). The two main categories with greater differences were transport of molecules and cell function and maintenance.

Table 3 List of genes with changed expression that are significantly (p < 0.01) over- and under-regulated in patients with NSCLC exposed to wood smoke versus tobacco (five IPA biological functions categories)

Three functional networks are shown, these networks involve the majority of the differentially expressed genes (37/57) and have: Ubiquitin C (UBC, score 28, Fig. 4), GABA(A) receptor-associated like protein (GABARAPL1, score 28, Fig. 5) and PI3K/AKT and MEK/ERK genes (score 26, Fig. 6) as main hubs of the network. Moreover, when the networks were overlapped, all up and down regulated genes appear around the PI3K/AKT and MEK/ERK signaling pathways (Fig. 7).

Fig. 4
figure 4

Network 1 (Score: 28/Ratio: 0.353/p value: 1.48E-24). Genes (p < 0.01) in wood smoke exposure in NSCLC around the UBC gene. Red mark: up-regulated genes. Green mark: down-regulated genes. The node shapes denote enzymes (), phosphatases (), kinases (), peptidases (), G-protein coupled receptor (), transmembrane receptor (), cytokines (), growth factor (), ion channel (), transporter (), translation factor (), nuclear receptor (), transcription factor () and other ()

Fig. 5
figure 5

Network 2 (Score: 28/Ratio: 0.375/p value: 6.11E-25). Genes (p < 0.01) in wood smoke exposure in NSCLC around the GABARAPL1 gene. Red mark: up-regulated genes. Green mark: down-regulated genes. The node shapes denote enzymes (), phosphatases (), kinases (), peptidases (), G-protein coupled receptor (), transmembrane receptor (), cytokines (), growth factor (), ion channel (), transporter (), translation factor (), nuclear receptor (), transcription factor () and other ()

Fig. 6
figure 6

Network 3 (Score: 26/Ratio: 0.134/p value: 2.09E-17). Genes (p < 0.01) in wood smoke exposure in NSCLC around the PI3K/AKT and MEK/ERK signaling pathways. Red mark: up-regulated genes. Green mark: down-regulated genes. The node shapes denote enzymes (), phosphatases (), kinases (), peptidases (), G-protein coupled receptor (), transmembrane receptor (), cytokines (), growth factor (), ion channel (), transporter (), translation factor (), nuclear receptor (), transcription factor () and other ()

Fig. 7
figure 7

Overlapping networks and related genes (p < 0.01) in wood smoke exposure in NSCLC. Red mark: up-regulated genes. Green mark: down-regulated genes. Purple lines reflect the sites of intersections between our study genes (p < 0.01) and the main canonical networks (PI3K/AKT and MEK/ERK) associated with NSCLC. The node shapes denote enzymes (♦), phosphatases (), kinases (), peptidases (), G-protein coupled receptor ( ), transmembrane receptor ( ), cytokines (), growth factor (), ion channel (), transporter ( ), translation factor (), nuclear receptor (), transcription factor () and other ()

Discussion

Although the majority of lung cancer occurs in smokers, 25 % of worldwide lung cancer occurs in life long never smokers [28], being the 7th largest cause of cancer-related mortality in this group [29], presenting a wide-ranging geographic incidence and risk factors such as asbestos, air pollution, radon, arsenic compounds, cadmium, chromium, ionizing radiation and WSE [30]. Additionally, molecular profiles observed in lung cancer are critically different among smokers and non-smokers particularly identified in genes such EGFR, KRAS, P53 and ALK [31]. In the case of WSE, there have been association with NSCLC and adenocarcinoma histology, EGFR mutations, a reverse association with KRAS mutations and higher response to EGFR-TKIs [15] making it a distinctive disease entity inside the group of never smokers which would be a good candidate for personalized diagnostic and therapeutic approaches. Therefore, lung cancer associated to WSE presents unique characteristics that make it a distinctive entity of disease within the group of never smokers; thus, it could be a good candidate for personalized diagnostic and therapeutic approaches.

There is evidence of differential expression profiles associated with the bronchial epithelium of tobacco-smokers that sustains carcinogenesis [32], as well as the determination of tobacco-smoke transcriptional changes in oncogenes and anti-oncogenes [33]. Our study shows that the gene expression profiling of samples from patients with WSE is different from patients using tobacco.

Our group has previously reported that lung cancer related to tobacco smoke and WSE exhibits different clinical and pathological characteristics that may be related to different mechanisms, and this is reflected in their response rate and overall survival in NSCLC patients [15]. However, in the present report we show a specific gene expression profile for WSE that involves 57 genes. Using biological or functional network analysis, 37 genes were identified around UBC, GABARAPL1 genes and PI3K/AKT and MEK/ERK signaling pathways.

The UBC hub in Network 1 (Fig. 4) is involved in cellular homeostasis and signaling. It was originally activated to degrade misfolded or disused proteins, but it has been recently associated with the cell cycle, DNA repair, endocytosis, antigen processing and apoptosis [34]. Recently, Tang et al. demonstrated that the inhibition of the ubiquitin system decreased the proliferation and radio-resistance in the H1299 cell line (NSCLC cells) [35]. In this regard, a clinically relevant observation is the approval of bortezomib as an inhibitor of the protein degradation system in human cancer [32].

The GABARAPL1 hub in Network 2 (Fig. 5) is a highly conserved protein throughout evolution. It is related to autophagy and vesicle intracellular transport [36]. Its participation in cancer is still not clear, but it has been reported that lower levels of this transcript correlates with decreased survival in patients with neuroblastoma [37] and increased metastasis in breast cancer [36]. On the other hand, the ectopic over-expression of GABARAPL1 inhibits cancer cell proliferation and tumor growth in mice [38]. There are other reports that relate low expression of this gene in several cancer cell lines [39].

Regarding the last network, there have been reports that show that PI3K/AKT and MEK/ERK signaling pathways are altered in NSCLC and their activation is associated with malignant transformation and drug resistance (Figs. 6 and 7). MEK and PI3K inhibitors can inhibit cell proliferation in NSCLC; however, for apoptosis activation, both signaling pathways must be simultaneously inhibited [40, 41], a situation that is directly related to the frequently observed EGFR-TKI resistance in this tumor. There are other reports showing that EGFR mutations function as inductors to sensitization to TKIs through PI3K/AKT and MEK/ERK signaling pathways [4145]. It has also been demonstrated that cases with EGFR mutations have a major sensibility to the EGFR-TKIs, using inhibitors from PI3K/AKT and MEK/ERK [4145]. On a clinical note, our group has previously reported the association between NSCLC adenocarcinoma and positive EGFR mutation status in patients with history of WSE compared to tobacco smoke exposure.

WSE is also related to gene promoter methylation that synergistically increases the risk for reduced lung function in cigarette smokers [46]. A recent report describing the toxicological characteristics associated with WSE in A549 cell lines, including high levels of polycyclic aromatic hydrocarbons (PAH) and low level of water-soluble metals, showed an enhanced level of free radicals, DNA damage and the major expression of inflammatory/oxidative stress genes [47]. There is evidence that some potential molecular targets, such as EGFR and the ErbB family receptor, are usually altered in epithelial tumors [48]. EGFR mediates cell proliferation, differentiation, survival, angiogenesis and migration, and is overexpressed in approximately 40–80 % on NSCLC tumors [4951].

Clinically, it is known that EGFR inhibitors in NSCLC extend survival after first-line or second-line therapy in patients with EGFR mutations [52]. These mutations are more frequent in specific populations, including women, Asian and Hispanic ethnicities, never-smokers and adenocarcinoma histology [18, 53, 54]. Activating mutations in EGFR leads to constitutive tyrosine kinase activation and oncogenic transformation of lung epithelial cells [12, 13]. In this sense, the presence of these common activating EGFR mutations is tightly associated with sensitivity to reversible EGFR- specific tyrosine kinase inhibitors (e.g.: erlotinib or gefitinib). Patients with these mutations display EGFR-TKIs response rates of approximately 70 % a median progression free survival (PFS) of approximately 9–12 months and overall survival rates that may exceed 20–32 months [55]. Most patients will experience disease progression and drug resistance attributed to the development of other second mutations or with the presence of other uncommon EGFR mutations [56]. Certain therapeutic relations in NSCLC include the main oncogenic protein KRAS-GTP with biological significance between EGFR and PI3K/AKT or MEK/ERK pathways [56]. The presence of KRAS mutations leads to an increased signal through the MEK/MAPK transduction pathway [56]. Rare cases of mutations of MEK have been reported in NSCLC [57]. Preclinical studies in both the KPC mouse model as well as patient-derived xenografts have shown that blocking the MAPK pathway at MEK results in a decrease of cell proliferation and a subsequent halt in tumor growth [58]. The activation of EGFR recruits PI3K to the cell membrane and phosphorylates phosphatidylinositol-2-phosphate (PIP2) to phosphatidylinositol-3-phosphate (PIP3), which in turn activates AKT and several downstream effectors [59]. Inhibitors of both PI3K and AKT have been developed [60], although inhibition of PI3K is complicated by the fact that there are multiple isoforms of the protein [61]. Another biological interaction takes place on KRAS is one that directly activates PI3KCA [62]. Unlike most oncogenic driver mutations on NSCLC, PI3K mutations may occur in association with EGFR or KRAS mutations [63]. Although rare, PI3K/AKT/mTOR pathway activation may occur through AKT mutations in NSCLC [64]. Clinical evidence has shown that reversible EGFR-TKIs are considered the frontline treatment for advanced NSCLC patients harboring EGFR mutations [65]. New emerging evidence suggests that the anti-tumor activity of EGFR-TKIs in resistant NSCLC cell lines can be enhanced by combined therapy with other regimens. Early efforts have shown that cetuximab, produced synergistic anti-proliferative effects when used in combination with gefitinib or erlotinib [66]. Our analyses provide biological networks relationships between 37 genes and PI3K/Akt and MEK signaling for understanding the biologic properties of WSE effects as a carcinogenic factor in NSCLC. It also shows useful common pathway maps for a future understanding of the disease and the development of new therapeutic targets.

Whilst the differences in gene expression patterns between WSE or tobacco-related lung cancer that we identified in this paper provide an important insight into the molecular basis of the clinical and biological differences between these two tumors, there is a limitation regarding the small sample size. However, this is countervailed by a thorough characterization of the samples, a detailed clinical history and close follow-up on all patients. It is imperative to continue further study to validate the potential biological and clinical implications of our findings.

Conclusion

In conclusion, our results suggest a differential gene expression profile for WSE or tobacco-related lung cancer, which suggests different carcinogenesis mechanisms between both risk factors and enlightens the clinical-pathological and mutational profiles between both groups with adenocarcinoma.