Introduction

Cigarette smoke is an established risk factor for colorectal cancer incidence and mortality [1,2,3]. Cigarette smoke contains hundreds of carcinogens, some of which can cause epigenetic alterations [4]. While there is strong evidence linking smoking to epigenetic changes, including findings from Epigenome-Wide Association Studies, less is known about epigenetic changes in former smokers, particularly long-time quitters [5, 6].

Accumulating evidence suggests that DNA hypomethylation may play an important role in colorectal cancer progression [7, 8]. Long interspersed nucleotide element-1 (LINE-1) hypomethylation is a surrogate marker for genome-wide DNA hypomethylation [9], and LINE-1 hypomethylated colorectal cancer has been associated with worse prognosis and non-response to certain chemotherapies, suggesting potential utilization of LINE-1 hypomethylation as a prognostic biomarker [10, 11]. Moreover, LINE-1 hypomethylation is an emerging biomarker for early-onset colorectal cancer diagnosed before age 50 [12], which has shown increasing incidence in many parts of the world since the 1980s [13].

To test whether the association of smoking status at diagnosis with colorectal cancer mortality might differ by LINE-1 methylation levels in tumors, we leveraged data from two large prospective cohorts with 4420 incident colorectal cancer cases, including 1208 cases with available tumor tissue data.

Methods

Study population and design

The Nurses’ Health Study (NHS, N = 121,701) was established in 1976 and the Health Professionals Follow-up Study (HPFS, N = 51,529) was established in 1986 [14, 15]. Self-administered questionnaires were mailed to participants at baseline and then biennially to update smoking status, lifestyle, and medical history. Semi-quantitative food frequency questionnaires were administered every 4 years to assess participants’ diet.

Deaths were identified through next-of-kin reports or the National Death Index, and cause of death was determined by study physicians after a review of the medical records or death certificates. A pathologist (S.O.), blinded to other information, conducted a centralized review of hematoxylin and eosin (H&E)-stained tissue sections of all colorectal cancer cases and recorded pathological features including tumor differentiation. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and the Harvard T. H. Chan School of Public Health, and those of participating registries as required. We also obtained signed consents from patients (or next-of-kin for deceased patients) to use CRC tissue specimens for molecular pathological analyses.

Assessment of smoking behavior at diagnosis

Detailed information on smoking was obtained as previously described [1, 2, 15]. Smoking status was divided into three categories (never smoking, past smoking, and current smoking at the time of diagnosis). As most current smokers (86%) had a smoking history of ≥20 packyears at diagnosis, we did not examine associations by packyears of smoking in current smokers. Of the 599 past smokers, 285 (47.6%) had a smoking history of 1–19 packyears and 314 cases (52.4%) had a history of packyears of ≥20. Past smokers who quit ≥10 years prior to diagnosis were more likely to have a smoking history of 1–19 packyears than those who quit <10 years prior to diagnosis (201 out of 322 or 62% vs. 84 out of 277 or 30%); therefore, for past smokers, we did not further stratify by time since quitting smoking (Additional file 1: Supplementary Table 1). Smoking status, diet, and lifestyle at the time of diagnosis were defined using participants’ most recent available questionnaire prior to the diagnosis of cancer.

Analyses of LINE-1 methylation analysis, microsatellite instability (MSI), CpG island methylator phenotype (CIMP), and KRAS, BRAF, and PIK3CA mutations

More details on our methods and pyrosequencing results were described in our previous publications [11, 16]. In brief, DNA was extracted from formalin-fixed paraffin-embedded tissue blocks, focusing on tumor areas only. We performed bisulfite DNA treatment, polymerase chain reaction (PCR), and pyrosequencing using the PyroMark kit (Qiagen), and quantified LINE-1 methylation levels by amplifying a region of the LINE-1 element (position 305 to 331 in accession No. X58075) which includes 4 CpG sites. We used the average of the proportions of C nucleotides at the 4 CpG sites (a scale of 0 to 100) as the LINE-1 methylation levels of each case. The LINE-1 methylation level showed a normal distribution [16] and was used both as a continuous variable (scale 0–100%) and a categorical variable [“high” (≥68% methylation), “intermediate” (≥60% and <68% methylation), and “low” (<60% methylation)]. The preciseness of pyrosequencing assay has been validated in our previous study using approximately 500 cancer cells from 5 anonymized colorectal cancer cases which were collected by laser capture microdissection [17].

MSI analysis was carried out utilizing a panel of 10 microsatellite markers, as previously described [18]. MSI-high was defined as instability in ≥30% of the markers. We quantified DNA methylation in eight CpG island methylator phenotype (CIMP)-specific promoters (CACNA1G, CDKN2A, CRABP1, IGF2, MLH1, NEUROG1, RUNX3, and SOCS1) by using bisulfite DNA treatment and real-time PCR (MethyLight) as previously described [19, 20]. PCR and pyrosequencing targeted for KRAS (codons 12, 13, 61, and 146) [21, 22], BRAF (codon 600) [18], and PIK3CA (exons 9 and 20) were performed as previously described [23].

Statistical analyses

Our primary hypothesis testing was an assessment of a statistical interaction between smoking status at diagnosis [ordinal categories; never smoker (0), past smoker with 1–19 packyears (1), past smoker with ≥ 20 packyears (2), and current smoker (3)] and tumor LINE-1 methylation levels (continuous) using Cox proportional hazards regression. We utilized the Wald test to assess the statistical significance of that interaction and reported the P value as Pinteraction. Our main endpoints were all-cause and colorectal cancer-specific mortality. Survival time was defined as the time from colorectal cancer diagnosis until death or the end of follow-up, whichever came first (January 1, 2016, for the HPFS; May 31, 2016, for the NHS). For the analysis of colorectal cancer-specific mortality, deaths from other causes were censored. We calculated hazard ratio (HR) and its 95% confidence interval (CI) using re-parameterization of the interaction term in a single regression model. The trend test was conducted using the ordinal smoking variable.

To reduce potential selection bias due to tumor tissue data availability, we applied the inverse probability weighting (IPW) method using all 4420 cases as reported previously [24, 25].

We used the multivariable IPW-adjusted Cox proportional hazards regression models to adjust for potential confounders, which initially included age at diagnosis, year of diagnosis of cancer family history of colorectal cancer in first degree relatives, body mass index at diagnosis (BMI), alcohol consumption at diagnosis, empirical dietary inflammatory pattern (EDIP) score at diagnosis, dietary fiber intake at diagnosis, folate intake at diagnosis, regular aspirin use at diagnosis, physical activity at diagnosis, tumor location and differentiation, MSI and CIMP status, and KRAS, BRAF, and PIK3CA mutations. We conducted a backward elimination with a threshold of P = 0.05 to select variables for the final models. The proportionality of hazards assumption was generally satisfied after assessment of Schoenfeld residual plots and including interaction terms of smoking and survival time to the multivariable models. All P values were two-sided and a P <0.005 was considered statistically significant as recommended by the expert panel [26].

Results

Patients’ characteristics

With a follow-up to 2012, we documented 4420 incident colorectal cancer cases including 1208 patients with available data on both smoking information at diagnosis and tumor LINE-1 methylation level (Additional file 1: Supplementary Table 2). After a median follow-up time of 16 years (interquartile range 11.9–20.3 years) for censored cases, 776 all-cause deaths including 343 colorectal cancer-specific deaths were identified.

Smoking status and mortality

Compared with never smokers, HRs of past smokers with <19 packyears, past smokers with ≥20 packyears, and current smokers were 0.81 (0.65–1.00), 1.01 (0.85–1.21), and 1.31 (0.99–1.71) for all-cause mortality and 0.79 (0.58–1.08), 0.86 (0.65–1.13), and 1.07 (0.75–1.53) for colorectal cancer-specific mortality, respectively (Additional file 1: Supplementary Table 3).

Tumor LINE-1 methylation levels and mortality

Colorectal cancer cases who developed tumors with high LINE-1 methylation level tumors (≥68%) appeared to have a lower risk of colorectal cancer-specific mortality (HR 0.74, 95% CI 0.55–0.99) than those with low (<60%) LINE-1 methylation levels. However, the lower risk in cases with high LINE-1 methylation level tumors was not seen for all-cause mortality (Additional file 1: Supplementary Table 4).

Smoking and mortality in relation to tumor LINE-1 methylation levels

In our primary hypothesis testing, the association between smoking status and mortality appeared to be stronger in cases with low levels of LINE-1 methylation tumors than in those with high levels of LINE-1 methylation (Pinteraction = 0.050 for all-cause mortality, Pinteraction = 0.017 for colorectal cancer-specific mortality; Table 1). Among cases with low LINE-1 methylation tumors (< 60%), HRs of current smoking relative to never smoking were 1.55 (95% CI, 0.94–2.56) for colorectal cancer mortality and 1.80 (95% CI, 1.19–2.73) for all-cause mortality. On the other hand, among cases with high levels of LINE-1 methylation (≥68%), corresponding HRs (current vs. never smoking) were 0.93 (95% CI, 0.50–1.73) for colorectal cancer-specific mortality and 1.33 (95% CI, 0.85–2.08) for all-cause mortality. Results did not change substantially when we repeated analyses without using IPW adjustment, or restricted analyses to cases with stage I–III colorectal cancers (Additional files 1: Supplementary Tables 5 and 6).

Table 1 Smoking status at diagnosisa and colorectal cancer mortality stratified by tumor LINE-1 methylation levels in the Nurses’ Health Study and Health Professionals Follow-up Study

Discussion

Previous studies have suggested that smoking may be associated with worse survival in colorectal cancer, perhaps in certain tumor molecular subtypes though findings have not been consistent [27,28,29,30]. In contrast to those tumor markers (except for MSI status) in the previous studies [27,28,29], tumor LINE-1 hypomethylation has been consistently shown to be a strong prognostic indicator in various cancer types including colorectal cancer [12, 31, 32]. Our findings suggest that the positive association between smoking status at diagnosis and mortality may be more pronounced in cases with low LINE-1 methylation levels than in those with high LINE-1 methylation levels.

Experimental studies have shown that DNMT1 (DNA methyltransferase 1) mutation can lead to global DNA hypomethylation [7] and that global DNA hypomethylation promotes tumor development through chromosomal instability including loss of heterozygosity of TP53, which can cause cell cycle arrest for DNA repair or apoptosis of damaged cells [33]. Assuming that tumors with high genomic instability are more prone to somatic mutation induced by exogenous factors such as smoking, it is possible that cases who later on developed tumors with LINE-1 hypomethylation may have been more susceptible to the mutagenic effects of smoking than those with high LINE-1 methylation tumors. Our findings suggest that the stronger association between smoking and mortality observed in LINE-1 hypomethylated tumors may be, at least in part, explained by a higher accumulation of genomic instability over time. In addition, tumor LINE-1 hypomethylation has been associated with lower levels of T cell immune response to colorectal cancer, suggesting its immunosuppressive effect [34]. Furthermore, smoking has been associated with the incidence of colorectal cancer subtypes containing fewer counts of T cells and macrophages, implying its suppressive effect on effector immune cells [35, 36].

Together with these previous findings, our current results may suggest that smoking status and tumor LINE-1 hypomethylation interact and jointly influence the tumor-immune interaction, leading to a stronger prognostic role of smoking status for tumors with LINE-1 hypomethylation.

Strength and limitations

One major strength of this study was our molecular pathological epidemiology [37,38,39] database of colorectal cancer cases with the availability of diet and lifestyle information that has been prospectively and repeatedly collected. This rich database enabled us to examine the prognostic interaction between smoking behavior and tumor LINE-1 methylation levels while adjusting for multiple potential confounders [40, 41] and selection bias due to tumor molecular data availability [25].

There are several limitations to our study. First, we did not examine the effect of postdiagnosis smoking status because 40% of current smokers (44 out of 110) quit smoking after diagnosis and the sample size of postdiagnosis current smokers was small. Second, data on cancer treatment were limited in this dataset. However, it is unlikely that the ratio of patients who underwent chemotherapy differed substantially according to smoking status. Additionally, treatment protocols for colorectal cancers are generally similar across the USA and adjusting for the AJCC stage should have limited potential confounding due to treatment. Third, data on cancer recurrence were unavailable. However, given a follow-up of >10 years, colorectal cancer-specific mortality can be considered as a reasonable measure of colorectal cancer outcome. Fourth, our main results did not meet our stringent multiple comparison-adjusted significance level of P < 0.005. However, we selected all the risk factors and statistical comparisons on the basis of previous data and certain hypotheses and interpreted our results prioritizing biological plausibility, coherence, and consistency rather than only statistical significance. Fifth, there is evidence that LINE-1 hypomethylation is inversely associated with MSI-high, CIMP-high, and BRAF-mutated CRC [11, 16]. While we adjusted for these molecular markers in our multivariable models, due to limited sample size, we were not able to further stratify by these markers. Therefore, our findings warrant additional investigation in future larger-sized studies with sufficient power for these stratified analyses. Sixth, we cannot exclude the possibility of biases related to tumor heterogeneity and contaminated normal cells. In addition, a previous study reported cell-type heterogeneity in LINE-1 methylation levels [42], which might affect our results. However, an experienced pathologist, Dr. Shuji Ogino, carefully reviewed H&E-stained slides of all cases and identified tumor areas in each section, which minimized the possibility of these biases.

Future prospects

Our study has shown a prognostic interaction between smoking and tumor LINE-1 methylation levels measured by the bisulfite-PCR-pyrosequencing method. We used the average of the proportions of C nucleotides at the 4 CpG sites as LINE-1 methylation levels, but DNA methylation may vary in specific repetitive elements of genomes. Recently, several novel approaches have been developed to explore detailed epigenetic profiling. Bock et al. have shown that locus-specific DNA methylation assays in combination with machine learning algorithms can predict global DNA methylation levels more accurately [43]. Subsequently, Zhang et al. reported that a random forest algorithm could accurately predict genome-wide repetitive element methylation using microarray data [44]. Furthermore, nanopore sequencing has enabled us to conduct a direct and real-time analysis of long DNA fragments electronically, which leads to the elimination of amplification bias and efficient assembly, compared to short-read sequencing [45]. These methods can be used in future studies for accurately measuring DNA methylation levels.

Conclusion

Our findings suggest that the association of smoking status at diagnosis with colorectal mortality may be stronger in cases with low LINE-1 methylation level tumors than in those with intermediate or high LINE-1 methylation level tumors. Considering the need for more accurate CRC prognostication, future larger-sized studies are warranted to confirm our findings and guide further exploration into underlying pathways.