Background

Smoking is known to cause many forms of cancer that affect the respiratory, digestive, and urinary tracts [1]. Cigarette smoke contains more than 60 different carcinogenic compounds, including polycyclic aromatic hydrocarbons (PAH), nitrosamines, and aromatic amines, which can form DNA adducts by metabolic activation. This can lead to mutations in tumor-suppressor genes and oncogenes, as well as cell damage, resulting in tumor development [2]. Because carcinogens from smoking can also reach the colorectal mucosa and affect the expression of cancer-related genes [3], it has been established that cigarette smoking is associated with an increased risk of colorectal cancer (CRC) with sufficient evidence by the International Agency of Research on Cancer [4].

Previous studies have reported that cigarette smoke interacts with genetic factors, suggesting that different risk estimates apply to different genetic predispositions [5]. However, there remains a lack of research on gene and smoking interactions for CRC because most studies have focused on genetic polymorphisms of tobacco metabolizing enzymes, and only a weak mEH3-smoking interaction effect was found by a meta-analysis [5]. Genome-wide association studies (GWAS) have identified a number of common low-penetrance genetic loci involved in the etiology and progression of CRC [6], but there were few gene-environment interaction studies between GWAS-identified SNPs and smoking [7]. The genome-wide interaction analyses between genetic variants and smoking were also conducted, but none of statistically significant interactions were observed [8, 9].

Although none of GWAS-identified SNPs were directly relevant to tobacco metabolizing enzymes, since the smoking has been the most environmental exposure factors affecting gene-environment interactions in cancer [10] and both of GWAS-identified SNPs and smoking are evident risk factors for CRC, there may be possible indirect gene-environment interactions. In this case-control study, we hypothesized that smoking could modify associations between common genetic variants and CRC risk. To test this hypothesis, we examined the effects of associations between smoking behaviors and 30 susceptibility SNPs, which were previously identified by GWAS, on CRC risk. Interactions between smoking behaviors and the genotypes of the susceptibility SNPs were also investigated.

Methods

Study population

Eligible cases included CRC patients who were newly diagnosed and underwent surgical treatment between August 2010 and August 2013 at the National Cancer Center (NCC) in Korea. Among 1427 eligible CRC patients, we were able to contact 1259 patients, and 1070 patients agreed to participate in this study. Among them, 367 patients did not complete our questionnaire and had insufficient blood samples for genotyping; these patients were excluded. Accordingly, a total of 703 CRC patients were included in the analysis. Healthy controls were recruited from a cancer-screening center at the NCC among people who visited for a health check-up program supported by the National Health Insurance Corporation between October 2007 and December 2014. After selecting individuals who completed the questionnaire and had sufficient blood samples, the remaining control subjects were 1:2 frequency-matched to 703 CRC patients according to 5-year interval age and sex. Thus, a total of 703 cases and 1406 controls were included in the analysis. The study was approved by the institutional review board of the NCC (IRB No. NCCNCS-10-350 and NCC 2015–0202).

Data collection

The CRC patients were face-to-face interviewed by trained interviewers using a structured and written questionnaire (Additional file 1), which was also used in previous studies [11,12,13]. The original questionnaire written in Korean was developed based on questionnaires of the Korean National Health and Nutrition Examination survey (KNHANES) and the quality assurance and control of the national survey was described in elsewhere [14]. From the questionnaire, we obtained general information on age, sex, family history of CRC, body mass index, education level, and lifestyle information, including alcohol drinking and smoking behavior. The control participants completed self-administered questionnaires on general and lifestyle information. Then, the trained interviewers called them to validate their responses.

Smoking behaviors consisted of ever smoking status, smoking duration, amount of smoking, and pack-years of smoking. The smoking status was classified as never and ever smokers which were defined as those who had smoked ≥5 packs of cigarettes during their lifetime. The pack-years of smoking were calculated by multiplying the amount of smoking (number of cigarettes per day) by duration (number of years smoked) and dividing by 20. The duration, amount, pack-years of smoking was divided into two groups by median value among ever smokers to conduct the gene-environment interaction analyses.

Genotyping

From the National Human Genome Research Institute (NHGRI) GWAS Catalog [6], we extracted 41 CRC-associated SNPs with p-value < 5 × 10−8 reported before 2015. Among those SNPs, 14 imputed SNPs were excluded and 9 SNPs were additionally identified through reference review. The 36 susceptibility SNPs were located among 27 loci, which have been identified to be associated with CRC risk by previous GWAS. These SNPs were selected for genotyping (Additional file 2: Table S1) [15,16,17,18,19,20,21,22,23,24,25]. From the subjects’ blood samples, genomic DNA was extracted using a MagAttract DNA Blood M48 kit and BioRobot M48 automatic extraction equipment (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The genotyping was performed using an Agenabio MassArray iPLEX® gold assay (Agena Bioscience, Inc., San Diego, CA, US). Because of genotyping failure for 4 SNPs and a monomorphic genotype for 2 SNPs, 30 of the originally selected 36 SNPs were included in the final analysis (Additional file 2: Table S2).

Statistical analysis

The Hardy-Weinberg equilibrium (HWE) was tested for the genotypes of each SNP using a chi-square test for the controls. To compare characteristics between the cases and the controls, a t-test was used for continuous variables, specifically age and body mass index (BMI), and a chi-square test was used for categorical variables, specifically family history of CRC, education level, alcohol drinking, and smoking status. The associations of smoking behaviors and additive SNPs on CRC risk were examined using a logistic regression model that was adjusted for age, family history of CRC, BMI, and education level. The interactions were estimated by including additional interaction (genotypes of each SNP × smoking behaviors) terms in the logistic models. In terms of the statistically significant interactions, we also assessed associations between SNPs and CRC risk after stratification by smoking behavior. For multiple comparisons of the 30 SNPs, false discovery rate (FDR) and Bonferroni tests were additionally conducted. For all association tests, odds ratios (ORs) and 95% confidence intervals (95% CIs) were calculated, and p-values less than 0.05 were considered statistically significant. All statistical analyses were stratified by sex and considered two-sided; analyses were performed using SAS version 9.3 (SAS Institute, Inc., Cary, NC, US).

Results

The characteristics of the study subjects are summarized by sex in Table 1. Because of the frequency matching by age and sex between the cases and the controls, there was no significant difference in age. For the men, those affected by CRC showed a higher frequency of having a family history of CRC (P < 0.01), a higher BMI (P < 0.01), and a lower education level (P < 0.01) than the healthy controls. However, alcohol drinking and smoking statuses were similar among the male CRC patients and the control subjects. For the women, there were no differences in family history of CRC, BMI, or alcohol drinking status, but the CRC patients were more likely to have a lower education level (P < 0.01) and more smoking experience than the controls (P < 0.01).

Table 1 Characteristics of colorectal cancer cases and controls from National Cancer Center in Korea, 2010–2013

Table 2 shows the adjusted associations between smoking behaviors and risk of CRC. The male CRC patients who smoked for more than 28 years (OR = 1.49, 95% CI = 1.11–1.98, P < 0.01) at an amount equal or greater than 20 cigarettes per day (OR = 2.12, 95% CI = 1.61–2.79, P < 0.01) and who smoked for equal or greater than 21 pack-years (OR = 1.78, 95% CI = 1.35–2.35, P < 0.01) were significantly associated with increased risk of CRC. For the women, we found that ever smoking (OR = 2.23, 95% CI = 1.15–4.34, P = 0.02) and smoking duration equal or greater than 5 pack-years were associated with increased risk of CRC (OR = 6.11, 95% CI = 1.10–34.00, P = 0.04).

Table 2 Association between smoking behaviors and risk of colorectal cancer

The associations that were defined between the previously identified common SNPs and the risk of CRC were stratified by sex and provided in Additional file 2: Table S2. We found 5 significant interactions between the common SNPs and the various smoking behaviors assessed for risk of CRC (Table 3). There was an interaction between smoking status and the polymorphism rs1957636 at 14q22.3 in LOC105370507 for CRC risk in men. The risk allele (C) was associated with decreased risk among never smokers (ORCC vs. TT = 0.36, 95% CI = 0.16–0.79, ORCT + CC vs. TT = 0.54, 95% CI = 0.32–0.92) and increased risk among ever smokers (ORCC vs. TT = 1.51, 95% CI = 1.02–2.24, ORCT + CC vs. TT = 1.33, 95% CI = 1.00–1.77, P interaction for additive model = 5.5 × 10−4 , P interaction for dominant model = 1.9 × 10−3), with a statistically significant FDR-corrected p-value (P interaction for additive model adjusted by FDR = 1.8 × 10−3). A significant interaction was observed between smoking status and the polymorphism rs4813802 at 20p12.3. This allele was associated with a lower risk of CRC among ever smokers in men (ORGG vs. TT = 0.41, 95% CI = 0.18–0.94, P interaction for additive model = 0.04). In women, significant interactions were observed between smoking status and the polymorphism rs6687758 at 1q41 (P interaction for additive model = 0.03), smoking duration and the polymorphism rs174537 at 11q12.2 in MYRF (P interaction for additive model = 0.05), and pack-years of smoking and the polymorphism rs4813802 (P interaction for additive model = 0.04), but there were no statistically significant associations between those interactions and risk of CRC.

Table 3 Association of GWAS-identified single-nucleotide polymorphisms on risk of colorectal cancer by smoking behaviors

Discussion

In this case-control study, we found that various smoking behaviors, including smoking status, smoking duration, amount of smoking, and pack-years of smoking, were associated with risk of CRC. Additionally, we found that associations between several common susceptibility SNPs, including rs1957636 at 14q22.3, rs4813802 at 20p12.3, rs6687758 at 1q41, and rs174537 at 11q12.2, and risk of CRC were modified by smoking behaviors according to sex.

In this study, greater durations, amounts, and pack-years of smoking in men and ever status and greater pack-years of smoking in women were all associated with an increased risk of CRC. A previous meta-analysis also showed an association between smoking and CRC risk in both men and women [26]. Several studies, in contrast, have reported that associations between smoking and CRC risk were attenuated in women due to small sample sizes or the anti-estrogenic effect of smoking [27, 28].

Biological evidence on the association between smoking and CRC has suggested that carcinogenic compounds absorbed from cigarette smoking could cause mutations in the APC or KRAS genes that are known to be related to early stages of colorectal carcinogenesis [29]. It was reported that APC and KRAS mutations in colorectal polyps were more frequent among smokers compared to non-smokers [30]. However, there were also inconsistent results on the roles of APC and KRAS mutations induced by cigarette smoking in CRC [31] as well as a lack of similar studies. Therefore, more studies on the molecular mechanisms that cause genetic damage induced by cigarette smoking in CRC are needed.

Previous studies on gene and smoking interactions in CRC have been based on candidate genes such as CYP1A1 [32], CYP1A2 [32], GPX1 [33], GSTM1 [32, 34,35,36,37,38], GSTT1 [35,36,37,38], LEPR [39], MAD1L1 [40], mEH3 [41], mEH4 [41], NAT1 [36], NAT2 [32, 42, 43], NQO1 [44], OGG1 [33], PTEN [45], SMAD7 [46], and TGFBR1 [46]. A meta-analysis reported no evidence for gene and smoking interactions for the GSTM1, GSTT1, mEH3, mEH4, and NAT2 genes in CRC. However, this study suggested a potential negative interaction between smoking and mEH3 in colorectal adenoma (CRA). There was also a potential positive interaction between smoking and GSTT1 because smoking was associated with risk of CRA only among GSTT1-null carriers [5].

In this study, we identified novel interactions between smoking behaviors and common susceptibility SNPs, specifically rs1957636, rs4813802, rs6687758, rs174537, and rs481302, in CRC according to sex. The most significant interaction was between smoking status and rs1957636 and showed variable effects: allele (C) was associated with decreased or increased risk of CRC according to whether an individual was a never or ever smoker. The SNP rs1957636 is located at 14q22.3 (LOC105370507) and is close to the transcription start site of the BMP4 gene, which is involved in bone morphogenetic protein (BMP) signaling. A similar positive interaction was also observed between rs17563 on BMP4 and smoking for CRC risk in a previous study [47] in spite of little linkage disequilibrium between rs1957636 and rs17563 (r2 = 0.12 in HapMap3 JPT + CHB + CHD individuals). Biologically, BMP signaling has been suggested to cause human cancer through its tumor suppressor properties, but colon cancer cells were resistant to the growth suppression and differentiation induced by BMP4 [48]. Experiments conducted using a rat model showed that BMP4 was up-regulated by chronic cigarette smoking [49]. Thus, it is possible that the interaction between BMP4 and smoking might explain the variable effects of BMP4 on the risk of CRC.

For the male subjects, the G allele of the SNP rs4813802 tended to be associated with risk of CRC among the ever smokers, while no associations with the SNP were observed among the subjects who never smoked. A possible interaction between the SNP rs4813802 and smoking on CRC risk was also observed in women. The SNP rs4813802 is located upstream of the BMP2 gene. Previous experiments have found that higher nicotine concentrations in smokers decreased BMP2 expression [50], which could mediate intestinal cell growth [51]. Furthermore, the BMP2 gene is part of the transforming growth factor-β (TGFβ) superfamily and plays a role in cell apoptosis, differentiation, and proliferation [52]. However, no results were reported on interactions between SNPs on BMP2 and smoking behaviors in CRC risk. More studies on BMP pathway loci, including BMP4 and BMP2, should be conducted to explain the missing heritability of CRC [53].

Smoking behaviors also possibly interacted with the polymorphisms rs6687758 at 1q41 (intergenic) and rs174537 at 11q12.2 (MYRF) in women, despite the lack of associations with CRC risk. Of these SNPs, rs6687758 is near the DUSP10 gene, which encodes dual specificity phosphatase 10 (DUSP10). DUSP10 regulates intestinal epithelial cell proliferation through the mitogen-activated protein kinase (MAPK) signaling pathway, thereby acting as a suppressor of CRC [54]. The polymorphism rs174537 is known as an expression quantitative trait locus (eQTL) for the FADS1 and FADS2 genes [22], which encode enzymes involved in the metabolism of polyunsaturated fatty acids and mediate the effects of cyclooxygenase-2 (COX-2) in CRC carcinogenesis. Benzo[a]pyrene, one of the carcinogenic compounds included in cigarette smoke, up-regulated COX-2 in mouse cells [55], which in turn could either activate or be dependent on the MAPK pathway, suggesting a possible effect resulting from a gene-smoking interaction [55, 56].

One of the strengths of this study is that we found novel interactions between genes and smoking behaviors that affected CRC risk, accounting for part of the missing heritability in previous GWAS. Especially, the novel interaction between smoking status and the additive genotypes of the polymorphism rs1957636 (P interaction = 5.5 × 10−4) was still significant after FDR (adjusted P interaction = 1.8 × 10−3) and Bonferroni adjustments (P interaction < 1.67 × 10−3). Although several gene-environment interactions involving susceptibility loci identified in GWAS have been evaluated [7, 9, 57,58,59,60], no significant gene-smoking interactions have been observed. In addition, this study considered various types of information regarding smoking behavior, such as status, duration, amount, and pack-years of smoking, which differs from most previous gene-smoking interaction studies, which have typically dealt only with smoking status.

A limitation of this study is the insufficient sample size, leading to relatively low statistical power for detecting gene-smoking interactions; a power of 0.66 was found for the additive and dominant models of the SNP rs1957636, with an α = 0.05 in men. To obtain a power over 0.80 for the same condition, a minimum male sample size of 2025 would be recommended. In our analyses of ever smokers, the median values of duration, amount, and pack-years of smoking were defined differently depending on sex. When we analyzed the data using the common median values between the men and the women, the female associations between smoking behaviors and CRC risk were not supportive of further calculations due to the small number of ever smokers. For women, smoking prevalence is very low in Korea [61]. Accordingly, even though we used the female-specific median values for smoking behaviors, several associations between each combination of genotype and smoking behavior and CRC risk could not be calculated.

Another limitation is that this hospital-based case-control study might have had selection bias because the control subjects were recruited from among individuals who took a health examination. However, the control subjects were from the same hospital as the cases, and random sampling and matching with the cases were conducted to reduce the effect of selection bias. Nevertheless, several GWAS-identified SNPs had a higher proportion of risk alleles in controls than in cases. This may be due to ethnic differences in allele frequency of SNPs and potential lack of representativeness of controls who visited hospital for medical-check-up. However, family history of CRC was not that frequent in controls and if controls were actually characterized by higher-risk group for CRC compared to general population, the results would have been estimated towards the null.

Moreover, other potential confounders, such as dietary factors, were not adjusted in the analyses since there were very little difference in the results. Lastly, because we examined SNPs previously identified in GWAS in this analysis, we did not cover or represent all polymorphisms related to CRC risk. GWAS are likely to identify functional genetic variants that are associated with CRC development rather than those correlated with direct disease-causing function. Accordingly, additional fine mapping and functional studies on possible gene-environment interactions should be conducted.

Conclusions

In conclusion, this study provided evidence that smoking could be associated with CRC risk and identified associations between several common susceptibility SNPs, namely, rs1957636 at 14q22.3, rs4813802 at 20p12.3, rs6687758 at 1q41, and rs174537 at 11q12.2, and CRC risk that may be modified with smoking in CRC carcinogenesis. Further gene-smoking interaction studies with large sample sizes are warranted to confirm our findings.