Background

Ten chromosomal loci have thus far been shown to modestly increase colorectal cancer (CRC) risk, based on genome-wide association studies (GWASs) [19]. The tagging single-nucleotide polymorphisms (SNPs) with the strongest association signal in each locus were rs6983267 at 8q24 [13], rs4939827 at 18q21 [4], rs4779584 at 15q13 [5], rs16892766 at 8q23 [6], rs10795668 at 10p14 [6], rs3802842 at 11q23 [7, 8], rs4444235 at 14q22 [9], rs9929218 at 16q22 [9], rs10411210 at 19q13 [9], and rs961253 at 20p12 [9]. Each of the ten loci independently predispose to CRC with allelic odds ratios (ORs) of <1.3 and risk allele frequencies range between 7-90% in the general population [10].

GWASs are based on genotyping SNPs which tag linkage disequilibrium (LD) blocks in the genome, thus capturing a high proportion of common genetic variation. Hence, usually the associating tag SNPs are not themselves causal but rather are in LD with disease-causing variants. We have previously demonstrated that the tag SNP rs6983267 at 8q24 directly disrupts a TCF-4 transcription factor binding site and enhances Wnt signalling in the colon [11]. This was supported by a simultaneous study showing a physical interaction between rs6983267 region and MYC proto-oncogene [12]. Furthermore, allelic imbalance (AI) at 8q24 seen in colorectal tumours favors the risk allele G of rs6983267, suggesting that the locus is somatically selected for in tumourigenesis [13]. At 18q21, a novel variant that correlates with rs4939827 reduces SMAD7 expression, leading to aberrant TGFβ (transforming growth factor beta) signalling [14]. Finally, rs16888589 at the 8q23 CRC locus was shown to influence EIF3 H expression by physically interacting with the promoter [15]. Unlike at 8q24, no significant difference in the alleles targeted by imbalance was detected in rs4939827 at 18q21 [4, 14] nor in rs16892766 at 8q23 [15]. It is therefore likely that many low-penetrance cancer susceptibility loci may be explained by subtle changes in distant regulatory elements, and these changes can also play a role in the somatic tumour development. Based on the genes that locate inside or near the CRC-associating regions, including GREM1 at 15q13 and BMP4 at 14q22, alterations in TGFβ-superfamily signalling appear to be at the basis of several loci [10].

Possible biological mechanisms underlying CRC predisposition are yet to be discovered in the seven low-penetrance loci at 15q13, 10p14, 11q23, 14q22, 16q22, 19q13, and 20p12, which is the focus of this study. First, possible somatic selection of the risk alleles was evaluated in heterozygous individuals by examining the tagging SNPs in tumour and corresponding normal tissues. The location of the SNPs at predicted enhancer elements was then investigated with an in silico tool. As none of the tagging SNPs were predicted to locate at transcription factor binding sites, the analysis was extended to the given LD regions. Putative functional variants were searched by genotyping all the known SNPs inside the associating LD regions that were predicted to disrupt transcription factor binding sites.

Methods

Study population

A population-based series of 1 042 CRC samples collected since 1994 from nine Finnish central hospitals was used in this study [16, 17]. Both germline DNA extracted from blood or normal colonic tissue and corresponding fresh-frozen tumour DNA were available. Information on histological tumour grade and Duke's stage was obtained from pathology reports. The 837 control DNA samples used in this study were anonymous healthy blood donors from the Finnish Red Cross Blood Transfusion Service. Samples and clinicopathologic data were obtained with informed consent and ethical review board approval in accordance with the declaration of Helsinki.

Allelic imbalance (AI) analysis

Allelic ratios were compared by sequencing tumour and respective normal tissue DNA in heterozygous patients, as described previously [13, 18, 19]. In brief, allele peak height ratios of <0.6 and >1.67 between normal and tumour samples were considered imbalance (Tumour(Allele1/Allele2)/Normal(Allele1/Allele2)). Tumour and normal samples were sequenced using Applied Biosystems BigDye v3.1 sequencing chemistry and ABI3730 Automatic DNA sequencer (Applied Biosystems, Foster City, CA, USA). Peak heights were manually measured from sequencing chromatograms using Chromas http://www.technelysium.com.au and Sequence Scanner (Applied Biosystems) softwares, based on which the allelic ratio was calculated. First, 90 tumour-normal pairs were analyzed. If any trend towards imbalance was observed, all the available heterozygote samples were analyzed. All the tumours were microscopically evaluated by a pathologist and at least 64% of the analyzed tumours contained ≥70% of carcinoma tissue.

Identification of SNPs at transcription factor binding sites

A computational tool called Enhancer Element Locator (EEL) [20, 21], that aligns genomic sequence from two species and predicts the location of putative transcription factor binding sequences and enhancer elements, was used. In the output, a score is given to each element based on conservation, clustering, and predicted affinity of the binding sites. One Mb of human and corresponding mouse sequence surrounding each SNP (500 kb of flanking sequence up- and downstream) was exported from the Ensembl database vs 54 http://www.ensembl.org/index.html. Transcription factor binding-affinity matrices from the publicly available Jaspar database http://jaspar.genereg.net/ and those published elsewhere were used in the alignment [2224], that was done with default parameters. All the known SNPs that were predicted to locate directly at transcription factor binding sites were selected from enhancers that were inside the CRC-associating LD regions. We defined LD blocks using HapMap data http://hapmap.ncbi.nlm.nih.gov/: chr15: 30 782 050 - 30 841 010 bps (59 kb; human genome build 36) in rs4779584 [5], chr10: 8 730 000 - 8 810 000 bps (80 kb) in rs10795668 [6], chr11: 110 640 000 - 110 690 000 bps (50 kb) in rs3802842 [8], chr14: 53 477 192 - 53 494 200 bps (17 kb) in rs4444235 [9], chr16: 67 286 613 - 67 396 803 bps (110 kb) in rs9929218 [9], chr19: 38 203 614 - 38 300 573 bps (97 kb) in rs10411210 [9], and chr20: 6 316 089 - 6 354 440 bps (38 kb) in rs961253 [9]. The analysis was restricted to such SNPs where the EEL score for the given element was ≥ 300.

Genotyping of SNPs in cases and controls

Genotyping was carried out using Sequenom MassArray iPlex Gold (Sequenom, San Diego, CA, USA) performed by the Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki. Each 96-well sample plate contained two negative water controls and two positive CEPH controls. The concordance between duplicate controls was 99,79% (479/480 genotypes). Twelve SNPs (rs11631292, rs62002613, rs17485426, ENSSNP10169878, rs1999638, rs12273224, rs45615536, rs10505287, rs57897735 rs10505283, rs2761880, and rs12893484) were successfully genotyped with MassArray. Three remaining SNPs (rs28768389, rs12899808, and rs34812868) were genotyped by direct genomic sequencing using Applied Biosystems BigDye v3.1 sequencing chemistry and ABI3730 Automatic DNA sequencer (Applied Biosystems).

Statistical analysis

All the analyses were performed with R software. Exact binomial test was used in allelic imbalance analysis. Allelic odds ratios, 95% confidence intervals, and P-values were calculated with Pearson's Chi-squared test. To adjust for multiple testing we applied a Bonferroni correction (not shown in Table 1). Fisher's exact test was used in the analysis of clinicopathological characteristics.

Table 1 Genotyping of 12 SNPS that locate in transcription factor binding sites in low-penetrance CRC loci

Results

The tag SNPs of seven low-penetrance loci were sequenced in heterozygous tumour-normal pairs, in order to detect possible AI occurring in the neoplastic progression. The risk alleles were not significantly targeted by AI in any of the seven SNPs (Table 2). The frequency of overall imbalance (loss of either the risk or the neutral allele) ranged between 9 and 31% at the seven loci (Table 2). In rs10411210, ten tumours showed loss of the neutral allele and five tumours loss of the risk allele, however AI occurred altogether in only 9% of the tumours (Table 2). No significant differences were observed between the two AI groups in terms of Duke's stage (P = 0.3) or histological grade (P = 1.0).

Table 2 Results of allelic imbalance analysis in low-penetrance CRC loci

The LD regions containing the seven tag SNPs were further analyzed with EEL. None of the seven tag SNPs located in predicted transcription factor binding sites. Thirteen other SNPs in the LD regions located at predicted binding sites in elements with a score ≥ 300. Three out of seven loci (16q22, 19q13, and 20p12) contained no SNPs at transcription factor binding sites in elements with a score ≥ 300. One of the SNPs, rs11853552 at 15q13, was already previously genotyped by Jaeger et al. (2008) [5], and was therefore excluded from the analysis. The remaining 12 SNPs were genotyped in the same Finnish case-control series as the tag SNPs in previous studies (Table 1) [6, 8, 9]. None of the SNPs showed association with CRC (Table 1).

Three of the 12 SNPs (rs28768389, rs12899808, and rs34812868 in 15q13) were genotyped using sequencing and four additional polymorphisms that did not locate in any predicted binding sites were observed in the sequencing fragments. One of these SNPs, rs35614970 (A6/A3), showed significant association with CRC (OR 1.2, 95% CI 1.03-1.38, P = 0.02). The frequency of A6 was 0.705 in controls and 0.741 in cases. This association did not remain significant after correction for multiple testing (P = 0.21, Bonferroni correction for 13 SNPs). None of the other additional SNPs (rs11071928, rs34944927, and a novel C to T change at chr15: 30 806 922) showed association with CRC.

Discussion

In this study, we exploited the same approach as for 8q24 to systematically analyze molecular basis of seven low-penetrance CRC loci where the cancer-causing variants have not yet been identified. This is the first time, to our knowledge, that rs4779584, rs10795668, rs4444235, rs9929218, rs10411210, and rs961253 have been analyzed for possible AI in colorectal tumours. Sequencing of fresh-frozen tumour material provides accurate data on possible loss of the neutral allele or gain of the risk allele. In case of selective imbalance, subsequent copy number analyses can reveal whether the role of a variant resembles that of a tumour suppressor or an oncogene, and guide further functional efforts. The 8q24 locus, where gain of the risk allele was observed in rs6983267 [11, 13], currently seems to be the only low-penetrance risk locus for CRC that is somatically enriched during tumourigenesis. Lack of imbalance favouring risk allele does not, however, preclude involvement in germline predisposition. It is therefore possible that some of the seven susceptibility variants analyzed in this study play a role in the early stages of neoplastic development, without providing further selective advantage in the somatic cancer progression.

Interestingly, 19q13 gain is common in both primary and metastatic CRC [25]. We did not, however, observe any significant difference in the alleles targeted by AI in rs10411210, nor association of neutral allele loss with more advanced disease stage. Loss of heterozygosity involving 10p14 has also been reported to occur in CRC [26] but we found no evidence of risk allele selection in rs10795668. Furthermore, deletion of 11q23-q24 is a frequent event in CRC, among other tumour types [27]. However, Tenesa et al. (2008) found no AI in favour of the risk allele in rs3802842 based on up to 43 CRCs [7], which is now confirmed by our analysis of 89 CRCs. Finally, although 18q loss is common in CRC [28], Broderick et al. (2007) observed no selection of the risk allele at rs4939827 in 248 CRC cases [4].

The location of rs6983267 at TCF-4 binding sequence was recently discovered using EEL [11], which prompted us to utilize this powerful tool also for the seven susceptibility loci. Transcription factor tissue specificities are incompletely understood, supporting the rationale of our unbiased approach of not restricting to colon-specific factors. Given that the tagging SNPs at the seven loci lie in noncoding regions, the most likely underlying mechanism is differential gene expression through enhancers or repressors. Although regulatory SNPs have been identified in the loci successfully fine-mapped so far, our study underscores the importance of considering also other mechanisms. For instance, sequence variation affecting noncoding regulatory RNAs, many of which have been linked to cancer-associated pathways, could explain some of the predisposition loci devoid of coding genes.

Conclusion

While successful in the 8q24 locus, the approach used in this study was unable to pinpoint causal variants in the seven low-penetrance CRC loci analyzed. Finding the underlying functional changes in the GWAS loci is a challenging, yet important, task in order to fully understand the biology behind common CRC susceptibility.