Meta-analysis of new genome-wide association studies of colorectal cancer risk
- First Online:
- Cite this article as:
- Peters, U., Hutter, C.M., Hsu, L. et al. Hum Genet (2012) 131: 217. doi:10.1007/s00439-011-1055-0
- 964 Views
Colorectal cancer is the second leading cause of cancer death in developed countries. Genome-wide association studies (GWAS) have successfully identified novel susceptibility loci for colorectal cancer. To follow up on these findings, and try to identify novel colorectal cancer susceptibility loci, we present results for GWAS of colorectal cancer (2,906 cases, 3,416 controls) that have not previously published main associations. Specifically, we calculated odds ratios and 95% confidence intervals using log-additive models for each study. In order to improve our power to detect novel colorectal cancer susceptibility loci, we performed a meta-analysis combining the results across studies. We selected the most statistically significant single nucleotide polymorphisms (SNPs) for replication using ten independent studies (8,161 cases and 9,101 controls). We again used a meta-analysis to summarize results for the replication studies alone, and for a combined analysis of GWAS and replication studies. We measured ten SNPs previously identified in colorectal cancer susceptibility loci and found eight to be associated with colorectal cancer (p value range 0.02 to 1.8 × 10−8). When we excluded studies that have previously published on these SNPs, five SNPs remained significant at p < 0.05 in the combined analysis. No novel susceptibility loci were significant in the replication study after adjustment for multiple testing, and none reached genome-wide significance from a combined analysis of GWAS and replication. We observed marginally significant evidence for a second independent SNP in the BMP2 region at chromosomal location 20p12 (rs4813802; replication p value 0.03; combined p value 7.3 × 10−5). In a region on 5p33.15, which includes the coding regions of the TERT-CLPTM1L genes and has been identified in GWAS to be associated with susceptibility to at least seven other cancers, we observed a marginally significant association with rs2853668 (replication p value 0.03; combined p value 1.9 × 10−4). Our study suggests a complex nature of the contribution of common genetic variants to risk for colorectal cancer.
Colorectal cancer is the second leading cause of cancer death in developed countries, with the lifetime risk estimated to be 5–6% (Ries et al. 2007). Linkage studies have identified important rare germline mutations, such as those in the APC gene and DNA mismatch repair genes, leading to severe syndromes, e.g. familial adenomatous polyposis and Lynch syndrome (also called hereditary non-polyposis colorectal cancer) (de la Chapelle 2004). However, these high-penetrance mutations explain only a small fraction of the genetic risk. To date, genome-wide association studies (GWAS) have identified 14 low-penetrance genetic variants that, together, explain approximately 8% of the familial association of this disease (Broderick et al. 2007; Gruber et al. 2007; Houlston et al. 2008, 2010; Tenesa et al. 2008; Tomlinson et al. 2007, 2008; Zanke et al. 2007). Based on a recent method by Chatterjee and Park (Park et al. 2010) that estimates the amount of familial association explained by common genetic variants, we estimate that about 60–70 common variants [95% confidence interval (CI) 31–173] would explain approximately 17% (95% CI 11.6–35.8%) of the familial association in colorectal cancer. Accordingly, we hypothesize that additional common colorectal cancer susceptibility loci exist that yet have to be identified, and that these loci can be identified through a genome-wide analysis of single nucleotide polymorphism (SNP) data.
Studies participating in the genome-wide association study (GWAS) and replication meta-analyses
Colon Cancer Family Registry
Women’s Health Initiative
Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial
Diet, Activity and Lifestyle Survey
Assessment of Risk in Colorectal Tumors in Canada
Colon Cancer Family Registry Set II
Darmkrebs: Chancen der Verhütung durch Screening
Diet, Activity and Lifestyle Survey Set II
French Case–Control Study
Health Professionals Follow-up Study
Molecular Epidemiology of Colorectal Cancer Study
Newfoundland Familial Colon Cancer Registry
Nurses’ Health Study
Physician’s Health Study
From the fixed-effects meta-analysis of GWAS scans, the inflation factor λ was 1.008, indicating little evidence of residual population substructure, cryptic relatedness, or differential genotyping between cases and controls (Supplemental Figure 1). When analyzed separately, λ was similarly low for each scan (range 1.005 to 1.01).
Risk estimates for colorectal cancer for previously reported genome-wide association studies (GWAS) hits
GWAS + replication
OR (95% CI)
OR (95% CI)
OR (95% CI)
8.7 × 10−5
2.1 × 10−4
1.1 × 10−7
5.2 × 10−3
8.1 × 10−7
1.8 × 10−8
6.8 × 10−7
9.3 × 10−3
4.0 × 10−7
4.1 × 10−3
2.7 × 10−5
3.8 × 10−7
9.0 × 10−4
5.2 × 10−5
6.4 × 10−2
7.5 × 10−4
3.3 × 10−3
1.4 × 10−5
Risk estimates for colorectal cancer for loci with p < 5 × 10−4 in combined genome-wide association studies (GWAS) and replication analysis
GWAS (2,906 cases, 3,416 controls)
Replication (8,161 cases, 9,101 controls)
GWAS + replication
OR (95% CI)
OR (95% CI)
OR (95% CI)
SNPs in regions of prior GWAS hits for colorectal cancer
5.0 × 10−4
4.1 × 10−3
1.7 × 10−5
5.5 × 10−5
7.3 × 10−5
SNPs in new regions
1.2 × 10−3
5.6 × 10−6
2.9 × 10−4
8.7 × 10−5
8.8 × 10−4
1.4 × 10−4
4.5 × 10−4
1.9 × 10−4
2.5 × 10−3
2.1 × 10−4
3.6 × 10−4
3.6 × 10−4
For all SNPs in Table 3, we tested if the risk estimates of these variants may vary by mode of inheritance or sex. While for some variants (rs4813802/BMP2, rs275454/POLS, rs2373859/SLC8A1, and rs2853668/CLPTM1L), the recessive model tended to provide stronger risk estimates and slightly lower p values than the log-additive or dominant model, the AIC value was >2 in all cases, indicating no statistical evidence for improvement over the log-additive model (Supplemental Table 5). We also explored if results vary by sex and found that for rs16888522/EIF3H the statistical evidence for association was stronger in men (OR = 1.25; p value = 0.002) than in women (OR = 1.10; p value = 0.25), although the effect estimates were in the same direction and similar in magnitude for both men and women (Supplemental Table 6).
As a sensitivity analysis, we reran the combined fixed-effects meta-analysis leaving out one study at a time for all SNPs in Table 2. In no case did the point estimate change >3%. Further, all pfixed remained <5 × 10−3 except for when we removed the French study from the analysis of rs4925386. In that case the OR remained similar (OR = 0.94) but the p value was slightly attenuated pfixed = 8.2 × 10−3.
From the analysis of GWAS and replication, including a total of up to 11,067 cases and 12,517 controls, we found that SNPs in eight out of ten previously identified colorectal cancer susceptibility loci were associated with the disease in our replication study at p < 0.05. We found evidence that a second SNP (rs4813802) near the BMP2 gene could be associated with colorectal cancer, independent of the association with the previously identified susceptibility SNP in that region (rs961253). Furthermore, our study reports for the first time a potential new association of a variant in the TERT-CLPTM1L region with colorectal cancer risk.
Our results provide further support for eight of ten previously identified GWAS hits. When excluding studies that have previously published results on these known loci, five loci showed evidence of replication in this independent subsample. The 8q24 SNP rs6983267 has already been heavily studied, including published reports for many of the studies included in this paper (Figueiredo et al. 2011; Hutter et al. 2010), so we were not able to examine independent replication of this SNP in this study. Among the remaining four loci that did not show a significant association at p < 0.05, three showed a trend toward replication (with p < 0.2 and an OR in the same direction as the original GWAS report). However, one SNP, rs10795668, did not show any evidence for association with disease (OR = 1.00; 95% CI 0.93–1.08; p = 0.96; Supplemental Figure 2). Several papers have reviewed potential reasons for the lack of replication of GWAS findings (Chanock et al. 2007; Kraft et al. 2009). As in any observational study, it is possible this represents either a false positive in the initial report or a false negative in this replication; although that seems unlikely since both the discovery GWAS and this report are based on large, well-powered studies. We used the same genetic model and similar trait definitions as the discovery GWAS. Further, all studies were restricted to non-Hispanic Whites, limiting the possibility of differences in LD patterns. It is possible that there may be differences in the distribution of a key effect modifier between the studies used to identify rs10795668 and the studies presented in this paper. A full exploration of underlying gene–gene or gene–environment interactions is beyond the scope of the current paper, but we did explore if the effect of rs10795668 varied by sex. Although the results were not significant for either sex, and the 95% CIs overlap, we do note an interesting pattern where the ORs are in opposite directions for women and men, with men showing a trend in the direction of the discovery GWAS. Specifically, we found ORwomen = 1.07 (95% CI 0.92–1.26; pfixed = 0.38) and ORmen = 0.95 (95% CI 0.84–1.07; pfixed = 0.40).
The rs4925386/LAMA5 SNP was also recently identified in another GWAS meta-analysis (Houlston et al. 2010). Although it was not a known colorectal cancer susceptibility locus at the time we selected SNPs for replication, this SNP met our criteria for selection, and showed evidence for association in our replication sample. The rs4925386 variant lies in the intron of the large laminin A5 protein encoding gene. As previously reported the variant is in LD (r2 > 0.5) with four nonsynonymous SNPs in LAMA5 (Houlston et al. 2010). However, the prediction of each of these amino acid changes is proposed to be benign. Overall, our finding provides additional independent support that this variant is associated with susceptibility to colorectal cancer.
None of the loci were significantly associated with colorectal cancer in our replication study after adjusting for multiple testing (0.05/321 = 1.6 × 10−4), and none of the loci reached “genome-wide significance” at the suggested p values of 1.6 × 10−7 after accounting for the two-stage design (for details, see “Materials and methods”) (Dudbridge and Gusnanto 2008; Hoggart et al. 2008; International HapMap Consortium 2005; Pe’er et al. 2008; Risch and Merikangas 1996; Wellcome Trust Case Control Consortium 2007). However, for some of the variants with p < 0.05 in our replication and combined p value <10−4, additional lines of evidence provide support for the hypothesis that we may have identified genomic regions harboring causal variants for colorectal cancer susceptibility. The variant rs4813802 is about 295.3 kb centromeric to the previously identified rs961253/BMP2 GWAS hit (Houlston et al. 2008); both statistical models and LD data support the idea that these are independent signals. The closest gene is bone morphogenetic protein 2 (BMP2). The new variant of interest, rs4813802, is closer to BMP2 (49.2 kb upstream) than the previously identified SNP rs961253 (344.5 kb upstream of BMP2). Interestingly, rs4813802 lies within an ENCODE Digital DNAseI Hypersensitivity Cluster; it is also within an ENCODE region showing H3K4Me1 enhancer associated histone marks (Rosenbloom et al. 2010), and the flanking 15 bp shows strong placental mammal conservation by PhastCons (Siepel et al. 2005). While not conclusive, all of these are consistent with the region flanking the SNP acting as a long-range enhancer element, plausibly for BMP2. The BMP2 gene belongs to the transforming growth factor-β (TGFβ) superfamily, which plays an important role in cell proliferation, differentiation, and apoptosis (Massague 2000). SNPs in five out of the ten known colorectal cancer SNPs have chromosomal locations in or near TGFβ superfamily genes (Tenesa and Dunlop 2009). Furthermore, loss in BMP signaling has been reported at the transition from advanced adenoma to early cancer stage, compatible with a role in tumor progression (Hardwick et al. 2008). Support for a role for BMP signaling in colorectal cancer comes from the identification of mutations in the bone morphogenetic protein receptor, type IA protein (BMPR1A) in juvenile polyposis (Howe et al. 2001). Individuals with familial juvenile polyposis have a 20% risk of colon cancer by age 35 and 68% by age 60 (Schreibman et al. 2005). Our finding supports the possibility of allelic heterogeneity at the BMP2 locus, which is consistent with findings for the 8q24 cancer locus (Al Olama et al. 2009; Witte 2007) and recent findings for height showing evidence for allelic heterogeneity at as many as 19 loci (Lango et al. 2010). Similar to our finding, these 19 secondary signals in height were rather distant (on average 177 kb) from the initial index SNP that was found to be associated through GWAS (Lango et al. 2010). Accordingly, a comprehensive exploration of already discovered colorectal cancer loci may uncover additional independent variants. However, this example demonstrates that defining the boundaries of a susceptibility locus may be challenging, because the SNP we identified (rs4813802) would not have been included if we had defined the region around the initial index SNP (rs961253) by LD.
The SNP rs7315438, which showed the most statistically significant association in both the replication study alone, as well as in the combined meta-analysis of GWAS and replication studies, is located on chromosome 12q24 about 76.9 kb upstream of the T-box 3 protein (TXB3). The SNP is also located 50.4 kb downstream of MED13L, which encodes for a subunit of the mediator complex, a large complex of proteins that functions as a transcriptional coactivator for most RNA polymerase II-transcribed genes. Since it has been implicated in transcription, this gene is a plausible candidate for further study. However, this SNP is in a large LD region containing numerous other potential candidate genes, including the kinase suppressor of RAS2 (KSR2).
Other SNPs identified as potentially associated with colorectal cancer in this study are rs27545 (POLS), rs2373859 (SLC8A1) and rs1525461 (LOC643308/TPK1). The gene closest to rs27545 is POLS (59 kb downstream), a DNA polymerase that is likely involved in DNA repair and, hence, provides a potentially interesting candidate gene (Hubscher et al. 2002). Other genes close to rs27545 are SRD5A1 (146 kb upstream), which converts testosterone into the more potent dihydrotestosterone, and the methyltransferase NSUN2 (183 kb downstream), which methylates tRNA (Brzezicha et al. 2006). The SNP rs2373859 resides in the intronic region of SLC8A1 also known as NCX1, which is a cell membrane protein that is involved in the rapid Ca(2+) transports (Annunziato et al. 2004). It is in a gene-rich region including other interesting candidates, such as MAP4K3 (954 kb upstream), a member of the mitogen-activated protein kinases, which is involved in regulating both cell growth and death and has altered gene expression in many cancer types (Cuadrado and Nebreda 2010) and SOS1, which may act as a positive regulator of RAS (Freedman et al. 2006). The closest gene to rs1525461 is TPK1 (195 kb upstream). TPK1 is involved in the regulation of thiamine metabolism (Timm et al. 2001). TPK1 flanks a gene-rich region, including several olfactory receptors but none of the genes has an obvious link to colorectal cancer development. However, the assignment of SNPs to candidate genes should be done with caution, as recently shown by additional fine mapping and in silico analysis of the previously identified colorectal cancer loci 8q23.3 (EIF3H), 16q22.1 (CDH1/CDH3), which suggested functional variation in unexpected candidate target genes (Carvajal-Carmona et al. 2011).
Overall, our study suggests a complex nature of the contribution of common genetic variants to risk for colorectal cancer, and suggests the need for additional studies to identify variants with marginal effects, as well as studies to examine potential sources and role of heterogeneity, including gene–gene and gene–environment interactions. We note that this study focused on the log-additive model. Although we present results for other genetic models for our top findings, our results may have been biased for SNPs that do not follow this assumed log-additive model (Minelli et al. 2005). Further, this study was not set up to investigate less frequent (allele frequency 1–5%) and rare variants (allele frequency < 1%), which have the potential to contribute substantially to the genetic susceptibility of colorectal cancer (Bodmer and Bonilla 2008; Cirulli and Goldstein 2010; Manolio et al. 2009).
In summary, we replicated the majority of SNPs that have previously been found to be associated with CRC in GWAS studies. We also report suggestive evidence for an additional independent signal for colorectal cancer risk in the BMP2 locus and a possible new association of colorectal cancer with a variant in the multi-cancer susceptibility locus around TERT-CLPTM1L. Future studies are needed to try to replicate these findings, and if successful, to identify the underlying variants directly responsible for the association, and to study the underlying molecular mechanisms.
Materials and methods
The studies and their abbreviations are listed in Table 1, and each study is described in detail in the Supplemental Note. In brief, all cases were defined as colorectal adenocarcinoma (International Classification of Disease Code 153-154) and confirmed by medical records, pathologic reports, or death certificate. All cases and controls were self-reported as White, which was confirmed in GWAS samples based on genotype data. All participants gave written informed consent and studies were approved by the Institutional Review Board.
The GWAS meta-analysis results are based on two scans. One GWAS was conducted within the CCFR, including population-based cases and unrelated population-based controls from three sites: USA, Canada, and Australia (Figueiredo et al. 2011). In total, 1,191 cases and 999 controls were successfully genotyped on the Illumina 1M/1M Duo platform and passed all quality-control (QC) steps. The second scan was conducted across three US studies: the WHI and PLCO cohorts and the DALS population-based case–control study. A total of 1,715 colon cancer cases and 2,417 controls were successfully genotyped on the Illumina HumanHap 550K, 610K or combined Illumina 300K and 240K platforms and passed all QC steps. After applying rigorous genotyping QC filters (see below), a total of 378,739 directly genotyped SNPs commonly shared among the scans were included in the GWAS meta-analysis. To further boost the power and inform the ranking of SNPs, we included summary statistics from a previously published colorectal cancer GWAS (Colorectal Tumour Gene Identification Consortium, CORGI) in the meta-analysis (The Institute of Cancer Research 2008; Tomlinson et al. 2008). However, to ensure independence of results from prior published scans, we did not include any CORGI results in any of the presented ORs or p values.
Fixed-effects p values from the GWAS meta-analysis were used to select SNPs for replication. We rank ordered the top SNPs. We used LD information in our controls to prune out “redundant” signals (defined as r2 > 0.5 for SNPs ≥ 100 kb apart and r2 > 0.1 for SNPs < 100 kb apart). For the top five SNPs, with p < 10−5, we selected two other SNPs with r2 > 0.9 to ensure against potential genotyping failure. We then went down the ranked list until we filled our SNP platform (total number of SNPs selected for this project = 343). SNPs were excluded based on p value for heterogeneity < 0.001 (n = 1) and poor clustering in visual inspection of cluster plots (n = 3). If SNPs had a low design score, we replaced them with an alternative SNP with r2 > 0.9. The lowest ranked SNP had p value 1.2 × 10−3. Our platform also included SNPs for the ten known colorectal cancer susceptibility loci published in previous GWAS at the time we designed the platform. These 343 SNPs were genotyped in samples from DACHS, DALS, French, HPFS, NHS and PHS studies (N = 4,062 cases and 4,718 controls) (Table 1; Supplemental Note) and 306 SNPs were successfully genotyped in all studies (see details below). After we selected SNPs for replication, the ARCTIC genome-wide scan became available (769 cases and 665 controls), and we used imputed data from that study for analysis of the 343 SNPs (12 SNPs were not included due to low imputation quality or low HWE p values). As of April 2010, we had genotyped and analyzed the GWAS data and replication data from ARCTIC, DACHS, DALS and the French case–control study. We selected 32 SNPs with p < 0.1 in this replication set and/or a pfixed < 10−4 in the combined replication and GWAS for further genotyping in 2,550 cases and 3,539 controls, including additional samples from NHS, PHS, and HPFS, and samples from MECC and NFCCR. The top SNPs were also analyzed in a second set of data from the CCFR (780 cases and 780 controls). We present results for the total replication sample of 8,161 cases and 9,101 controls.
Genomic DNA was extracted from blood samples or, in the case of a subset of PLCO samples, from buccal cells using conventional methods.
GWAS for CCFR
Genotyping was completed on the Illumina Human1M and Human1M-Duo Bead Array in accordance with the manufacturer’s protocol.
Sample exclusions The following sample exclusion criteria were applied: call rate < 95% (n = 75), any stripe (physical/analytical location on BeadChip) call rate < 80% (n = 9), discordance with prior genotyping (n = 3), non-White (n = 29), samples that showed admixture identified using the program STRUCTURE (n = 33) (Falush et al. 2003; Pritchard et al. 2000), high identity by descent using PLINK (n = 2), and mismatch between called and phenotypic sex (n = 4). The final analysis was based on 1,191 cases and 999 controls.
SNP exclusions SNPs were excluded if they did not overlap between the Illumina Human1M and Human1M-Duo (n = 190,301), were annotated as “Intensity Only” (n = 8,263), had call rates < 90% on either the Illumina Human1M or Human1M-Duo (n = 9,229), or by study center or case–control status (n = 12,695). When further restricting analysis to SNPs with Hardy–Weinberg Equilibrium (HWE) p > 0.0001, MAF > 0.05, and SNP call rate > 0.98, a total of 739,733 SNPs remained in the analysis.
Average sample call rate was equal to 98.6% with >94% of samples having a call rate > 98%. Intra- and interplate replicate concordance rates were equal to 99.97 and 98.7%, respectively.
GWAS for DALS, PLCO and WHI
Genotyping was completed using Illumina HumanHap300 and HumanHap240S (PLCO), 550K (WHI, DALS) and 610K (DALS, PLCO) BeadChip Array System on the Infinium platform in accordance with the manufacturer’s protocol or as previously described for HumanHap300 and HumanHap240S (Yeager et al. 2007).
Sample exclusions Samples were excluded if the average call rate was <97% (DALS: n = 110, PLCO: n = 63, WHI: n = 66) or there was a mismatch between called and phenotypic sex (DALS: n = 6, PLCO: n = 1). To search for unexpected duplicates and closely related individuals we calculated identity-by-state values. We excluded unexpected duplicates (DALS, n = 2). Additionally, we excluded samples based on low concordance with prior genotyping (DALS: n = 10, WHI: n = 1) as well as samples that did not cluster with the CEU samples in principal component analysis including the three HapMap populations as a reference (DALS: n = 20, PLCO: n = 2, WHI: n = 6). The final analysis was based on 698 cases and 719 controls in DALS, 534 cases and 1,168 controls in PLCO, and 483 cases and 530 controls in WHI.
SNP exclusions Because we combined data from different platforms, we took precautions to exclude SNPs that do not perform consistently across platforms. This included SNPs reported by Illumina as not performing consistently across platforms (n = 78), SNPs found to have more than one discordant call across the 550K and 610K platforms in HapMap Data or our interplatform duplicates (n = 185); and SNPs with different MAF calls on the two platforms in our control populations (n = 9). We further filtered SNPs within each study (DALS, PLCO, WHI) based on MAF < 0.05% or HWE in controls < 0.0001. We applied a call rate per chip type per study of >98%. A total of 392,361 SNPs passed all QC checks for all three studies.
The average sample call rate was ≥98.8% in any of the three studies, and the concordance rate of blinded duplicates (n = 98 pairs) was >97%.
When we combined data across all scans, a total of 378,739 autosomal SNPs were successfully genotyped across all studies and used in our final GWAS meta-analysis of 2,906 cases and 3,416 controls.
Genotyping of 343 SNPs in DACHS, DALS, French, and the first sub-sets of HPFS, NHS and PHS were carried out using BeadXpress technology according to the manufacturer’s protocol. Problematic genotype clusters were visually inspected by lot number and the calling algorithm was adjusted, if indicated. 35 SNPs were excluded from the analysis due to poor cluster quality and 2 SNPs were excluded for being out of HWE (p < 0.0001) in controls of at least one study. The 306 SNPs in the replication had call rates > 92% across studies (average call rate per SNP per study 97.8%). MECC and NFCCR samples were genotyped using Matrix-assisted Laser Desorption/Ionization Time-of-Flight on the Sequenom® MassARRAY 7K platform (Sequenom, Inc., San Diego, CA). A total of 23 and 30 SNPs were successfully genotyped in MECC and NFCCR, respectively. Additional samples from NHS, HPFS and PHS were genotyped on 29 SNPs using the TaqMan® OpenArray® Genotyping Instrument Platform Assays (Applied Biosystems, Carlsbad, CA). Overall, 32 SNPs had call rates > 98% across studies (average call rate per SNP per study 99.5%; Supplemental Table 7), indicating excellent quality.
Two GWAS data sets (ARCTIC and CCFR II) became available after the GWAS meta-analysis and were used only for replication as described above. ARCTIC has been previously published (Zanke et al. 2007). Because ARCTIC was genotyped on the Affymetrix platform with limited overlap of SNPs with the Illumina platforms, we made use of imputed data for this study. Imputation was done with BEAGLE, using the phased HapMap release 22 as the reference sample (http://ftp.hapmap.org/phasing/2007-08_rel22/) (Browning and Browning 2009). SNPs were removed if they were out of HWE (p < 0.0001) in the controls (n = 1) or had an imputation r2 < 0.3 (n = 11). For CCFR phase II, samples were genotyped using the Illumina 1M Omni. Inclusion/exclusion criteria for cases in phase II were consistent to those described for phase I.
Study-specific analysis of GWAS data
To estimate the association between each genetic marker and risk for colorectal cancer we calculated ORs and 95% CIs using log-additive genetic models relating the genotype dose (0, 1 or 2 copies of the minor allele) to risk of colorectal cancer. We adjusted for age, sex (when appropriate), center and the first three principal components from EIGENSTRAT to account for population substructure. The CCFR calculated these estimates with Cochran–Mantel–Haenzsel analysis with strata defined by age, sex, and center.
Quantile–quantile (Q–Q) plots were assessed to determine whether the distribution of the p values in each study population was consistent with the null distribution (except for the extreme tail; Supplemental Figure 1). To quantify the data in the QQ plots, we calculated the inflation factor (λ) to measure the over-dispersion of the test statistics from association tests by dividing the mean of the test statistics by the mean of the expected values from a Chi-square distribution with 1 degree of freedom.
Combined analysis of GWAS
We conducted inverse-variance weighted fixed-effects meta-analysis to combine OR estimates from log-additive models or multiplicative methods across individual studies as described above. In this approach, we weighted the beta estimates of each study by their inverse variance and calculated a combined estimate by summing the weighted betas and dividing by the summed weights. We chose to focus on fixed effects because we only had a small number of studies. When the number of studies is small, the between study variance may be poorly estimated, resulting in deflated test statistics for association. As such, fixed-effects analysis is better powered for discovery of novel variants (Kraft et al. 2009). We calculated I2, which is a measure of the percentage of total variation across studies due to heterogeneity beyond chance, and obtained the heterogeneity p values based on Cochran’s Q statistic (Ioannidis et al. 2007).
Study-specific analysis of replication data
To estimate the association between each genetic marker and risk for colorectal cancer, we calculated ORs and 95% CIs using a log-additive genetic model relating the genotype dose (0, 1 or 2 copies of the minor allele) to risk of colorectal cancer and adjusting for age, sex, and study center (as appropriate) in logistic regression analysis.
Combined analysis of replication data
We conducted inverse-variance weighted fixed-effects meta-analysis to combine OR estimates from log-additive models across individual studies and measured heterogeneity using I2 and Cochran’s Q statistic, as discussed above.
Analysis of combined GWAS and replication data
We again combined across studies using inverse-variance weighted fixed-effects meta-analysis. For novel SNPs with p < 5 × 10−4 based on combined analysis of GWAS and replication, we also report random effects that incorporate potential heterogeneity into the effect estimate. For these SNPs, we also examined dominant, recessive and unrestricted genetic models and compared models by calculating the Akaike information criterion (AIC). We performed stratified analyses and evaluated whether the effects differed by sex. For novel SNPs in regions identified by previous GWAS, we also performed a conditional analysis including both the newly and previously identified SNPs in the region in one model to examine whether the effect of the newly identified SNP can be explained by the existing one. To quantify the independence of the novel SNPs from prior GWAS hits in the same region, we calculated the variance–covariance matrix and reported the correlation between the two betas. Finally, we performed a sensitivity analysis where we removed the studies one at a time and examined the results from the fixed-effect meta-analysis. We report any situations where removing one study resulted in a >5% change in the OR point estimate and/or reduced the p value of the combined fixed-effects meta-analysis to be <5 × 10−3, since that would indicate the results might be being driven by only one study.
Criterion for genome-wide significance
Based on an increasing number of papers (Dudbridge and Gusnanto 2008; Hoggart et al. 2008; International HapMap Consortium 2005; Pe’er et al. 2008; Risch and Merikangas 1996; Wellcome Trust Case Control Consortium 2007) providing a detailed discussion on the appropriate genome-wide significance threshold, which all arrive at similar values in the range of 5 × 10−7 to 5 × 10−8 for White populations, we decided to use a p value of 5 × 10−8 as genome-wide significance threshold. To account for the two-stage approach (GWAS and replication), we calculated that an overall p value of 5 × 10−8 is equal to a combined two-stage p value of 1.6 × 10−7 given our sample sizes in the GWAS and replication and a threshold for selecting SNPs from the GWAS of 1.2 × 10−3 as used here.
We used PLINK (Purcell et al. 2007; Purcell 2011) and R (R Development Core Team 2011) to conduct the statistical analysis and summarized results graphically using STATA (StataCorp 2009), snp.plotter (Luna and Nicodemus 2007), and LocusZOOM (Pruim et al. 2010).
The authors thank Dr. Ian Tomlinson at the Wellcome Trust Centre for Human Genetics, Oxford, UK, Dr. Richard Houlston at the Section of Cancer Genetics, Institute of Cancer Research, Sutton, UK, and Dr. Malcolm Dunlop at Colon Cancer Genetics Group, Institute of Genetics and Molecular Medicine, University of Edinburgh and Human Genetics Unit, Medical Research Council, Edinburgh, UK for providing access to GWAS summary statistics of the Colorectal Tumour Gene Identification Consortium (CORGI) and allow us to use these results to inform the ranking of the SNP selection for the replication.
ARCTIC: This work was supported by the Cancer Risk Evaluation (CaRE) Program grant from the Canadian Cancer Society Research Institute. TJH and BWZ are recipients of Senior Investigator Awards from the Ontario Institute for Cancer Research, through generous support from the Ontario Ministry of Research.
CCFR: This work was supported by the National Cancer Institute, National Institutes of Health under RFA # CA-95-011 and through cooperative agreements with members of the Colon Cancer Family Registry and P.I.s. This genome-wide scan was supported by the National Cancer Institute, National Institutes of Health by U01 CA122839. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the CFRs, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the CFR. The following Colon CFR centers contributed data to this manuscript and were supported by the following sources: Australasian Colorectal Cancer Family Registry (U01 CA097735), Familial Colorectal Neoplasia Collaborative Group (U01 CA074799), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (U01 CA074800), Ontario Registry for Studies of Familial Colorectal Cancer (U01 CA074783), Seattle Colorectal Cancer Family Registry (U01 CA074794), University of Hawaii Colorectal Cancer Family Registry (U01 CA074806).
DACHS: This work was supported by grants from the German Research Council (Deutsche Forschungsgemeinschaft, BR 1704/6-1, BR 1704/6-3, BR 1704/6-4 and CH 117/1-1), and the German Federal Ministry of Education and Research (01KH0404 and 01ER0814). We thank all participants and cooperating clinicians, and Ute Handte-Daub, Belinda-Su Kaspereit and Ursula Eilber for excellent technical assistance.
DALS: This work was supported by the National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (R01 CA48998 to MLS).
DALS, PLCO and WHI GWAS: Funding for the genome-wide scan of DALS, PLCO, and DALS was provided by the National Cancer Institute, Institutes of Health, U.S. Department of Health and Human Services (R01 CA059045 to UP). CMH was supported by a training grant from the National Cancer Institute, Institutes of Health, U.S. Department of Health and Human Services (R25 CA094880).
FRENCH: This work was funded by a regional Hospital Clinical Research Program (PHRC) and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises dans la LUtte contre le Cancer (GEFLUC), the Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer (LRCC).
GECCO: Funding for GECCO infrastructure is supported by National Cancer Institute, Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088 to UP).
HPFS: This work was supported by the National Institutes of Health (P01 CA 055075 to C.S.F., R01 137178 to A.T.C., and P50 CA 127003 to C.S.F.). We acknowledge Patrice Soule and Hardeep Ranu for genotyping at the Dana-Farber Harvard Cancer Center High Throughput Polymorphism Core under the supervision of David J. Hunter, and Carolyn Guo for programming assistance.
MECC: This work was supported by the National Institutes of Health, U.S. Department of Health and Human Services (R01 CA81488 to SBG and GR).
NFCCR: This work was supported by an Interdisciplinary Health Research Team award from the Canadian Institutes of Health Research (CRT 43821); the National Institutes of Health, U.S. Department of Health and Human Services (U01 CA74783); and National Cancer Institute of Canada grants (18223 and 18226). The authors wish to acknowledge the contribution of Alexandre Belisle and the genotyping team of the McGill University and Génome Québec Innovation Centre, Montréal, Canada, for genotyping the Sequenom panel in the NFCCR samples.
NHS: This work was supported by the National Institutes of Health (P01 CA 087969 to ELG, R01 137178 to ATC, and P50 CA 127003 to CSF). We acknowledge Patrice Soule and Hardeep Ranu for genotyping at the Dana-Farber Harvard Cancer Center High Throughput Polymorphism Core under the supervision of David J. Hunter, and Carolyn Guo for programming assistance.
PHS: We acknowledge Patrice Soule and Hardeep Ranu for genotyping at the Dana-Farber Harvard Cancer Center High Throughput Polymorphism Core under the supervision of David J. Hunter, and Haiyan Zhang for programming assistance.
PLCO: This research was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics and supported by contracts from the Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services. The authors thank Drs. Christine Berg and Philip Prorok at the Division of Cancer Prevention at the National Cancer Institute, and investigators and staff from the screening centers of the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, Mr. Thomas Riley and staff at Information Management Services, Inc., Ms. Barbara O’Brien and staff at Westat, Inc., and Mr. Tim Sheehy and staff at SAIC-Frederick. Most importantly, we acknowledge the study participants for their contributions to making this study possible.
Control samples were genotyped as part of the Cancer Genetic Markers of Susceptibility (CGEMS) prostate cancer scan were supported by the Intramural Research Program of the National Cancer Institute. The datasets used in this analysis were accessed with appropriate approval through the dbGaP online resource (http://cgems.cancer.gov/data_access.html) through dbGaP accession number 000207v.1p1.c1 (National Cancer Institute 2009; Yeager et al. 2007). Control samples were also genotyped as part of the GWAS of Lung Cancer and Smoking. Funding for this work was provided through the National Institutes of Health, Genes, Environment and Health Initiative [NIH GEI] (Z01 CP 010200). The human subjects participating in the GWAS are derived from the Prostate, Lung, Colon and Ovarian Screening Trial and the study is supported by intramural resources of the National Cancer Institute. Assistance with genotype cleaning, as well as with general study coordination, was provided by the Gene Environment Association Studies, GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NHI GEI (U01 HG 004438). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number ph000093.v2.p2.c1.
WHI: The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, 44221, and 268200764316C.
The authors wish to acknowledge Jacques Rossouw, Shari Ludlam, Joan McGowan, Leslie Ford, and Nancy Geller at the (National Heart, Lung, and Blood Institute, Bethesda, Maryland); the following Clinical Coordinating Center investigators: Kooperberg (Fred Hutchinson Cancer Research Center, Seattle, WA) Ross Prentice, Garnet Anderson, Andrea LaCroix, Charles Kooperberg, (Medical Research Labs, Highland Heights, KY) Evan Stein, and (University of California at San Francisco, San Francisco, CA) Steven Cummings; and (Wake Forest University School of Medicine, Winston-Salem, NC) Sally Shumaker with the Women’s Health Initiative Memory Study.
In addition, we wish to acknowledge the following Clinical Center investigators: Sylvia Wassertheil-Smoller (Albert Einstein College of Medicine, Bronx, NY); Haleh Sangi-Haghpeykar (Baylor College of Medicine, Houston, TX); JoAnn E. Manson (Brigham and Women’s Hospital, Harvard Medical School, Boston, MA); Charles B. Eaton (Brown University, Providence, RI); Lawrence S. Phillips (Emory University, Atlanta, GA); Shirley Beresford (Fred Hutchinson Cancer Research Center, Seattle, WA); Lisa Martin (George Washington University Medical Center, Washington, DC); Rowan Chlebowski (Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA); Erin LeBlanc (Kaiser Permanente Center for Health Research, Portland, OR); Bette Caan (Kaiser Permanente Division of Research, Oakland, CA); Jane Morley Kotchen (Medical College of Wisconsin, Milwaukee, WI); Barbara V. Howard (MedStar Research Institute/Howard University, Washington, DC); Linda Van Horn (Northwestern University, Chicago/Evanston, IL); Henry Black (Rush Medical Center, Chicago, IL); Marcia L. Stefanick (Stanford Prevention Research Center, Stanford, CA); Dorothy Lane (State University of New York at Stony Brook, Stony Brook, NY); Rebecca Jackson (The Ohio State University, Columbus, OH); Cora E. Lewis (University of Alabama at Birmingham, Birmingham, AL); Cynthia A. Thomson (University of Arizona, Tucson/Phoenix, AZ); Jean Wactawski-Wende (University at Buffalo, Buffalo, NY); John Robbins (University of California at Davis, Sacramento, CA); F. Allan Hubbell (University of California at Irvine, CA); Lauren Nathan (University of California at Los Angeles, Los Angeles, CA); Robert D. Langer (University of California at San Diego, LaJolla/Chula Vista, CA); Margery Gass (University of Cincinnati, Cincinnati, OH); Marian Limacher (University of Florida, Gainesville/Jacksonville, FL); J. David Curb (University of Hawaii, Honolulu, HI); Robert Wallace (University of Iowa, Iowa City/Davenport, IA); Judith Ockene (University of Massachusetts/Fallon Clinic, Worcester, MA); Norman Lasser (University of Medicine and Dentistry of New Jersey, Newark, NJ); Mary Jo O’Sullivan (University of Miami, Miami, FL); Karen Margolis (University of Minnesota, Minneapolis, MN); Robert Brunner (University of Nevada, Reno, NV); Gerardo Heiss (University of North Carolina, Chapel Hill, NC); Lewis Kuller (University of Pittsburgh, Pittsburgh, PA); Karen C. Johnson (University of Tennessee Health Science Center, Memphis, TN); Robert Brzyski (University of Texas Health Science Center, San Antonio, TX); Gloria E. Sarto (University of Wisconsin, Madison, WI); Mara Vitolins (Wake Forest University School of Medicine, Winston-Salem, NC); Michael S. Simon (Wayne State University School of Medicine/Hutzel Hospital, Detroit, MI).