Introduction

The insulin-like growth factor (IGF) pathway is involved in normal growth and putatively colorectal tumorigenesis. In support of this, blood levels of IGF-related factors have been associated with CRC risk1. In addition, an increased IGF-1 level, the main growth factor in adult life, has been associated with hyperinsulinaemia, which may result in insulin resistance and, ultimately, type 2 diabetes mellitus2. Type 2 diabetics have been shown to be at an increased risk of CRC3. Furthermore, since hyperinsulinaemia can stimulate the production of IGF-1, adiponectin and peroxisome proliferator-activated receptor gamma may be considered, as these factors have been associated with glucose and lipid homeostasis, insulin resistance and compensatory hyperinsulinaemia2,4,5.

A genetic predisposition to CRC regarding IGF-related factors would substantiate a role for the IGF pathway in colorectal tumorigenesis. Indeed, genetic variants in genes encoding for IGF-related factors have been associated with CRC risk in several studies6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28, but integration of genetic information from across genes in the IGF pathway is lacking. This is important, because most single SNPs confer only minor risks and because gene-gene(-environment) interactions29 and functional compensation between genes may exist30. Therefore, we set out to add to the existing molecular epidemiological evidence by using a genetic sum score of unfavorable alleles, which also allows quantifying sex- and subsite-specific CRC risks with optimal power. Quantifying sex- and subsite-specific risks is important, because CRC is a heterogeneous disease, with risk factors differing between men and women and for different anatomical subsites31.

Previous studies have shown genetic sum scores to be successful for investigating multiple SNPs at once in relation to carcinogenic biomarkers [e.g.32,33,34], cancer risk [e.g.35,36,37,38,39] and cancer survival [e.g.40,41,42,43]. As opposed to many of these studies, we used the literature to define unfavorable alleles in terms of those potentially increasing CRC risk. We required that SNPs had been significantly associated with a selected endpoint at least twice or with >1 selected endpoints in the literature (an exception was made for missense variants). Unless SNPs were equivocally associated with selected endpoints across studies, unfavorable alleles were aggregated into a genetic sum score. Our literature-based strategy avoided overfitting of the cumulative model due to potential false-positive findings in a single dataset. We conservatively refrained from weighing SNPs in the score according to their associated effect size, because the effect of a single SNP is likely dependent on other SNPs and environmental factors [gene-gene(-environment) interactions].

Our hypothesis was that carrying more unfavorable alleles in genes related to the IGF pathway, as indicated by a higher genetic sum score, increases CRC risk. Our study population was the Netherlands Cohort Study, which includes 120,852 men and women. In addition to SNPs, we also studied an IGF1 19-CA repeat polymorphism that has been associated with CRC risk in several studies, though not consistently7,9,10,13,23,24,25,28.

Results

Baseline characteristics

Genotype and allele frequencies, the P-value for Hardy-Weinberg Equilibrium (HWE) and the unfavorable allele as indicated by literature are shown in Table 1. All SNPs adhered to HWE in subcohort members, except SNP rs1342387. We did not exclude SNP rs1342387, because all SNPs were genotyped simultaneously and there was no indication of genotyping errors. Unfavorable alleles for 18 SNPs in genes related to the IGF pathway were aggregated into a genetic sum score, as the literature was unequivocal about the direction of the association for these SNPs. Tertiles of the genetic sum score ranged from 6–14, 15–18 and 19–29 unfavorable alleles. The theoretical maximum was 36. 134 subcohort members and 120 CRC cases could not be categorized into one of the tertiles because of missing SNP data (one SNP was missing at most; this was the case in a total of 356 subcohort members and 311 CRC cases). The IGF1 19-CA repeat, for which we distinguished between individuals carrying 19/19, 19/non-19 and non-19/non-19 CA repeats, was not in HWE in the subcohort, when taking into account the multiallelic character of this locus (P-value < 0.001), although deviations may arise due to the presence of rare alleles and genotypes, which is the case in our population.

Table 1 Minor allele and genotype frequencies of single nucleotide polymorphisms in genes related to the IGF pathway in subcohort members from the Netherlands Cohort Study (1986–2002)a.

Lifestyle characteristics and dietary habits in men and women according to tertiles of the genetic sum score are shown in Table 2. Men in different tertiles of the genetic sum score did not differ on any of the baseline characteristics. Women in higher tertiles of the genetic score were more likely to drink more alcohol, consume more meat and have a higher total energy intake (P-values < 0.05); there were no differences in age, family history of CRC, diabetes after age 30, anthropometric measures, physical activity levels, smoking status and meat intake.

Table 2 Baseline characteristics of male and female subcohort members in the Netherlands Cohort Study (1986–2002), according to tertiles of the genetic sum score of unfavorable alleles in genes related to the IGF pathway.

Genetic sum score

Table 3 shows the associations of the genetic sum score in tertiles with CRC risk by subsite in men and women as derived from age-adjusted Cox models. The findings in Table 3 show that an accumulation of unfavorable alleles in the IGF pathway may increase CRC risk in men but not women. Specifically, we observed dose-response relationships between the genetic sum score and CRC risk at all subsites, except the rectum, in men. Men in the highest versus lowest tertile were at an ~40% increased risk [hazard ratio (HR) for CRC = 1.36, 95% CI: 1.11, 1.65, P-trend = 0.002; HR for colon cancer = 1.39, 95% CI: 1.11, 1.74, P-trend = 0.004; HR for proximal colon cancer = 1.34, 95% CI: 1.00, 1.79, P-trend = 0.06; HR for distal colon cancer = 1.48, 95% CI: 1.12, 1.94, P-trend = 0.006]. The genetic sum score was not associated with CRC risk by subsite in women, although a trend towards an increased rectal cancer risk was observed (middle and highest versus lowest tertile: HR = 1.28, 95% CI: 0.85, 1.91 and HR = 1.50, 95% CI: 0.98, 2.28, respectively; P-trend = 0.06). Models, in which we estimated the risks associated with the number of unfavorable alleles as a continuous variable, showed hazard ratios to be significantly increased with 3–4% for each additional unfavorable allele at all CRC subsites, except the rectum, in men but not women. For interpretability, this translates into 30–40% increased CRC risks per 10 additional unfavorable alleles under the assumption of linearity.

Table 3 Age-adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) for colorectal cancer endpoints in relation to the genetic sum score of unfavorable alleles in genes related to the insulin-like growth factor pathway in men and women in the Netherlands Cohort Study (1986–2002).

Single SNPs were not associated with CRC risk, except the IGF1 SNP rs5742694 in men (Table 4). Considering this finding and that IGF1 is central in the IGF pathway, we examined whether IGF1 SNPs were drivers of the associations observed with respect to the genetic sum score. We modeled a genetic sum score that did not include IGF1 SNPs. The model was adjusted for the excluded variants to make sure effects were independent of these SNPs. This simultaneously provided a check as to whether the LD between SNPs in the IGF1 gene influenced results, as there was higher LD between these SNPs (median: 0.650; range: 0.327–0.872) than between the other SNPs (median: 0.306; range: 0.090–0.753). Results showed no essential differences (data not shown).

Table 4 Age-adjusted Hazard Ratios (HR) and 95% Confidence Intervals (CI) for Colorectal Cancer in Relation to SNPs in Genes in the Insulin-like Growth Factor Pathway in Men and Women in the Netherlands Cohort Study (1986–2002).

Models including separate genetic sum scores for 1) SNPs in genes encoding for factors in or regulatory to the IGF pathway and 2) SNPs in genes encoding for adiponectin, adiponectin receptors and the peroxisome proliferator-activated receptor gamma, did not yield essentially different results as compared to results from models including the overall genetic sum score (data not shown). Additional adjustment for the number of SNPs and genes, respectively, underlying the genetic sum score attenuated the previously increased proximal colon cancer risk in men, rendering this risk nonsignificant (data not shown). Only when we adjusted for the number of SNPs underlying the genetic sum score did we observe changes in results in women, i.e. we observed a significantly increased colon cancer risk, particularly for the distal colon (data not shown). When we modelled the number of genes with unfavorable alleles in a continuous fashion, we observed a 42–48% increased CRC risk at all subsites in both men and women (Supplementary Table 2).

IGF1 19-CA repeat

Table 3 also shows the associations of the IGF1 19-CA repeat polymorphism using the categorization by Rosen et al.44 and using a sum of repeats on both chromosomes with CRC risk by subsite in men and women as derived from age-adjusted Cox models. The findings in Table 3 show that variant repeat alleles may decrease CRC risk in women but not men. Specifically, using the categorization by Rosen et al.44, the IGF1 19-CA repeat was not associated with CRC risk in men. In women, there was evidence of dose-response relationships with CRC risk at all subsites, except the rectum. CRC risk was about halved in women homozygous for variant (non-19-CA) repeat alleles versus women homozygous for the wild type (19-CA) repeat allele (HR for CRC = 0.54, 95% CI: 0.42, 0.70, P-trend <0.001; HR for colon cancer = 0.50, 95% CI: 0.38, 0.65, P-trend < 0.001; HR for proximal colon cancer = 0.48, 95% CI: 0.34, 0.67, P-trend < 0.001 and HR for distal colon cancer = 0.52, 95% CI: 0.36, 0.75, P-trend = 0.001). A model including the sum of repeats on both chromosomes, using individuals carrying 38 CA repeats as the reference group, revealed that risk reductions, including a reduced rectal cancer risk, were present in women carrying less than 38 CA repeats, but not more than 38 CA repeats. Exclusion of individuals not homozygous for the wild type allele (19 CA repeats) from the reference category, slightly strengthened associations (data not shown).

Discussion

Current data showed that an accumulation of unfavorable alleles with respect to SNPs in genes related to the IGF pathway increased CRC risk at all subsites, except the rectum, in men. Single SNPs (except one) were not associated with CRC risk, underlining the importance of integrating SNP information across genes in a pathway. This study builds on a number of previous studies on SNPs in genes related to the IGF pathway and CRC risk6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,27,28 and provides further evidence for the involvement of the IGF pathway in colorectal tumorigenesis. Findings between studies are difficult to compare due to that different sets of SNPs and mostly single SNP effects were studied. However, few studies have shown such clear dose-response relationships as the present one.

With respect to the IGF1 19-CA repeat polymorphism, we observed a reduced CRC risk to be associated with variant alleles in women when using the classical categorization by Rosen et al.44 A risk reduction at all subsites was present in women with less than 38 repeats but not in women with more than 38 repeats on both chromosomes together when we distinguished between individuals carrying less or more than 38 repeats. This is the most common total number of repeats, which in the majority of individuals corresponds to being homozygous for the wild type allele. We hypothesized in our methods section that the number of repeats may influence CRC risk differently. The categorization of individuals with fewer and more than 19 CA repeats (the wild type allele) in the same category—as is done in the classical categorization—may have yielded both increased and decreased risks in previous studies for variant alleles depending on the distribution of the number of repeats in a particular study population9,23,24,25. Our results support our hypothesis that the number of repeats may matter and future studies are encouraged to model the total number of IGF1 19-CA repeats to elucidate this further.

Next, some methodological considerations associated with the use of a genetic sum score should be made. First, an advantage of a genetic sum score is that no explicit assumption on the inheritance mode of the SNPs is necessary (recessive/dominant/additive). That is, if one treats all SNPs according to an additive inheritance mode and one assumes that one is not more likely to include SNPs adhering to a recessive than a dominant inheritance mode, the misclassification associated with assuming an additive inheritance mode is likely to cancel out in the sum score. To explain this, consider the example of a genetic sum score for two SNPs, of which one SNP adheres to a recessive inheritance mode and the other adheres to a dominant inheritance mode. For both SNPs, we assume an additive inheritance mode. If an individual is heterozygous for both SNPs and we aggregate the number of unfavorable alleles, we arrive at a sum score of 2. Had we known that one SNP adhered to a recessive inheritance mode, meaning that one unfavorable allele does not influence CRC risk, we should have coded the heterozygous genotype the same as the homozygous genotype for the other allele, i.e. ‘0’ (instead of ‘1’). Had we known that the second SNP adhered to a dominant inheritance mode, meaning that carrying one or two unfavorable allele(s) influences CRC risk in the same way, we should have coded the heterozygous genotype the same as the homozygous genotype for the unfavorable allele, i.e. ‘2’ (instead of ‘1’). If we aggregate these two scores, we arrive at a sum score of 2, which is the same score as the score arrived at when assuming an additive inheritance mode.

Second, a particular genetic sum score may influence CRC risk differently depending on the number of SNPs and genes, respectively, underlying the sum. For example, a score of 10 can be achieved by being heterozygous for 10 SNPs or by being homozygous for the unfavorable allele for five SNPs (or any combination in between). Likewise, a sum of 10 can correspond to carrying unfavorable alleles in five, six, seven, eight, nine, or ten different genes. Both types of adjustment in our dataset influenced results, indicating that the number of SNPs and genes underlying the genetic sum score may be relevant weighing factors. Furthermore, our finding that the number of genes with unfavorable alleles by itself increased CRC risk at all subsites with 42-48% in both men and women, suggests that an accumulation of unfavorable alleles across genes in a pathway may be particularly important for influencing CRC risk.

Third, as mentioned in the introduction, SNP effects may differ, which is why the effect size for individual SNPs has been used as a weight in genetic sum scores. As explained, we refrained from doing so, because the strength of single SNP effects may depend on other SNPs and environmental factors. Our approach was conservative, because it remains to be seen whether a better estimation of risk is achieved when weighing a genetic sum score according to single SNP effects and/or the number of SNPs or genes underlying the score as suggested in the previous paragraph. This should ideally be investigated using simulated data in which true SNP effects and interaction patterns are known. However, it is important to realize that our conservative approach, in which SNPs are not weighed, only holds under the assumption that single SNPs have similar effects on risk. This may be a reasonable assumption in our data, considering that all included SNPs were common SNPs (MAF > 5%), which may be hypothesized to have modest effects on common cancers like CRC45. As mentioned in the methods section, we did not include the IGF1 19-CA repeat polymorphism, because it is a different type of variant than a SNP for which this assumption is less likely. Still, misclassification of individuals on the genetic sum score can never be excluded and since this misclassification was likely independent of disease, it most probably attenuated results.

Finally, it may matter whether or not unfavorable alleles within a gene were located on the same parental chromosome. This predominantly applies to individuals who carry heterozygous genotypes. We were unable to explore a potential influence on risk of this, because we could not determine the chromosomal origin (paternal or maternal) of alleles.

Despite these considerations, the use of a genetic sum score for SNPs was successful in this study. Strengths of this study include the prospective design and long follow-up with large case numbers. A limitation may be that only ~75% of the cohort returned toenail clippings. However, comparison of subcohort members with and without available toenail material on baseline characteristics did not indicate this to be a selective group (data not shown).

Future directions include investigating whether an accumulation of unfavorable alleles in genes related to the IGF pathway modifies the increased CRC risk associated with overweight and a lack of physical activity. Overweight and physical activity has been associated with blood levels of IGF-related factors1 and are likely candidates to interact with genetic variants in genes related to the IGF pathway. Previous gene-environment interaction (GxE) studies on this, however, have been inconsistent25,46,47. In future GxE studies, the use of a genetic sum score will be advantageous, particularly its associated optimization in power, because the detection of a statistical significant interaction has been estimated to require a four times larger sample size as compared to the detection of a marginal effect of similar magnitude48. The results of GxE studies can contribute to the evidence base underlying the development of targeted CRC prevention strategies aimed at modifying diet and lifestyle. Genetic sum scores, in this regard, might be useful variables for risk stratification.

In conclusion, an accumulation of unfavorable alleles increased CRC risk in men, whereas a decreased total number of IGF1 19-CA repeats reduced CRC risk in women. These findings provide further evidence for the involvement of the IGF pathway in colorectal tumorigenesis. That single SNPs were not associated with CRC risk underlines the importance of integrating SNP information across genes in a pathway.

Materials and Methods

Study population and design

The Netherlands Cohort Study (NLCS) includes 120,852 men and women who were between 55–69 years old in 1986, when completing a self-administered baseline questionnaire on diet and cancer. Participants originate from the general population in the Netherlands and were sampled via the municipal population registries. The NLCS has been described in detail previously49. Along with the questionnaire, participants were asked to return toenail clippings by way of an enclosed envelope. ~90,000 participants provided toenail clippings, which is a valid DNA source for the genotyping of germline genetic variants50. Toenail DNA isolation is performed according to the DNA extraction protocol of Cline et al.51 with some adjustments50. The NLCS, the use of toenail DNA for genotyping and associated protocols were approved by the review boards of the TNO Nutrition and Food Research Institute (Zeist, the Netherlands) and Maastricht University (Maastricht, the Netherlands). All methods were carried out in accordance with the approved guidelines.

The NLCS is characterized by a case-cohort design, which entails that a random subcohort (n = 5,000), selected immediately after baseline, is followed up to estimate the accumulated person-time at risk, whereas incident cancer cases are enumerated for the entire cohort. Follow-up for vital status is performed through linkage to the Central Bureau of Genealogy and the municipal population registries (~100% completeness). Cancer follow-up is performed through linkage with the population-based cancer registry and PALGA (the Netherlands pathology database) (>96% completeness)52,53. After 16.3 years and exclusion of participants with a history of cancer other than skin cancer at baseline, there were 4,774 subcohort members and 3,440 incident CRC cases. Toenail DNA was available for 3,768 of these subcohort members (78.9%) and 2,580 of these CRC cases (75.0%); 114 subcohort members who developed CRC during follow-up were included in both counts, leaving a total of 6,234 unique toenail DNA samples.

Gene and SNP selection

Our gene and SNP selection strategy was literature-based and ultimately intended for studying GxE interactions between genetic variants in genes related to the IGF pathway and body size, physical activity and energy restriction. We selected genes in or regulatory to the IGF pathway (i.e. IGF1, IGF2, IGF1R, IGF2R, IRS1, IRS2, IGFBP1-7, IGFALS, GH1, GHR, GHRH and GHRHR), genes related to adiponectin (i.e. ADIPOQ, ADIPOR1, ADIPOR2) and the peroxisome proliferator-activated receptor gamma gene (PPARG). We accepted that our literature-based SNP selection strategy may neglect false negative findings in the literature as we primarily aimed at replicating previous findings and quantifying sex- and subsite-specific CRC risks through the use of a genetic sum score.

Our SNP selection strategy consisted of four steps. In step 1, we searched Pubmed for studies on SNPs in the selected genes in relation to the following endpoints: i) the risk of CRC; ii) traits of interest in the context of future GxE work, i.e. obesity, insulin resistance and blood levels of IGF pathway-related factors; iii) the risk of type 2 diabetes mellitus; and iv) the risk of other obesity-related cancers. We also searched for hits in these genes in genome-wide association studies on v) CRC, vi) type 2 diabetes mellitus and vii) traits of interest as described under ii). The search yielded 381 SNPs with a reported rs-number. We carried forward SNPs with a >10% prevalence of the rare homozygous and heterozygous genotypes as reported in CEU individuals from the Hapmap project to step 2 (n = 275).

In step 2, SNPs were scored on points i through vii. If no association was reported in relation to a specific endpoint, SNPs were assigned a selection score of ‘0’ for that endpoint; if found associated once, SNPs were assigned a selection score of ‘1’; if found associated at least twice, SNPs were assigned a selection score of ‘2’. We chose not to assign selection scores of ‘3’ or higher when SNPs were found associated with a specific endpoint in more than two studies, because this might simply lead to the prioritization of SNPs that have often been investigated (see step 3).

In step 3, selection scores across points i–vii were summed and used to rank SNPs. In order to minimize the chance of selecting SNPs based on false-positive results, SNPs with a sum score of less than two were excluded (n = 222), with the exception of four missense SNPs. One SNP of the remaining 53 SNPs was in perfect LD with another SNP (r2 = 1). A second SNP failed in a pilot that preceded this study and could not be replaced with a perfect proxy (r2 = 1). These two SNPs were therefore excluded.

Step 4 concerned the assay design using the iPLEXTM assay for genotyping on the SEQUENOM® MassARRAY® platform (Sequenom Inc., Hamburg, Germany). This platform allows high-throughput genotyping of maximally 40 SNPs at once. Considering that multiplexing is often not 100% efficient due to sequence incompatibilities between the sequences flanking the SNPs, we designated the 20 highest ranked SNPs as high-priority SNPs (all had a total selection score of ≥3). The remaining 31 low-priority SNPs were used for ‘superplexing’, i.e. given a fixed, optimal design as based on the set of high-priority SNPs, these SNPs were added to the design if possible. In total, 25 SNPs in 9 genes could be included in the assay, of which 15 high-priority and 10 low-priority SNPs.

SNP genotyping

The protocol for genotyping on the SEQUENOM® MassARRAY® platform has been described previously54 and was carried out using 100 ng of toenail DNA, pipetted into 384-well plates. Included were duplicate samples for a random selection of 314 samples and 436 water controls. Twenty-four out of the 25 SNPs in the assay were successfully genotyped. Genotyping of SNP rs35767 failed as only the C-allele was found. The reproducibility of genotypes was 98.8% or higher for the different SNPs. Exclusion of possibly contaminated samples as indicated by our laboratory technicians (n = 4), samples with irreproducible results (n = 1) and samples with a call rate <95% (n = 532, 8.5%) resulted in 5,697 samples. All SNPs had call rates of 92.6% or higher, except SNP rs4773082, which had a call rate of 83.6%.

Genotyping of the IGF1 19-CA repeat polymorphism

The IGF1 19-CA repeat polymorphism was genotyped by PCR amplification and subsequent analysis of the PCR products’ length using the 96-capillary ABI 3730xl DNA Analyzer. The PCR was carried out using 100 ng of DNA, 10.75 μl MilliQ, 2.5 μl 10x PCR buffer, 0.875 μl of 50 mM MgCL2, 2 × 0.125 μl of Primer predilution-mix (10 times diluted), 0.5 μl of 10 mM dNTP mix and 0.125 μl of Platinum Taq Polymerase (Life Technologies, Bleiswijk, the Netherlands). The primers (forward: 5′-ACCACTCTGGGAGAAGGGTA-3′; reverse: 5′-GCTAGCCAGCTGGTGTTATT-3′) were fluorescently labelled with 6-FAM (blue), NED (yellow) and PET (red), which enabled the simultaneous analysis of three samples in a single run on the ABI 3730xl DNA Analyzer. The protocol was carried out in the dark because of the light-sensitivity of the fluorescent labels. The PCR reactions were performed using the following cycles: 94 °C for 10 min, followed by 35 cycles of 94 °C for 30 sec, 55 °C for 30 sec and 72 °C for 30 sec, followed by 72 °C for 10 min and 4 °C for 30 min. The analysis included 314 duplicate samples and 436 water controls. The reproducibility of the IGF1 19-CA repeat analysis was 93.6%. Genotyping was successful for 70.7% of samples.

Statistical analysis

Hazard ratios and 95% confidence intervals were estimated using age-adjusted Cox regression models. We conservatively refrained from including other CRC risk factors as covariates in the model, particularly indicators of body size which may have a genetic basis associated with the IGF pathway, because adjusting for covariates with a potential genetic basis may unintentionally introduce (collider) bias in genetic association studies55. Participants with inconsistent/incomplete baseline questionnaires were excluded in order to relate genotyping results to baseline characteristics and to keep numbers comparable with those in future GxE studies within the NLCS. This left 3,203 subcohort members and 2,274 CRC cases with SNP data and 2,134 subcohort members and 1,833 CRC cases with data on the IGF1 19-CA-repeat. To calculate a genetic sum score, we aggregated unfavorable alleles for SNPs, unless the literature was equivocal about the direction of the association with selected endpoints. Alleles were considered ‘unfavorable’ if these increased the risk of selected diseases [CRC, type 2 diabetes mellitus and other obesity-related cancers (i.e. cancers of the oesophagus, pancreas, gallbladder, breast (in postmenopausal women), endometrium and kidney56)], or if these were associated with selected traits in a manner that may increase CRC risk (overweight, obesity, insulin resistance and blood levels of IGF pathway-related factors). In a two-SNP example, an individual heterozygous for one SNP and homozygous for the unfavorable allele on the other SNP would receive a sum score of 1 + 2 = 3. The genetic sum score was categorized into tertiles as based on the distribution in the subcohort. Tertiles enabled tests for a linear trend, while maintaining optimal power within categories. Categorization of the genetic sum score allows for better interpretability of the results, considering that the human genome consists of millions of SNPs and considering that our inclusion of potentially relevant SNPs was not exhaustive. However, for completeness, we also modelled the genetic sum score in a continuous fashion. SNPs not included in the genetic sum score were evaluated when analyzing single SNPs.

Single SNPs were analyzed assuming an additive inheritance mode and only in relation to the overall CRC risk in men and women, because of power considerations. We furthermore conducted four sensitivity analyses. First, we modeled a sum score only including SNPs in genes encoding for factors in or regulatory to the IGF pathway and a sum score only including SNPs in genes encoding for adiponectin, adiponectin receptors and peroxisome proliferator-activated receptor gamma, because these factors may be conceptually different. Second and third, we additionally adjusted our model that included the genetic sum score for the number of SNPs and genes, respectively, underlying an individual’s score; these variables might turn out relevant weighing factors. Fourth, we modeled the number of genes in which unfavorable alleles were present to explore whether an accumulation of unfavorable alleles across genes may be important for influencing risk.

The IGF1 19-CA repeat polymorphism was categorized according to Rosen et al.44, distinguishing individuals homozygous for the wild type allele (19/19 CA repeats), heterozygous individuals (19/non-19 CA repeats) and individuals homozygous for variant alleles (non-19/non-19 CA repeats). We did not include the IGF1 19-CA repeat polymorphism in the genetic sum score because it is a conceptually different type of variant than a SNP. This means that the assumption that all variants in the genetic sum score have a similar weight is less assured for this variant. In light of that previous studies showed increased9,24,25 and decreased CRC risks23 for variant alleles, we hypothesized that the number of repeats may influence CRC risk differently: i.e., categorizing individuals with fewer and more than 19 CA repeats (the wild type allele) in the same category may have led to qualitatively different observations depending on the distribution of the variant alleles in a particular study population. To explore this, we considered a model in which the number of repeats on both chromosomes was aggregated. Essentially, this yielded another sum score which was analyzed categorically. Individuals with 38 repeats—most of which were homozygous for the wild type allele (19 CA repeats)—were taken as the reference group.

In all analyses, standard errors were estimated using the robust Huber-White sandwich estimator to account for the additional variance introduced by sampling the subcohort from the entire cohort. The proportional hazards assumption was tested using the scaled Schoenfeld residuals and by visually inspecting the -log-log-transformed hazard curves (there were no apparent violations). All analyses were conducted using Stata version 12 (Stata Corp., College Station, TX). Statistical significance was indicated by a P-value < 0.05 for two-sided testing. We did not correct for multiple testing because our study was hypothesis-based and because our use of a genetic sum score of unfavorable alleles (the primary mode of analysis) greatly reduced the number of tests that had to be performed.

Additional Information

How to cite this article: Simons Colinda, C.J.M. et al. Genetic Variants in the Insulin-like Growth Factor Pathway and Colorectal Cancer Risk in the Netherlands Cohort Study. Sci. Rep. 5, 14126; doi: 10.1038/srep14126 (2015).