Introduction

Altered cell metabolism is considered a cancer hallmark associated with malignant cell growth. [1] Cell metabolism is influenced by the mammalian target of rapamycin (mTOR)-phosphatidylinositide 3-kinases (PI3K)-Akt pathway, which could therefore influence cancer development. In particular signaling by mTOR complex 1 (mTORC1) influences cell growth and survival via control of protein synthesis, autophagy, lipid synthesis, and mitochondrial metabolism. [2] Cellular energy status itself regulates mTOR-PI3K-Akt signaling, as do growth factors, stress, and nutrients. [2].

Genetic variation in the mTOR-PI3K-Akt pathway, which captures natural variation in the mTOR-PI3K-Akt pathway in the population, has been associated with cancer risk across organ sites. [3,4,5,6,7,8,9,10,11,12] Differences in associations between cancers may exist as, for example, MTOR rs2295080, a promotor variant associated with transcription [10] and mRNA expression [5] was oppositely associated with leukemia risk than with risk of other cancers. [7, 12, 13] To our knowledge, only one study investigated a potential interaction between MTOR rs2295080 and other variants in the mTOR-PI3K-Akt pathway and a diet risk score, showing evidence of interaction. [14].

Our aim was to extend on the existing evidence by studying mTOR-PI3K-Akt pathway genetic variation in relation to CRC risk and by investigating potential effect modification of mTOR-PI3K-Akt pathway genetic variation on associations between energy balance-related factors (body mass index, trouser/skirt size, height, physical activity, and early life energy restriction) and CRC risk in the large, prospective Netherlands Cohort Study. A higher body mass index, tallness, and a lack of physical activity are established CRC risk factors [15] which are thought to be associated with a positive energy balance and increased mTOR-PI3K-Akt signaling, stimulating malignant growth. Energy restriction during childhood and adolescence may favorably influence mTOR-PI3K-Akt signaling and could lower the potential for malignant growth. [2] Therefore, if we can show that the CRC risk conferred by these energy balance-related factors depends on genetic variation in the mTOR-PI3K-Akt pathway, which reflects core variation in the population, this provides evidence for that the mTOR-PI3K-Akt pathway is a mechanism that underlies associations between energy balance-related factors and CRC risk.

To achieve our aim, we generated sex-specific polygenic risk scores, capturing multiple polymorphisms in one variable. We generated the scores by splitting the dataset in two halves and only including polymorphisms which showed the same direction of association in relation to CRC risk in both datasets, as effect alleles could not be defined based on literature or existing genome-wide association studies (GWAS). We weighted the polymorphisms in the scores with the standard error weighted regression coefficients from the other set. The scores were then standardized and the scores and data were merged back together again, after which Cox hazard ratios for CRC were estimated for the scores (modeled in tertiles and continuously) and for the energy balance-related factors (modeled categorically) within tertiles of the scores.

Results

Baseline characteristics

A flow diagram leading up to the number of subcohort members and cases available in the NLCS for the current analyses is shown in Fig. 1. The polygenic risk score in men was made up of the following 12 SNPs out of a set of 24 genotyped SNPs in 10 top-ranked genes in the mTOR-PI3K-Akt pathway (Supplemental Tables 1 and 2): MTOR rs1057079, TSC2 rs1800720, TSC2 rs2516739, PDK1 rs6723872, EIF4EBP1 rs6605631, RPS6KB2 rs12787021, AKT3 rs14403, AKT3 rs3006939, AKT3 rs946824, AKT2 rs16974157, AKT2 rs874269, and INSR rs891088. The polygenic risk score in women was made up of the following 11 SNPs out of the 24 genotyped SNPs (Supplemental Tables 1 and 2): MTOR rs2295080, TSC2 rs12918803, PDK1 rs6723872, RPS6KB2 rs12787021, AKT3 rs1352162, AKT3 rs14403, AKT3 rs7523198, AKT3 rs7523742, AKT2 rs16974157, AKT2 rs874269, and INSR rs891088. Supplemental Table 3 shows the regression coefficients, SEs, and resulting weights in the two dataset halves that were used to generate the sex-specific polygenic risk scores in each set. Supplemental Fig. 1 shows that the subcohort distributions of the standardized polygenic risk scores were similar in both sets. Since the scores were standardized, the mean equaled 0 and the SD equaled 1. The subcohort distributions of the sex-specific polygenic risk scores in the total population are shown in Fig. 2. The standardized score specific for men ranged from -2.25 to 3.70 in male subcohort members and from -2.25 to 3.55 in male CRC cases. The standardized score specific for women ranged from -1.79 to 4.70 in female subcohort members and from -1.65 to 4.54 in female CRC cases. Table 1 shows that the mean scores within tertiles were comparable between subcohort members and CRC cases in both men and women. Table 1 furthermore shows baseline characteristics of the NLCS cohort, with no major differences in the distributions of most baseline characteristics between subcohort members and CRC cases in men and women. The most notable difference between subcohort members and CRC cases was in the percentage of first-degree family history of CRC (men: 8.8% versus 5.3%, respectively; women: 9.4% versus 5.5%, respectively).

Fig. 1
figure 1

Flow diagram of subcohort members and colorectal cancer cases

Table 1 Baseline characteristics of subcohort members and CRC cases within the Netherlands Cohort Study (20.3 years of follow-up)
Fig. 2
figure 2

Histogram of the sex-specific polygenic risk scores in male and female subcohort members

Sex-specific polygenic risk scores of mTOR-PI3K-Akt polymorphisms and CRC risk

Positive associations were observed between the sex-specific polygenic risk scores and CRC risk when modeling these in tertiles and continuously (Table 2). Men had a 7% increase in CRC risk per unit increase on the polygenic risk score specific for men (HRcontinuous = 1.07, 95% CI: 1.00-1.15; HRtertile 2 vs. 1 = 1.07, 95% CI: 0.91-1.26; HRtertile 3 vs. 1 = 1.14, 95% CI: 0.97-1.35). Women had a 9% increase in CRC risk per unit increase on the polygenic risk score specific for women (HRcontinuous = 1.09, 95% CI: 1.01-1.17; HRtertile 2 vs. 1 = 0.97, 95% CI: 0.81-1.16; HRtertile 3 vs. 1 = 1.15, 95% CI: 0.97-1.38). Similar positive (borderline) statistically significant associations were observed for colon cancer risk and proximal and distal colon cancer risk in men and women. The associations between the polygenic risk scores and rectal cancer risk in men and women were positive in direction, but not statistically significant.

Table 2 Polygenic risk scores of mTOR-PI3K-Akt pathway polymorphisms in relation to CRC risk by sex and subsite in the Netherlands Cohort Study after 20.3 years of follow-up

Individual SNP-CRC risk associations are shown in Supplemental Table 4. Several statistically significant associations were observed between individual SNPs, predominantly AKT3 SNPs, and CRC risk in men and women after gene-based FDR adjustment.

Energy balance-related exposures and CRC risk: effect modification by sex-specific polygenic risk scores of mTOR-PI3K-Akt polymorphisms?

Table 3 shows the associations between BMI, trouser/skirt size, BMI at age 20, non-occupational physical activity, height, and energy restriction during childhood and adolescence and CRC risk in men and women, stratified by tertiles of the sex-specific polygenic risk scores. BMI was positively associated with CRC risk in men in the lowest tertile of the polygenic risk score specific for men; non-occupational physical activity was inversely associated with CRC risk in women in the lowest tertile of the polygenic risk score specific for women; and height was positively associated with CRC risk in men and women in the middle tertile of the polygenic risk score specific for each sex and in the lowest tertile of the polygenic risk score specific for women. No significant multiplicative interactions were observed. Analyses for subsite-specific CRC risks are shown in Supplemental Tables 5-8. In these stratified analyses for subsite-specific CRC risks, BMI was positively associated with proximal colon cancer risk in men, height was positively associated with colon, proximal colon, and distal colon cancer risk in both men and women and with rectal cancer risk in women, and non-occupational physical activity was inversely associated with colon, proximal colon, and distal colon cancer risk, with most associations observed in the lower tertiles of the polygenic risk score specific for each sex. Furthermore, one statistically significant interaction was observed between energy restriction during childhood and adolescence and the polygenic risk score specific for men in relation to distal colon cancer risk. Exposure to energy restriction during childhood and adolescence was inversely associated with distal colon cancer risk in men in the middle tertile of the polygenic risk score specific for men, while the association in the lowest tertile was positive in direction, though not statistically significant, nor was the association in the highest tertile statistically significant.

Table 3 Associations between exposures related to energy balance and CRC risk in men and women, stratified for tertiles of the sex-specific polygenic risk score of mTOR-PI3K-Akt pathway polymorphisms in the Netherlands Cohort Study (20.3 years of follow-up)

Discussion

The associations observed between the sex-specific polygenic risk scores and the risk of CRC overall, specifically colon cancer risk, suggest that the mTOR-PI3K-Akt pathway is involved in colon cancer development in both men and women. Involvement of the mTOR-PI3K-Akt pathway in rectal cancer development cannot be concluded based on the current data. There were no (multiplicative) interactions between the energy balance-related exposures studied and the polygenic risk scores specific for each sex in relation to CRC risk overall or by subsite, except for one, i.e. there was an interaction with energy restriction during childhood and adolescence in relation to distal colon cancer risk in men. However, associations within tertiles of the polygenic risk score did not provide a clear indication for a modifying effect. Overall, in the stratified analyses, we predominantly observed associations between energy balance-related exposures and CRC risk in the lower tertiles of the sex-specific polygenic risk scores, with the direction of the associations generally in line with what would be expected for these factors in relation to CRC risk based on literature [15] and with what was previously observed in the NLCS after 16.3 years of follow-up. [16, 17] However, these stratum-specific associations on their own, without (statistical) interaction present, do not form sufficient evidence for concluding that there was a modifying effect by mTOR-PI3K-Akt genetic variation on associations between energy balance-related factors and CRC risk. That said, if we allow for speculation and view these data in a broader sense, these data raise the question whether environmental factors predominate when genetic risk is low.

As regards to our findings for the polygenic risk scores and subsite-specific CRC risks, stronger involvement of the mTOR-PI3K-Akt pathway in the development of more proximally located colorectal tumors is plausible considering that higher (over)expression of Akt1, Akt2, and p-p70S6K(Thr389) genes has been reported in proximal colon tumors than distal colon tumors. [18] PTEN gene expression was also found to show a positive expression gradient towards the proximal colon, starting at the rectum. [19] Furthermore, PTEN and PIK3CA mutations are more prevalent in tumors of the proximal colon. [20] These literature findings provide some confidence in that the associations observed in this study, which suggest mTOR-PI3K-Akt involvement in colon cancer development, were not chance findings.

There are few data characterizing associations between energy balance-related exposures and CRC risk within genetic risk strata based on mTOR-PI3K-Akt pathway polymorphisms or vice versa. We specifically chose the former modulation in light of future translation of the results towards prevention, because polymorphisms are static variables and energy balance-related exposures such as BMI are modifiable; that is, a healthy BMI and physical activity level could be especially important for specific genetic risk groups. Previous studies, however, investigated cancer risks associated with carrying more risk alleles within strata of energy balance-related factors, [21,22,23,24] under the hypothesis that a positive energy imbalance activates the mTOR-PI3K-Akt pathway. These studies did not uniformly suggest that activation of the mTOR-PI3K-Akt pathway by a positive energy imbalance influences cancer risk, as some observed associations between mTOR-PI3K-Akt pathway variants and cancer risk in normal weight instead of overweight/obese individuals. [23] Meanwhile, energy balance has been shown to modulate signaling through Akt and mTOR in multiple epithelial tissues in mice, with diet-induced obesity enhancing and calorie restriction inhibiting activation. [25] The mixed observational results in the literature might be explained by differences in effect on CRC risk of the specific variants included, perhaps suggesting the importance of capturing a sufficient and representative amount of genetic variation present in the mTOR-PI3k-Akt pathway in the population. For example, one of the studies referenced above utilized both a polygenic risk score of mTOR-PI3K-Akt pathway polymorphisms and an energy balance index and found a joint effect of the two on bladder cancer risk. [21] This study, however, may have had limited power, leading to unstable (and extreme) risk estimates, as based on the case numbers and the wide confidence intervals reported. In addition, this study selected SNPs for inclusion in the risk score based on p-values for main effects and tested the risk score in the same population as in which the single SNPs were tested, which might have led to overfitting of the risk score model to the underlying data and inflation of the results. Alternatively, the mixed results in the literature in relation to CRC risk and the absence of interaction in the present study could mean that an interaction between energy balance-related exposures and genetic variation in the mTOR-PI3K-Akt pathway in relation to CRC risk is absent or not strong enough to be detected given the average statistical power achieved in a large observational cohort.

Despite the absence of (statistical) interaction between energy balance-related exposures and the polygenic risk score of mTOR-PI3K-Akt pathway polymorphisms in relation to CRC risk, one particular finding in this study is noteworthy. This is the observation that height was a colon cancer risk factor in both men and women in the lowest and middle tertiles of the polygenic risk score. Previously, after 16.3 years of follow-up, height was observed to be a colon cancer risk factor in women but not men, [16] whereas accounting for genetic variation in the mTOR-PI3K-Akt pathway appeared to remove the sex difference observed overall in our cohort. We have observed the same phenomenon when accounting for genetic variation in the insulin-like growth factor pathway. [26] The absence of a sex difference is in accordance with the literature that shows energy balance-related exposures such as BMI and height to be CRC risk factors regardless of sex. [15] Interestingly, BMI and height were colon but not rectal cancer risk factors in this study and in previous studies from the NLCS regardless of which other variables were taken into account, [16, 26] which may be a cohort-specific effect (e.g. residual confounding in this specific population), as the literature shows these factors to also be rectal cancer risk factors. [15].

The methodology used to select genes and polymorphisms in the mTOR-PI3K-Akt pathway and the methodology used to generate the sex-specific weighted polygenic risk scores of mTOR-PI3K-Akt polymorphisms deserves some further discussion. Firstly, the assumptions made to select key genes in the mTOR-PI3K-Akt pathway using the relative betweenness centrality measure may not accurately represent the biology of the mTOR-PI3K-Akt pathway. For example, it was assumed that the information flow (signals) between nodes (genes) in a pathway is undividable and always takes the shortest path. In addition, we have assumed an undirected graph (pathway), meaning the information flow between connected nodes can go both ways. These assumptions were nevertheless necessary and resulted in a list of top-ranked genes that fit with prior knowledge of key players in the mTOR-PI3K-Akt pathway, reassuring us that no major bias occurred because of a potentially inaccurate representation of the biology of the pathway. Secondly, our method of SNP selection, i.e. we selected tagging variants in order to cover as much of the genetic variation in the top-ranked genes as possible, did not immediately allow us to consider correlations of SNPs with other biological levels, such as gene or protein expression. Many selected SNPs, however, turned out to be expression quantitative trait loci (eQTLs) for the gene that they were tagging and/or other genes according to the Genotype-Tissue expression (GTex) project (https://gtexportal.org/home/; National Institutes of Health, United States). Thirdly, we were limited in the number of SNPs that we could genotype, and thus the number of genes in the mTOR-PI3K-Akt pathway that we could cover, because of budgetary constraints that allowed us to genotype only one multiplex assay. Given the genes that we covered, this may have led to insufficient coverage of genes encoding for proteins of which signaling is under the influence of a negative energy imbalance. For example, we could not include SNPs encoding for adenosine monophosphate-activated protein kinase (AMPK), which phosphorylates TSC2 in the TSC1-2 complex [2] and stabilizes the mTOR-RAPTOR bond in mTORC1 under conditions of a negative energy balance, inhibiting mTORC1 signaling. [2, 27].

Strengths of this study include that it is a large, population-based prospective cohort with long follow-up, resulting in a large number of CRC cases and making selection and information bias unlikely. Limitations include the single baseline measurement of exposures. The NLCS population has been found stable in its dietary habits, [28] but diminishing physical activity levels and changes in body composition may be inevitable with increasing age, possibly having led to attenuation of associations over time.

Conclusions

The findings of this study suggest that the mTOR-PI3K-Akt pathway may be involved in the development of colon cancer, but not rectal cancer. Energy balance-related factors were associated with CRC risk as hypothesized, mostly within the lower tertiles of the polygenic risk score specific for each sex, but there was no clear modifying effect of the scores. The relevance of this study lies in its contribution to the evidence base on mechanisms involved in colon cancer development through use of a polygenic risk score, capturing natural variation in the mTOR-PI3K-Akt pathway in the population.

Methods

Population and design

The NLCS [29] includes 120,856 men and women who completed a questionnaire on diet and cancer at baseline in 1986 when 55-69 years old. The baseline questionnaire included a 150-item semi-quantitative food frequency questionnaire, which was found to rank individuals’ dietary intake adequately as compared to a 9-day dietary record [30] and was shown a good indicator of intake for at least 5 years. [28] Approximately 75% of the cohort returned toenail clippings, which are a valid and long-term DNA source. [31, 32] The NLCS is characterized by a case-cohort approach for reasons of efficiency related to questionnaire processing, follow-up, and genotyping. A random subcohort (n=5000), selected immediately after baseline and independent of any exposure, is followed up for vital status through record linkage to the Central Bureau of Genealogy and municipal population registries (>99.9% completeness) to estimate the accumulated person-time at risk. Participants were excluded if they reported a history of cancer other than skin cancer at baseline, leaving 4774 subcohort members for follow-up (Fig. 1). The whole cohort is followed up for incident cancer cases through record linkage to the population-based cancer registry and PALGA (the Netherlands pathology database) (>96%completeness). [33, 34] The case-cohort design allows for the estimation of hazard ratios as would be done in a full cohort under the assumption that the fraction of the accumulated person-time at risk observed for exposed and unexposed individuals is equal. This is reasonable considering that the subcohort was selected independent of any exposure. The extra variance introduced by sampling the subcohort from the total cohort can be adjusted for using the robust variance estimator. [35] A detailed description of the NLCS is available in [29]. After 20.3 years of follow-up from September 1986 until the end of 2006, there were 3144 incident colon cancer cases (ICD-O-3 code C19) (among which 1623 incident proximal colon cancer cases (ICD-O-3 codes C18-C18.4) and 1430 incident distal colon cancer cases (ICD-O-3 codes C18.5-C18.7)), 427 incident rectosigmoid cancer cases (ICD-O-3 code C20), and 1026 incident rectal cancer cases (ICD-O-3 code C21), totaling to 4597 incident CRC cases (Fig. 1).

Baseline information

Baseline information included height (cm) and weight (kg) used to derive body mass index (kg/m2) (BMI) (reflecting body fatness), trouser/skirt size (Dutch clothing sizes) which is used as a marker for waist circumference (reflecting abdominal fatness when adjusting for BMI), weight at age 20 used together with height to derive BMI at age 20 (kg/m2), and energy restriction during childhood and adolescence as based on place of residence during the Dutch Hunger Winter in 1944-45. Self-reports on weight and height have been shown valid measures in other cohort studies with >10 years of follow-up. [36, 37] Trouser/skirt size correlated with hip and waist circumferences in a subset of weight-stable NLCS men (r=0.63 and 0.64, respectively) and women (r=0.78 and 0.71, respectively) and was associated with endometrial and renal cancer risk in a fashion as would be expected for waist circumference. [38] BMI and height measures were divided into sex-specific tertiles based on the distribution in the subcohort. Trouser/skirt size was split into two sex-specific categories based on the median in the subcohort. Information on non-occupational physical activity in categories of ≤30, 30-60, >60 min of physical activity per day was a sum measure of daily walking/cycling (min/day), weekly recreational walking/cycling, weekly gardening/doing odd jobs, and weekly sports/gymnastics (categories: never, 1, 1-2, >2 h/week). More details on energy restriction during childhood and adolescence as measured in the NLCS are available in [39]. Baseline information on relevant covariates in diet and lifestyle was also available from the baseline questionnaire.

DNA isolation and genotyping

Toenail clippings were stored without further treatment or climate control of the storage room. The DNA isolation protocol has been described in [31] and [32]. DNA isolated from toenail material was stored at -30 °C at the BioBank Maastricht University Medical Center+ (Maastricht, the Netherlands). Toenail DNA is suitable for genotyping on the Agena BioScience MassARRAY® platform (Hamburg, Germany), allowing the genotyping of 36-40 SNPs at once, although, in practice, not all SNPs can be combined due to sequence incompatibilities between the sequences flanking the SNPs.

Gene and SNP selection

We identified 10 top ranked genes in the mTOR-PI3K-Akt pathway according to their relative betweenness centrality, which provides an indication of the strength of node involvement in the information flow through a network: MTOR (alias: FRAP1), TSC2, PDPK1 (alias: PDK1), EIF4EBP1 (alias: 4EBP1), IRS1, RPS6KB1 (alias: S6K1), RPS6KB2 (alias: S6K2), AKT3, AKT1, and AKT2 (Supplemental Table 1). The Kyoto Ecyclopedia of Genes and Genomes (KEGG) mTOR signaling (map04150) was used as input (http://www.genome.jp/kegg/) (R software, version 3.2.2, KeggGraph package). Since there were no SNPs in these genes associated with colorectal cancer risk at a significance level of p<1*10−5 in GWAS (https://www.ebi.ac.uk/gwas/), we selected tagging single nucleotide polymorphisms (tagSNPs) at a minor allele frequency of 5% or higher for the top 10 ranked genes using aggressive tagging [40] (HaploView version 4.2, Broad Institute). Not all 10 genes could be included in the assay because not all combinations of SNPs can be included because of sequence incompatibilities between the sequences flanking the SNPs on the basis of which the primer design took place. We firstly fixed the replicated cancer risk-associated MTOR SNP rs2295080 in the assay design. The assay design next allowed for the inclusion of the following tagSNPs covering 7 of the 10 top-ranked genes: MTOR rs1057079; TSC2 rs2516739, rs1800720, rs2074969, rs9928737, and rs12918803; PDK1 rs6723872; EIF4EBP1 rs6605631; RPS6KB2 rs12787021; AKT3 rs3006939 rs14403, rs7523198, rs7523742, rs1352162, and rs946824; and AKT2 rs874269, rs16974157, and rs7250897. The assay was furthermore filled up as much as possible with single genome-wide association study (GWAS) hits for anthropometric traits, physical activity, or CRC annotated to mTOR-PI3K-Akt pathway genes (https://www.ebi.ac.uk/gwas/). Included were RPTOR rs7503807 for its association with obesity, [41] RICTOR rs2043112 and S6K1 rs1051424 for their association with (childhood) obesity-associated traits, [42] and IGF1R rs2871865 [43, 44] and INSR rs891088 for their association with height (abbreviations are explained in Supplemental Table 1). [43, 45].

Genotyping

Genotyping was performed for 3793 (79.5%) subcohort members and 3464 (75.3%) CRC cases with available toenail DNA (Fig. 1). Potentially contaminated samples as noted by the laboratory technicians were excluded (2.6%) to ensure the quality of the data. Mean sample call rates were 97.4% (median: 100%). SNP call rates were between 97 and 99%, except for one SNP, which had a SNP call rate of 92% (rs1051424). A sample call rate of 95% or higher was present in samples from 3550 subcohort members (93.6%) and in samples from 3293 CRC cases (95.1%). (Two SNPs genotyped for use in another project were also enumerated when calculating the sample call rate.) Allele frequencies in the subcohort, which is representative of the whole cohort, are given in Supplemental Table 2. Hardy-Weinberg Equilibrium was violated on five occasions, but we did not exclude these SNPs from further analysis, because we had no reason to suspect genotyping errors since all SNPs were genotyped using a single assay and because multiple tests increased the risk of a significant finding by chance.

Statistical analysis

The main exposure variable used in the analyses was a sex-specific polygenic risk score. Since no GWAS summary statistics were available to generate a polygenic risk score, we generated this score using the data at hand. First, the dataset was divided into two random sets of approximately equal size (datasets A and B). In each set, each individual SNP was modelled continuously in relation to the risk of CRC in men and women separately, adjusting the model for age. We deemed it important to do this in a sex-specific manner, because energy balance-related risk factors for CRC have been shown to differ between men and women in the NLCS. [16, 17, 26, 39, 46,47,48] Specifically, a larger BMI and trouser size, used as a proxy for waist circumference, were previously shown to be risk factors in men but not women, whereas height was a risk factor in women but not men in the NLCS. Genotypes in individuals were coded ‘0’ when homozygote for the major allele, ‘1’ when heterozygote, and ‘2’ when homozygote for the minor allele. We used the standard error (SE) weighted regression coefficients (beta / SE) from set A to generate the polygenic risk scores in set B and vice versa (i.e. two polygenic risk scores were generated in each set, one for men and one for women). The polygenic risk scores were calculated by weighting the number of risk alleles carried by an individual in one set using the standard error weighted regression coefficient from the other set (SNP x: n risk alleles * (beta / SE), with n being 0, 1, or 2) and then summing the weighted number of risk alleles for all SNPs into a single score. In case of a negative standard error weighted regression coefficient, the coding of the SNP was reversed, as in these instances the major allele instead of the minor allele was considered the risk-conferring allele, and the absolute value of the weighted regression coefficient was used. We only included SNPs in the polygenic risk scores that showed the same direction of effect in both sets so that the risk scores in each dataset would include the same SNPs, though different weights were used to generate the scores. The scores were allowed to include different SNPs between men and women. We also only included SNPs in the polygenic risk scores that were in low linkage disequilibrium (LD) which was defined as r2≤0.6, because SNPs in low LD are most likely to add new information to the score in terms of the variance explained in the outcome. LD was evaluated for the data under study using default settings in Haploview version 4.2 and defining CRC cases as affected individuals and subcohort members without CRC as unaffected individuals. There were two pairs of SNPs that were in LD with r2>0.6 (AKT3 rs946824 and AKT3 rs7523742 and MTOR rs2295080 and MTOR rs1057079), of which only one of a pair (chosen at random) was included in the polygenic risk score in case of consistent betas in sets A and B for both SNPs. To adjust for missing SNP data (one SNP was missing at most because of exclusion of samples with <95% call rate), we divided the polygenic risk scores in each set by the proportion of successfully genotyped risk alleles [(n successfully genotyped SNPs*2) / (n genotyped SNPs*2)]. We then standardized the set-specific polygenic risk scores by deducting the mean and dividing the scores by their standard deviation (SD) [(x-mean) / SD], with sex-specific means and SDs based on the subcohort. This allowed us to merge the scores and the datasets back together again, resulting in one dataset which included two polygenic risk scores, i.e. one for men and one for women.

Cox regression was then used to study the (subsite-specific) CRC risks associated with the sex-specific polygenic risk scores in men and women separately using R (R software, version 3.2.2). Models were age-adjusted and the polygenic risk scores were modelled in tertiles (based on the distribution in the male and female subcohort, respectively) and continuously. We also analyzed individual SNPs in relation to CRC risks in men and women in the total dataset, assuming a codominant and additive inheritance mode, in order to facilitate potential future meta-analyses for any of these individual SNPs.

To investigate whether associations of BMI, trouser/skirt size, BMI at age 20, non-occupational physical activity, height, and energy restriction during childhood and adolescence with overall and subsite-specific CRC risks in men and women were modified by the polygenic risk scores, we stratified associations by tertiles of the sex-specific polygenic risk scores and tested multiplicative interactions using the Wald test in men and women (wald.test, aod package in R). Participants with incomplete or inconsistent baseline questionnaires were excluded from these analyses, leaving 2191 male and 2248 female subcohort members and 2409 male and 1870 CRC cases, although the total number per analysis differed because of missing values on specific exposure variables and covariates (Fig. 1). Since BMI, trouser/skirt size, BMI at age 20, non-occupational physical activity, height, and energy restriction during childhood and adolescence have been studied as CRC risk factors after 16.3 years of follow-up in our cohort, we used the same confounder sets as before, which included established CRC risk factors and confounders derived using a backward procedure. [16, 17, 26, 39, 46,47,48].

Our approach of using polygenic risk scores was aimed at reducing the risk of overfitting, which can lead to inflated estimates or false positive findings. Since most of the SNPs used in this study were tagging polymorphisms that were selected to cover as much genetic variation as possible in the set of identified top-ranked genes in the mTOR-PI3K-Akt pathway, we did not have data from an independent population available on the risk-conferring allele for the majority of the SNPs. By generating a risk score in each half of the data and then merging the data back together again, we benefited from optimal power to carry out subsite-specific analyses and investigate potential effect modification by the polygenic risk scores of associations between energy balance-related factors and (subsite-specific) CRC risks in men and women separately.

All Cox models (coxph, survival package in R) were adjusted for the additional variance introduced by sampling the subcohort from the total cohort by entering the participant identification number as cluster term in the model (robust variance option). [35] We checked potential violations of the proportional hazards assumption by plotting the scaled Schoenfeld residuals against time and violations appeared negligible (cox.zph, survival package in R). Statistical significance was indicated by a P-value <0.05 for two-sided testing. Gene-based false discovery rate-adjusted P-values across men and women were calculated according to the method of Benjamini and Hochberg for P-values for individual SNP-CRC associations. The FDR adjustment entailed ranking P-values in ascending order and multiplying a predefined FDR threshold (0.20 [49]) with the inverse of the rank order over the total number of P-values considered to be part of the multiple testing. [50] If the original P-value was below 0.05 and the FDR-adjusted P-value, we considered the result statistically significant.