Introduction

Colorectal cancer (CRC) is the third most common cancer in the world, regardless of sex, with nearly 1.4 million cases diagnosed in 2012 [1]. The majority of these cancers (70–80%) are sporadic in nature [2], and if current trends continue, it is estimated that 2.2 million cases of CRC will be diagnosed annually worldwide by 2030 [1]. It is now well accepted that CRC risk is highly modifiable through diet and lifestyle; recent reports suggest that up to 47% of CRC cases could be prevented by staying physically active, maintaining a healthy body weight and eating a healthy diet [3].

The expert panel of the World Cancer Research Fund (WCRF), which is the organization responsible for publishing the most comprehensive review to date on risk factors related to diet and physical activity for cancer, has recently concluded that there is convincing strong evidence that body fatness, adult attained height, and consuming processed meat and alcoholic drinks increase the risk of developing CRC, while physical activity decreases the risk of developing CRC. Furthermore, they concluded that consuming whole grains, foods containing dietary fiber, dairy products and calcium supplements probably protect against CRC, and consuming red meat probably increases the risk of developing CRC [3].

CRC is not a single disease, but rather encompasses a heterogeneous complex of diseases characterized by numerous genetic and epigenetic abnormalities [4•]. Recently, several studies have used unsupervised clustering methods to develop genomic signatures to classify colorectal cancer (CRC) into different subtypes, and have shown that each subtype has distinct molecular features and prognosis [5•]. As summarized by Song et al. [5•], the CRC Assigner (CRCA) classifier categorized CRC into 5 distinct subtypes: enterocyte, gobletlike, inflammatory, stemlike, and transit amplifying (TA) [6]; and the Colon Cancer Subtypes (CCS) classifier identified 3 groups: CCS1, CCS2, and CCS3 [7]. Several studies have shown that different classifiers are highly correlated; for example, for CCS and CRCA classifiers, most CCS1 tumors are classified as TA or enterocyte, most CCS2 tumors are classified as inflammatory and gobletlike tumors, and most CCS3 tumors are classified as stemlike tumors [8•, 9]. Although these classifications may be significant in the advancement of CRC research, these subtypes will not be specifically addressed in this review, as they have not yet been investigated in MPE studies yet.

Generally, there are different (epi)genetic pathways to CRC development, and the cancers resulting from each pathway have specific molecular characteristics that often associated with distinct prognosis trajectories. Therefore, it is also likely that these cancers have a distinct etiology. Diet and lifestyle factors may not only play a role in causing mutations and epigenetic changes, but also in enhancing tumor growth in tissues that have already acquired specific (epi) genetic aberrations. There may be direct causal associations between diet and lifestyle factors and molecular changes in CRC, and establishing this is important for prevention strategies, and increasing the ability to better predict disease progression and prognosis.

Traditionally, epidemiological research has been used to investigate how an exposure may increase or decrease the risk of developing cancer, and pathological research has been used to explore molecular characteristics of tumors to predict prognosis and response to treatment. By combining these two disciplines, a relatively new field of scientific investigation has emerged: molecular pathological epidemiology (MPE) [10]. In this review, we describe the (epi)genetic molecular pathways leading to CRC; identify MPE studies from around the world that have studied molecular markers of these pathways in relation to diet and/or lifestyle factors; summarize the data published on such associations; and explore future perspectives in this realm of research. We focus on diet and lifestyle factors for which there is evidence for an association with CRC as identified by the World Cancer Research Fund reports. In addition, we review promising tumor markers and hypotheses, that warrant consideration in future studies.

Studies on the importance of diet and lifestyle factors for CRC survival according to molecular subtype of CRC are not reviewed due to the current paucity of data. In addition, studies focused on downstream expression of genes in CRC as outcome are not reviewed.

(Epi)genetic Pathways to CRC

Although each individual CRC tumor is (epi) genetically complex, and arises and behaves in a unique manner, it is common to classify tumors according to a limited number of phenotypes, because it is assumed that tumors with similar molecular characteristics have arisen through common mechanisms [10].

There are two morphologic, multi-step pathways to CRC (the traditional adenoma-carcinoma pathway and the serrated neoplasia pathway), which are driven by three molecular carcinogenesis pathways (chromosomal instability (CIN), microsatellite instability (MSI), and epigenetic instability (primarily the CpG island methylator phenotype (CIMP)) [11•]. It is important to understand these pathways, because MPE studies have been used to identify disease subtypes that may benefit from certain behavioral interventions, and may be used to validate molecular markers for risk assessment, early detection, prognosis, and prediction [12••, 13].

The Traditional Adenoma-Carcinoma Pathway

Tumors arising via the traditional adenoma-carcinoma pathway begin as premalignant lesions comprising of conventional, tubular or tubulovillous adenomas [11•], and account for approximately 60–90% of sporadic CRCs [2]. They are characterized by CIN, which describes a condition of aneuploidy that is caused by an accelerated rate of gains and losses of entire or large portions of the chromosome during cell division [14, 15]. CIN is associated with inactivating mutations or losses in the Adenomatous Polyposis Coli (APC) tumor suppressor gene, which occurs as an early event in this sequence [16]. Mutations in the KRAS oncogene, as well as TP53, SMAD4, and PIK3CA genes are also frequently observed [2]. With CIN, there is an increased rate of heterozygosity, which may contribute to the inactivation of tumor suppressor genes or activation of tumor oncogenes [17]. Descriptively, tumors that arise from this pathway are more often associated with male sex, and observed in the distal colon [11•].

Serrated Neoplasia Pathway

Approximately 10–30% of sporadic CRC tumors arise via the serrated neoplasia pathway [11•] and have distinctly different histology compared to tumors derived from the traditional adenoma-carcinoma sequence. They are characterized by MSI, a form of genetic instability characterized by length alterations within simple repeated microsatellite sequences of DNA. This is the result of strand slippage during DNA replication, which is not repaired due to a defective postreplication mismatch repair system [18]. An early event of these tumors is mutation of the BRAF proto-oncogene, which inhibits normal apoptosis of colonic epithelial cells [19]. The driving force of the serrated neoplasia pathway is the CpG methylator phenotype (CIMP), a form of epigenetic instability responsible for silencing a range of tumor suppressor genes, including MLH1 [2]. Loss of MLH1 is thought to cause microsatellite instability (MSI) and once MLH1 is inactivated, the rate of progression to malignant transformation is rapid [19]. Descriptively, these tumors are more frequently associated with female sex, and are observed in the proximal colon [11•].

Insights from the Cancer Genome Atlas Study

The Cancer Genome Atlas study, a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), has generated a comprehensive, multi-dimensional map of the key genomic changes in CRC [20]. As recently summarized by Bae et al. [11•], the Cancer Genome Atlas study reports that CIN and MSI are mutually exclusive. CIMP, on the other hand, overlaps with the MSI pathway because of sporadic MSI-high CRCs, which are also usually CIMP-high, but does not appear to be in an exclusive relationship with the CIN pathway [11•, 20]. CIMP-high tumors can exist in the absence of MSI-high, and these tumors show some copy number variations across the genome, but the degree of CIN is less pronounced than CIMP-negative, MSI-low tumors. This suggests that CIMP alone may not be enough for the malignant transformation of serrated polyps, and requires collaboration with either CIN or MSI to promote successful malignant transformation [11•, 20].

In an MPE paradigm, a potential etiological factor, such as diet or lifestyle, is assessed with risk of an outcome across strata of molecular characteristics for the disease of interest [12••]. For purposes of this review, focus is on MPE studies that have considered diet and lifestyle factors in conjunction with primary molecular markers of (epi)genetic instability. For the traditional adenoma-carcinoma pathway, these include CIN, APC mutation, KRAS mutation, and TP53 mutation. For the serrated neoplasia pathway, these include BRAF mutation, MSI, hypermethylation of MLH1, and CIMP.

MPE Studies on Diet, Lifestyle, and CRC

Because MPE is an emerging research field, studies are usually drawn from existing cohort and case-control studies that have collected pathology specimens [12••]. In the realm of CRC, it is not uncommon for some large, long-running, population-based studies to have thousands of CRC cases. However, obtaining tumor blocks and subsequently phenotyping molecular characteristics in sample numbers large enough for meaningful statistical analysis requires a significant investment of both time and money. Therefore, while many epidemiological studies have investigated associations between diet, lifestyle, and CRC, the number of studies that have embarked on MPE investigations considering such associations is still currently quite limited.

The Current Review

We reviewed the literature by searching combinations of key words (molecular pathological epidemiology, prospective cohort study, case-control study, KRAS mutation, APC mutation, Microsatellite Instability, CpG Island Methylator Phenotype, CIMP, BRAF mutation) in Pubmed and EMBASE databases, as well as by analyzing proceedings and participants of the International Molecular Pathological Epidemiology Meeting Series. Eight prospective cohort studies, five case-control studies, and one cross-sectional study that explicitly presented data on molecular markers of (epi)genetic instability were identified (Table 1). However, one cohort study did not further consider associations with diet and lifestyle factors [71], so for purposes of this review, was excluded from discussion. Of the remaining studies, associations have been published on molecular endpoints of CRC in relation to smoking, alcohol consumption; body mass index (BMI); waist:hip ratio; adult attained height; physical activity; early life energy restriction; ethnicity; dietary acrylamide, fiber, fat, methyl donors, omega 3 fatty acids; meat intake, including total protein, processed meat, and heme iron; and vegetable intake. For purposes of comparison and discussion, statistical associations are summarized in Tables 2 and 3, according to markers of the traditional adenoma-carcinoma and serrated neoplasia pathways, respectively, and the impact of these findings on advancing knowledge of CRC etiology is described in further detail below.

Table 1 Epidemiological studies that have collected molecular data according to (epi)genetic characteristics of colorectal cancer
Table 2 Associations between diet and lifestyle factors and markers of the traditional adenoma-carcinoma pathway to CRC
Table 3 Associations between diet and lifestyle factors and markers of the serrated neoplasia pathway to CRC

Smoking

Smoking has been studied in relation to both the traditional adenoma-carcinoma pathway [25, 41, 42, 58, 70, 72] and the serrated neoplasia pathway [30, 58, 60,61,62, 65]. As described in the proceedings of the third international MPE meeting, smoking provides one of the best examples of how MPE research can better predict CRC compared to epidemiological studies without molecular classification [12••]. Meta-analysis of traditional epidemiological studies showed only a modest link between smoking and CRC (i.e., a RR usually below 1.2) [73], which may lead one to believe that smoking is not a convincing risk factor for CRC. However, with the advent of MPE, it can be seen that once CRC cases are stratified by MSI or CIMP status, this risk increases up to two-fold for MSI-H and CIMP-H tumors in prospective cohort studies, while there are null associations for tumors not exhibiting these phenotypes (i.e., tumors of the traditional adenoma-carcinoma pathway). These data supports the premise that traditional epidemiological studies may mask true associations between some risk factors and cancer, and that MPE studies can shed light on true patterns of association.

Alcohol Intake

The association between alcohol intake and CRC has been studied separately by tumor markers related to the traditional carcinoma-adenoma pathway [21, 38, 43, 66] and the serrated neoplasia pathway [22, 38, 44, 63, 67]. Although considered by the WCRF as a convincing risk factor for CRC in menand women, MPE data is conflicting. Acetaldehyde in alcoholic beverages is a highly toxic substance that is carcinogenic to humans. In one of the earliest case-control studies considering alcohol in relation to risk of APC mutations, Diergaarde et al. found that alcohol intake only increased the risk of APC wildtype tumors [66]. In 2006, Bongaerts et al. concluded that alcohol was not associated with tumors harboring mutations in the KRAS gene [43]; however, in 2016, Jayasekra et al. concluded that alcohol intake is associated with an increased risk of KRAS mutated and BRAF wildtype/KRAS wildtype tumors originating via the traditional adenoma-carcinoma pathway but not with BRAF mutated tumors originating via the serrated pathway [38]. This is in contrast to case-control data from Slattery et al., who was the first to report that alcohol intake is associated with MSI [63]. Some reasons for these discrepancies may include heterogeneity between the way that alcohol intake was measured (i.e. lifetime exposure, highest vs. lowest intake, continuous intake), and the inability to consider men and women separately in data analysis due to limitations with sample size. Another layer of complexity in the association between alcohol and CRC risk is that there are susceptibility genes in relation to alcohol metabolism not accounted for in MPE studies. This may also explain some of the observed heterogeneity.

Indicators of Energy Balance

Indicators of energy balance include lifestyle factors that play a role in the development of body growth and obesity. These include body mass index (BMI), waist and hip circumference, adult-attained height, caloric intake and physical activity. The majority of MPE research on these factors has been conducted with respect to markers of the serrated neoplasia pathway [26, 39•, 45, 46, 59, 62, 64]. Although associations with APC, KRAS, and CIN have not been directly considered, the fact that BMI and waist measurements are positively associated with BRAF mutations and BRAF-wildtype, MSI and microsatellite stable tumors, and CIMP-H and non-CIMP tumors, is in accordance with WCRF evidence showing that overweight is a strong risk factor for CRC in general.

On the other hand, studies on adult-attained height and early life energy restriction suggest that timing of exposure may be important for influencing CRC risk. Height is a marker of aggregated fetal and childhood experience, and can be considered a proxy measure for important nutritional exposures, which affect several hormonal and metabolic axes [3]. Like body weight, adult-attained height is also an established risk factor for CRC in general; however, observations tend to be stronger for tumors demonstrating BRAF mutation and MSI [39•, 45]. One study on early life energy restriction showed that exposure to famine during childhood and adolescence decreased the risk of developing a tumor characterized by CIMP [46]. Taken together, this suggests that early life exposures may influence risk of epigenetic instability and CRC risk through the serrated neoplasia pathway, but data are scarce and more research is needed in this area.

Dietary Factors

Because the majority of MPE studies are derived from larger cohort and case-control studies that were designed to consider outcomes between diet and cancer, and therefore have validated food frequency questionnaires in place, it is not uncommon for multiple dietary exposures to be presented in the same publication.

Red meat intake was identified by the WCRF as a probable risk factor for CRC, and MPE research supports that this may especially be true for tumors of the traditional adenoma-carcinoma pathway; dietary heme intake shows stronger associations with KRAS.mutated tumors than KRAS wildtype tumors. It has been hypothesized that heme can enhance the endogenous formation of carcinogenic N-nitroso compounds [51•]. The study by Gilsing et al. is important because it is the first human observational study providing evidence, as expected, for an association between heme and tumors with specific point mutations [51•].

Similarly, the first observational study showing that dietary acrylamide might be associated with CRC with specific somatic mutations, such as G > C or G > T mutations, was recently published [47], which supports the a priori hypothesis that metabolites of acrylamide are human carcinogens.

With respect to dietary fat, a high intake of polyunsaturated fat, in particular linoleic acid, has also been linked to KRAS mutations [49]. Intriguingly, and in contrast, it was recently reported that high marine omega-3 polyunsaturated fatty acid intake is associated with lower risk of MSI-high CRC but not MSS tumors, suggesting a potential role of omega-3 fatty acids in protection against CRC through DNA mismatch repair [31]. Calcium, milk, and garlic were not significantly associated with specific tumor subtypes in the reviewed publications [21, 22, 63, 64, 51•].

Alcohol is often considered in conjunction with dietary methyl donors such as folate, because folate may influence promoter methylation at gene promoters, and is depleted with alcohol intake. It has been hypothesized that methyl donors such as folate and methionine influence CRC through the serrated neoplasia pathway because of their role in methyl transport (i.e. a deficient status may result in a decrease in promotor hyper methylation, as observed in CIMP). Folate intake is associated with BRAF mutations, suggesting that it does play a role in epigenetic aberrations [52]. However, high folate consumption also appears to reduce the risk of APC wildtype colon tumors, while being positively associated with APC mutated colon tumors in men [50], indicating that folate may also enhance colorectal carcinogenesis through a distinct APC mutated pathway. More research, with attention to sample size, is needed to replicate and clarify these associations.

Future Perspectives

In order to gain more insight into etiology and potential CRC interventions, it is important to continue investigating associations between diet, lifestyle factors and risk of different CRC subtypes. As mentioned previously, several studies have recently been publishing clustering CRC into specific subtypes [5•, 6, 8•, 9, 74]. The Cancer Genome Atlas study provides additional insights on how MPE studies in the realm of CRC should consider molecular markers and etiologic pathways [20].

As noted earlier, MPE studies are usually drawn from existing cohort and case-control studies. That means that in most cases, such studies have validated food-frequency and lifestyle questionnaires in place and in the future may have more tumor tissues available for molecular subtyping as cases continue to be identified. This will improve interpretation of research findings as One important limitations of MPE studies is limited sample size. Any molecular pathological epidemiology study conducted within a larger cohort will undergo multiple exclusions based on availability of tumor material and valid assay results. Therefore, the sample size for a study with molecular endpoints will always be smaller than the parent study. To analyze molecular data for associations with diet and lifestyle factors, a subset analysis for the different sub-sets is performed (i.e. CIMP-H vs CIMP-0; MSI-H vs MSS; BRAF mutated vs. BRAF wildtype tumors). The sample size for a subset, especially the rarer event (e.g., CIMP-H, MSI-H, BRAF mutated) may be too small to provide adequate statistical power, or limit the number of possible subtypes to be distinguished, even though this may at least in part be offset by more refined risk estimates in these subtypes.

Pooling data from independent studies may be a solution to this problem. To our knowledge, only one such MPE pooling data from the (NLCS) and the Melbourne Collaborative Cohort Study (MCCS) to assess the association between body size and CRC, by MSI and BRAF mutation, has been published so far. However, iin that study, pooling CIMP data was not possible due to methodological differences [39•]. This study highlights a unique challenge of pooling molecular data: it is important that similar definitions and laboratory analyses be used to define the phenotype in each study. We have previously published on the need for a global consensus on how to analyze and define CIMP [75, 76], but this is important for all molecular endpoints.

In a 2010 review on MPE of CRC, Ogino et al. identified that to overcome the unique challenges of this work, it would be necessary to coordinate research efforts around the world and to formulate a system where researchers could discover and validate new findings [4•]. Recently, The 3rd International Molecular Pathological Epidemiology (MPE) Meeting was held in Boston, which was attended by 150 scientists from 17 different countries [12••]. This meeting highlighted a new wave of research that is focused on increasing the understanding of the role that lifestyle/behavioral factors on modifying prognosis of diseases (including CRC) by considering specific disease subtypes. Such organization and collaboration will only expedite the creation of new, high quality studies, research questions, and answers around CRC etiology.

Conclusion

Because CRC is a heterogeneous disease with several molecular subtypes, traditional epidemiological studies may mask completely or underestimate true associations between diet, lifestyle and disease risk. The WCRF has identified several convincing and probable risk factors for CRC, and by utilizing MPE can inform prevention and treatment strategies as well as predict prognosis for CRC.

MPE studies have also suggested that timing of exposure may be important for establishing patterns of epigenetic instability (e.g., as suggested by associations on adult-attained height and early life energy restriction with tumors exhibiting specific (epi)genetic markers). Furthermore, MPE studies offer the possibility to test hypotheses with regards to mutagenic effects (e.g., as suggested by the associations of heme iron and acrylamide with tumors exhibiting specific somatic mutations related to the exposure).

In the future, continuing collaboration and pooling data from high quality studies, including data on other molecular endpoints, may improve the strength of individual MPE findings, overcome the challenges of small sample sizes, and further pinpoint carcinogenic mechanisms leading to CRC.