Background

Colorectal cancer is the third most common cancer globally, with around 1.9 million cases diagnosed in 2020, and the second most common cause of cancer-related death [1]. There is great potential to reduce this burden since most colorectal cancer cases are sporadic [2] and are associated with modifiable risk factors such as body fatness [3], alcohol intake [4], and diet [5]. Colorectal cancer development is also influenced by metabolic factors [6, 7]. For example, insulin and insulin-like growth factors are thought to play causal roles in colorectal tumorigenesis [8], likely through the promotion of cell proliferation and growth signaling pathways [9]. Broad metabolic dysfunction may lead to perturbed small-molecule metabolism, which in turn elicits bioactivity at the level of tissues and organs.

Amino acids are among the most abundant circulating metabolites and serve as building blocks of proteins, precursors of many signaling molecules, and an important energy source via the citric acid cycle. Certain amino acids may also fuel cancer development [10], and marked changes in blood amino acid concentrations have been extensively observed in colorectal cancer patients [11]. For example, levels of amino acids such as glutamine, citrulline, alanine, and histidine have been inversely associated with advancing disease stage [12, 13], while valine and leucine were among the metabolites that distinguished colorectal cancer cases using a discovery-replication strategy [14]. Similarly, the concentrations of several blood amino acids distinguished early-stage colorectal cancer cases from controls in Japanese patients, most notably aspartic acid [15], as well as ornithine and lysine [16]. Glutamine was a notable discriminant in patients newly diagnosed with colorectal cancer compared to controls in a Chinese hospital-based study [17]. Overall, amino acid levels were generally inversely associated with prevalent colorectal neoplasia, suggesting a depletion of serological concentrations in cases compared to healthy individuals. Amino acid profiling could therefore potentially help identify early-stage disease [18], as well as providing insights into mechanisms of carcinogenesis.

Despite these observations, few prospective studies have been conducted to test the hypothesis that pre-diagnostic amino acid concentrations are associated with colorectal cancer risk. Two such studies of nested case-control design that analyzed pre-diagnostic serum or plasma by untargeted metabolomics found limited dysregulation of lipophilic metabolites only [19, 20], while in a case-control study nested in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort that measured tryptophan and serotonin levels, tryptophan was inversely associated with colon cancer [21]. The aim of the current study was thus to test these associations in a larger and more comprehensive analysis. We first employed the EPIC nested case-control study as a discovery cohort, which measured between 13 and 21 amino acids in fasting plasma or serum in relation to colorectal cancer. In a replication step, we tested those amino acids associated with colorectal cancer risk in EPIC in the UK Biobank cohort, in which 9 overlapping compounds had been measured in over 111,000 participants. Together, the two cohorts allow for the largest and most detailed investigation of circulating amino acids and colorectal cancer risk performed to date.

Methods

The EPIC cohort

The EPIC cohort includes over 520,000 individuals who were recruited between 1992 and 2000 from 23 study centers across 10 European countries (Denmark, France, Germany, Greece, Italy, Norway, Spain, Sweden, the Netherlands, and the UK). Participants were 35-70 years of age at recruitment, and approximately 70% of the cohort are women. The study design has been previously described [22, 23]. In brief, extensive questionnaire data on dietary and lifestyle variables were collected at baseline, and approximately 75% of individuals provided non-fasting blood samples.

Incident cases of colorectal cancer were identified through record linkage with regional cancer registries or via a combination of methods, such as the use of health insurance records, contacts with cancer and pathology registries, and active follow-up through participants and their next of kin. Colorectal cancer was defined using the tenth edition of the International Classification of Disease (ICD-10) and the second edition of the International Classification of Disease for Oncology (ICD-O-2). Proximal colon cancers included those found within the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure (C18.0 and C18.2–18.5). Distal colon cancers included those found within the descending (C18.6) and sigmoid (C18.7) colon. Overlapping (C18.8) and unspecified (C18.9) lesions of the colon were classed as colon cancers only. Cancer of the rectum included cancers occurring at the recto-sigmoid junction (C19) and rectum (C20).

The current study employed a fasted subset of EPIC data, obtained from two separate metabolomics studies on colorectal cancer, as a discovery cohort. Samples were analyzed using the Biocrates AbsoluteIDQTM p180 kit (467 cases and 467 matched controls) and the p150 kit (1141 cases and 1141 controls). Combining these studies and then excluding non-fasting participants resulted in a final combined sample of 654 fasted cases and 654 controls, of which 354 case-control pairs were analyzed using the p180 kit. Controls were selected using incidence density sampling from all cohort members who were alive and free of cancer (except non-melanoma skin cancer) at the time of diagnosis of the colorectal cancer cases. Controls were matched to cases on age at recruitment (within 6 months), sex, study center, follow-up time since blood collection, time of day at blood collection (within 4 h), and fasting status. Women were further matched on menopausal status (pre-, peri-, and post-menopausal) and, in pre-menopausal women, phase of menstrual cycle at blood collection. Approval for the study was obtained from the International Agency for Research on Cancer (IARC) and local center review boards. All participants provided written informed consent.

The UK Biobank cohort

The UK Biobank aims to investigate the genetic, lifestyle, and environmental causes of a range of diseases [24]. Between 2006 and 2010, 502,656 adults aged between 40 and 69 years (229,182 men and 273,474 women) who were registered with the UK National Health Service were recruited at 22 study assessment centers. Ethical approval was obtained from the North West Multicentre Research Ethics Committee, the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland. All participants provided written informed consent. The present study was undertaken under application number 25897.

During the baseline recruitment visit, participants completed a self-administered questionnaire on socio-demographics (including age, sex, education, and Townsend deprivation score), health and medical history, lifestyle exposures (including smoking habits, dietary intakes, and alcohol consumption), early life exposures, and medication use. Physical measurements were taken, including weight, height, and waist circumference. Colorectal cancer cases were defined using the 10th Revision of the International Classification of Diseases (ICD-10). Colorectal cancers comprised those of the proximal colon (C18.0 and C18.2–18.5), distal colon (C18.6–C18.7), overlapping and unspecified lesions of the colon (C18.8–C18.9), and rectal cancers (C19–C20), as described above.

Blood samples, with data on time since last meal, were collected from all participants at recruitment and additionally from around 20,000 participants who attended a repeat assessment visit between 2012 and 2013. The current study included all participants for whom metabolite profiling had been performed at the time of the study, and thus had available amino acid measurements. From our supplied dataset that contained observations for 502,524 participants, exclusions were made for voluntary withdrawal from the study (n = 36) and prevalent cancer at recruitment (n = 27,240). Of the remainder, plasma amino acid measurements were available for 111,323 participants, and these were included as the replication cohort (Fig. 1).

Fig. 1
figure 1

Flow chart showing discovery and replication study design. CRC, colorectal cancer; EPIC, European Prospective Investigation into Nutrition and Cancer

Laboratory methods

In EPIC, targeted metabolomics profiling was performed at the International Agency for Research on Cancer (Biocrates AbsoluteIDQTM p180 kit) and the Helmholtz Centre in Munich (Biocrates AbsoluteIDQTM p150 kit). The samples were prepared as per the Biocrates kit instructions [25, 26]. Assay preparation steps were carried out on 96 well plates and a volume of 10 μL plasma was prepared. The p150 kit allows the quantification of up to 13 amino acids and the p180 kit up to 21 amino acids (Additional file 1: Supplemental Methods) [25, 27]. Liquid chromatography–mass spectrometry (LC-MS) was used to quantify the levels of the amino acids in accordance with the kit manufacturer’s instructions. All 21 amino acids included were fully quantified in μmol/L. The amino acids quantified were arginine, glutamine, glycine, histidine, methionine, ornithine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine (p150 and p180 kits); and alanine, asparagine, aspartate, citrulline, glutamate, isoleucine, leucine, and lysine (p180 kit only). See Additional file 1: Supplemental Methods for full details of sample preparation. Coefficients of variation for amino acids are given in Table S1.

Analysis of plasma from around 118,000 participants of the UK Biobank was performed using nucleic magnetic resonance (NMR) spectroscopy on the Nightingale metabolic biomarker platform (Nightingale Health Ltd, Finland), which comprises 249 metabolic measures, among which are concentrations of 9 amino acids. In brief, stored plasma samples prepared in 96-well plates were thawed, mixed gently, and centrifuged for 3 min at 3400 g to remove the precipitate. Aliquots of each sample were mixed with phosphate buffer, loaded onto a cooled sample changer, and analyzed by NMR spectroscopy. Metabolic biomarkers were identified and quantified from two separate spectra, a pre-saturated proton NMR spectrum, and a T2-relaxation-filtered spectrum. Six identical Bruker AVANCE IIIHD instruments were employed in parallel. The amino acids quantified were alanine, glutamine, glycine, histidine, isoleucine, leucine, valine, phenylalanine, and tyrosine. See Additional file 1: Supplemental Methods for further details.

Statistical analysis

Pearson correlations between non-log transformed amino acid concentrations were first calculated in the 654 EPIC fasted controls only and in UK Biobank participants with a fasted time of >4 h (n = 56,688). Hierarchical clustering of concentration profiles using Ward’s method was used to visualize and identify notable clusters of correlated metabolites.

Analysis of discovery cohort

In EPIC, case-control status was modeled using conditional logistic regression and odds ratios (OR) and 95% confidence intervals (CI) estimated for each amino acid. Models were adjusted for an a priori determined set of potential confounders comprising smoking status (never, former, current, unknown), alcohol drinking history (never, former, current, lifetime, unknown), Cambridge physical activity index (inactive, moderately inactive, moderately active, active, unknown) and body mass index (BMI; <25, 25–30, and >30kg/m2), all at baseline. The false discovery rate (FDR) procedure was used to adjust P-values and an FDR P-value threshold of 0.05 was used for statistical significance. Continuous models per SD concentration and categorical models by quartile were fit for each amino acid. For the categorical models, inner quartile cut points were determined by the metabolite concentrations among control participants. To test for trends across categories, quartile medians were additionally modeled as continuous variables.

Analysis of replication cohort

Amino acids that were significantly associated with colorectal cancer per SD concentration in EPIC were carried forward for testing in the UK Biobank cohort. Here, time to colorectal cancer diagnosis was modeled using Cox proportional hazards regression and hazard ratios (HR) and 95% CI estimated for each amino acid. Time at study entry was age at recruitment, while exit time was age at incident cancer diagnosis, death, or the last date at which follow-up was considered complete. Multivariable models were kept as similar as possible to those fit in EPIC and were adjusted for BMI category (<25, 25–30, >30 kg/m2), total physical activity (<10, 10–20, 20–40, 40–60, >60 metabolic equivalent of task [MET] h/week), alcohol consumption frequency (never, special occasions only, 1–3 times/month, 1–2 times per week, 3–4 times/week, daily or almost daily, unknown/prefer not to answer), smoking status (smoker, former smoker, never smoker), time since last meal (hours), and family history of colorectal cancer (yes/no). Stratification variables were age at recruitment in 5-year intervals, Townsend deprivation index quintiles, and assessment center region. A raw P-value threshold of 0.05 was used for statistical significance.

Stratified and sensitivity analyses

The above analysis was repeated but excluding individuals diagnosed within the first 2, 5, and 10 years of the study in EPIC, and within the first 2 and 5 years in the UK Biobank. Sex-stratified models were also performed for all amino acids measured in both cohorts and, in UK Biobank, amino acid models were conducted for colon and rectal subsites separately. Heterogeneity by sex and by tumor subsite was tested for by fitting models with and without interaction terms and comparing these by likelihood ratio test. As sensitivity analyses, models for glutamine and histidine only were repeated additionally adjusting for major sources of animal proteins (red and processed meat, poultry, fish, and dairy product intake), and amino acid models were repeated in EPIC only using non-fasted participants as well as fasted participants.

Analyses were conducted either in the R open-source statistical programming language (version 3.6.3 on the RStudio environment) or STATA version 16.1 (StataCorp Inc).

Results

A median follow-up of 14.4 years was observed for the 654 colorectal cancer cases and 654 controls in EPIC while, during a median follow-up of 10.7 years in the UK Biobank, 1221 incident cases of colorectal cancer occurred among the 111,323 participants with available amino acid measurements. The EPIC and UK Biobank populations were of similar ages at baseline and at colorectal cancer diagnosis although, in EPIC, most participants (77.4%) were from Italian or Spanish centers. Full baseline characteristics are shown in Table 1. Glutamine, alanine, and glycine were at the highest circulating concentrations overall, as quantified in EPIC (Fig. 2). Fasting concentrations of amino acids in cancer-free participants were almost always positively correlated, with the following correlated clusters noted in EPIC: glycine and serine; arginine, methionine, and tryptophan; valine, isoleucine, and leucine; and histidine and phenylalanine (Fig. 3). In the UK Biobank, valine, isoleucine, and leucine concentrations (branched-chain amino acids) were strongly intercorrelated.

Table 1 Baseline characteristics of participants, by cohort
Fig. 2
figure 2

Blood concentrations of amino acids as determined in fasted EPIC participants on the p150 or p180 Biocrates platform. Based on 654 and 354 cancer-free controls for p150 and p180 platforms, respectively

Fig. 3
figure 3

Fasting amino acid concentrations and their intercorrelations in EPIC cancer-free controls. Compounds are ordered by the hierarchical cluster as determined by Ward’s method. Squares represent groups of highly correlated compounds

Associations of pre-diagnostic amino acid concentrations with colorectal cancer risk

In the EPIC discovery phase, histidine concentrations were inversely associated with colorectal cancer risk (OR 0.80 per SD concentration, 95% CI 0.69–0.92, FDR P-value = 0.03) (Table 2). A statistically significant trend was also observed by quartile of histidine concentration (P-trend = 0.002). Lysine was also inversely associated with colorectal cancer risk (OR 0.78 per SD concentration, 95% CI 0.66–0.93, FDR P-value = 0.05), and glutamine was borderline inversely associated with risk (OR 0.85 per SD concentration, 95% CI 0.75–0.97, FDR P-value = 0.08). For both lysine and glutamine, individuals in Q4 of concentrations had a lower risk compared to those in Q1, with an apparent decreasing trend across quartiles (P-trend for both amino acids = 0.01).

Table 2 Associations between concentrations of 21 plasma or serum amino acids and colorectal cancer risk in the EPIC nested case-control discovery and UK Biobank replication cohorts

Histidine and glutamine, but not lysine, were among the nine amino acids measured in UK Biobank and were thus carried forward to the replication stage. Histidine was also significantly inversely associated with colorectal risk in UK Biobank (HR 0.93 per SD concentration, 95% CI 0.87–0.99, P-value = 0.03), with a significantly decreasing trend across quartiles of concentration. Glutamine was again borderline inversely associated with risk on a continuous scale (HR 0.95 per SD, 95% CI 0.89–1.01 respectively, P-value = 0.09), and individuals in Q4 of concentrations were at a lower risk of colorectal cancer than those in Q1 (HR 0.85, 95% CI 0.72–1.00).

Analysis by follow-up time

In EPIC, ORs for histidine and glutamine did not appreciably change when cases diagnosed within 2, 5, or 10 years were excluded (OR 0.82 per SD concentration, 95% CI 0.67–1.01 and OR 0.82, 95% CI 0.68–0.99 respectively for the two amino acids, exclusion of the first 10 years of follow-up) (Table 3). Similarly, minor changes in HR were observed for the exclusion of 2 and 5-year periods of follow-up for these amino acids in UK Biobank.

Table 3 Associations between concentrations of serum and plasma amino acids and colorectal cancer risk in the EPIC nested case-control discovery and UK Biobank replication cohorts by follow-up time to diagnosis, where available

Stratified and sensitivity analysis

In EPIC, most available colorectal samples were for cancers of the colon (625/654) and estimates for colon cancer mirrored those of colorectal cancer (Additional file 1: Table S2). Nevertheless, in the UK Biobank where 31.7% of colorectal cases (388/1221) were rectal cancers, HRs for colorectal and colon cancers were also similar. Here, glutamine concentrations were similarly associated with risk of colorectal cancer and colon cancer only (HR 0.92 per SD concentration, 95% CI 0.85–0.99), while no association was observed for rectal cancer (HR 1.02 per SD, 95% CI 0.91-1.13). Heterogeneity between colon and rectal tumor subsites approached but did not reach statistical significance (P=0.13). As regards histidine, hazard ratios were similar for colon cancer (HR 0.95, 95% CI 0.88-1.02), rectal cancer (HR 0.89, 95% CI 0.80–0.99), and colorectal cancer overall (HR 0.93, 95% CI 0.87-0.99). Inverse associations for amino acids were more pronounced in men than in women (Additional file 1: Table S3), and heterogeneity by sex was observed for histidine in UK Biobank (P-heterogeneity = 0.02). Heterogeneity by sex was not observed for any other amino acid measured in UK Biobank or for any amino acid measured in EPIC.

For the EPIC and UK Biobank participants included in the main study, adjustment for major sources of amino acid intake (red and processed meat, poultry, fish, eggs, and dairy products) did not change associations between circulating amino acids and colorectal cancer risk (Additional file 1: Table S4). Likewise, in sensitivity analyses including non-fasting as well as fasting participants in EPIC, associations did not change appreciably (Additional file 1: Table S5).

Discussion

In this analysis of pre-diagnostic circulating amino acid levels and colorectal cancer risk, histidine was found to be robustly inversely associated and glutamine borderline inversely associated with colorectal cancer risk via a discovery-replication strategy in two large prospective cohorts. In addition, odds ratios and hazards ratios for these amino acids were attenuated minimally by the exclusion of cases diagnosed within 10 years of follow-up. This study provides strong evidence that lower levels of histidine, and possibly glutamine, are associated with subsequent risk of colorectal cancer, even up to 10 years before a colorectal cancer diagnosis.

Circulating levels of several amino acids have previously been found to be inversely associated with colorectal neoplasia, but in studies of cross-sectional design only. Glutamine, for example, was one of several amino acids found to be lower in colorectal cancer patients compared to healthy controls [28], while histidine was lower among stage IV colorectal cancer cases than stage I cases [12] and even inversely correlated with tumor stage [29]. Untargeted metabolomics studies using discovery and validation cohorts demonstrated leucine and the dipeptide glutamine-leucine to be among those metabolites that distinguished cases from controls [14, 30]. Nevertheless, few studies have analyzed pre-diagnostic samples to investigate whether amino acid dysregulation precedes tumorigenesis. Two other prospective case-control studies on colorectal cancer with some amino acid measurements also found no significant associations [19, 31]. Our study is therefore the first to observe inverse associations of amino acid levels with colorectal cancer risk in a prospective setting and in independent studies. Although the above evidence suggests that tumor energy requirements give rise to the depletion of circulating glutamine and histidine, the levels of these amino acids among colorectal cancer cases in our prospective cohorts were associated with colorectal cancer risk at least 10 years prior to a colorectal cancer diagnosis, suggesting that alterations in the metabolism of these compounds may either reflect etiological pathways associated with the development of disease or metabolic changes linked to early events in colorectal tumorigenesis.

The most pronounced finding of the current study was a robust inverse association between circulating histidine and colorectal cancer risk in both EPIC and UK Biobank. Histidine, an essential amino acid derived from the diet, is converted by histidine decarboxylase to the biogenic amine histamine [32], a signaling molecule that mediates an acute inflammatory response by binding to specific receptors [33]. Histidine decarboxylase activity may be upregulated in tumor cells and is thought to accelerate cell proliferation and angiogenesis [34]. For instance, the enzyme was found to be more active in colon cancer cells, particularly metastatic tumor cells, than normal colonic cells [34]. Given that an inverse association between histidine and colorectal cancer risk was apparent as long as 10 years before diagnosis, perturbations to specific etiologic pathways may be hypothesized. Prior evidence suggests that higher histidine concentrations mitigate metabolic dysregulation; for example, dietary supplementation with histidine was found to improve insulin sensitivity, possibly via the suppression of pro-inflammatory cytokine expression, in women with metabolic syndrome [35]. These findings suggested that histidine may even hold potential as a therapeutic agent against metabolic disease. However, levels of histidine have also been found to be positively associated with breast cancer [36]. Therefore, additional laboratory studies are needed to elucidate the potential role of this amino acid in carcinogenesis.

Glutamine was borderline inversely associated with colorectal cancer risk in both cohorts. Glutamine is among the most abundant small-molecule metabolites in circulation and plays a central role in amino acid metabolism. It is used by proliferating cancer cells as an energy source [37] and is likely an important substrate throughout colorectal tumorigenesis [38]. The reasons for lower circulating glutamine in individuals who went on to develop colorectal cancer compared to those who remained cancer-free are uncertain. Firstly, given the slow development of colorectal cancer, lowered glutamine may reflect the undetected presence of polyps or early cancerous lesions in cases at baseline [28]. Secondly, regardless of the presence of such lesions and even controlling for major risk factors, abnormally low glutamine levels may reflect cancer-promoting metabolism. For instance, as well as being directly related to the tumor stage, glutamine levels have been inversely associated with serum C-reactive protein and inflammatory cytokines [13, 39]. Also, lowered glutamine and the glutamine-glutamate ratio was reported to be associated with incident type II diabetes [40], an established risk factor for colorectal cancer [41]. Lowered glutamine may represent dysregulation of the glutamine-glutamate axis. Although our study measured glutamate in a subset of EPIC cases and controls only, some evidence for a positive association of glutamate with colorectal cancer risk was observed in the categorical analysis. Glutamine concentrations may influence multiple mechanisms related to cancer development which deserve further investigation in experimental models.

The inverse associations observed between amino acid concentrations and colorectal cancer risk may also reflect cancer-promoting dysbiosis of the gut microbiota. Similar to the protection against colorectal cancer afforded by short-chain fatty acids produced from dietary fiber by microbiota via the mitigation of an inflammatory microenvironment [42], certain components of the gut microbiota may act upon amino acids in the lumen to influence inflammation and tumorigenesis [43],[44]. For example, the production of histidine decarboxylase by gut microbes has been suggested to decrease intestinal inflammation via the binding of histamine to the receptor HR2 in the gut lumen [45]. Also, specific components of the microbiota are also believed to mediate the relationship between branched-chain amino acids and insulin resistance [46]. It is therefore plausible that the associations of histidine and glutamine with colorectal cancer risk in the current study may reflect variability in gut microbial activity and its interaction with host metabolism. Further mechanistic research is needed to investigate links between histidine and glutamine metabolism, the gut microbiota and colorectal tumorigenesis.

The main strengths of this study include the prospective design, the use of large-scale cohorts with extensive participant data, and robust amino acid measurements in participants of well-characterized fasting status. We excluded non-fasted participants from the outset in the EPIC discovery cohort to minimize the effects of recent dietary intake upon amino acid levels which could have complicated interpretation of the results. As an additional safeguard against this bias, we performed sensitivity analyses for fasting status in EPIC and for major dietary sources of amino acids in both cohorts, which did not appreciably attenuate risk estimates. This is consistent with a recent study in EPIC that found weak or no correlations between amino acid intake and their blood concentrations [47].

In terms of limitations, only 9 of the 21 amino acids were measured in all EPIC and UK Biobank samples, while only 13 were measured all EPIC samples, with limited statistical power for the remainder. It is plausible that levels of amino acids other than glutamine and histidine are associated with colorectal cancer and we note that HR or OR point estimates were lower than 1 for most compounds in both cohorts. With greater statistical power, particularly in UK Biobank, other amino acids would likely have been found inversely associated with colorectal cancer risk via the discovery-replication strategy. Also, measurements of amino acids were taken at the study baseline only, and the technical and biological reproducibility of measurement was therefore not accounted for. However, studies calculating intra-class correlation in blood samples suggest that polar metabolites such as amino acids are measured reproducibly, particularly in fasted participants [48]. Statistical power was limited for individual colorectal cancer subsites, particularly rectal cancer. Also, we were not able to consider amino acids in tissue samples, which may better represent the tumor microenvironment and provider deeper insight into the biological implications of our findings.

Conclusions

Circulating histidine levels were robustly inversely associated with colorectal cancer risk in two independent prospective cohorts with similar, albeit slightly weaker, evidence for glutamine. This knowledge should contribute to a better understanding of the underpinnings of colorectal cancer and metabolism and could potentially support new prevention or early detection strategies. Further research using experimental models to assess potential causality of the identified associations is now needed.