The simultaneous measurement of a large variety of small molecules in accessible tissues by metabolomics techniques may facilitate the identification of metabolic fingerprints of different cancers [1]. First prospective metabolomics studies have shown that elevations in circulating branched-chained amino acids 2–5 years prior to diagnosis are related to increased risk of pancreatic adenocarcinoma [2], that significant differences in serum levels of phosphatidylcholines and fatty acids between later prostate cancer cases and controls can be detected [3], and that pre-diagnostic serum concentrations of glycochenodeoxycholate are associated with colon cancer risk [4]. A complex pattern of pre-diagnostic metabolites representing several pathways has been identified in another prospective study on hepatocellular carcinoma [5]. For other cancers, differences in plasma metabolite concentrations between cases and controls have also been demonstrated, but prospective data are missing [1].

Here, we report results of the largest prospective metabolomics study so far on cancers of the breast, colorectum, and prostate. Our goal was the identification of metabolites that may give insight into metabolic alterations preceding the manifestation of cancer. In a case-cohort subset of the prospective population-based EPIC-Heidelberg study, we analyzed pre-diagnostic levels of 120 plasma metabolites by liquid chromatography-tandem mass spectrometry (LC-MS/MS) and flow injection analysis-tandem mass spectrometry (FIA-MS/MS), and evaluated associations between these metabolites and cancer risk over time.


Study population

EPIC-Heidelberg was established as part of the European Prospective Investigation into Cancer and Nutrition (EPIC) between 1994 and 1998 [6]. Overall, 25,540 adults (53.3 % females) aged 35–65 years from the local general population entered the study [7]. The vast majority of participants are white. Detailed information on habitual diet, smoking, alcohol consumption, physical activity, and socio-economic status was obtained by questionnaires and during interviews. In addition, anthropometric measurements were taken by trained personnel. Blood was drawn, processed, and aliquoted into 0.5 ml straws of serum (8 straws), plasma (12 straws), buffy coat (4 straws), and erythrocytes (4 straws), which were stored in liquid nitrogen at −196 °C. Plasma straws used for the present study were retrieved for the first time, i.e. they had not undergone freeze-and-thaw cycles.

Since the baseline, participants are followed-up by active and passive procedures [8]. Self-reported and registry-derived incident cases of cancer were validated by study physicians who accessed diagnostic records provided by treating physicians and hospitals. A case-cohort design [9] was chosen for the present metabolomics study and metabolites were measured in pre-diagnostic blood samples of cancer cases and a random subcohort [10]. Individuals in the case-cohort selection were free of diabetes at baseline, as the subcohort was initially drawn for the EPIC-InterAct study on incident diabetes [10]. All incident primary cases of breast (ICD-10 C50), prostate (ICD-10 C61), and colorectal cancer (ICD-10 C18–20) as well as cases secondary to non-melanoma skin cancer that had occurred before 31 December 2006 were selected. After exclusion of participants with missing covariate information (n = 4) and prevalent cancer cases (n = 65), the subcohort comprised 774 individuals. The final numbers of incident cases of breast, prostate, and colorectal cancer were 362, 310, and 163, respectively, after exclusions due to missing covariate information (breast cancer: n = 3; colorectal cancer: n = 2).


The present study was approved by the Ethics Committee of the Medical Faculty of the University of Heidelberg (Heidelberg, Germany). The study participants provided written consent for the use of their blood samples and data.

Laboratory analyses

Metabolites, i.e. 40 acylcarnitines, 21 amino acids, 14 biogenic amines and creatinine, 76 phosphatidylcholines (PCs), 14 lysophosphatidylcholines (lysoPCs), 15 sphingomyelins, and overall hexose were quantified using the MetaDisIDQ™ Kit (Biocrates, Innsbruck, Austria). Sample preparation and measurements were carried out according to the suppliers’ protocol and have been described in detail elsewhere [11]. In short, 10 μl of standards, quality controls, and plasma samples, which had been thawed for 30 minutes at −4 °C, were pipetted onto filter paper within each well of the 96-well plate that had been commercially prepared. Nitrogen gas was used to dry the samples followed by application of phenylisothiocyanate (PITC) and extraction of metabolites from the filter paper. The eluate was then dried under flowing nitrogen gas and the wells sealed. Kit plates were shipped to the partner laboratory at the Department of Proteomics, Helmholtz Centre for Environmental Research (Leipzig, Germany), where liquid FIA-MS/MS (amino acids and biogenic amines) and LC-MS/MS (lipids and hexoses) analyses were carried out on each kit. Samples were blinded to the laboratory team. Metabolites were detected using an Agilent 1100 HPLC (Agilent Technologies, Böblingen, Germany) coupled to an API 5000 triple quadrupole mass spectrometer (AB Sciex, Darmstadt, Germany).

The MetaDisIDQ™ software (Biocrates) was used to randomize samples of the study population so that samples of cases and non-cases were stratified within and across batches. Each analytical batch contained three quality control (QC) samples of lyophilized human EDTA-plasma with standardized low, medium (spiked), and high (spiked) concentrations of internal standards to monitor the validity of measurements. Two blinded pooled QC plasma samples were included in duplicate to independently verify the consistency of measurements within and across batches. In addition to a commercially available pooled QC sample (NIST SRM 1950), we used a pooled sample from our own population. One blank without internal standard was included on each batch to calculate background levels and to check the system for contamination. Three additional zero samples with internal standards, but no analytes were integrated to calculate limits of detection.

Data pre-processing and statistical analyses

Metabolites with more than 25 % missing values (n = 2), within batch coefficients of variation greater than 25 % (n = 12), or more than 25 % of values below the limit of detection (n = 47) were excluded so that 120 out of 181 metabolites were used for statistical evaluation (Additional file 1). All metabolite values were then log2-tranformed and normalized by metabolite-wise batch-standardization [12]. Multivariable outliers were identified by robust principal components analyses and excluded [13]. Spearman’s rho adjusted for age and sex was calculated to detect correlations between metabolites. Associations between metabolite concentrations and epidemiological covariates were assessed by calculating Spearman’s correlations (continuous covariates) and geometric mean values across categories (categorical covariates). Cox proportional hazards regression analyses were carried out to evaluate the relationships between circulating metabolites and cancer risks. Metabolite concentrations were divided into quartiles and linear trends were tested for using log2-transformed metabolite values on the continuous scale. The so-called Prentice method was used to account for the case-cohort design of the study [14]. Age was used as the underlying timescale, with age at recruitment as age at entry and age at the end of follow-up, death, or cancer diagnosis, whichever came first, as age at exit. The proportional hazards assumption was tested based on Schoenfeld residuals [15]. With respect to the vast majority of metabolites, and particularly those which showed significant associations with cancer risk, no strong violations of the proportional hazards assumption were noted. Adjustment factors for multivariable Cox regression models were selected based on literature review. The Bonferroni method was used to adjust for multiple testing, i.e. P values of single tests were divided by the total number of tests (n = 120). All statistical tests were two-sided and P values below 0.05 were considered statistically significant. All analyses were performed using SAS 9.3 (SAS Institute, Cary, NC, USA).


Characteristics of the study population are shown in Table 1. The case-cohort sample comprised a random subcohort (n = 774) and cases of breast (n = 362), prostate (n = 310), and colorectal cancer (n = 163). Median follow-up durations were 8.34 years in the subcohort and 6.36 years (breast cancer), 6.57 years (colorectal cancer), 6.83 years (prostate cancer), and 6.48 (overall cancer) among incident cases.

Table 1 Characteristics of the study population (EPIC-Heidelberg, case-cohort sample)

Associations between metabolite levels and cancer risk are depicted in Fig. 1. The strongest statistical relationship was found between levels of the lysoPC a C18:0 and breast cancer risk, at a P value of 0.00421 corrected for multiple testing by the Bonferroni method. The hazard ratio (HR) of breast cancer among women in the highest quartile of lysoPC a C18:0 concentrations was 0.29 (95 % CI, 0.18–0.47, Table 2). The second and third strongest associations were between PC ae C38:1 and breast cancer risk (HR = 0.53, 95 % CI, 0.34–0.83, P = 0.08), and PC ae C30:0 and prostate cancer risk (HR = 1.89, 95 % CI, 1.06–3.36, P = 0.23). Considering the consistency of associations between levels of lysoPC a C18:0 and PC ae C30:0 and individual cancer types, visualized in Fig. 1, we decided to expand our analyses to overall cancer risk. LysoPC a C18:0 concentrations were strongly associated with decreased overall cancer risk (HR = 0.37, 95 % CI, 0.27–0.51, P <0.0001, Table 2). Similarly, levels of lysoPC a C18:1 and further lysoPCs, which positively correlated with lysoPC a C18:0 levels (Fig. 2), showed inverse associations (Fig. 1, Table 2). Circulating PC ae C30:0 was directly associated with overall cancer risk (HR = 1.85, 95 % CI, 1.31–2.60, P = 0.0314, Table 2).

Fig. 1
figure 1

Plasma metabolite concentrations and cancer risk. P values from Cox regression analyses on individual metabolite concentrations on the log-2 scale and cancer risk are represented by the needles. The blue dashed lines depict the significance threshold at an uncorrected P <0.05 and green dashed lines depict the significance threshold after Bonferroni correction (0.05 divided by 120). Unfilled circles indicate inverse associations and filled circles indicate direct associations. Metabolites are grouped by chemical properties: block 1, acylcarnitines; block 2, amino acids; block 3, biogenic amines; block 4, lysophosphatidylcholines (lysoPCs); block 5, diacylphosphatidylcholines; block 6, acyl-alkyl-phosphatidylcholines; block 7, sphingolipids; and block 8, overall hexoses. All multivariable Cox regression models were adjusted for age, smoking (never, former, current), lifetime alcohol intake (g/d), current aspirin use (yes/no), physical activity (Cambridge index), waist circumference (cm), BMI (continuous), height (cm), and education level (primary school, secondary school, university degree). Analyses on breast cancer risk were additionally adjusted for menopausal status, current HRT use (yes/no), current oral contraceptive use (yes/no), and at least one full term pregnancy (yes/no). Analyses on colorectal cancer risk were additionally adjusted for sex, fiber intake (g/d), and processed meat intake (g/d)

Table 2 Hazard ratios (95 % CIs) of cancer across quartiles of lysoPC a C18:0 and PC ae C30:0 concentrations
Fig. 2
figure 2

Age- and sex-adjusted Spearman’s correlations between levels of different lysoPCs

There was no heterogeneity in the associations between lysoPC a C18:0 or PC ae C30:0 concentrations and cancer risk by lag time (Additional file 2: Table S2). No meaningful associations between metabolite levels and epidemiological covariates such as age, BMI, dietary factors, or smoking status were observed (Tables 3 and 4). Moderate to strong positive correlations were observed between a majority of metabolites within chemical groups (Fig. 2 and Additional file 2: Figure S1).

Table 3 Age- and sex-adjusted partial Spearman’s correlations between levels of lysoPC a C18:0 as well as PC ae C30:0 and covariates
Table 4 Geometric means (95 % CIs) of metabolite levels adjusted for age and sex across strata of covariates


Since the metabolite levels in our study were measured in blood samples taken years before diagnosis, the present associations between lysoPC and PC ae C30:0 concentrations with the risk of cancer may represent general metabolic alterations fostering the development and growth of cancer cells. In contrast to the relationship between PC ae C30:0 and cancer risk that has not been described in the literature, our findings of associations between lysoPC levels and cancer risk are consistent with results of three case-control studies, in which lower blood concentrations of lysoPCs in patients with breast, prostate and colorectal cancer were found, as compared to controls [16-18].

While a potential shift between lysoPC levels from blood to tumor tissue indicates a higher consumption of lysoPCs by cancer cells, specific signaling properties of lysoPCs in cancer remain to be established [19]. Alternatively, lysoPCs may act as carriers of fatty acids, and extracellular hydrolization of lysoPC a C18:0 and lysoPC a C18:1, followed by a rapid uptake of the respective fatty acids, i.e. stearic (18:0) and oleic acid (18:1), appears to be a characteristic of solid tumors in mice [20]. This is in line with epidemiologic observations of inverse associations between stearic acid levels and breast [21], prostate [3], and colorectal cancer risk [4]. However, it appears spurious why lysoPC a C16:0, lysoPC a C18:1, and lysoPC a C20:4 concentrations also showed inverse associations with cancer risk in our study, whereas circulating palmitic (16:0), oleic (18:1), and arachidonic acid (20:4)—that were not covered by our assay—were not related to cancer risk in previous epidemiological studies.

Possibly, other degradation products of lysoPCs than fatty acids drive tumorigenesis. Extracellular lysoPCs are converted into lysophosphatidic acid (LPA), which induces tumor growth, by autotaxin (ATX), a secreted lysophospholipase D. Overexpression of ATX and LPA receptors has been proposed to be a common feature of several cancers, and both ATX and LPA receptor knockout mice show lower cancer risk [2224]. Moreover, lysophosphatidylcholine acyltransferase 1 (LPCAT1), which converts lysoPCs into PCs is overexpressed in several cancers, and increased incorporation of PCs into cell membranes may facilitate proliferation, adhesion, and motility of cancer cells [2527].

Less is known about the role of PC ae C30:0 in carcinogenesis. Elevated PC ae C30:0 concentrations were detected in plasma of patients with ovarian endometriosis [28]. Even though PC levels were inversely associated with prostate cancer risk in one previous study [3] and have been found to be higher in cancer cells than in non-malignant cells [2931], a distinct biological function of PC ae C30:0 has not been described and ours was the first prospective study on PC ae C30:0 and cancer to our knowledge.

Levels of lysoPCs and PC ae C30:0 were not related to background factors such as BMI, physical activity, or smoking in our study. While PC ae C30:0 has not been shown to be associated with chronic diseases in previous studies, it may seem noteworthy that lysoPC a C18:2 has been proposed to be a potential pre-diagnostic biomarker of diabetes [32]. At the same time, associations between lysoPC a C18:0 and diabetes risk have not been observed in previous studies [32, 33], and potential mediation of associations between lysoPCs and cancer risk by a pre-diabetic state does therefore not appear to be a valid explanation for the present findings.

A limitation of our study is that only a single blood sample was available; however, reasonable mid-term reliability of metabolite levels over time has been demonstrated in reproducibility studies [3437]. While we cannot provide data on the stability of the analyzed metabolites in long-term storage at −196 °C, it has been shown that freeze-and-thaw cycles do not substantially affect most metabolites covered by the kit we used [38, 39]. Longer exposure to room temperature, which may indeed lead to an increase of lysoPC levels [38], was avoided in our study, and the plasma samples of case patients and non-cases used for the present metabolomics project have been stored and prepared under exactly the same conditions. Many of the metabolites measured in our project are not part of metabolomics platforms previously used in other studies, which hampers comparisons across studies and underlines the need for standardization [1]. Undoubtedly, replication of the present associations is needed before metabolites such as lysoPC a C18:0 or PC ae C30:0 may eventually be used as cancer biomarkers. Moreover, our findings require further investigation in mechanistic studies, before more definite conclusions on lysoPCs and PC ae C30:0 in tumorigenesis can be drawn.


In summary, we observed consistent associations of lysoPC a C18:0 and PC ae C30:0 concentrations with the risk of three frequent cancer types independent of background factors. Intriguingly, associations in our study did not depend on lag time between blood draw and diagnosis, indicating that low levels of lysoPCs and high levels of PC ae C30:0 are cancer risk factors rather than early markers of disease. The consistency of findings across three cancer types points to global rather than cancer type-specific alterations underlying the observed associations. Further studies are needed to evaluate whether the top hits from our study may have a potential as biomarkers of cancer risk.