Background

Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer death among women in the USA [1]. Despite knowledge of several modifiable risk factors for this cancer [2], incidence rates for breast cancer have continued to rise over the past several years [3]. A better understanding of breast cancer etiology and the factors that affect this process could lead to the development of new prevention strategies and the identification of novel therapeutic targets for chemoprevention.

Metabolomics provides a comprehensive assessment of the small molecules in a blood sample that integrates the effects of endogenous metabolism, exogenous exposures, and genetic variation. Recently, this technology has been used in prospective cohort studies to identify metabolites associated with breast cancer risk. To date, there have been ten studies from seven prospective cohorts that have applied metabolomics to prediagnostic blood samples from breast cancer cases and controls [4,5,6,7,8,9,10,11,12,13]. These included both targeted metabolomics [4, 9, 12, 13] in which a defined set of metabolites were analyzed, and untargeted metabolomics [5,6,7,8, 10, 11], where all metabolites that can be measured were analyzed and used either nuclear magnetic resonance (NMR) [7, 11] or mass spectroscopy (MS) [4,5,6, 8,9,10, 12, 13] for the metabolite measurements. The number of breast cancer cases in these studies ranged from 100 [13] to 1997 [12], and the criteria used to define statistical significance for associations varied. While all the studies identified at least one metabolite associated with breast cancer, the only metabolites whose associations were directly replicated were the sulfated derivatives of the androgenic steroids, dehydroepiandrosterone (DHEA), and 3β, 17β-androstenediol [5, 6, 10]. This lack of replication could indicate that robust associations for metabolites with breast cancer do not exist, or it could be due to the small size of most of the studies and the limited overlap in the metabolites analyzed in each study [14]. Larger studies using untargeted platforms that maximize the coverage of metabolites are needed to resolve this issue.

In this study, we conducted a large prospective case–cohort analysis among 1695 breast cancer cases and a randomly selected subcohort of 1983 participants drawn from women enrolled in the Cancer Prevention Study-3 (CPS-3). Relative levels of 868 known metabolites were measured using an untargeted, MS-based metabolomics platform to maximize the chance of our findings overlapping with those of other studies and to discover novel metabolites associated with breast cancer risk.

Methods

Study population

The women in this study were from the CPS-3, a prospective study of cancer incidence and mortality among approximately 300,000 adults. CPS-3 participants were cancer-free, between the ages of 30 and 65, and from 35 states, Puerto Rico, and Washington DC at the time of enrollment between 2006 and 2013. Details about enrollment and cohort characteristics are available elsewhere [15]. All participants provided informed consent, a non-fasting blood sample, and completed a self-administered questionnaire requesting demographic, lifestyle, and medical information at enrollment. Blood was collected in an EDTA-containing vacutainer and was processed into plasma, red blood cells, and buffy coat within 24 h of collection. Blood fractions were frozen and stored in a biorepository in liquid nitrogen vapor phase tanks. All aspects of the CPS-3 study are approved by the Emory University Institutional Review Board.

Of the 303,682 participants enrolled in CPS-3, we excluded those missing a blood sample (N = 9534), who were not female (N = 70,596), had prevalent cancer other than nonmelanoma skin cancer (N = 2248), lived in a state not covered in our cancer registry linkage at the time of this analysis (N = 17,880), were missing birth date (N = 64), and whose enrollment was revoked or otherwise compromised (N = 166). From the 205,595 women who remained, 1695 were identified as having been diagnosed with invasive breast cancer between enrollment and December 31, 2015, through linkage to 36 state cancer registries. We also selected a random subcohort of 1983 women from the women eligible for the analysis, of whom 14 developed invasive breast cancer after enrollment. Comparison of the basic characteristics of subcohort with those of all the women in CPS-3 [15] indicates that it is representative of the women in the entire cohort.

Metabolomic analyses

Metabolomic analyses of plasma samples were done by Metabolon, Inc. (Morrisville, NC) as previously described [16]. Metabolites were identified by comparison of ion features to a library of over 3300 chemical standards. Compounds with the same features for which the exact placement of side groups could not be assigned were given the same chemical name followed by a number in parentheses to distinguish them from one another. Metabolite peaks were quantified using the area under the curve. Metabolite levels below the limit of detection were assigned the minimum observed value measured. Day-to-day variation was corrected by dividing each metabolite by its median for each run-day. The reliability of the analyses was assessed using replicate quality control samples analyzed with the study samples. For the measured metabolites, the median technical intraclass correlations coefficient (ICC) was 0.79 with an interquartile range of 0.69 to 0.89.

The metabolomic analyses provided data on 1053 named metabolites. Of these, metabolites were excluded if they had an ICC < 0.50 (N = 70), if no results were obtained for them from any of the quality control samples (N = 52), or if they were missing in > 90% of the samples (N = 63). Thus, 868 metabolites were included in the analyses.

Statistical analyses

Metabolite levels were log-transformed and auto-scaled (mean = 0, SD = 1) to approximate a normal distribution and be on the same scale [17]. With the case–cohort study design, multivariable-adjusted relative risks (RR) and 95% confidence intervals (CI) for the association of each metabolite (per one standard deviation diagnosis increase) with breast cancer was estimated using Prentice-weighted Cox proportional hazards regression models using time-in-study as the time axis. In these models, cases outside the subcohort contributed person-time only on their diagnosis date [18]. The women in the subcohort contributed to person time from the date of blood draw or collection of the baseline questionnaire, whichever came last, to date of breast cancer, death date, or December 31, 2015, whichever came first. Multivariable models were stratified on single year of age and adjusted for race, education, family history of breast cancer, age at menarche, oral contraceptive use, postmenopausal hormone use, and parity and age at first birth, all modeled as presented in Table 1. BMI was modeled as a continuous variable and, when missing, was imputed as the median of the entire study population. To account for multiple comparisons, a false discovery rate (FDR) < 0.05 was used to define statistical significance [19]. However, metabolites associated with breast cancer at FDR < 0.20 were also included in all analyses and tables to facilitate comparisons with results of previous studies that focused on metabolites in this range [5, 6, 8, 10] and because the expanded group of metabolites may provide more insight into the associations of the various metabolites.

Table 1 Selected characteristics of the women in the study

Stratified analyses were run to determine if metabolite associations varied by several parameters. For estrogen receptor (ER) status, independent models were run for ER+ and ER− breast cancer. p values for heterogeneity were calculated based on a meta-analysis of the results of the two models done using Cochran’s Q test [20]. For menopausal status and time since blood draw, an interaction term between the metabolite and the stratification variable was included in the model. A p value was calculated using the Likelihood Ratio test between the full model and a reduced model without the interaction term.

The clustered block analyses defined groups of metabolites mutually associated with breast cancer risk at FDR < 0.20 that could be represented by a single lead metabolite were done as described previously [21]. Briefly, hierarchical heat maps based on Pearson correlation coefficients, shown in Additional file 1: Fig. 1, were used to identify groups of metabolites with correlation coefficients ≥ 0.40. The metabolite most strongly associated with breast cancer in each group was defined as the lead metabolite for the group; whether it could represent the associations of all metabolites in the group was determined by rerunning the analyses controlling for that metabolite. If none of the associations were statistically significant (uncorrected p < 0.05), the group of metabolites was defined as a clustered block. Otherwise, the group of metabolites was split as suggested by the heatmap and the procedure was repeated until no significant associations remained.

Fig. 1
figure 1

Stratified analyses of breast cancer associations. Associations of the metabolites associated with breast cancer at FDR < 0.20 grouped in correlated blocks (A) in pre- and postmenopausal women, and (B) in women with ER+ or ER− breast cancer. Associations marked with † differed significantly (p < 0.05) between the two groups. An * next to a metabolite name indicates a level two (putative annotation) compound identification, whereas level one (definitive) identification requires comparing two or more properties of the metabolite, such as retention time, m/z, or fragmentation mass spectrum, to those for an authentic chemical standard, level two (putative) identification requires comparison of only one of these properties

Results

The characteristics of the women in the study are given in Table 1. The breast cancer cases were somewhat older than the subcohort, with an average age of 52.1 versus 48.3 years. The cases were slightly heavier, with an average BMI of 28.2 versus 27.7 kg/m2, and were more likely to be white or have a family history of breast cancer. The cases were also more likely to be parous, be ever users of postmenopausal hormones, and be less educated than the women in the subcohort.

Of the 868 metabolites in the analyses, 11 were associated with breast cancer with FDR < 0.05. These, along with 50 additional metabolites associated with FDR < 0.20, are listed in Table 2. Ten of the 11 metabolites with FDR < 0.05 were lipids and were inversely associated with breast cancer risk. The other significant metabolite was the xenobiotic 3-methyl catechol sulfate [2], which was associated with an increased risk of breast cancer. The associations for all 868 metabolites included in the analysis are shown in Additional file 1: Table 1.

Table 2 Metabolites associated with breast cancer at FDR < 0.20

As shown in Additional file 1: Table 2, 58 of the 61 metabolites associated with breast cancer at FDR < 0.20 clustered into 10 blocks of mutually associated metabolites. The largest block included 21 phospholipids, lysophospholipids, sphingomyelins, plasmalogens, and amino acids. The other clustered blocks ranged in size from 2 to 12 metabolites with members of each cluster mostly either structurally or functionally similar. Three metabolites were not clustered with any other metabolites.

Adjusting for BMI had no meaningful effect on the point estimates for the associations of the top metabolites with breast cancer (shown in Additional file 1: Table 3), although statistical significance was attenuated.

Table 3 Summary of metabolites previously replicated, newly replicated, or newly associated with breast cancer risk

Results stratified by menopausal or ER status for the 61 metabolites are presented in Additional file 1: Tables S4 and S5, respectively, and grouped into clustered blocks in Fig. 1. The associations were significantly different (p < 0.05) by menopausal status for 8 of the 9 steroids and the lipids octadecadienoate (C18:2-DC) and sphinganine (Fig. 1A). The associations were stronger in postmenopausal women for all the metabolites. The associations of two metabolites, androstenediol (3β, 17β) disulfate [2] and catechol glucuronide, were significantly higher among ER+ than ER− breast cancer cases (Fig. 1B).

To investigate whether the associations of the metabolites with breast cancer varied by the time between blood collection and diagnosis, estimates were calculated for three-time strata (complete results are in Additional file 1: Table S6). As shown in Fig. 2, the association of several metabolites varied by time between blood collection and breast cancer diagnosis. However, the difference was only significant for sphinganine-1-phosphate, for which the association was strongest in cases diagnosed within 1.5 years of blood collection and was attenuated in the later follow-up intervals, and octadecadienoate (C18:2-DC), for which the opposite trend was seen.

Fig. 2
figure 2

Influence of time from blood draw to diagnosis on breast cancer associations. Association for the metabolites associated with breast cancer at FDR < 0.20 grouped in correlated blocks stratified by time between blood collection and breast cancer diagnosis. Associations marked with † differed significantly (p < 0.05) between the three strata

Finally, the use of exogenous hormones alters the association of some known risk factors with breast cancer [22]. Sensitivity analyses excluding current users of exogenous hormones resulted in only very small changes in the metabolite breast cancer associations (data not shown).

Discussion

This prospective metabolomic analysis is among the largest done to date both in terms of the study population and the number of metabolites queried. Eleven metabolites were associated with breast cancer risk at FDR < 0.05 and an additional 50 metabolites were associated at a relaxed threshold of FDR < 0.20. These results replicated some previous studies and identified some novel associations.

The metabolites associated with breast cancer risk and that either replicate previous results or are novel findings are summarized in Table 3. The previously replicated metabolites which were associated with an increased risk of breast cancer were three androgenic steroids derived from DHEA [6, 10]. Two of these three steroids, androstenediol (3β,17β) disulfate [1] and 16α-hydroxy DHEA 3-sulfate, were associated with an increased risk of breast cancer in CPS-3. Four additional steroids, DHEA-S, androsteroid monosulfate [1], androstenediol (3β,17β) disulfate [2], and androstenediol (3β,17β) monosulfate [1], were also associated with an increased risk of breast cancer in CPS-3. These results, as well as the finding that the associations were only with postmenopausal breast cancer, are consistent with findings from other studies of circulating steroids [23,24,25]. Most studies of steroid metabolites in breast cancer have focused on androgens such as DHEA as the key metabolites influencing estrogen metabolism [26]. However, the correlated group of steroid metabolites we identified included two metabolites of pregnenolone (21-hydroxypregnenolone and pregnenolone sulfate), which is a precursor to the androgenic steroids. This suggests that the alteration in the rate of formation of pregnenolone from cholesterol, which is a highly regulated reaction and the rate-limiting step in steroid hormone biosynthesis [27], may play a role in breast cancer etiology.

One other metabolite that has potentially been replicated by previous studies [9, 10] is the plasmalogen phosphatidylcholine (PC) (O-16:0/18:2). Our findings for this metabolite directly replicate the finding from the CPS-II study [10]. In the European Prospective Investigation into Cancer (EPIC) study [9], which used the targeted Biocrates metabolomics platform, PC (O-16:0/18:2) was not specifically measured. However, all PC plasmalogens with 34 carbons and two double bonds, which include PC (O-16:0/18:2), were associated with breast cancer risk. Overall, the glycerophospholipids and sphingolipids we found to be associated with breast cancer clustered into two correlated blocks and included three lipids [PC (18:0/18:2), lyso-phosphatidylethanolamine (PE) (O-18:0) and lysoPC (18:2)] that replicated findings from previous studies [9, 10] for the first time. Why elevated levels of the lipids would be associated with reduced breast cancer risk is not clear. However, they are all common components of cellular membranes, and their altered levels could reflect the perturbation of pathways for membrane synthesis.

We found that glutamine was associated with a reduced risk of breast cancer, but previous studies have found conflicting results. Glutamine was associated with increased risk in the Supplémentation en Vitamines et Minéraux Antioxydants (SU.VI.MAX) cohort [8] where it was reported as glutamine/isoglutamine, and in the Etude Epidémiologique auprès de femmes de la MGEN (Mutuelle Générale de l’Education Nationale) (E3N) cohort [11], where the association was limited to premenopausal women. Glutamine was associated with a reduced risk of breast cancer in studies with both pre- and postmenopausal women in EPIC [9] and our study. Additional studies are needed to confirm the association of glutamine with breast cancer risk. However, the finding of an inverse association for asparagine, which is synthesized from glutamine, here and in the EPIC study [9] supports an inverse association for glutamine as higher levels of one of these amino acids should result in higher levels of the other. Neither of the studies that found a direct association for glutamine included asparagine among the metabolites analyzed.

We found associations between breast cancer risk and several metabolites that had not been included in previous studies. These metabolites are listed as novel associations in Table 3. Two metabolites, both decarboxylated fatty acids (octadecadienoate and 2-hydroxysebacate), were associated with decreased risk while the other three were associated with increased risk of breast cancer. One of these three, syringol sulfate, is a metabolite of syringol, which is a biomarker of smoked meat consumption [28]. A recent meta-analysis found that higher consumption of either red or processed meat was associated with a greater risk of breast cancer [29] but did not study smoked meat consumption specifically. Our findings for syringol sulfate argue that this issue should be investigated further.

The other two novel associations we observed were for the xenobiotics catechol glucuronide and 3-hydrixypyridine glucuronide, which were highly correlated (r = 0.76) and are metabolites of catechol and pyridine, respectively. While both compounds occur naturally at low levels, they are produced synthetically in large amounts. About half of the catechol and pyridine and catechol produced is used to make pesticides, while smaller amounts are used for pharmaceuticals and flavoring agents [30, 31]. Pyridine is also used in organic chemistry and in dyes [31], and both compounds have been found in cigarette smoke. The International Agency for Research on Cancer (IARC) evaluated the carcinogenicity of catechol, in 1999 [32], and pyridine, in 2019 [31], using primarily animal data and classified both as 2B, possibly carcinogenic to humans. Our findings suggest that further investigation into the carcinogenicity of these compounds is warranted.

In addition to several steroid metabolites, the associations of two additional metabolites [sphinganine and octadecadienoate (C18:2-DC)*] differed significantly (p < 0.05) in pre- and postmenopausal women. Two metabolites (androstenediol (3β,17β) disulfate [2] and catechol glucuronide) differed significantly between women with ER+ and ER− breast cancer. It is unclear why these associations differ by menopausal or ER status. These findings may be due to chance and require replication in future analyses.

A significant portion of the cases in this study were diagnosed with breast cancer within a few years after the blood collection, while others occurred later in follow-up, allowing us to explore if associations varied by time between blood collection and diagnosis. Only two metabolites had associations that varied significantly (p < 0.05) by time between blood draw and diagnosis, thus limiting any conclusions as to whether any of metabolite levels might be affected by reverse causation.

Although all the risk estimates remained similar, adjustment for BMI attenuated the associations of all the metabolites with breast cancer. This could indicate that BMI is a mediator of the associations. If so, then adjustment for BMI may be inappropriate. This possibility should be investigated further in future analyses.

A strength of this study is the large study population and the large number of identified metabolites measured. The factors likely contributed to our finding of 11 metabolites associated with breast cancer risk at FDR < 0.05, which is more than previous studies which identified one or two metabolites at most at this significance level [9, 10]. Limitations of our study include the fact the results were based on a single blood sample for each study participant. However, evidence suggests that levels of most circulating metabolites are relatively stable for up to 2 years [33, 34], suggesting that a single sample may be sufficient. Other limitations include smaller numbers in the subgroups used in the stratified analyses. Finally, although Black and Hispanic women were included in our study, there were too few to determine if associations differed by race and/or ethnicity. Thus, our findings may not be generalizable to all groups.

Conclusions

This metabolomic study of breast cancer further replicated positive associations for several steroid metabolites that had been previously replicated and provided new replications for inverse associations for some lipids and amino acids. We also found novel associations for some metabolites which suggest new avenues for investigation into potentially modifiable risk factors for breast cancer. The associations of metabolites of syringol, catechol, and pyridine with increased breast cancer risk suggest future etiologic research should focus on smoked meat consumption and exposure to some chemicals found in our environment. Finally, the growing evidence that larger metabolomic studies are needed to identify robust associations suggests that additional studies and pooled analyses of existing results are needed.