Introduction

Diets have transitioned from fresh, unprocessed, and minimally processed foods toward a rise in the consumption of ultra-processed foods. These now contribute roughly 25 to 60% of the total daily energy intake of individuals across countries (Adams & White, 2015; Juul et al., 2022; Latasa et al., 2018; Levy et al., 2022; Madruga et al., 2022; Marrón-Ponce et al., 2019; Moubarac et al., 2014a; Wang et al., 2021). In the past decades, several food processing frameworks have been developed, such as the NOVA classification, the Food Standards Australia New Zealand, the International Food and Information Council, and the International Food Policy and Research Institute (Bleiweiss-Sande et al., 2019; Crino et al., 2017; Moubarac et al., 2014b). Although most of these similarly classify basic foods as processed or unprocessed, the NOVA system is the most widely applied to scientific studies and may be more useful for monitoring changes in the food supply and evaluating associations with health outcomes (Crino et al., 2017; Monteiro et al., 2019). The NOVA classification categorizes foods into four groups based on the extent and purpose of food processing they undergo: unprocessed/minimally processed foods (NOVA 1), processed culinary ingredients (NOVA 2), processed foods (NOVA 3) and ultra-processed foods (NOVA 4) (Monteiro et al., 2019). Foods classified as NOVA 4, which are foods that undergo multiple physical, biological, and/or chemical processes, have been suggested to have detrimental health effects due to their poorer nutritional qualities on average (e.g. they are often energy dense and/or rich in saturated and trans-fatty acids) or the presence of a wide range of additives and contaminants formed during processing (Lane et al., 2021; Pagliai et al., 2021; Srour et al., 2022). Epidemiological studies investigating the association between the consumption of foods classified by the NOVA scale and the risk of breast cancer are sparse. Breast cancer risk was inversely associated with NOVA 1 (Fiolet et al., 2018; Jacobs et al., 2022), positively associated with NOVA 3 (Kliemann et al., 2023), and either not associated (Jacobs et al., 2022; Romaguera et al., 2021) or positively (Chang et al., 2023; Fiolet et al., 2018; Romieu et al., 2022) associated with NOVA 4. A recent analysis in the European Prospective Investigation into Cancer and Nutrition (EPIC), performed by our group, suggested that higher consumption of NOVA 1 was associated with lower breast cancer risk while higher consumption of NOVA 3 was associated with higher breast cancer risk (Kliemann et al., 2023). In that study, we did not stratify our analyses by breast cancer subtype although such analyses could shed light on the potential mechanisms underlying the associations observed between different degrees of food processing and breast cancer, since the etiologies behind breast cancer subtypes are different. Furthermore, few previous studies were able to stratify their analyses by breast cancer subtypes or by alcohol intake and body mass index (BMI). Therefore, the aim of this study was to investigate the associations between diet according to the degree of food processing and breast cancer risk, overall and by breast cancer subtype, menopausal status, alcohol intake and BMI, within the EPIC cohort.

METHODS

EPIC Cohort

Between 1992 and 2000, a total of 521,323 EPIC participants were recruited from 23 centers across 10 European countries (Riboli et al., 2002). At recruitment, socio-demographic, dietary, lifestyle, anthropometric and medical data were collected for all participants by administration of validated country-specific questionnaires. Ethical approval for the study was obtained from the relevant ethical review boards of the participating centers of EPIC as well as from the ethics committee of the International Agency for Research on Cancer (IARC).

Study population and follow-up

The selection of the study population is shown in Supplementary Figure S1. We used data from all participating countries apart from Greece due to a lack of data access. Participants were further excluded if they (i) had any cancer diagnosis before recruitment, (ii) had no follow-up, (iii) had no lifestyle or dietary information, (iv) had an energy intake-to-requirement ratio within the extreme ranking (top and bottom 1%, which are implausible dietary exposure values), or (v) were men. Women were followed from study inclusion until the date of their latest known contact, cancer diagnosis, death, emigration, or the end of the follow-up period (between 2008 and 2014 depending on the center), whichever occurred first. The analytical sample included 318,686 women who were free of cancer at recruitment.

Identification of incident breast cancer cases

In Italy, Spain, the United Kingdom, the Netherlands, Denmark, Norway, and Sweden, population-based cancer registries were used to identify breast cancer cases. In France and Germany, a combination of methods was used, including health insurance records, contacts with cancer and pathology registries, and active follow-up of participants and their next of kin. Almost all centers (except Malmö, Granada, and Murcia) had information on tumor characteristics, including invasiveness status (in situ/invasive/unknown), estrogen receptor (ER) status (ER-positive/ER-negative/unknown), progesterone receptor (PR) status (PR-positive/PR-negative/unknown), and human epidermal growth factor receptor 2 (HER2) status (HER2-positive/HER2-negative/unknown). The diagnosis of breast cancer cases was based on the 2nd or the 3rd revision (depending on the year of diagnosis) of the International Classification of Diseases for Oncology (ICD-O-2 or ICD-O-3) (International Statistical Classification of Diseases and Related Health Problems 10th Revision, n.d.). In the present work, the first diagnosis of breast cancer was identified as primary incident breast tumors. Vital status was collected from regional or national mortality registries.

Dietary data and NOVA classification

Country-specific dietary questionnaires were used in EPIC and validated at the center level (Huybrechts et al., 2022). Semi-quantitative food frequency questionnaires, extensive quantitative dietary questionnaires, and combined methods (i.e. a 7-day record on hot meals was combined with quantitative food-frequency questionnaires in Malmö, Sweden) were used to collect dietary data at baseline. These were center specific to account for local dietary habits and were either self-administered or administered in-person by trained interviewers. These data were then harmonized to obtain a standardized food list with comparable detail across countries. The dietary questionnaires and their mode of administration were described in detail in previous publications (Riboli et al., 2002, Huybrechts I et al., 2022). Then, more than 11,000 foods/ingredients/beverages were categorized into one of the four NOVA groups based on their degrees of food processing. The different NOVA groups are defined in the Additional File (Text S1). The classification of EPIC foods into NOVA groups has been described in depth elsewhere (Huybrechts et al., 2022). To account for potential changes in industrialization over time, lower, middle, and upper bound scenarios were created. The "middle bound" scenario, deemed most likely in the past 25 years, was used for the primary analysis. In the “lower bound” scenario, foods with potential for less processing were assigned to a lower processed NOVA group, while in the “upper bound” scenario, foods with potential for more processing were assigned to a higher processed.

For each study participant, we calculated the dietary intake from each NOVA group as expressed by (1) the total absolute intake in grams/day (g/day) and (2) the total absolute intake in kcal/day (kcal/d). We also calculated the relative contribution of each NOVA food group to the total daily dietary intake in grams (%g/day) and kcal (%kcal/day). The g/d unit was considered the primary exposure because it better captures industrial foods with zero calorie content (e.g., artificially sweetened drinks) and food processing factors (e.g., neoformed contaminants or food additives).

Covariates at recruitment

Information on lifestyle, reproductive/hormonal factors, and medical history was gathered using baseline questionnaires. All EPIC centers collected information on educational level, age at menarche, age at first full term pregnancy and parity, breastfeeding, and use of oral contraceptives and menopausal hormone therapy (MHT). Menopausal status was determined by combining different baseline information. Women who reported to have menstrual cycles, had at least nine menstrual periods over the previous 12 months, or were younger than 42 years were considered as premenopausal women. Women who reported fewer than four menses in the past year, a bilateral ovariectomy, or were older than 55 years were considered as postmenopausal women. Otherwise, women were considered perimenopausal.

Body weight and height were either measured by a health care professional or self-reported in each center. Weight and height were used to calculate BMI defined as weight in kilograms divided by height in meters squared (kg/m2). Physical activity levels were estimated using a questionnaire focused on past-year physical activity in occupational, leisure, and household domains. The Cambridge physical activity index was then created by combining occupational physical activity with time spent in physical exercise (such as cycling, swimming, and jogging) (Wareham et al., 2003). Alcohol intake in grams per day was based on the number of standard glasses of wine, beer, cider, sweet liquor, distilled spirits, or fortified wines consumed daily or weekly during the 12 months before recruitment. The Mediterranean diet score was calculated using a methodology previously described (Couto et al., 2011).

Statistical analysis

In the main analyses, the middle-bound scenario for the NOVA classification and the absolute g/d of the four NOVA food groups were used.

Baseline characteristics were examined by quartiles of each NOVA food group intake. Multivariable Cox proportional hazards regression models were performed to estimate hazard ratios (HRs) and their corresponding 95% confidence intervals (CIs) for the associations between the intake of each NOVA food group [1 standard deviation (SD) increment] and breast cancer incidence, overall and by breast cancer subtypes. Age served as the primary time scale. All four NOVA groups were simultaneously included in the Cox model.

All models were stratified by age at recruitment in 1-year categories and study center and adjusted for potential confounding factors including educational level (none, primary school, technical/professional school, higher education), physical activity (inactive, moderately inactive, moderately active, active), height in cm (continuous), age at menarche in years (≤ 13, > 13), oral contraceptive use (never, ever, unknown), pregnancies (nulliparous,1 or 2 children, >3 children), age at first full-term pregnancies (continuous), breastfeeding (never, ever, unknown), menopausal status (pre, peri, post-menopause), and menopausal hormone therapy (MHT) use (never, ever, unknown). We investigated whether adding different dietary-related factors/components (BMI, total energy intake, total fat, sodium intake, carbohydrate intake, Mediterranean diet, and alcohol intake) to the model changed the HR associated with NOVA groups. Only alcohol consumption modified the HRs and was therefore included in an additional model (Supplementary Table S1). Ever use of oral contraceptives and MHT and breastfeeding had >5% missing values, which were accommodated by using a “missing” category in the models. All other covariates had <5% missing values, which were replaced with the mode for categorical variables, or the median for continuous variable values observed among the subjects with complete data.

Heterogeneities according to the invasiveness status or hormonal receptor status were evaluated with competing risk analyses. In these analyses, cases with missing information on the studied subtype were excluded from the corresponding analysis and those who developed the competing breast cancer subtypes were censored at the time of occurrence (Lunn & McNeil, 1995). Heterogeneities were calculated as the deviations of logistic beta-coefficients observed in each of the subgroup relative to the overall beta-coefficient.

As subgroup and sensitivity analyses, we repeated the analyses (1) by using lower and upper bound scenarios for the NOVA classification, (2) by using consumption of NOVA groups measured as %g/d, kcal/d and %kcal/d instead of g/d, and (3) by removing alcoholic drinks (present in NOVA groups 3 and 4). When we analyzed the intake of NOVA food groups measured as the proportion of overall daily food intake (%g/d) or daily kcal intake (%kcal/d), Cox regression analyses were performed separately for each NOVA group. Finally, we explored whether associations between NOVA groups and breast cancer risk varied by alcohol intake, BMI categories, menopausal status and country. Effect modification was evaluated by using likelihood ratio tests to compare models with and without cross-product interaction terms. Statistical analyses were conducted using SAS software (version 9.4, Copyright © 2017, SAS Institute Inc.).

Results

During a median follow-up time of 14.9 years (13.5-16.4), 14,933 breast cancer cases were diagnosed (1,603 in situ, 13,320 invasive, and 10 of unknown invasiveness status) among the 318,686 participants. Among the invasive breast cancer cases, 9,525 had information on ER status (7,789 ER-positive and 1,736 ER-negative), 7,994 on PR status (5,268 PR-positive and 2,726 PR-negative), and 4,577 on HER2 status (901 HER2-positive and 3,676 HER2-negative). There were 573 ER+PR±HER2+, 3023 ER+PR±HER2-, 264 ER-PR-HER2-, and 419 ER-PR-HER2+ breast cancer cases.

Using the middle-bound scenario expressed as g/d, consumption of food classified as NOVA 1 contributed 74% of the total diet (Table 1). The main foods contributing to this group were coffee/tea (31%), water (19%), fruits (12%) and milk/plain yogurt (12%) (Table 2). The contribution of NOVA 2 to the total diet was 1%, with plant oils being the highest contributor to the group (37%) followed by table sugar (29%), and animal fats (28%). The contribution of processed foods (NOVA 3) to the total diet was 11%, with an important contribution of beer/wine (35%) and processed bread (26%). Overall, ultra-processed foods (NOVA 4) contributed 13% to the total diet with dairy desserts and drinks among the top group contributors (14%), followed by soft drinks (13%), ultra-processed breads (12%) and sweetened beverages (11%). The relative intake of food classified as NOVA 1 was highest in France (80%) and Denmark (79%). The relative intake of food classified as of NOVA 2 was highest in Italy (3%) and Spain (2%). NOVA 3 foods were mostly highly consumed in Italy (23%), Spain (14%) and Germany (14%), while NOVA 4 foods were highly consumed in Norway (23%) and the United Kingdom (19%).

Table 1 NOVA group intake and relative and absolute contributions to total diet overall and by country.
Table 2 Absolute intakes and relative contributions of food to total diet and to each NOVA group

The main baseline characteristics of participants by quartiles of intake of NOVA 1, NOVA 2, NOVA 3, and NOVA 4 are presented in supplementary Tables S2, S3, S4 and S5, respectively.

Table 3 shows the associations between the middle-bound scenario of each NOVA group intake (in g/d) and breast cancer, overall and by breast cancer subtypes. Overall, intake of NOVA 1 [HRper 1 SD=0.99 (95% CI 0.97 – 1.01)], NOVA 2 [HRper 1 SD=1.01 (0.98 – 1.03)], and NOVA 4 [HRper 1 SD=1.01 (0.99 – 1.03)] were not associated with breast cancer risk. However, intake of processed foods (NOVA 3) was associated with a higher risk of breast cancer [HRper 1 SD=1.05 (1.03 – 1.07)]. Estimates did not differ by invasiveness or hormone receptor status (Table 3, Phomogeneity≥ 0.11). When the model was further adjusted for alcohol intake (Table 4), the positive association between NOVA 3 and breast cancer risk was attenuated and no longer statistically significant [HRper 1 SD=1.01 (0.98 – 1.03)]. Furthermore, when alcoholic drinks were excluded from NOVA 3 the association with breast cancer risk was also null [HRper 1 SD=0.99 (0.97 – 1.01), Table 5]. Associations were similar when models were stratified by alcohol intake, BMI at recruitment (Pinteraction ≥ 0.17, Table 6) or menopausal status (Table S6).

Table 3 Associations between of NOVA groups (in g/d) and breast cancer risk, overall and by breast cancer subtypes
Table 4 Associations between NOVA groups (in g/d) and breast cancer risk with further adjustment for alcohol consumption
Table 5 Associations between NOVA 3 intake after excluding alcoholic drinks (in g/d) and breast cancer risk
Table 6 Associations between NOVA intake (in g /d) and breast cancer risk, stratified by alcohol intake and BMI at recruitment

In secondary analyses using %g/d, kcal/d or %kcal/d as the exposure, the results were consistent with those obtained in the main analyses (Supplementary Table S7). However, when using %g/d as an exposure variable, a higher intake of NOVA 1 was associated with a slightly lower risk of breast cancer [HRper 1 SD =0.96 (0.94 - 0.98)]. Nevertheless, this association was no longer statistically significant when the model was further adjusted for alcohol intake [HRper 1 SD =0.98 (0.94-1.00), Supplementary Table S7]. In addition, the results were similar when we used lower and upper bound scenarios (data not shown). Finally, no heterogeneity was reported by country (Phomogeneity ≥ 0.09, Supplementary Table S8).

Discussion

In this large-scale prospective analysis, we found a positive association between the consumption of processed foods and breast cancer, which was likely driven by alcohol – an already established risk factor for breast cancer. The association between the degree of food processing and breast cancer risk did not differ by breast cancer subtype, menopausal status, alcohol intake or BMI.

In this study, no associations were found between the consumption of food included in the NOVA 1 group and breast cancer risk, overall or by breast cancer subtypes when the absolute values of intake were evaluated (g/day or kcal/day). Although a slight inverse association was reported when the %g/day values were used, this association disappeared when models were further adjusted for alcohol intake. Furthermore, because we only observed an inverse association when the variable was expressed as %g/d, these results might be because individuals who consumed more food from NOVA group 1 also consumed less food from NOVA group 3. Only one cohort study (Fiolet et al., 2018) and one case-control study (Jacobs et al., 2022) investigated associations between NOVA 1 and breast cancer risk and reported an inverse association (using %g/d and %kcal/d, respectively). Although NOVA 1 foods have low energy density and are rich in phytochemicals (carotenoids, flavonoids, dietary fiber), vitamins and minerals, known to be anticancerogenic (Bakker et al., 2016), our study does not support the hypothesis of a lower breast cancer risk with higher intake of NOVA 1 foods. In addition, we found no evidence of an association between NOVA 2 and breast cancer risk, as also reported in a South African case-control study (Jacobs et al., 2022).

We observed a positive association between NOVA 3 intake and breast cancer risk. To our knowledge, no previous population study has reported a positive association between NOVA 3 and breast cancer risk (Fiolet et al., 2018; Jacobs et al., 2022). Interestingly, in our study population, the positive association disappeared when models were adjusted for alcohol consumption or when alcoholic drinks were excluded from the NOVA groups. Indeed, alcohol is an established risk factor for breast cancer and, as such, could drive the positive association between processed foods and breast cancer risk. Of note, in this study we observed that, on average, beer and wine made up 35% of NOVA 3 g/day intake.

Furthermore, we found no evidence of an association between the consumption of NOVA 4 and breast cancer risk. Other studies have also reported no association between NOVA 4 intake and breast cancer risk (Jacobs et al., 2022; Romaguera et al., 2021). However, our results differ from those from the NutriNet-Santé French cohort and two case-control studies, which reported a positive association between the consumption of NOVA 4 and breast cancer risk (Fiolet et al., 2018; Queiroz et al., 2018; Romieu et al., 2022). It has been suggested that NOVA 4 foods may increase breast cancer risk through several factors such as their high energy density due to added sugars and fats, the presence of a variety of additives, preservatives and processing contaminants (e.g. acrylamide, trans-fatty acids, endocrine disrupters, etc.) or lack of fiber, proteins, and other components that are associated with fullness and satisfaction, leading individuals to eat more in an attempt to feel satisfied/saturated (Friedman, 2015; Luiten et al., 2016; Moubarac et al., 2013; Pouzou et al., 2018). In addition, we might have expected to observe a positive association between NOVA 4 and breast cancer risk due to alcoholic distilled drinks, however, in this study population alcoholic distilled drinks made up 2.2% of NOVA 4 g/day intake. The lack of association between NOVA 4 and breast cancer risk in the current study might be explained by the fact that the consumption of NOVA 4 in EPIC was quite low as this was based on dietary intakes at recruitment (during the nineties); since then, the consumption of NOVA 4 has replaced the consumption of other NOVA groups (e.g. recipes that were made at home in the 1990s may currently be industrially processed), which may bias the associations with breast cancer risk. Indeed, NOVA 4 has been recently suggested to represent up to 60% of total daily energy intake in some countries of the European area such as the UK (Rauber et al., 2018) while in the current EPIC study population, NOVA 4 contributed to 31% of total energy intake (data not shown). However, when we classified food products based on the modern/current food environment (upper bound scenario), the association between NOVA 4 and breast cancer risk was still null.

Finally, in our study, the associations between the degree of food processing and breast cancer risk did not differ by invasiveness and hormone receptor status. Our results are consistent with a previous study that investigated the association between NOVA 4 and breast cancer risk by molecular status suggesting no differences in risk estimates [ER+ or PR+: OR10% increase=1.04 (0.96 – 1.13); HER2+: OR10% increase=0.96 (0.84 – 1.10); ER-PR-HER2-: OR10% increase=0.93 (0.75 – 1.15)] (Romaguera et al., 2021). However, another study reported a positive association between NOVA 4 and ER+ breast cancer [ORT3vsT1=2.44 (1.01 - 5.90)] and no clear association with ER- breast cancer [ORT3vsT1=1.87 (0.43 - 8.13)] (Romieu et al., 2022). No other studies reported results for other NOVA groups by invasiveness and molecular status of breast cancer, which makes any comparison with our results challenging.

Our study has several strengths including the prospective design, the multicenter aspect, the long-term follow-up, and the availability of a comprehensive assessment of participant characteristics, as well as the large number of incident breast cancer cases. We had self-reported data on lifestyle, reproductive, and medical factors and were therefore able to consider a wide range of potential confounders. We also stratified the analyses according to menopausal status, alcohol intake, obesity breast cancer subtype. However, it should be noted that the statistical power for some of these stratified analyses was rather low (e.g. for some of the breast cancer subtypes), therefore, these results should be interpreted with caution. The major limitation of the study is that dietary data were only collected at baseline (in the 1990s) while the food environment changed in the intervening years, exposing the EPIC participants to potentially different degrees of food processing over the course of their follow-up. However, three different scenarios were created considering that the food environment may have changed over time compared to the baseline. The lower and upper bound scenarios were used in sensitivity analyses to explore the potential impact of further industrialization of food products and of changes in consumer habits to convenience foods over time, and results were virtually unchanged. The fact that dietary data were collected only once may cause random measurement error and may fail to reflect long-term habits; any such bias would likely lead to an underestimation of true associations (i.e. regression dilution bias) (Clarke et al., 1999; Hutcheon et al., 2010). However, it is noteworthy that recent analyses comparing dietary follow-up data in some of the EPIC countries demonstrate only minor changes in dietary intakes among the EPIC participants, potentially due to the relatively older age of the participants included in the cohort (unpublished data). Finally, the dietary questionnaires used in EPIC were not designed to identify different food processing categories. Therefore, several assumptions had to be made when insufficient information about the processing of the food item was available, potentially contributing to measurement error. However, byproducts of processing (e.g. trans fatty acids or syringol metabolites) have been positively associated with NOVA group 4 in EPIC which is a sign of a good measurement of ultra-processed foods in EPIC (Huybrechts et al., 2022).

Conclusion

This large-scale prospective analysis among European women suggests that the positive association between processed food intake and breast cancer risk was likely driven by alcoholic beverage consumption. Other degrees of food processing were not associated with breast cancer risk.