Introduction

In recent years, there has been growing interest in the potential role of industrial food processing in disease aetiology. The NOVA (not an abbreviation) classification system developed by Monteiro et al. [1, 2] categorises foods into four groups according to their degree and purpose of processing: (1) unprocessed or minimally processed foods, (2) processed culinary ingredients, (3) processed foods and (4) ultra-processed foods (UPFs). UPFs are industrial formulations manufactured in a complex way using ingredients not usually found in kitchens (e.g. maltodextrin, hydrogenated oils, modified starches) and cosmetic additives (e.g. emulsifiers, flavourings, colourants, artificial sweeteners) [2]. They are typically cheap, highly palatable, and widely available ready-to-eat products which are often consumed in large quantities, replacing more nutritious, unprocessed/minimally processed foods in the diet [3, 4]. Examples of UPFs include soft drinks, sweet or savoury packaged snacks, confectionery, packaged breads and buns, reconstituted meat products and pre-prepared frozen or shelf-stable dishes.

Several studies have shown that the consumption of UPFs may be associated with an increased risk of cancer [5,6,7,8,9]. In the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort, Kliemann et al. [9] found positive associations between higher UPF consumption and the risk of head and neck cancer (HNC; hazard ratio [HR] = 1.14 per one standard deviation [SD] higher UPF intake, 95% confidence interval [CI] 1.06–1.24) and oesophageal adenocarcinoma (OAC; HR = 1.21 per 1-SD higher UPF intake, 95% CI 1.05–1.39). They also found an inverse association between UPF consumption and oesophageal squamous cell carcinoma risk (HR = 0.79 per 1-SD higher UPF intake, 95% CI 0.64–0.96), although this did not withstand additional adjustments for alcohol intake, body mass index (BMI) and several dietary factors (HR = 0.90 per 1-SD higher UPF intake, 95% CI 0.72–1.11).

UPF consumption has also been positively associated with higher adiposity (i.e. BMI, fat mass, waist circumference and waist-to-hip ratio [WHR]) [10,11,12,13,14]. Since body fatness (measured by BMI, waist circumference and WHR) is an established modifiable risk factor for OAC [15,16,17,18,19,20], and visceral adiposity (i.e. waist circumference and WHR) has been positively associated with HNC risk [21,22,23], it is plausible that the positive associations between UPF consumption and these upper-aerodigestive tract cancers are mediated via adiposity. Although BMI has been inversely associated with HNC risk, this seems to be a consequence of residual confounding related to smoking (an established risk factor for HNC), as smokers tend to have lower BMIs than non-smokers [23]. In a meta-analysis of 20 prospective cohort studies, BMI was positively associated with HNC risk when the analysis was restricted to never smokers [22]. Although adiposity may be one of the mechanisms underlying the association between UPF consumption and upper-aerodigestive tract cancer, this has not been investigated using mediation analysis.

The aim of this study was to reassess and further investigate the associations between the consumption of UPFs and the risk of HNC and OAC in the EPIC study. As a complement to the study by Kliemann et al. [9] (described above), this study explored the associations between UPF consumption and the risk of HNC and its subtypes (i.e. oral cavity, oropharynx, hypopharynx, larynx and unspecified/overlapping cancers) as defined by the International Head and Neck Cancer Epidemiology (INHANCE) consortium. It also investigated effect modification by smoking status, alcohol intake, sex, physical activity, and education level in the associations between the consumption of UPFs and the risk of upper-aerodigestive tract cancers. Additionally, this study assessed the possibility of residual confounding using accidental death as a negative control outcome. Lastly, it examined the role of BMI and WHR in the associations between UPF consumption and the risk of HNC and OAC by means of a mediation analysis.

Methods

The EPIC cohort

The EPIC study has been fully described elsewhere [24,25,26]. Briefly, EPIC is one of the largest prospective cohort studies in Europe. It recruited 521,323 participants between 1992 and 2000. Participants were enrolled in 23 centres across 10 European countries, namely Denmark, France, Germany, Greece, Italy, the Netherlands, Norway, Spain, Sweden and the United Kingdom. Most were 35–69 years old at recruitment [24, 25]. They were either volunteers from the general population, blood donors, employees of local companies, teachers/school employees or individuals enrolled in local ongoing studies. All participants provided written informed consent before completing the dietary and lifestyle questionnaires. Anthropometric and blood pressure data were also obtained at baseline. EPIC was approved by the International Agency for Research on Cancer (IARC) Ethics Committee and the local ethical review boards of all EPIC centres.

Study sample

Participants who withdrew consent from the study were not included in this research. We excluded participants diagnosed with cancer before enrolment (n = 25,184) and those with a length of follow-up equal to zero (n = 4148). We also excluded participants who did not complete the dietary or lifestyle questionnaires (n = 6259). We additionally excluded participants with extreme energy intake versus energy requirement ratios (top and bottom 1%) (n = 9573) and participants recruited in Greece due to administrative issues (n = 26,048). After exclusions, 450,111 participants were included in the analyses (Supplementary Fig. 1).

Dietary data and food processing variables

Semi-quantitative food frequency questionnaires (FFQs), extensive quantitative dietary questionnaires, and combined methods (i.e. semi-quantitative FFQs combined with 7-day records in the UK, and a non-quantitative FFQ combined with a 14-day record on hot meals in Malmö, Sweden) were used to obtain dietary data at baseline [25]. These were centre specific to account for local dietary habits and were either self-administered or administered in-person by trained interviewers. Furthermore, a standardised 24-h recall was used to obtain supplementary dietary data for a subsample of EPIC participants to calibrate baseline dietary measurements across EPIC centres [25, 27,28,29,30]. The dietary questionnaires and their mode of administration were described in detail in previous publications [25, 30].

The NOVA classification was used to categorise foods into four groups according to their extent and purpose of industrial processing [31]. Unprocessed/minimally processed foods (NOVA 1) are natural foods that may have undergone minimal processing for their preservation, storage, safety, or edibility. Processed culinary ingredients (NOVA 2) correspond to substances derived from unprocessed/minimally processed foods (e.g. oil, butter) or nature (e.g. salt) that are normally consumed in combination with unprocessed/minimally processed foods. Both processed foods (NOVA 3) and UPFs (NOVA 4) are industrial products. The former typically contain two or three common ingredients (i.e. a combination of unprocessed/minimally processed foods and processed culinary ingredients), while the latter contain many ingredients (most of which are rarely used in kitchens) and additives that make the final product tastier and more attractive to consumers.

Food preparations made (at home or elsewhere) using traditional methods were decomposed using standardised recipes. Individual food items were then classified according to their degree of processing. Food items were combined into broader food categories for simplicity. Of a total of 67 food categories in the dietary questionnaires, 19 were classified as unprocessed/minimally processed foods, 5 as culinary ingredients, 13 as processed foods and 30 as UPFs (see Supplementary Table 1 for details).

Here, we used the relative intake of each NOVA group in grams per day (%g/d). We also used the absolute intake in grams per day (g/d) and the absolute and relative intake in kilocalories per day in sensitivity analyses (kcal/d and %kcal/d, respectively).

Ascertainment of cancer cases

Incident cancer cases were identified through population-based cancer registries in Denmark, Italy (except Naples), the Netherlands, Norway, Spain, Sweden and the United Kingdom. Participants in other centres (France, Germany, Greece and Naples) were actively followed up using health insurance records, pathology registries and direct contact with participants or their next of kin.

HNC and OAC were defined using the 2nd and 3rd Revision of the International Classification of Diseases for Oncology (ICDO-2 and ICDO-3). According to the INHANCE consortium [32], HNC cases include malignant neoplasms of the oral cavity (topography codes C00.3–C00.6, C00.8–C00.9, C02.0–C02.3, C03.0–C03.1, C03.9, C04.0–C04.1, C04.8–C04.9, C05.0, C06.0–C06.2, C06.8–C06.9), oropharynx (C01.9, C02.4, C05.1–C05.2, C09.0–C09.1, C09.8–C09.9, C10.0–C10.4, C10.8–C10.9), hypopharynx (C12.9–C13.2, C13.8–C13.9), larynx (C32.0–C32.3, C32.8–C32.9), and oral cavity and pharynx unspecified/overlapping regions (C02.8–C02.9, C05.8–C05.9, C14.0, C14.2, C14.8). We did not exclude any histological subtypes of HNC. Oesophageal cancer cases correspond to topography codes C15.0–C15.5 and C15.8–C15.9. Among these, OAC cases were identified with codes 8140/3, 8144/3, 8480/3, 8481/3 and 8490/3. Other oesophageal cancer subtypes (e.g. squamous cell carcinoma and small cell carcinoma) were not investigated as outcomes in this study.

Covariates

Data on age at recruitment, sub-centre (22 centres in total, split into 27 sub-centres as follows: Northeast of France, Northwest of France, South of France, South coast of France, Florence, Varese, Ragusa, Turin, Naples, Asturias, Granada, Murcia, Navarra, San Sebastian, Cambridge, Oxford health-conscious population, Oxford general population, Bilthoven, Utrecht, Heidelberg, Potsdam, Malmö, Umeå, Aarhus, Copenhagen, Southeast of Norway, Northwest of Norway), sex (male/female), education level (none, primary, technical/professional, secondary, further education), physical activity based on the Cambridge Physical Activity Index [33] (inactive, moderately inactive, moderately active, active), measured/self-reported height (continuous in cm) and smoking status (never, former, current, unknown) were obtained at baseline through anthropometric measurements and lifestyle questionnaires. Additionally, data on alcohol intake (continuous in g/d) were acquired using dietary questionnaires.

Potential mediators

BMI and WHR were investigated as potential mediators in mediation analyses. BMI (continuous in kg/m2) was calculated from measured height and weight (measured using comparable, standardised methods) [34]. WHR (continuous) was estimated from measured waist and hip circumferences. Waist circumference was measured midway between the iliac crest and the lower ribs or at the narrowest torso circumference. Hip circumference was measured over the buttocks or at the widest point. EPIC-Oxford health-conscious population self-reported data were also used to estimate BMI and WHR, after the application of measurement error corrections [34, 35].

Statistical analysis

Descriptive characteristics

The participants’ baseline characteristics were divided into sex-specific quartiles of relative UPF consumption (in %g/d). Mean and SD estimates were obtained for continuous variables, while frequencies and percentages were obtained for binary/categorical variables. Furthermore, we made a histogram to graphically represent the distribution of UPF consumption (in %g/d) in the EPIC cohort.

Data imputation

We used single-value imputation to deal with missing data in the covariates used to control for potential confounding (i.e. height, physical activity, education level and smoking status). When measured/self-reported height values were not available, missing values were imputed with mean centre-, age- and sex-specific height values [34]. Mode imputation was used for baseline binary and categorical covariates missing less than 5% of their values (i.e. education level: “primary school completed”, physical activity: “moderately inactive”, smoking status: “never”). Multiple imputation was used in sensitivity analyses (details in the “sensitivity analyses” subsection below).

Main association analysis

Cox proportional hazards models with age as the underlying timescale were used to investigate the association between the intake of UPFs and the risk of HNC and OAC. We estimated HRs and 95% CIs per 10% g/d higher consumption of UPFs. Time of entry was defined as age at recruitment, while time of exit was defined as age at first cancer diagnosis (excluding non-melanoma skin cancer) or age at last follow-up (i.e. death, emigration, loss to follow-up or end of follow-up [i.e. between June 2008 and December 2013, depending on the centre]), whichever came first. Model 1 was stratified by age at recruitment in 1-year categories, sex and sub-centre. Model 2 was additionally adjusted for education, physical activity, height and smoking status. Model 3 was additionally adjusted for alcohol intake in g/d to reflect the association between the consumption of UPFs and cancer, regardless of alcohol intake (a well-known cancer risk factor [36,37,38,39,40,41,42,43,44] that forms part of some processed foods and UPFs).

We graphically assessed the proportional hazards assumption using log–log survival plots. Additionally, we tested proportionality using Schoenfeld residuals. We also used correlation matrices and variance inflation factors to assess the presence of multicollinearity. Non-linearity was assessed using likelihood ratio tests comparing UPF consumption (in %g/d) modelled with and without natural cubic splines.

We undertook additional analyses to investigate the associations between the consumption of UPFs and the risk of HNC subtypes (i.e. oral cavity, oropharynx, hypopharynx, larynx, and oral cavity and pharynx unspecified/overlapping cancers). Heterogeneity tests were used to assess differences between HNC subtype estimates.

Furthermore, we stratified Model 3 (for every exposure–outcome combination) by alcohol intake (as defined by Wozniak et al. [45], i.e. no/light alcohol intake [0.1–6 g/d (men); 0.1–3 g/d (women)], moderate alcohol intake [6.1–24 g/d (men); 3.1–24 g/d (women)], heavy alcohol intake [> 24 g/d]), sex (i.e. male, female), physical activity (i.e. inactive, moderately inactive, moderately active, active), smoking status (i.e. never smoker, former smoker, current smoker) and education level (i.e. primary school or less, secondary or technical/professional school, higher education) and performed likelihood ratio tests to explore interactions. Models were not adjusted for the stratification variable.

Mediation analysis

Under the strong assumption that there is no residual confounding or measurement error in our study, we conducted a mediation analysis using the counterfactual framework [46] to further explore the mediating role of BMI and WHR in the associations between UPF consumption and the risk of HNC and OAC (Fig. 1).

Fig. 1
figure 1

Mediation analysis diagram of the counterfactual two-way decomposition of the total effect of UPF consumption on the risk of head and neck cancer and oesophageal adenocarcinoma. All mediation models accounted for potential exposure–mediator interactions and were adjusted for age at recruitment in 1-year categories, sex, sub-centre, education, physical activity, height, smoking status and alcohol intake. The total effect (TE) corresponds to the sum of the pure natural direct effect (PNDE) and the total natural indirect effect (TNIE). Point estimates were obtained by direct counterfactual imputation estimation and confidence intervals were obtained using 1000 bootstrap repetitions. Abbreviations: BMI, body mass index; WHR, waist-to-hip ratio; UPF, ultra-processed food; HNC, head and neck cancer, OAC, oesophageal adenocarcinoma

In exploratory analyses, we ran linear regressions to study the associations between UPF consumption (i.e. the exposure) and both WHR and BMI (i.e. the potential mediators). We also ran exposure-adjusted Cox regressions to analyse the associations between the potential mediators and the risk of both HNC and OAC (i.e. the outcomes). Where there was evidence of an association between the potential mediator and both the exposure and the outcome, we used the “cmest” function in the “CMAverse” R package [47] to decompose the Total Effect (TE) of UPF consumption on the corresponding upper-aerodigestive tract cancer into a Pure Natural Direct Effect (PNDE) and a Total Natural Indirect Effect (TNIE) (on the ratio scale TE = PNDE × TNIE). The proportion mediated was also calculated (i.e. 100 × (PNDE × (TNIE – 1))/(TE − 1)) for each exposure–mediator–outcome combination [48]. All mediation models accounted for potential exposure–mediator interactions and were adjusted for age at recruitment in 1-year categories, sex, sub-centre, education, physical activity, height, smoking status and alcohol intake. Point estimates were obtained by direct counterfactual imputation estimation and 95% CIs were obtained using 1000 bootstrap repetitions. The results were scaled to reflect a 10% g/d higher consumption of UPFs.

Sensitivity analyses

As a sensitivity analysis, we explored adjusting our Cox models for total water intake (i.e. water content from foods and drinks, in addition to drinking water and water used as an ingredient in preparations). This was to see whether differences in water content across NOVA groups may influence the associations between the relative intake of UPFs and the risk of HNC and OAC. Similarly, we explored adjustments for total energy intake.

We also reran our Cox models after excluding participants who were censored during the first two years of follow-up to avoid reverse causation due to undiagnosed cancer at recruitment.

Additionally, we repeated the analyses using the absolute intake of UPFs in grams per day (g/d) and the absolute and relative intake in kilocalories per day (kcal/d and %kcal/d, respectively) as the exposure.

Moreover, we conducted a complete case analysis excluding participants with missing data for at least one lifestyle covariate (i.e. smoking status, physical activity and education level). In addition, we used the ‘mice’ R package to perform multivariate imputation by chained equations (MICE) [49], whereby smoking status, physical activity and education level were imputed five times by predictive mean matching. We fit our models using the MICE imputed data sets and then pooled the results according to Rubin’s rules [50] to obtain average HR estimates and standard errors for each model. For the complete case analysis and the MICE analysis, we still used centre-, age- and sex-specific imputed height as a covariate, as this is standard practice when dealing with anthropometric variables as confounders in EPIC [34].

Finally, we performed a negative control outcome analysis (i.e. where the outcome is not plausibly linked to the exposure of interest) to help identify any residual confounding that could be biasing our results [51]. We considered accidental deaths as the outcome (instead of upper-aerodigestive tract cancers) since the consumption of foods by their degree of processing is unlikely associated with the risk of being involved in a deadly accident (e.g. falls, transport accidents, accidental drowning). Any evidence of an association between UPF consumption and accidental deaths would suggest that our main results may be biased by the same factors that biased the negative control outcome results. Accidental deaths were defined as deaths due to events linked to codes V01–X59 in the 10th Revision of the International Classification of Diseases (ICD-10). For the negative control analysis, time of exit was defined as age at the time of death, emigration, loss to follow-up or end of follow-up, whichever came first. Participants were not censored at the time of cancer diagnosis, whereas they were in all other analyses in this study. The accidental death models accounted for the same covariates as the main analysis. BMI and type 2 diabetes mellitus would not normally be adjusted for in this analysis, as they are potential mediators and adjusting for them could induce collider bias (i.e. open backdoor paths from UPF consumption to accidental deaths through unobserved factors) [52]. Here, we did this in an explorative manner, assuming the absence of unobserved confounders of BMI, type 2 diabetes mellitus and accidental deaths.

Statistical software

All statistical analyses and visualisations were performed using R version 4.2.3. We used version 3.2.10 of the “survival” R package for the Cox regressions and version 0.1.0 of the “CMAverse” R package [47] for the mediation analysis. We also used version 0.1.0 of the “ggforestplot” R package to create forest plots. To create tables, we used version 1.3.0 of the “tidyverse” R package and version 0.7.0 of the “flextable” R package. P-values for heterogeneity between HNC subtype estimates were obtained using version 4.18–0 of the “meta” R package. Non-linearity was assessed using version 4.2.3 of the “splines” R package. MICE was performed using version 3.16 of the “mice” R package. Two-sided p-values < 0.05 were considered statistically significant.

Results

Descriptive characteristics

In total, we included 450,111 participants of which 70.8% were female. The mean age at recruitment was 51.1 years (SD 9.8, range 17.8–98.5 years). The mean consumption of UPFs in the cohort was 13.7% g/d (364 g/d), ranging from a mean intake of 8% g/d (156.9 g/d) in Spain to 18.6% g/d (520.5 g/d) in the United Kingdom. A histogram of the proportion of UPFs in the diet (in %g/d) is available in Supplementary Fig. 2. On average, males consumed a higher proportion of UPFs than females (14.7% vs 13.3%, p < 0.001). Participants with technical education were among the highest consumers of UPFs (Table 1). UPFs contributed greatly to the diet of younger, taller, and more physically active participants. Participants who did not provide data on their physical activity and education also tended to consume more UPFs. In terms of diet quality, UPFs were highly consumed by participants who consumed less alcohol and more calories, carbohydrates, fat and sodium.

Table 1 Baseline characteristics of study participants by sex-specific quartiles of relative ultra-processed food consumption (%g/d)

The UPF group was mainly composed of fizzy drinks (14.1% of absolute UPF consumption in g/d), non-carbonated sweetened beverages (12.1%), ultra-processed dairy products (12.0%), ultra-processed breads (12.0%) and ultra-processed meats (9.9%) (Supplementary Table 1). Beer and wine (46.2%) were the main contributors to the processed foods group, followed by processed breads (22.7%) and cheese (10.2%). The unprocessed/minimally processed foods group primarily comprised tea and coffee (32.2%), water (17.3%), milk and plain yoghurt (12.4%), fruit (11.2%) and vegetables (8.9%).

Associations between the consumption of ultra-processed foods and upper-aerodigestive tract cancers

During a mean follow-up of 14.13 ± 3.98 years (6,358,569 person-years, median follow-up = 14.95 years; range 1 day–22.79 years), 1125 incident cases of HNC and OAC were documented. Of these, 910 had HNC (i.e. 234 oral cavity, 235 oropharynx, 66 hypopharynx, 310 larynx and 65 oral cavity and pharynx unspecified/overlapping regions) and 215 had OAC.

The proportional hazards assumption was met in all models (Supplementary Figs. 3, 4, 5, 6), and we did not find evidence of multicollinearity between covariates (Supplementary Fig. 7 and Supplementary Table 2). Furthermore, there was no evidence of non-linearity between UPF consumption and HNC (p = 0.54) (Supplementary Fig. 8). The non-linearity test for the association between UPF consumption and the risk of OAC was not informative due to the limited number of OAC cases in the dataset (results not shown).

Head and neck cancer (HNC)

A higher proportion of UPF in the diet (in %g/d) was associated with a higher risk of HNC, even after accounting for alcohol intake (HR = 1.23 per 10% g/d higher UPF intake, 95% CI 1.14–1.34) (Fig. 2 and Supplementary Table 3). We did not find evidence of heterogeneity between HNC subtypes (p-value for heterogeneity = 0.11) (Fig. 3 and Supplementary Table 4).

Fig. 2
figure 2

Associations between the consumption of ultra-processed foods (in %g/d) and the risk of head and neck cancer and oesophageal adenocarcinoma. Hazard ratios per 10% g/d higher ultra-processed food intake. Time of entry was defined as age at recruitment, while time of exit was defined as age at first cancer diagnosis (excluding non-melanoma skin cancer) or age at last follow-up (i.e. death, emigration, loss to follow-up or end of follow-up), whichever came first. Model 1 was stratified by age at recruitment in 1-year categories, sex, and sub-centre. Model 2 was additionally adjusted for education, physical activity, height, and smoking status. Model 3 was additionally adjusted for alcohol intake. N = 450,111, of which 910 and 215 had head and neck cancer and oesophageal adenocarcinoma, respectively. Abbreviations: CI, confidence interval; UPF, ultra-processed food

Fig. 3
figure 3

Associations between the consumption of ultra-processed foods (in %g/d) and head and neck cancer subtypes. Hazard ratios per 10% g/d higher ultra-processed food intake. Time of entry was defined as age at recruitment, while time of exit was defined as age at first cancer diagnosis (excluding non-melanoma skin cancer) or age at last follow-up (i.e. death, emigration, loss to follow-up or end of follow-up), whichever came first. Model 1 was stratified by age at recruitment in 1-year categories, sex, and sub-centre. Model 2 was additionally adjusted for education, physical activity, height and smoking status. Model 3 was additionally adjusted for alcohol intake. N = 450,111, of which 234, 235, 66, 310 and 65 had cancer of the oral cavity, oropharynx, hypopharynx, larynx, and oral cavity and pharynx unspecified/overlapping regions, respectively. Abbreviations: CI, confidence interval; UPF, ultra-processed food

In stratified analyses for the association between UPF consumption (in %g/d) and HNC risk, we did not find evidence of effect modification by alcohol intake (p-value for interaction = 0.46), physical activity level (p-value for interaction = 0.48), smoking status (p-value for interaction = 0.46) or education level (p-value for interaction = 0.31) (Supplementary Table 5). There was some evidence of an interaction by sex (p-value for interaction = 0.006), with a positive association between UPF consumption and HNC risk among males (HR = 1.34 per 10% g/d higher UPF intake, 95% CI 1.22–1.48, N = 131,425, events = 603) but not among females (HR = 1.03, 95% CI 0.87–1.21, N = 318,686, events = 307).

Oesophageal adenocarcinoma (OAC)

After accounting for alcohol intake, UPF consumption (in %g/d) was positively associated with OAC risk (HR = 1.24 per 10% g/d higher UPF intake, 95% CI 1.05–1.47) (Fig. 2 and Supplementary Table 3).

When we stratified the association between UPF consumption (in %g/d) and OAC risk, we did not find evidence of differing estimates across levels of alcohol intake (p-value for interaction = 0.18), physical activity (p-value for interaction = 0.94), smoking status (p-value for interaction = 0.99), sex (p-value for interaction = 0.44) or education (p-value for interaction = 0.83) (Supplementary Table 5).

Mediating role of adiposity in the associations between ultra-processed food consumption and upper-aerodigestive tract cancers

In exploratory analyses (Supplementary Table 6), we found positive associations between UPF consumption and both BMI (mean change = 0.24 kg/m2 per 10% g/d higher UPF intake, 95% CI 0.22–0.26) and WHR (mean change = 0.41 per 10% g/d higher UPF intake, 95% CI 0.38–0.43). Moreover, BMI was positively associated with OAC risk (HR = 1.08 per 1 kg/m2 higher BMI, 95% CI 1.04–1.11), but inversely associated with HNC (HR = 0.98 per 1 kg/m2 higher BMI, 95% CI 0.96–0.99). WHR was positively associated with the risk of both HNC (HR = 1.02 per 0.01 higher WHR, 95% CI 1.01–1.03) and OAC (HR = 1.06 per 0.01 higher WHR, 95% CI 1.04–1.08).

WHR as a mediator between UPF consumption and HNC risk

Only a small part of the positive association between UPF consumption (in %g/d) and HNC risk was mediated via WHR (5%, 95% CI 3–10%, p < 0.001; TNIE HR = 1.01 per 10% g/d higher UPF intake, 95% CI 1.01–1.01) (Table 2). Most of the association was not explained by WHR (PNDE HR = 1.22 per 10% g/d higher UPF intake, 95% CI 1.11–1.32). Furthermore, there was some evidence of an interaction between UPF consumption and WHR (p-value for interaction = 0.03).

Table 2 Mediation analysis for the associations between ultra-processed food consumption (in %g/d) and head and neck cancer and oesophageal adenocarcinoma, where the potential mediators (i.e. waist-to-hip ratio and body mass index) were measured at baseline

WHR as a mediator between UPF consumption and OAC risk

The TE of UPF consumption (in %g/d) on OAC risk was decomposed into a PNDE of 1.17 (95% CI 0.99–1.33) and a TNIE via WHR of 1.03 (95% CI 1.02–1.03) (Table 2). Hence, the proportion mediated by WHR was 15% (95% CI 8–72%, p = 0.03) in the association with HNC.

BMI as a mediator between UPF consumption and OAC risk

Most of the association between UPF consumption (in %g/d) and higher OAC risk was not mediated via BMI (PNDE HR = 1.18 per 10% g/d higher UPF intake, 95% CI 1.00–1.34) (Table 2). BMI mediated 13% (95% CI 6–53%, p = 0.04) of the association (TNIE HR = 1.02 per 10% g/d higher UPF intake, 95% CI 1.01–1.03).

Sensitivity analyses for the associations between ultra-processed food consumption and upper-aerodigestive tract cancers

Further adjusting for total water intake (including water in foods) or total energy intake produced similar results to those obtained in the main analyses (Supplementary Tables 7 and 8). Excluding participants censored in the first two years of follow-up (N = 442,536, since 7575 were excluded) did not substantially affect our results either (Supplementary Table 9). Likewise, the complete case analysis results (N = 419,590, of which 851 had HNC and 191 had OAC) and the results obtained using multiple imputation were similar to those obtained in our main analysis (Supplementary Tables 10 and 11).

Repeating the analyses using either the absolute intake of UPFs in grams per day (g/d) or the relative intake of UPFs in kilocalories per day (%kcal/d) as the exposure produced comparable results to those obtained in the main analysis (Supplementary Table 12 and Supplementary Fig. 9). Nevertheless, using the absolute intake of UPFs in kilocalories per day (kcal/d) as the exposure produced slightly different results to those in the main analysis, namely because UPF consumption was no longer associated with HNC risk (HR = 1.01 per 100 kcal/d higher UPF intake, 95% CI 0.99–1.03) (Supplementary Table 12 and Supplementary Fig. 9).

Lastly, we found a positive association between UPF consumption (%g/d) and accidental deaths (HR = 1.12 per 10% g/d higher UPF intake, 95% CI 1.02–1.23) in the negative control outcome analysis, after accounting for all the covariates included in the upper-aerodigestive tract cancer models (Supplementary Table 13). This association also withstood additional adjustments for BMI and type 2 diabetes mellitus (i.e. factors that may influence recovery after an accident) (results not shown).

Discussion

In this large prospective cohort, UPF consumption (in %g/d) was associated with an increased risk of HNC and OAC. We did not find evidence of heterogeneity between HNC subtype association estimates. Furthermore, the positive association between UPF intake and HNC may be stronger in males than in females. Our negative control analysis suggests that at least part of the observed associations between the consumption of UPFs and the risk of upper-aerodigestive tract cancers is likely due to the influence of residual confounding. However, this does not necessarily mean that the associations are entirely non-causal; only that any causal estimate is likely smaller than we observed. In our mediation analysis, adiposity (i.e. BMI and WHR) only mediated a small proportion of the positive associations between UPF consumption and HNC and OAC.

Apart from the study conducted by Kliemann et al. [9] (which motivated our research), only one other study has investigated the associations between UPF consumption and HNC and OAC risk. Chang et al. [8] did not find an association between UPF consumption and the risk of HNC and OAC in the UK Biobank, in contrast to the findings in EPIC. A possible explanation for the null results in the UK Biobank could be limited power (197,426 UK Biobank participants of which 342 and 186, respectively, developed HNC and OAC over a median follow-up of 10 years, versus 450,111 EPIC participants of which 910 and 250, respectively, developed HNC and OAC over a median follow-up of 15 years). Residual confounding and the fact that the FFQs used were not designed to capture the extent and purpose of food processing may also partly explain the inconsistencies between studies.

Our mediation results are in line with existing findings. First, UPFs have been associated with excess weight (i.e. obesity and BMI) and central adiposity (i.e. WHR and waist circumference) in several observational studies [10,11,12,13]. UPFs are highly palatable energy-dense foods with low nutritional quality. They are convenient, cheap, and often sold in large portions [4, 54]. This, in addition to their reduced satiety potential [55], favours the consumption of large portions and an excessive amount of calories. Some studies even suggest that the consumption of UPFs may disrupt the gut microbiota, induce inflammation, and cause endocrine changes that disturb energy balance and increase the risk of obesity [11, 56, 57]. Second, multiple studies suggest that excess weight and abdominal obesity are positively associated with OAC risk[15,16,17,18,19,20, 58], and that central adiposity may be a risk factor for HNC [21, 22]. Hence, it is plausible that BMI and WHR mediate the association between UPF consumption and OAC risk, and that WHR mediates the association between UPF consumption and HNC risk. Notwithstanding, our findings indicate that the mediated effects via BMI and WHR are small and that other mechanisms are likely involved.

A review of prospective cohort studies suggested that diet quality did not play an important role in the positive associations between UPF consumption, obesity and obesity-related outcomes (e.g. cancer), since the adjustment for several dietary factors did not substantially attenuate the associations [59]. The authors argue that ultra-processing itself may be associated with disease risk, independent of nutritional quality. This has profound implications for the food industry, as it could mean that UPF reformulation would not be sufficient to tackle the risks associated with UPF consumption. We acknowledge that this may just be another case of ‘highly consequential but misleading findings’ [60], where the observed associations are not causal but rather an artefact of residual confounding [61, 62] (the association of UPF consumption with accidental death in EPIC provides some support for this interpretation). However, if these associations truly reflect causality, the presence of carcinogenic compounds in UPFs, such as neo-formed contaminants produced during heat treatment, contaminants transferred from packaging materials and additives used to preserve and improve the organoleptic properties of food [5] may partly explain the relation between UPF intake and upper-aerodigestive tract cancer risk.

In sensitivity analyses, we did not find an association between a 100 kcal/d higher UPF consumption and HNC risk. This could be because the consumption of artificially sweetened UPFs (which may contain potentially carcinogenic compounds like aspartame and 4-methylimidazole [63]) is likely disregarded or underestimated when using kcal/d as a measure of UPF intake. We acknowledge that this is an arguable hypothesis since Chazelas et al. [64] did not find an association between artificially sweetened drinks and cancer in the NutriNet-Santé cohort. Nevertheless, a study in the same cohort found a positive association between higher artificial sweetener intake from all food sources and cancer risk, suggesting other artificially sweetened UPFs (e.g. yoghurts, breakfast cereals, gelatine desserts) may play a role in cancer incidence [65].

Some of the strengths of this study include the large sample size of the EPIC cohort and its long follow-up time. The prospective nature of EPIC and the availability of measured rather than self-reported BMI and WHR were also advantages. Moreover, EPIC’s multi-centre design increases the diversity of our study sample. Another advantage is that cancer cases were detected through registries (which provide detailed information on cancer subtypes) and active follow-up methods, both of which are unlikely to be affected by measurement error. Finally, the use of several measures of UPF intake (%g/d, g/d, %kcal/d and kcal/d) as the exposure makes our results more comparable to previous studies investigating the associations between the consumption of UPFs and cancer [5,6,7,8,9].

We acknowledge that this study has several limitations. For instance, in our mediation analysis we assumed that the associations between UPF consumption and upper-aerodigestive tract cancers were not influenced by residual confounding or measurement error. These are strong assumptions since residual confounding due to unmeasured (e.g. human papilloma-virus infection) or imprecisely measured (e.g. smoking and alcohol intake) confounders inevitably biased our estimates to some extent (since estimates changed when the models were adjusted for potential confounders for which some values were missing or likely measured with error). Indeed, the fact that our negative control outcome analysis suggested that UPF consumption may be associated with accidental deaths points to the possibility of residual confounding.

For our mediation analysis, we used data on BMI and WHR collected at baseline. Admittedly, a limitation with this approach is that the exposure data were not gathered prior to the mediator data, so we cannot be certain that the exposure temporally precedes the mediator. Follow-up data on measured BMI and WHR were only available for 5% and 27% of the participants who answered the lifestyle follow-up questionnaire (N = 349,283), respectively. Unfortunately, cancer cases among participants with complete follow-up data were insufficient for us to conduct any sensitivity analyses using follow-up data.

An additional limitation is that we assumed that BMI and WHR mediated the association between UPF consumption and OAC risk through separate pathways. Since BMI and WHR are correlated, it would be incorrect to assume that the proportion mediated via adiposity equals the sum of the proportions mediated through both BMI and WHR. Therefore, in this association, the proportion mediated via adiposity is likely less than 28% (13% via BMI plus 15% via WHR).

Another issue is that the FFQs used at baseline were not designed to distinguish between NOVA groups [66], potentially leading to random misclassification bias and the weakening of our association estimates. Nonetheless, Huybrechts et al. [31] found positive correlations between UPF consumption and food processing biomarkers (i.e. plasma elaidic acid, an unsaturated trans-fatty acid, and urinary 4-methylsyringol sulphate) in EPIC, suggesting that UPFs were likely correctly identified in the dataset. Also, the dietary data used in our analyses were collected only once, at baseline in the 1990s, when the availability and consumption of UPFs was relatively lower than today. Hence, this study relies on the somewhat unrealistic assumption that UPF intake was rather low and did not increase over time [67]. When Kliemann et al. [9] explored the association between the consumption of UPFs and cancer in EPIC under a hypothetical “upper bound scenario” (where foods were classified into NOVA groups based on their highest degree of processing possible), results were similar to those obtained under the more conservative “middle bound scenario” used in our study (where foods were classified into NOVA groups based on their most likely degree of processing at the time of dietary data collection). Although the use of a hypothetical “upper bound scenario” may account for some of the changes in dietary intake during follow-up, regression dilution bias was still an issue Kliemann et al. [9] could not account for in their analyses. Consequently, dietary questionnaire-related biases may have led to the underestimation of the association between UPF consumption and cancer in the EPIC cohort.

In conclusion, we reaffirmed that UPF intake is associated with an increased risk of HNC and OAC in the EPIC study. Since BMI and WHR explain little of the associations between UPF consumption and upper-aerodigestive tract cancers, further research is required to investigate other mechanisms that may be at play (if there is indeed any causal effect of UPF consumption on these cancers). Our results are likely influenced by residual confounding, as indicated by the negative control analysis. Therefore, our findings should be regarded with caution until they are replicated in other settings (i.e. in populations with different underlying confounding structures) or triangulated with evidence obtained using other methodological approaches.