Background

Non-alcoholic fatty liver disease (NAFLD) has emerged as the most common chronic liver disorder globally [1]. Currently, nearly 32.4% of adults worldwide have NAFLD [2], while around 1 in 3 people has an early stage of the disease in the UK [3]. Diet is an essential modifiable risk factor for NAFLD, and dietary patterns, in which nutrients and foods are consumed in combination, reflect real-world dietary practice [4].

Plant-based diet patterns nowadays are gaining attention as their environmental sustainability benefits [5], and diet patterns characterized by high consumption of healthy plant-based foods were associated with lower NAFLD prevalence and liver fat content [6, 7]. However, not all plant-based foods were beneficial to NAFLD, as less nutrient-dense plant foods, including refined grains, fruit juices, and sugar-sweetened beverages, are associated with higher NAFLD risk [8,9,10]. To distinguish the plant-based diets with different quality, previous studies have developed three plant-based diet indices (PDIs), an overall plant-based diet index (PDI), which emphasizes the intake of all plant foods; a healthful plant-based diet index (hPDI), which emphasizes higher consumption of healthful plant-based foods such as whole grains, vegetables, nuts, legumes, coffee, and tea; and an unhealthful plant-based index (uPDI), which highlights the consumption of less healthful plant-based foods associated with increased risks of several chronic diseases [11]. The associations of lower overall PDI and hPDI but higher uPDI with higher liver fat content and the prevalence of fatty liver have been reported [12, 13], while insignificant associations were shown in other studies [14, 15]. Given the limited sample size and inconsistent findings of existing studies, evidence from large population-based studies with a prospective design is warranted.

The development of NAFLD is a consequence of an interaction between environmental and genetic factors [16]. To date, several NAFLD-associated loci have been identified in genome-wide association studies [17]. However, no studies have examined the interaction between diet patterns and genetic predisposition on NAFLD primary prevention, and only one study showed that improved adherence to the Mediterranean diet pattern or the Alternative Healthy Eating Index (AHEI) was associated with reduced liver fat content, particularly among individuals with high genetic risk [18]. Therefore, our study aimed to longitudinally investigate the association between PDIs and NAFLD risk and to explore whether such associations would be modified by the genetic risk of NAFLD.

Methods

Study design and setting

UK Biobank recruited more than 0.5 million participants aged 37-73 years from the general population between 2006 and 2010. Participants attended one of 22 assessment centers across England, Scotland, and Wales, where they completed the touch-screen questionnaire, a face-to-face interview with a nurse, a series of physical measurements, and provided biological samples. The date and cause of hospital admissions were obtained through recorded linkage to health episode statistics (England and Wales) and Scottish morbidity records (Scotland). The UK Biobank study was approved by the North West Multi-centre Research Ethics Committee (REC reference for UK Biobank 11/NW/0382), and written informed consent was provided prior to participation. Data from the UK Biobank are available to all researchers upon application (https://www.ukbiobank.ac.uk/).

Study population

We included participants with at least one dietary assessment (n = 210,673) and excluded those with diagnosed NAFLD, cirrhosis, or other liver diseases when dietary information collection was completed (n = 2920). We further excluded participants diagnosed with alcohol-related diseases at the end of the follow-up (n = 266). Participants with implausible energy intake (< 800 or > 4200 kcal/day in males and < 600 or > 3500 kcal/day in females) (n = 2994), without complete genetic data, or not of European descent were also excluded (n = 13,111). We excluded participants with cardiovascular diseases or cancer at baseline as they likely changed eating habits after disease diagnosis (n = 32,160). Finally, 159,222 participants were included in the NAFLD risk analyses and 20,692 in liver fat content analyses (Additional file 1: Figure S1).

Dietary assessment

Dietary information in UK Biobank was collected using the Oxford WebQ based on a 24-h dietary recall questionnaire. The Oxford WebQ has been validated against an interviewer-administered 24-h recall [19] and biomarkers [20]. The consumption of more than 200 commonly consumed food and more than 30 beverage items over the previous 24 h were collected. The first instance of dietary assessment was conducted in the assessment centers for the last 70,000 participants from April 2009 to September 2010, and the following 4 online cycles were conducted through e-mail invitations on four separate occasions between February 2011 and April 2012. For those who completed twice and more, the intake of every food item was calculated as the means of intake answered in all diet assessments.

We calculated the overall PDI, hPDI, and uPDI using the established method conceptualized by Satija et al. [11, 21]. We categorized foods into 17 groups (whole grains, fruits, vegetables, nuts, legumes, tea and coffee, refined grains, potatoes, sugary drinks, fruit juices, sweets and desserts, animal fat, dairy, eggs, fish or seafood, meat, and miscellaneous animal-based foods) and classed them into larger categories: healthy plant-based foods, less healthy plant-based foods, and animal foods (Additional file 1: Table S1). Given the controversial health effect of alcohol, we did not include alcoholic beverages in plant-based indices but adjusted them in the regression model [22]. The intakes of every food group were ranked into quintiles and given positive (Q1 to Q5 received 1 to 5 point) or reverse (Q1 to Q5 received 5 to 1 point) scores. To generate the overall PDI, healthful and unhealthful plant-based food groups were given positive scores, and animal food groups received reverse scores. For hPDI, positive scores were given to healthy plant-based food groups, and reverse scores were given to less healthy plant-based food groups and animal food groups. Finally, for creating uPDI, positive scores were given to less healthy plant-based food groups, and reverse scores were given to healthy plant-based food groups and animal food groups. The scores of 17 food groups for an individual were summed up to obtain the PDIs, with a theoretical range from 17 to 85.

Ascertainment of NAFLD

NAFLD was diagnosed according to hospital inpatient records, death registry data, and primary care data linked to the UK Biobank based on the 9th and 10th International Classification of Diseases (ICD-9 and ICD-10) [23]. The detailed ICD and primary care read codes are shown in Additional file 1: Table S2. The time-to-event was calculated from the last dietary assessment to the date of NAFLD diagnosis, death, loss to follow-up, or censorship (30 September 2021 for England, 31 July 2021 for Scotland, and 28 February 2018 for Wales), whichever came first.

Magnetic resonance imaging (MRI) scan of the liver

The MRI imaging protocol and analysis of liver fat content in the UK Biobank have been previously published [24]. Briefly, the liver MRI scan was performed using Siemens 1.5T MAGNETOM Aera by LiverMultiScan©, which is part of the abdominal imaging protocol in the UK Biobank. MRI-derived proton density fat fraction (MRI-PDFF), which has the highest accuracy for quantification of intrahepatic fat content compared with other non-invasive imaging modalities and positively correlates with histopathological hepatic triglyceride content [25, 26], was derived as previously described [24, 27]. Fat-referenced PDFF was measured as the average PDFF of nine regions of interest in the liver, placed while avoiding any inhomogeneities, major vessels, or bile ducts.

Polygenic risk score (PRS) for NAFLD

The detail of genotyping, imputation, and quality control of genetic data in UK Biobank has been discussed elsewhere [28]. We calculated the PRS of NAFLD based on 5 single-nucleotide polymorphisms (SNPs) significantly associated with NAFLD in participants of European descent (rs738409, rs58542926, rs641738, rs1260326, and rs72613567) [17]. The effect size of each SNP (β-coefficient) and other related information are shown in Additional file 1: Table S3. The PRS of NAFLD was calculated by summing the risk allele numbers of each SNP weighted by the effect size to NAFLD: PRS = β1 × SNP1 + β2 × SNP2 + …+βn × SNPn, where SNPn is the risk allele number of each SNP.

Covariates

Age at dietary assessment was determined from the date of birth to the date completed the last dietary assessment. Sex (male or female), education (lower secondary, upper secondary, vocational, college or university, or others), and household income (< 18,000, 18,000–30,999, 31,000–51,999, 52,000–100,000, > £100,000 £/year) were self-reported. Socioeconomic status was reflected by the Townsend deprivation index (quintiles) derived from the postcode of residence [29]. Smoking status was defined as current, former, or never. Physical activity was estimated in metabolic equivalent minutes per week (MET-mins/week, categorized into < 600, 600–1199, ≥ 1200 MET-mins/week, or unknown). Body mass index was calculated as weight in kilograms divided by height in meters squared (< 25.0, 25.0–29.9, ≥ 30 kg/m2, or unknown). Alcohol consumption (0, 0.1–10.0, 10.1–20.0, 20.1–35.0, ≥ 35.1 g/day, or unknown) and total energy intake (in Kcal, quintiles) were estimated using 24-h dietary recall data.

Statistical analysis

The analysis plan was preregistered with the Open Science Foundation (https://osf.io/z9u5m/). All analyses were conducted using SAS 9.4 (SAS Institute Inc) and R software (version 4.2.1). All statistical tests were 2-tailed, and p < 0.05 was considered statistically significant.

Associations between PDIs and NAFLD were estimated using Cox proportional hazard regression model by quintiles of exposures with time-to-event as the timescale. The results were presented as hazard ratios (HRs) and 95% confidence intervals (CIs). The proportional hazards assumption was tested by the Schoenfeld residual method and satisfied. The potential confounders were adjusted based on a priori-defined directed acyclic graph (Additional file 1: Figure S2). The minimize model was adjusted for age and sex. The multivariable-adjusted model was further adjusted for education, household income, Townsend deprivation index, assessment centers, smoking status, alcohol consumption, physical activity, total energy intake, BMI, NAFLD-PRS, the first 10 principal components of ancestry, and the genotype measurement batch. Given that BMI could be on the causal pathway between PDIs and NAFLD risk, we additionally conducted a multivariable-adjusted model without BMI. PDIs were also treated as continuous variables, and HR per 10-point increment was reported. P for trend was estimated by including a continuous variable assigned each quintile to its median value. The cumulative risk curves of NAFLD by PDIs quintiles were plotted using Kaplan–Meier methods. To investigate the dose-response associations between PDIs and NAFLD risk, we performed restricted cubic spline regressions (RCS) fitted by Cox hazard regression with four knots (5th, 35th, 65th, and 95th) to flexibly model the NAFLD risk distributed by PDIs (trimmed with 2.5% and 97.5% of the distribution). Furthermore, to explore the associations of PDIs with intrahepatic fat content, we estimated β-coefficients (95% CI) of PDIs quintiles with MRI-PDFF by generalized linear models and modeled the curves of continuous PDIs and MRI-PDFF using generalized additive models.

We conducted stratified analysis by NAFLD genetic risk tertiles and multiplicative interactions were tested by including a PDIs×PRS term in the fully adjusted model. We further conducted the sex-specific interaction analyses between PRS and PDIs on NAFLD risk to determine whether the interactions would differ by sex. We also estimated the joint association of PDIs and genetic risk with NAFLD risk and MRI-PDFF by defining a combined variable based on tertiles of genetic risk and PDIs (9 categories) with the highest risk combination (the lowest overall PDI/hPDI and the highest PRS) or the lowest (the lowest uPDI and the lowest PRS) as reference.

As secondary analyses, we (1) used sex-specific quintiles of PDIs and reran the main analysis; (2) conducted the stratified analyses of NAFLD risk and MRI-PDFF by age, sex, obesity, energy intake, alcohol consumption, and physical activity; (3) estimated the mediating effect of BMI at the second assessment on the associations between PDIs and NAFLD risk; (4) individually excluded each of 17 food groups from the PDIs and assessed the associations between those modified indices and NAFLD risk with further adjusting for the intake of excluded food group; (5) further adjusted for diagnosed depression, dyslipidemia, hypertension, and diabetes at baseline (i.e., the date when the last dietary assessment was completed) to limit the potential confounding of chronic disorders; (6) further adjusted for liver function to control the confounding effect of baseline liver function; (7) further adjusted for glucose, glycated hemoglobin, and triglyceride to minimize the confounding of metabolic factors; (8) further adjusted for waist circumference to limit the confounding of central obesity; (9) excluded participants with less than twice dietary assessment; and (10) excluded participants with less than 2 years of follow-up to minimize the reverse casualty.

Results

The baseline characteristics of 159,222 participants are shown in Table 1. The mean age was 58.0 ± 8.0, and 55.7% were female. The mean (SD) times of 24-h dietary assessment were 2.2 (1.2). The overall PDI ranged from 25 to 74, hPDI ranged from 27 to 82, and uPDI ranged from 27 to 78. Participants with higher overall PDI and hPDI but lower uPDI tended to be female, well-educated, non-current smokers, and with lower BMI. Total energy intake was higher among participants with higher overall PDI but lower in those with higher hPDI and uPDI. The baseline characteristics and PDIs were generally consistent among total participants and those with MRI-PDFF data (n = 20,692, Additional file 1: Table S4). The baseline characteristics by NAFLD status are shown in Additional file 1: Table S5.

Table 1 Baseline characteristics of 159,222 participants by three plant-based diet indices

During a median follow-up of 9.5 years, 1541 NAFLD cases were documented. We did not observe significant departures from linearity when the non-linearity of overall PDI, hPDI, and uPDI with NAFLD risk was tested using RCS (Fig. 1, p-nonlinearity > 0.05 for all PDIs). The cumulative risks of NAFLD by PDIs quintiles are shown in Additional file 1: Figure S3. Compared to the lowest quintile, multivariable-adjusted HRs of NAFLD in the highest quintile were 0.78 (95% CI, 0.66–0.93, p-trend = 0.02), 0.74 (95% CI, 0.62–0.87, p-trend < 0.0001), and 1.24 (95% CI, 1.05–1.46, p-trend = 0.02) for PDI, hPDI, and uPDI, respectively. These associations were stronger when not adjusting for BMI (Table 2). Additionally, per 10-point increment of PDIs was associated with an 11% lower, 20% lower, and 14% higher risk of NAFLD (with HRs 0.89 [95% CI, 0.81–0.97], 0.80 [95% CI, 0.73–0.88], and 1.14 [95% CI, 1.05–1.24] for overall PDI, hPDI, and uPDI, respectively (Table 2). When the liver fat content was indicated by MRI-PDFF and further adjusted for age at MRI in the final model, higher overall PDI and hPDI were associated with lower intrahepatic fat content (β [95% CI] per 10-point increment were − 0.34 [− 0.44, − 0.25] and − 0.45 [− 0.54, − 0.36], respectively), while higher uPDI was associated with higher liver fat content (β [95% CI], 0.41 [0.32, 0.49], Fig. 2, Additional file 1: Table S6).

Fig. 1
figure 1

Restrict cubic spline for associations of overall plant-based diet index, healthful plant-based diet index, and unhealthful plant-based diet index with NAFLD risk. Adjusted for age at the last dietary assessment, sex, education, household income, Townsend deprivation index, assessment centers, smoking, alcohol consumption, physical activity, total energy, BMI, NAFLD-PRS, first 10 principal components of ancestry, and genotype measurement batch. Abbreviations: BMI, body mass index; CI, confidence interval; HR, hazards ratio; NAFLD, non-alcoholic fatty liver disease; PRS, polygenic risk score

Table 2 Hazard ratios (95% confidence intervals) of NAFLD according to quintiles of overall plant-based diet index, healthful plant-based diet index, and unhealthful plant-based diet index
Fig. 2
figure 2

Associations of overall plant-based diet index, healthful plant-based diet index, and unhealthful plant-based diet index with MRI-PDFF. Adjusted for age at the last dietary assessment, age at MRI scan, sex, education, household income, Townsend deprivation index, assessment centers, smoking, alcohol consumption, physical activity, total energy, BMI, NAFLD-PRS, first 10 principal components of ancestry, and genotype measurement batch. Abbreviations: MRI, magnetic resonance imaging; NAFLD, non-alcoholic fatty liver disease; PDFF, proton density fat fraction; PRS, polygenic risk score

When assessing the joint association of PDIs and PRS with NAFLD risk, compared with the highest risk combinations, most other groups for overall PDI and hPDI had significantly lower NAFLD risk, and NAFLD risk was lowest in participants with low genetic risk and highest PDI/hPDI scores. On the contrary, compared to those with the lowest PRS and lowest uPDI, the NAFLD risk was increased with higher PRS and uPDI (Additional file 1: Figure S4). In joint associations of MRI-PDFF, the β-coefficients were gradually increased with increased PDIs and PRS (Additional file 1: Figure S5). In stratified analyses by genetic risk, the associations of higher overall PDI and lower uPDI with lower NAFLD risk were not modified by genetic susceptibility to NAFLD, but the association of hPDI with NAFLD risk was significantly modified by genetic risk (P-interaction > 0.05 for overall PDI and uPDI, P-interaction = 0.03 for hPDI, Additional file 1: Tables S7-S9).

Associations of PDIs with NAFLD risk and liver fat content persisted when using sex-specific quintiles of PDIs, and no significant interaction between sex and PDIs was observed (Additional file 1: Tables S10-S11, Figure S6). In addition, we observed no significant modifications by other major risk factors (age, obesity, energy intake, alcohol consumption, or physical activity) of NAFLD risk or MRI-PDFF, except the association between uPDI and NAFLD risk was modified by obesity (p for interaction = 0.0009 < 0.003 (0.05/(3 exposures * 5 groups)), Additional file 1: Tables S12-S13). Associations between PDIs and NAFLD remained largely unchanged when we further adjusted for chronic disorders and liver functions (Additional file 1: Table S14, sensitivity analysis 1&2); when we further adjusted for metabolic indicators and waist circumference (Additional file 1: Table S14, sensitivity analysis 3&4); when we excluded participants with less than twice diet assessments (Additional file 1: Table S14, sensitivity analysis 5); and when we excluded participants with less than 2 years of follow-up (Additional file 1: Table S14, sensitivity analysis 6). In addition, we observed significant mediating effects of BMI on PDIs-NAFLD risk associations, which were 51.8%, 47.0%, and 46.5% for PDI, hPDI, and uPDI, respectively (Additional file 1: Table S15). Furthermore, when we excluded each one of 17 food groups at a time from PDIs and adjusted for the excluded food group intake, the adjusted HRs for each 10-point increment in PDIs were not substantially altered. However, higher intakes of nuts, tea, and coffee were associated with lower NAFLD risk, but higher intakes of sugar-sweetened beverages, fish, and sea foods were associated with higher NAFLD risk (Additional file 1: Table S16). These results indicated that associations between PDIs and NAFLD risk might largely be driven by higher intakes of coffee and tea and lower intakes of sugar-sweetened beverages, fish, and sea foods.

Discussion

In the present longitudinal study, we found that greater intake of plant-based diets, particularly healthful plant-based diets was associated with lower NAFLD risk and liver fat content, while a higher uPDI was associated with increased NAFLD risk and higher liver fat content. Compared to the lowest quintile, participants in the highest quintile of overall PDI, hPDI, and uPDI had a 22% lower, 26% lower, and 24% higher risk of NAFLD, and 0.51% lower, 0.71% lower, and 0.72% higher liver fat content, respectively.

The longitudinal evidence on associations between plant-based diets and NAFLD risk is scarce, and only several cross-sectional studies reported inconsistent associations between PDIs and NAFLD [12,13,14,15]. A study of 18,345 participants in the National Health and Nutrition Examination Survey (NHANES) showed 21% lower, 24% lower, and 34% higher odds of NAFLD for overall PDI, hPDI, and uPDI comparing the highest tertile to the lowest, respectively, in which NAFLD was diagnosed based on the fatty liver index (FLI) [13]. However, in another study of 3900 participants in NHANES, only hPDI showed a significant association with lower transient elastography-diagnosed NAFLD prevalence, and the positive association between uPDI and NAFLD was largely modified and insignificant after further adjusting for BMI [15]. Besides the sample size, the NAFLD diagnosis method might also account for the aforementioned inconsistent findings. Though FLI was frequently used in several large population-based studies to define fatty liver disease (FLD), the more accurate non-invasive measurement of liver steatosis was MRI-PDFF [26]. In another study of 578 participants, the association of three PDIs and the likelihood of MRI-diagnosed FLD was not significant, which might contribute to the limited sample size [14]. In our preregistered analysis, the associations of higher overall PDI and hPDI and lower uPDI with lower NAFLD risk remained significant in fully adjusted model and sensitivity analyses, and the longitudinal study design, the larger sample size, and a more accurate assessment method for intrahepatic liver content increased the confidence in our findings.

The interplay between genetic risk and PDIs has not been reported, and only one study has examined the interaction between the whole diet quality and the overall genetic risk of NAFLD [18]. Based on the Framingham Heart Study, Ma et al. reported that improved diet quality (represented by Mediterranean diet score and AHEI) modified the genetic risk of NAFLD on the liver fat content increase. Though not PDIs, the richness of plant-based foods in the Mediterranean dietary pattern and the good correlation between AHEI and hPDI [30] hinted that PDIs might interplay with NAFLD genetic risk. In our analysis, the significant multiplicative interaction between PDIs and NAFLD-PRS on the risk of NAFLD was observed in hPDI in a sex-specific manner, which might be partly due to the higher NAFLD risk observed in men rather than in women [31].

Several food groups might account for the observed associations, including whole grains, tea, coffee, sugar-sweetened beverages, and red meat. The associations of nuts, tea, and coffee consumption with lower NAFLD risks in our study were in line with previous findings [32, 33]. Those associations might contribute to the higher intake of dietary fibers, flavonoids, caffeine, phytosterols, and plant proteins following a plant-based diet rich in healthy plant-based foods [34], which are all shown to have effects on improved insulin resistance, decreased central obesity, and improved gut microbiome, and hence reduce NAFLD risk [35,36,37,38,39]. The significant mediating effects of BMI on PDIs-NAFLD associations observed in our study also partly supported the aforementioned mechanisms. Besides beneficial foods and nutrients, the positive association between a higher intake of sugar-sweetened beverages and NAFLD risk was in agreement with a previous umbrella review of meta-analysis [40], and the accompanying intake of fructose has proved to promote liver fat accumulation [41]. In addition, previous evidence showed that red meat consumption was associated with increased NAFLD risk [42], but not white meat [43]. In PDIs, red meat and white meat are both grouped into meat, and the inconsistent associations between these two types of meat might explain the marginally significant association of meat with NAFLD risk in our analyses.

The longitudinal study was based on a relatively large sample with a long follow-up period. We used a validated dietary recall method and every dietary assessment information to quantify participants’ dietary intake, which limited the measurement bias. Several limitations should be mentioned. First, the dietary assessment was based on 24-h recall, which might be subjected to recall bias and lead to misclassification. However, the misclassification would likely bias our results toward the null. In addition, the representation of long-term dietary habits was limited, while the results were not substantially changed when we limited our analyses to those who completed at least twice dietary assessments. Second, NAFLD cases were ascertained based on primary care, in-hospital record, and death registry data, which might potentially underestimate the true NAFLD incidence. However, it is unlikely that the underdiagnosed NAFLD cases would be diet-specific. Assuming that the specificity of outcome detection is perfect and sensitivity is lower than 100% in both exposure groups, outcome misclassification would produce little bias in estimating the hazard ratio [44]. Third, though we have controlled the majority of confounders, the potential confounding factors are still likely. Fourth, insulin resistance status in UK Biobank was not available, which interfered with the interpretation of its effect on associations of PDIs with NAFLD risk and liver fat content. However, further adjusting for waist circumference, a marker of central obesity and closely associated with insulin resistance, did not largely change the results. Last, our analyses were conducted among Europeans, limiting our findings’ generalization to other ethnic groups.

Conclusions

Our results suggested that higher intakes of overall and healthful plant-based diets were associated with lower NAFLD risk regardless of genetic susceptibility. Conversely, an unhealthful plant-based diet was associated with increased NAFLD risk. Our findings highlighted the importance of the quality of plant-based food when adhering to a plant-based dietary pattern to prevent NAFLD in the entire population.