Nonalcoholic fatty liver disease (NAFLD) has become the most common form of chronic liver disease afflicting ~ 25% of the global population [1]. NAFLD has merged as the second leading indication for liver transplantation in the United States [2]. Due to obesity and diabetes epidemics, the disease burden of NAFLD is projected to increase 2 to threefold in Western countries as well as in several Asian areas by 2030 [3]. Although obesity is an important risk factor for NAFLD development, non-obese NAFLD patients have been identified as a large cluster making up ~ 20% of worldwide NAFLD population [4], which implies the etiological complexity of this disease as well as the possibility of prevention strategies targeting at other modifiable factors. Previous observational studies have identified modifiable factors for NAFLD, including metabolic traits, smoking, alcohol drinking, coffee consumption, and physical activity [5,6,7]. However, certain modifiable exposures, such as cigarette smoking [8] and alcohol drinking [9], have been inconsistently associated with NAFLD risk. The mutual relationship between metabolic syndrome and NAFLD are intertwined, especially in cross-sectional and case–control studies [10]. In addition, whether the associations of above factors with NAFLD risk are causal remains undermined due to potential residual confounding and reverse causality issues in observational studies.

Utilizing genetic variants as instrumental variables, Mendelian randomization (MR) is an epidemiological technique aimed at strengthening causal inference [11]. The approach carries two merits of minimizing confounding and diminishing reverse causality because genetic variants are randomly allocated at conception (thus unrelated to self-adopted and environmental factors) and cannot be modified by the development and progression of the disease [11]. Here, we conducted an MR study to investigate the associations of metabolic and lifestyle factors with risk of NAFLD. We also examined the obesity-independent effects of metabolic features on NAFLD as well as explored the mediators in the association between obesity and NAFLD.


Study design

Figure 1 shows the study design. The present MR study included 14 modifiable factors (5 lifestyle and 9 metabolic factors). We firstly examined the associations of these factors with NAFLD in a large discovery dataset and then performed a replication analysis in an independent population. To increase power of the analysis, we combined estimates from two data sources. Multivariable MR method and mediation analysis were used. The analysis was conducted using summary-level data from published genome-wide association studies (GWASs) and the analytic process was in accordance with the STROBE-MR guidelines [12]. All studies included in cited GWASs had been approved by a relevant review board and all participants had provided the consent forms. The present MR analyses were approved by the Swedish Ethical Review Authority (2019‐02793).

Fig. 1
figure 1

Study design overview

Genetic instrument selection

Single-nucleotide polymorphisms (SNPs) associated with 14 modifiable factors at the genome-wide significance level (p ≤ 5 × 10–8) were obtained from corresponding GWASs (Table 1). We estimated linkage disequilibrium among these SNPs based on the 1000 Genomes European reference panel [13]. SNPs in linkage disequilibrium (r2 ≥ 0.01) were excluded and the SNP with the smallest p value for the genome-wide association was attained. Genetic instrument selection for multivariable MR analysis followed the same criteria. For smoking behaviors, two sets of instruments (SNPs for smoking initiation and for lifestyle smoking index) were used for validation. Detailed information on GWASs of studied exposures, including number of participants and adjusted covariates, is presented in Table 1.

Table 1 Detailed information on used studies

Data sources for NAFLD

Summary-level data (i.e., beta coefficient and corresponding standard error) for the associations of exposure-associated SNPs with NAFLD were extracted from a GWAS meta-analysis including 8434 NAFLD cases and 770,180 non-cases (discovery stage) [14] and another GWAS including 1483 NAFLD cases and 17,781 non-cases (replication stage) [15] (Supplementary Table 1). Four GWASs (the Electronic Medical Records and Genomics, UK Biobank, FinnGen, and Estonian Biobank) were included in the discovery dataset [14]. The replication GWAS comprised data from 11 leading European tertiary liver centers [15]. Case definition and exclusion criteria in included NAFLD GWASs are shown in Supplementary Table 1. Detailed information on quality control refers to the cited GWAS papers [14, 15].

Statistical analysis

SNPs in the exposure and outcome datasets were harmonized by coded and reference alleles to omit ambiguous SNPs with non-concordant alleles. We defined palindromic SNPs with ambiguous minor allele frequency > 0.45 and < 0.55 and all possible palindromic SNPs were excluded in a sensitivity analysis. A few missing instruments were not replaced by proxy SNPs given that a small proportion of missing generates limited influences on the results.

The inverse variance weighted (IVW) method was used as the main statistical analysis method and supplemented by four sensitivity analyses, including the weighted median [16], MR-Egger [17], MR-PRESSO [18], and contamination mixture [19] methods. The assumptions and strengths of used methods are summarized in Supplementary Table 2. Given previously identified shared genetic risk between obesity and metabolic traits [20], we used multivariable MR analysis with adjustment for genetically predicted BMI to assess the independent effects of metabolic traits on NAFLD. Likewise, we performed multivariable MR analysis to disentangle the effects of different blood lipids fractions on NAFLD. The analysis of mediation of metabolic traits on the association between obesity and NAFLD was performed under network MR framework where multivariable MR analysis was applied to adjust for the genetic association of the instruments with BMI. The mediation effect was calculated using the formula: (total effect—direct effect)/total effect and standard error of the mediation estimate was calculated using the propagation of error method [21]. Reverse MR analyses were performed for type 2 diabetes, BMI, and blood lipids based on 6 SNPs as genetic instruments for NAFLD [22]. Cochran’s Q statistic was used to assess the heterogeneity of SNP-estimates in each MR association. The p value of intercept test from MR-Egger regression was used to assess the horizontal pleiotropy [17]. The association with the p value < 0.004 (0.05/14 exposures) were deemed a significant association, and the association with the p value ≥ 0.004 and ≤ 0.05 were regarded as a suggestive association. The F statistic was calculated to measure the strength of used instruments and power was estimated using an online tool [23]. All tests were two-sided and performed using the TwoSampleMR [24], MR-PRESSO [18] and MendelianRandomization [25] packages in the R software (version 4.0.2).


The F statistic for instruments and estimated power for all analyses are shown in Table 2. All F statistics for the overall instruments were over 10, indicating a good strength of used genetic instruments. The power was low in the analysis of alcohol, coffee, and caffeine consumption, but adequate for the other studied exposures.

Table 2 F statistic and power estimation

Genetic predisposition to smoking was significantly associated with an increased risk of NAFLD in the discovery dataset and the association remained directionally consistent in the replication dataset (Fig. 1). The odds ratios (ORs) of NAFLD were 1.09 (95% confidence interval (CI) 1.02, 1.16; p = 0.008) for genetically predicted one-standard deviation (SD) increase in prevalence of smoking initiation and 1.59 (95% CI 1.31, 1.93; p = 3.42 × 10–3) for genetically predicted one-SD change of lifetime smoking index in the combined analysis. Other studied lifestyle factors, including alcohol drinking, coffee and caffeine consumption, and vigorous physical activity, were suggestively inversely associated with the risk of NAFLD in the combined dataset (Fig. 2). The ORs of NAFLD were 0.61 (95% CI 0.38, 0.96; p = 0.032) for genetically predicted one-SD increase of log-transformed alcoholic drinks/week, 0.74 (95% CI 0.55, 1.00; p = 0.047) for genetically predicted 50% increase in coffee consumption, 0.87 (95% CI 0.75, 1.00; p = 0.050) for genetically predicted 80 mg increase in caffeine consumption, and 0.77 (95% CI 0.61, 0.97; p = 0.026) for genetic predisposition to vigorous physical activity.

Fig. 2
figure 2

Associations of genetically predicted lifestyle and metabolic factors with risk of nonalcoholic fatty liver disease in discovery, replication, and combined datasets. CI confidence interval, OR odds ratio

Six out of nine metabolic factors were significantly associated with NAFLD risk in the combined dataset (Fig. 2). There were associations for BMI (OR 1.33, 95% CI 1.23, 1.43; p = 1.75 × 10–12 per genetically predicted one-SD increase), waist circumference (OR 1.82; 95% CI 1.48, 2.24; p = 1.28 × 10–8 per genetically predicted one-SD increase), type 2 diabetes (OR 1.21, 95% CI 1.15, 1.27; p = 7.15 × 10–15 per one-unit increase in log-transformed odds), systolic blood pressure (OR 1.17; 95% CI 1.07, 1.26; p = 2.29 × 10–4 per genetically predicted 10 mm Hg increase), high-density lipoprotein cholesterol (OR 0.84, 95% CI 0.77, 0.90; p = 6.88 × 10–6 per genetically predicted one-SD increase), and triglycerides (OR 1.23, 95% CI 1.15, 1.33; p = 3.08 × 10–8 per genetically predicted one-SD increase). In the reverse MR analysis, genetic liability to NAFLD was associated with lower levels of high-density lipoprotein cholesterol, but not associated with other blood lipids, BMI, or risk of type 2 diabetes (Supplementary Table 3). High heterogeneity was observed in these analyses.

The observed associations were overall consistent across sensitivity analyses and between discovery and replication datasets (Supplementary Table 4). Moderate-to-high heterogeneity was observed in the analyses for alcohol drinking, type 2 diabetes, and lipid traits (Table 3 and Supplementary Table 4). We detected possible pleiotropy in the analyses for lifetime smoking index, type 2 diabetes, low-density lipoprotein cholesterol in the discovery analysis and in the analysis for body mass index in the replication analysis (Table 3 and Supplementary Table 4). However, these associations remained consistent after removal of outlier variants in MR-PRESSO analysis (Table 3 and Supplementary Table 4).

Table 3 Heterogeneity and pleiotropy assessment

The associations for type 2 diabetes, systolic blood pressure, and triglycerides, but not for high-density lipoprotein cholesterol, remained similar in MR analysis with adjustment for genetically predicted BMI (Figs. 2 and 3). The inverse association for genetically predicted levels of high-density lipoprotein cholesterol and the positive association for genetically predicted levels of triglycerides became stronger in multivariable MR analyses with adjustment for genetically predicted levels of other related lipid traits (Supplementary Table 5). The association between BMI and NAFLD attenuated after adjusting for genetic liability to type 2 diabetes, and genetically predicted levels of high-density lipoprotein cholesterol and triglycerides (Table 4). Genetic liability to type 2 diabetes mediated 51.4% (95% CI 13.4%-89.3%) of BMI-effects on NAFLD risk (Table 4).

Fig. 3
figure 3

Genetically predicted BMI-adjusted associations of genetically predicted waist circumference, diabetes, systolic blood pressure, and lipids with risk of nonalcoholic fatty liver disease in combined dataset. CI confidence interval, MVMR multivariable Mendelian randomization, OR odds ratio, UVMR univariable Mendelian randomization

Table 4 Mediation of genetically predicted diabetes, systolic blood pressure, and lipids in the Mendelian randomization association between body mass index and nonalcoholic fatty liver disease


This MR study found that genetic predisposition to smoking, obesity, type 2 diabetes, high blood pressure, and dyslipidemia (low levels of low-density lipoprotein cholesterol and high levels of triglycerides) was associated with an increased risk of NAFLD. The associations for type 2 diabetes, systolic blood pressure, and triglycerides were independent of genetically predicted BMI. Low-density lipoprotein cholesterol and triglycerides appeared to be robust risk factors for NAFLD risk among lipid biomarkers. Type 2 diabetes mediated a large proportion of BMI-effects on NAFLD risk. There were suggestive inverse associations of genetically predicted moderate alcohol drinking, coffee and caffeine consumption, and vigorous physical activity with NAFLD risk.

An animal study found that four-week cigarette smoking worsened liver injury in obese rats with histological features of NAFLD via increased oxidative stress [26]. Observational studies of smoking and NAFLD have been inconsistent, with no association in a cross-sectional study with 90 NAFLD patients [8]. In another cross-sectional study including 2811 participants, individuals with one more pack of cigarette smoked per day had 1% higher risk of NAFLD [7]. This positive association was confirmed in a subsequent cohort study of 199,468 Korean adults where current smoking, pack-years, and urinary cotinine levels (a marker of tobacco smoke exposure) were positively associated with the risk of incident NAFLD [27]. Being in line with this cohort, our study found robust MR associations of smoking initiation and lifetime smoking index with NAFLD risk in two independent datasets, which strengthened the causal nature of this association. In addition, passive smoking in child and early adult lives has been associated with an increased later-life risk of developing NAFLD [28]. The underlying mechanisms behind the association between smoking and NAFLD may related to insulin resistance, hyperinsulinemia, dyslipidemia, hepatic steatosis, inflammation, and increased levels of catecholamine and glucagon, which can be induced by long-term smoking and nicotine use [27, 29].

Epidemiological evidence on the association between alcohol drinking and NAFLD risk is conflicting. A three-year longitudinal study with 4 waves’ repeated measurements including up to 3773 Japanese adults found that light to moderate alcohol consumption was associated with a reduced risk of NAFLD in both sexes [30]. This inverse association was also identified in a recent meta-analysis of 14 observational studies [31]. However, moderate alcohol use compared to nondrinking was associated with less improvement in steatosis and level of aspartate transaminase in 285 NAFLD patients after a mean follow-up period of 47 months [9]. An MR study including 266 NAFLD cases and 200 non-cases found that lifetime moderate alcohol consumption proxied by one genetic variant located in alcohol dehydrogenase (ADH1B) gene had no beneficial effects on NAFLD disease severity [32]. Nevertheless, animal data revealed that aldehyde dehydrogenase 2 deficiency ameliorated alcoholic fatty liver but worsened liver inflammation and fibrosis in mice [33]. Our MR analysis found a borderline inverse association between moderate alcohol drinking and NAFLD. Additional studies with a large sample size, the ability to assess the nonlinear associations, and with a comprehensive consideration of confounders, especially healthy lifestyle factors, will be needed to verify our findings and elaborate on underlying mechanisms.

Habitual coffee consumption has been consistently associated with a reduced risk of NAFLD and the association appears to be in a dose–response way in observational studies [34]. A recent MR study found no statistically significant association of coffee consumption instrumented by 4 SNPs (OR 0.76; 95% CI 0.51, 1.14) and 6 SNPs (OR 0.77; 95% CI 0.48, 1.25) with NAFLD risk using data from the UK Biobank with 1122 cases (one sub-dataset of our analysis) [35]. Compared with this study, our MR analysis with larger power (12 SNPs explaining larger variance in coffee consumption and ~ 9 times more cases) revealed a possible inverse association between coffee consumption and NAFLD risk. Besides, we detected a possible inverse association for caffeine consumption.

An inverse association between physical activity and NAFLD risk was observed in a meta-analysis including 6 cohort and 4 case–control studies [36] and subsequent studies [37, 38]. Compared with individuals with lowest physical activity levels, those with highest levels (similar to vigorous physical activity) had a lower odds of developing NAFLD (risk ratio, 0.79; 95% CI 0.71, 0.89) [36]. This association was in line with our MR finding. Future well-powered MR analysis is warranted to verify our finding.

Metabolic disorders, like obesity [39], type 2 diabetes [10], increased systolic blood pressure [10], and dyslipidemia [40], have been associated with NAFLD in observational studies. However, whether these metabolic traits are causally associated with NAFLD risk is unknown given that most associations were based on observational data. A previous MR study with 1122 NAFLD cases found harmful causal effects of overall and central obesity (represented by BMI and waist-to-hip ratio adjusted for BMI, respectively), type 2 diabetes, and triglycerides levels on NAFLD [41]. These associations were replicated in this updated MR study with more cases. Notably, increased systolic blood pressure and decreased levels of low-density lipoprotein cholesterol were identified as two new causal risk factors for NAFLD in our analysis. Even though we had limited evidence in support of the reverse impact of having NAFLD on type 2 diabetes, BMI, and blood lipids except for high-density lipoprotein cholesterol, which is not in line with observational studies [42], these null findings should be cautiously interpreted given high heterogeneity in these analyses as well as a few genetic instruments for NAFLD [22]. Thus, the effects of NAFLD on metabolic profiles need to be further explored.

Metabolic traits are usually tightly correlated with overweight and obesity. Estimating BMI-independent effect of these metabolic traits is of clinical importance especially in identifying high-risk group especially given a large number of lean NAFLD patients [4]. By subtracting effects of BMI, our multivariable MR analysis found independent roles of type 2 diabetes, elevated systolic blood pressure, and increased levels of triglycerides in the development of NAFLD. These findings are in line with associations observed in the normal weight population [43] and meanwhile convey the information that it may be beneficial to promote NAFLD screening as well as lifestyle intervention among individuals with an abnormal profile of glycemic traits, blood pressure or triglycerides. We also performed multivariable MR analysis among lipid fractions to pinpoint the robust lipid biomarker associated with NAFLD [44]. We found that low-density lipoprotein cholesterol and triglycerides appeared to be better indicators compared to other studied lipid traits. Additionally, mediation analysis found that type 2 diabetes and dyslipidemia might be intermediators in the pathway from BMI to NAFLD. The finding highlights that a good management of glucose and lipids levels in obese individuals may be an effective way to somehow neutralize the detrimental effects of overweight and obesity on NAFLD.

MR analysis has three important assumptions [11]. First, the selected instrumental variables should be strongly associated with the exposure of interest (assumption 1, relevance). In the present study, we selected SNPs that were associated with the exposures at the genome-wide significance level (P < 5 × 10–8) as instrumental variables from genome-wide association studies with large sample sizes. Second, used instrumental variables should not be associated with any confounders in the association between the studied exposure and outcome (assumption 2, independence). Given the study was based on summary-level data, a thorough examination of the associations between exposures and possible confounders was not possible. However, these instruments were widely used in previous MR studies on metabolic diseases [45, 46]. Third, the genetic instruments should influence the outcome only via the exposure, not via other alternative pathways (assumption 3, exclusion restriction). Although we could not completely rule out the possibility that our findings might be biased by horizontal pleiotropy, our results remained consistent across several sensitivity analyses and the MR-Egger and MR-PRESSO analyses detected limited evidence in support of strong pleiotropic effects.

There are several strengths of this MR study. The major merit is MR design which can minimize confounding and reverse causality to a large extent [11]. We explored the associations in two independent datasets to examine the consistency and then combined the associations from two data sources to increase the number of cases. Together with more SNPs used as instrumental variables compared to previous MR studies [32, 35, 41], our established associations should be better powered even though we might still have overlooked weak associations. The results remained overall consistent across several sensitivity analyses. In addition, we confined our analysis to the population of European descent, which effectively reduce the bias caused by the population structure bias. However, this study population of consistent ancestry may limit the generalizability of our findings to other populations.

Limitations need consideration when interpreting our results. The major issue for any MR study is horizontal pleiotropy that means selected genetic instrument variables influence the risk of outcome not via the exposure but other alternative pathways. However, this pleiotropic effect should not bias our results for two reasons. First, we detected limited evidence on pleiotropy from MR-Egger intercept test for most associations in the present MR analysis. For certain associations with significant indication of horizontal pleiotropy, there were few outliers detected by MR-PRESSO analysis and the association remained consistent or became even stronger after removal of outlying SNPs. Second, we performed multivariable MR for traits with strong phenotypic and genetic correlations and the associations remained stable after adjustment. Sample overlap between the exposure and outcome data sources might exist and thus bias our causal estimates towards observational associations by inflating the weak instrument bias [47]. Nonetheless, our SNPs were selected at the genome-wide threshold (strongly associated with the exposure) and all estimated F statistics were over 10, which indicates that the bias introduced by partial sample overlap should be minimal. Associations for certain exposures differed between the discovery and replication datasets, which might be caused population differences in certain features, like prevalence of obesity and vigorous physical activity. In addition, differences in NAFLD definition might cause heterogeneity in meta-analysis of associations across used data sources. For certain exposures, like alcohol consumption, the nonlinear association could not be estimated in the present MR analysis based on summary-level genetic statistics. Likewise, the gene-environmental interaction could not be assessed in summary-level data. The prevalence of different lifestyle and metabolic factors as well as NAFLD differs by age or sex [1, 48]. Whether the observed associations differ by age and sex could not be examined in the current MR study based on summary-level data and needs further investigation.

In conclusion, the present study provides MR data in support of causal roles of smoking, obesity, high systolic blood pressure, and dyslipidemia featured by low levels of low-density lipoprotein cholesterol and high levels of triglycerides, in the development of NAFLD. Mediation effects of type 2 diabetes and dyslipidemia in the association between BMI and NAFLD suggest the important role of the management of blood glucose and lipids in obesity in NAFLD prevention. The inverse associations for moderate alcohol drinking, coffee and caffeine consumption, and vigorous physical activity need confirmation in well-powered studies.