Causal factors underlying diabetes risk informed by Mendelian randomisation analysis: evidence, opportunities and challenges

Diabetes and its complications cause a heavy disease burden globally. Identifying exposures, risk factors and molecular processes causally associated with the development of diabetes can provide important evidence bases for disease prevention and spur novel therapeutic strategies. Mendelian randomisation (MR), an epidemiological approach that uses genetic instruments to infer causal associations between an exposure and an outcome, can be leveraged to complement evidence from observational and clinical studies. This narrative review aims to summarise the evidence on potential causal risk factors for diabetes by integrating published MR studies on type 1 and 2 diabetes, and to reflect on future perspectives of MR studies on diabetes. Despite the genetic influence on type 1 diabetes, few MR studies have been conducted to identify causal exposures or molecular processes leading to increased disease risk. In type 2 diabetes, MR analyses support causal associations of somatic, mental and lifestyle factors with development of the disease. These studies have also identified biomarkers, some of them derived from the gut microbiota, and molecular processes leading to increased disease risk. These studies provide valuable data to better understand disease pathophysiology and explore potential therapeutic targets. Because genetic association studies have mostly been restricted to participants of European descent, multi-ancestry cohorts are needed to examine the role of different types of physical activity, dietary components, metabolites, protein biomarkers and gut microbiome in diabetes development. Graphical abstract Supplementary Information The online version contains a slide of the figure for download available at 10.1007/s00125-023-05879-7.


Introduction
Diabetes is a leading health issue that causes severe disease and has a huge economic burden worldwide [1,2]. Many epidemiological studies have assessed the causes of diabetes to provide an evidence base for disease prevention. For example, in type 2 diabetes, an exposure-wide umbrella review including 142 factors identified a wide range of biomarkers, medical conditions and dietary, lifestyle, environmental and psychosocial factors that were associated with the risk of disease [3]. The picture is somewhat different for type 1 diabetes owing to the strong genetic contribution and less influence of external factors. In addition to genetic factors, only a few environmental factors, including birthweight and childhood obesity, have been linked to type 1 diabetes [4]. While results from observational studies have provided initial evidence of potential exposures associated with diabetes, residual confounding and reverse causation limit our understanding of the complex set of factors underlying the development of

Collider bias
A bias caused by genetic associations obtained from genome-wide association analysis with adjustment for certain covariates. This can also be caused by stratifying the population based on a collider or studying the association between a risk factor for the disease and disease progression Confounding and reverse causality These two biases can usually be minimised but may be introduced by using invalid genetic instrumental variables, such as involving genetic variants with pleiotropic associations with confounders Horizontal pleiotropy Instrumental variables influence the outcome not only through the exposure but also through alternative pathways. Horizontal pleiotropy can be balanced or imbalanced. Imbalanced horizontal pleiotropy generates bias in MR estimates. For example, if the genetic variants for smoking are associated with a risk factor for type 2 diabetes, such as physical activity, in an imbalanced way, the MR association between smoking and type 2 diabetes will be biased by horizontal pleiotropy. Whether the association is biased by positive (leading to an exaggerated estimate) or negative (leading to an attenuated estimate) horizontal pleiotropy can be assessed by comparing the associations in the inverse variance weighted and MR-Egger regression methods

Instrumental variables
Genetic variants (i.e. SNPs) that are strongly associated with the exposure and not associated with confounders and that affect the outcome merely via the exposure, such as cis-SNPs that encode a protein Linkage disequilibrium The non-random assortment of genetic variants that can be understood as the correlation of genetic variants. Linkage disequilibrium can be used as a criterion for genetic instrument selection and for identifying proxy genetic variants MR An epidemiological method based on observational genetic data for causal inference MR sensitivity analyses MR statistical analysis methods supplementing the main analysis (usually the inverse variance method). Includes a wide range of methods, such as the frequently used weighted median, weighted mode, MR-Egger, MR-PRESSO and contamination mixture methods, and newly developed methods based on summary-level data, such as the MR-Cause method. These analyses can examine the robustness of MR results and provide indications about outliers, heterogeneity and pleiotropy Multiple instruments The genetic score of more than one genetic instrumental variables

Multivariable MR analysis
An analysis that includes at least two traits proxied by instrumental variables and provides MR estimates after mutual adjustment for the included traits. This method helps with minimisation of pleiotropy and mediation estimation Non-linear MR An MR design for examining the non-linear association between the exposure and the outcome. The design requires individual-level data One-sample MR Genetic associations for the exposure and the outcome from one dataset

Population structure bias
This bias can be generated by using genetic association data from populations of different ancestries, in which ancestry is correlated with both phenotype and genotype. For example, the results are likely to be biased if MR analysis used genetic instruments from a European population and outcome data based on a non-European population and vice versa. This bias can be minimised by using data from populations of the same ancestry and adjusting for top-ranked genetic principal components

Standard terms and key concepts in MR studies
diabetes. Thus, whether the factors observed in previous observational studies are causally associated with the risk of diabetes remains unconfirmed. A clear appraisal of the causal risk factors for diabetes is of great importance for disease prevention.
Mendelian randomisation (MR) is an epidemiological method that can strengthen causal inference by using genetic variants as instrumental variables [5]. An instrumental variable is a variable that satisfies three main conditions: (1) it is associated with the exposure (relevance assumption); (2) it does not share a common cause with the outcome (independence assumption); and (3) it is related to the outcome only through the exposure (exclusion restriction assumption) (Fig. 1). The text box summarises the common terms used in MR studies and their key concepts and limitations. As genetic variants are randomly assorted at conception and thus are generally unassociated with environmental and self-adopted factors, MR is believed to be less affected by measured and unmeasured confounding factors. This narrative review aims to summarise the evidence on potential causal risk factors for diabetes by integrating published MR studies on type 1 and 2 diabetes, and to reflect on future perspectives of MR studies on diabetes.

Causal exposures and risk factors for type 1 diabetes
Because there is a strong genetic component in type 1 diabetes, MR studies of type 1 diabetes are limited and only a few potentially modifiable risk factors have been identified (Table 1). Low birthweight [6], childhood obesity [6,7] and a higher abundance of the Bifidobacterium genus [8] have been associated with an increased risk of type 1 diabetes.
MR studies have found no associations of adult body size [6], features of the liver or pancreas [9] and serum 25hydroxyvitamin D levels [10] with type 1 diabetes.
A protein-wide MR study examined the associations of 1611 circulating protein biomarkers with the risk of type 1 diabetes and identified associations for signal regulatory protein gamma, IL-27 Epstein-Barr virus-induced 3 and chymotrypsinogen B1 [11]. These findings linking certain viral infections, particularly by enteroviruses (e.g. coxsackievirus), with the risk of type 1 diabetes are consistent with recent observational studies [12], thus providing an avenue to better understand and prevent this disease.

Causal exposures and risk factors for type 2 diabetes
Most MR studies on glycaemic outcomes have focused on type 2 diabetes. Our previous exposure-wide MR study examined the associations of 97 exposures with risk of type 2 diabetes using data from the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) consortium (74,124 cases and 824,006 controls). In total, 34 factors that were possibly causally associated with the risk of type 2 diabetes were identified [13]. In Table 2 we summarise and update the associations of a wide range of exposures with type 2 diabetes from MR studies on diabetes.
Somatic and psychological health status The results of MR studies of somatic and psychological health status in relation to type 2 diabetes are summarised in Table 2. Contradictory associations were reported for LDL-cholesterol and type 2 diabetes, with an inverse association observed in a European

Two-sample MR
Genetic associations for the exposure and the outcome from two independent datasets. This design can incorporate summary-level data from multiple sources and thus increase the power. However, the results may be influenced by population features of different studies Vertical pleiotropy Instrumental variables associated with traits in the same pathway from the exposure to the outcome. In detail, the exposure proxied by genetic instruments may influence a downstream factor and therefore influence the risk of the outcome. This type of pleiotropy does not bias MR estimates. For example, the association between genetically predicted alcohol consumption and type 2 diabetes risk is mediated by blood pressure, which does not violate the MR assumptions

Weak instrument bias
A bias caused by weak genetic instrumental variables that explain a small phenotypic variance, coupled with a small sample size in relation to the outcome. The strength of the instrumental variable can be assessed by the F statistic. The power of the analysis can be estimated Standard terms and key concepts in MR studies: continued population and a positive association in an African population [13][14][15]. A recent study further identified that the diabetogenic effect of low levels of LDL-cholesterol might be mediated by increased BMI [16]. Lower levels of bilirubin (a marker of liver function) [17], testosterone [18] and thyrotropin [19] were associated with an increased risk of type 2 diabetes in  Fig. 1 Study design and assumptions of MR analysis. The process of MR analysis is shown from top to bottom. In detail, MR analysis is based on genome-wide association analyses of the exposure and outcome. Genetic instruments for the exposure are independent SNPs that are strongly associated with the exposure of interest in a genome-wide association analysis in an unselected sample, such as a general population. Likewise, summary-level data on the outcome are obtained from a genome-wide association analysis of a binary phenotype that defines the population into cases and controls. The directed acyclic graph represents the study design and assumptions of MR analysis; G indicates the genetic instruments, X indicates the exposure of interest, Y indicates the outcome of interest, and U indicates the confounders. There are three important assumptions in MR analysis. Assumption 1 indicates that the genetic variants used as the instrumental variable should be robustly associated with the exposure. Assumption 2 indicates that the instrumental variable should not be associated with any confounders. Assumption 3 indicates that the instrumental variable used should affect the risk of the outcome only through the risk factor, not through alternative pathways. Regarding causal inference, the MR design resembles that of an RCT; specifically, the random allocation of genetic variants in MR mimics the randomisation process of RCTs, which minimises confounding effects. Source: Manhattan plot reproduced from Ikram et al [75], available under a CC BY 2.5 licence (https://creativecommons.org/licenses/by/2.5/). This figure is available as a downloadable slide Robustness in sensitivity analyses was defined as the association remaining stable in any of the following three sensitivity analyses: weighted median, MR-Egger and MR-PRESSO. We defined the sensitivity analyses as robust for studies on metabolites and proteins as the associations remained stable in the analyses based on different data sources. No sensitivity analysis means that the sensitivity analysis could not be performed because of a limited number of genetic variants b Study based on summary-level data 25OHD, 25-hydroxyvitamin D; AA, African American; ALP, alkaline phosphatase; ALT, alanine aminotransferase; AST, aspartate aminotransferase; EA, East Asian; EUR, European; GGT, gammaglutamyl transferase; NA, not available; SHBG, sex hormone-binding globulin some MR studies, but not all [13,[20][21][22]. Sex-specific associations were observed for testosterone [23,24], with an increased risk of type 2 diabetes in women but a decreased risk in men with higher testosterone levels [23]. Insomnia, but no other sleep-related traits, was associated with type 2 diabetes [13].
Adiposity-related factors Similar to the large body of evidence from prospective observational studies, childhood obesity, adulthood overall obesity and central obesity, excessive liver fat and whole-body and visceral fat mass were all associated with an increased risk of type 2 diabetes [9,[25][26][27][28][29]. Plasma levels of adiponectin, an adipocyte-secreted hormone, are decreased in individuals with obesity, which was associated with an increased risk of type 2 diabetes [30]. However, this association was inconsistent in MR sensitivity analyses [30], suggesting that the association may be biased by pleiotropy (e.g. from fat mass). Several MR studies have found that lower birthweight, independent of adult body weight, is associated with a higher risk of type 2 diabetes [31], which may suggest a role of the uterine environment and fetal development in the development of type 2 diabetes.
Lifestyle and nutritional factors MR studies have strengthened the causal role of cigarette smoking in type 2 diabetes and failed to convincingly confirm the effects of physical activity and alcohol and coffee consumption on type 2 diabetes risk [13,32,33]. Although alcohol consumption instrumented by 83 SNPs was not associated with type 2 diabetes, the main SNP that associates with higher alcohol consumption and alcohol abuse in European populations (i.e. rs1229984 in the ADH1B gene) was significantly associated with an increased risk of disease [13]. A robust inverse association between coffee consumption and type 2 diabetes risk has been reported in many observational studies [34]. However, genetically predicted higher coffee consumption was not associated with a decreased risk of type 2 diabetes in MR studies [13,35]. Pleiotropic effects of the SNPs used may cause this lack of association (e.g. from fat mass or other hot beverages or caffeine-containing drinks) and the inverse relationship between genetically proxied coffee consumption and plasma caffeine levels (i.e. the genetic variants with the strongest association with higher coffee consumption are associated with lower plasma caffeine levels) [36]. An MR study found an inverse association between circulating 25-hydroxyvitamin D levels and type 2 diabetes risk [37], and this association might be driven by the vitamin D synthesis pathway [37][38][39]. Lower levels of vitamin K1 (phylloquinone) [40] and higher levels of iron [41] were associated with an increased risk of type 2 diabetes. Eight out of ten plasma fatty acids were found to be associated with type 2 diabetes; however, the associations, with the exception of palmitoleic acid, were driven by SNPs in the FADS1/2 genes [42]. Thus, whether these associations were biased by this pleiotropic gene, which encodes a key enzyme in fatty acid metabolism, remains unknown [43].
Despite the popularity of MR studies for investigating dietary and lifestyle exposures in diabetes and cardiometabolic diseases, there are unique challenges in such studies of these time-varying, compositional and intercorrelated exposures [44]. For example, MR analyses of nutritional exposures based on genetic instruments for a single measure of diet collected in midlife bear an underlying assumption that, on average, the dietary assessment tool is representative of long-term habitual intake. Furthermore, like many behavioural exposures, nutrition is intercorrelated with numerous other lifestyle and environmental factors. Recent studies have documented that confounding and reverse causation affecting traditional epidemiological studies may also impact genetic associations [45]. A recent study has shown that half of the genetic variants associated with diet are the consequence of increased BMI and that it is possible to use genetics to correct for confounding and reverse causation to strengthen genetic correlations and causal inference [45].

IGF-1 and inflammatory biomarkers
Genetically predicted elevated levels of IGF-1, a peptide hormone similar in molecular structure to insulin, were positively associated with the risk of type 2 diabetes [46]. Given the heterogeneous effects of IGF-1-associated SNPs on type 2 diabetes, a recent MR analysis examined several clusters of IGF-1-associated SNPs in relation to type 2 diabetes and specified that this overall positive association might be explained by pathways related to amino acid metabolism and genomic integrity [47]. However, the main cluster of IGF-1-associated SNPs that were associated with a decreased risk of type 2 diabetes mapped to the growth hormone signalling pathway [47], possibly mediated by pleiotropic effects from fat mass, as growth hormone secretion is decreased in obesity [48].
As for inflammatory biomarkers, the IL-1 and IL-6 pathways may be involved in the development of type 2 diabetes [13,49], even though the evidence is weak. One additional minor allele of the IL6R SNP rs7529229 (corresponding to the effect of taking tocilizumab 4-8 mg/kg every 4 weeks) was suggestively associated with a reduced risk of type 2 diabetes (OR 0.97, 95% CI 0.94, 1.00), which implied a possible role of IL-6 receptor blockade in type 2 diabetes prevention.
Circulating metabolites and proteins One of the first demonstrations of the use of MR to study circulating metabolites was in relation to the previously reported epidemiological association between plasma levels of branched-chain amino acids (BCAAs) and the risk of type 2 diabetes [50]. In an MR analysis using genetic variation at the PPM1K locus (which encodes a mitochondrial phosphatase that activates branchedchain α-ketoacid dehydrogenase [BCKD]), an increase in leucine, isoleucine and valine levels was associated with an increased odds of type 2 diabetes [50]. However, given that BCKD has a range of substrates besides leucine, isoleucine and valine, untangling which of these substrates causes type 2 diabetes is challenging. A separate MR analysis of BCAAs showed that higher BCAA levels have no causal effects on insulin resistance but, rather, genetically raised insulin resistance drives higher circulating fasting BCAA levels [51]. A metabolome-wide MR approach confirmed evidence of the strong reverse causal effect, indicating that the genetic predisposition to type 2 diabetes may trigger early changes in valine and leucine [52]. Other products of amino acid catabolism, such as 2-aminoadipic acid (2-AAA) or α-hydroxybutyrate, are strongly associated with incident type 2 diabetes in observational studies [53], but MR studies have failed to demonstrate evidence of causality [54]. There are many reasons for the discrepancies between observational studies and MR studies, but the fact that observational studies have been conducted in a mixture of individuals with normoglycaemia and impaired glucose tolerance could explain these differences. A study in the Framingham cohort restricted to individuals with strict normoglycaemia at baseline (fasting glucose <5.6 mmol/l) provided evidence of a subset of 19 metabolites associated with the risk of diabetes among apparently healthy individuals [55]. Pathway enrichment analyses and MR showed that metabolites in the nitrogen metabolism pathway are causally related to the development of diabetes [55].
Integration of genomic and small molecule data across platforms enables the discovery of regulators of human metabolism and translation into clinical insights. A recent genomewide meta-analysis of 174 metabolite levels across six cohorts, including up to 86,507 participants, identified~500 genetic loci influencing metabolite levels [56]. Among many relevant findings for dysglycaemia, the study provided evidence that a missense p.Asp470Asn (rs17681684) variant in the GLP2R gene, which encodes the receptor for glucagonlike peptide 2, was associated with a 4% higher type 2 diabetes risk. Findings from a metabolome-wide MR analysis further identified new metabolites that potentially play a causal role in type 2 diabetes, including betaine, glutamic acid, lysine, alanine and mannose [52].
High-throughput detection and quantification of serum proteins in a large human population can provide insight into the molecular processes underlying diabetes risk. A proteinwide MR study examined the associations of 164 proteins with genome-wide association summary statistics available from the independent INTERVAL study and identified 16 proteins as potentially having a causal effect on the development of type 2 diabetes [57]. A recent protein-wide MR study examined the associations of 1089 circulating protein biomarkers with the risk of type 2 diabetes [58]. The analyses identified 20 proteins that might be causally associated with type 2 diabetes. These findings may provide evidence to support therapeutic development in type 2 diabetes.
MR studies on circulating metabolites and proteins usually employ a cis-variant located in an encoding gene region as the instrumental variable, which satisfies three key assumptions of MR. However, these MR associations can still be influenced by the genome-wide associations analyses on metabolites and proteins as well as corresponding profiling process (possible bias caused by batch effects) [59] and different highthroughput platforms [60]. Of note, using cis-variants as instrumental variables may not always completely rule out horizontal pleiotropy, especially when one gene regulates several metabolites and proteins that are not in a common pathway. In this case, multivariable MR analysis or removing the pleiotropic SNPs may help reduce this bias.
Gut microbiota and related metabolites With increasing evidence suggesting that the human gut microbiome plays a role in immune function and metabolic disease, there is a need to discriminate between microbiome features that are causal for disease and those that are a consequence of disease or its treatment. A study including genome-wide genetic data, gut metagenomic sequencing and measurements of faecal shortchain fatty acids showed that a host genetic-driven increase in gut production of butyrate was associated with improved insulin response following an oral glucose test. In contrast, abnormalities in the production or absorption of propionate were causally related to an increased risk of type 2 diabetes [61]. Another two-sample MR study identified seven genera of gut microbiota nominally associated with type 2 diabetes [62]. For gut microbiota-related metabolites, a separate study found that genetically predicted higher trimethylamine N-oxide and carnitine levels were not associated with higher odds of type 2 diabetes. However, the study found possible associations of high choline and low betaine levels with an increased risk of type 2 diabetes [63]. Of note, although many genome-wide association analyses of the gut microbiome have been carried out, high-quality MR studies on the gut microbiome in relation to diabetes are limited [8]. This may raise doubt over the applicability of host genetic variants as an instrumental variable to mimic the function of the gut microbiome.

Assessment of included MR studies on diabetes
The overall quality of the MR studies included was satisfactory, with careful genetic instrument selection criteria, comparatively large sample sizes and different approaches to testing the robustness of the findings. As for the examination of the assumptions of MR, assumption 1 was usually found to be satisfied by using genetic variants associated with the exposure of interest at the genome-wide significance level. However, there was no unified threshold for linkage disequilibrium of SNPs. Using a high or low threshold of linkage disequilibrium could lead to an inflated rate of type 1 and 2 errors, respectively. As MR analysis can minimise confounding, the associations are less likely to be biased by confounding but cannot be completely immune to this bias, especially when genetic instruments have large pleiotropy effects. Except for studies using individual-level data, whether genetic instruments were primarily associated with other phenotypes or were associated with confounders was rarely examined in these MR studies. The most common bias in MR analysis is horizontal pleiotropy caused by violation of assumption 3, the exclusion restriction assumption, which means that genetic variants affect the outcome through alternative pathways, not only through the exposure of interest. The associations with type 1 and 2 diabetes summarised in this review were robust in sensitivity analyses, and most studies used MR-Egger or MR pleiotropy residual sum and outlier (MR-PRESSO) to detect potential horizontal pleiotropy. Of note, even though statistical methods can detect and minimise the influence of horizontal pleiotropy, instrumental variable selection is a crucial process for reducing the bias. Using genetic variants in genes with well-understood biological functions as instrumental variables usually satisfies the assumptions of MR analysis and thus generates precise and correct associations. However, it is difficult to identify specific genetic variants for certain exposures, especially for health behaviours and complex phenotypes. Therefore, a thoughtful examination of pleiotropy should be conducted in analyses using multiple genetic instruments. Evidence from observational studies and clinical trials should be used in interpreting MR findings. Robust MR findings, in turn, should be examined in clinical trials. In addition, it is tricky to interpret MR results, especially for binary exposures. Given that the exposure in MR analysis is not an exact phenotype but is proxied by the effects of genetic variants on a certain trait, this genetically proxied exposure usually mimics a lifetime chronic effect, which hinders the exploration of time-specific associations.

Future perspectives
& The null findings in previous MR studies may have been caused by inadequate power, particularly for weak associations of exposures proxied by a few SNPs that explained a small phenotypic variance. For exposures with robust associations in traditional observational studies, the neutral associations in MR studies deserve to be reexamined in well-powered studies with robust genetic instruments for the exposures and large sample sizes for the diabetes outcomes. & Most previous MR studies were based on summary-level data, which do not allow the exploration of potential nonlinear associations (e.g. J-or U-shaped); rather, it can only be assumed that the association is linear without a threshold effect. MR analysis using individual-level data from large-scale biobanks and studies is needed to examine the non-linearity of the associations. & More effort should be put into MR studies on nonheritable exposures or exposures without genetic association information. For example, MR analyses of the association of diet and physical activity with diabetes risk are warranted. & Most MR studies have been based on data from European populations. With more and more data available from other populations, such as Asian and African populations, future MR studies are encouraged to include data from multi-ancestry cohorts. & Even though the associations between protein biomarkers and diabetes risk were examined in a few MR studies [11,58], more independent verification is needed to confirm these findings. In addition, the intermediate roles of blood proteins and metabolites in the pathways from environmental exposure to diabetes should be investigated to provide evidence for treatment and intervention. & Even though many statistical approaches, such as the weighted median, MR-Egger, MR-PRESSO, MR-Cluster and contamination mixture methods, have been developed to detect pleiotropy and verify the association with different assumptions, more efforts are needed to generate new statistical approaches to handle pleiotropy and other limitations.

Conclusion
This review has integrated data from published MR studies on type 1 and 2 diabetes to highlight the many possible causal risk factors for dysglycaemia. While few studies have been conducted for type 1 diabetes, most MR analyses support that social, demographic, metabolic and lifestyle factors are causally associated with the development of type 2 diabetes. More MR studies in multi-ancestry cohorts are needed to examine the role of diet in the development of diabetes. MR investigations based on data on metabolites, protein biomarkers and the gut microbiome may help to illustrate the pathological molecular basis of diabetes.

Supplementary Information
The online version contains a slide of the figure for download available at https://doi.org/10.1007/s00125-023-05879-7.
Funding Open access funding provided by Karolinska Institute.
Authors' relationships and activities JM is an Associate Editor for Diabetologia but played no role in evaluating this manuscript.
Contribution statement All authors were responsible for drafting the article and revising it critically for important intellectual content. All authors approved the version to be published.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .