Recent Developments in Mendelian Randomization Studies

Purpose of Review Mendelian randomization (MR) is a strategy for evaluating causality in observational epidemiological studies. MR exploits the fact that genotypes are not generally susceptible to reverse causation and confounding, due to their fixed nature and Mendel’s First and Second Laws of Inheritance. MR has the potential to provide information on causality in many situations where randomized controlled trials are not possible, but the results of MR studies must be interpreted carefully to avoid drawing erroneous conclusions. Recent Findings In this review, we outline the principles behind MR, as well as assumptions and limitations of the method. Extensions to the basic approach are discussed, including two-sample MR, bidirectional MR, two-step MR, multivariable MR, and factorial MR. We also consider some new applications and recent developments in the methodology, including its ability to inform drug development, automation of the method using tools such as MR-Base, and phenome-wide and hypothesis-free MR. Summary In conjunction with the growing availability of large-scale genomic databases, higher level of automation and increased robustness of the methods, MR promises to be a valuable strategy to examine causality in complex biological/omics networks, inform drug development and prioritize intervention targets for disease prevention in the future.


Introduction
Causal inference in traditional observational epidemiological studies is hampered by the possibility of confounding and reserve causation [1]. Mendelian randomization (MR) is a method that can be used to uncover casual relationships between an exposure and outcome in the presence of such limitations. MR is a form of instrumental variable analysis, where genetic variants are used as proxies for the exposure of interest [2]. As Mendel's Laws of Inheritance dictate, alleles segregate randomly from parents to offspring. Thus, offspring genotypes are unlikely to be associated with confounders in the population. In addition, germ-line genotypes are fixed at conception, and therefore, temporally precede the variables under observation, avoiding issues of reverse causation. The MR method involves finding genetic variants which are associated with an exposure, and then testing the association between these variants and the outcome. The causal "de-confounded" relationship between exposure and outcome can then be estimated when the necessary conditions are satisfied (Fig. 1a).
In understanding how MR works, it can be useful to think of an MR study as being analogous to a randomized controlled trial (RCT), except that genotypes are used to randomize participants into different levels of the exposure/treatment. However, it is important to realize that this analogy is not perfect e.g., RCTs typically involve treatments over a short duration, whereas an individual's genetics influences their biology from conception, meaning that many causal estimates from MR studies might reflect life-long exposures as well as developmental compensation that may arise from inheriting these mutations [4••].
Although initial applications of MR mostly focused on estimating the causal effect of environmental exposures on medically relevant outcomes, in recent years MR has found utility across a wide range of domains including the development of pharmaceutical agents (i.e., drug target validation, drug target repurposing, and side effect identification) and in the interpretation of high-dimensional omics studies. Table 1 lists several recent studies illustrating how MR has been used successfully across a wide variety of different contexts [5-37•, 106].

Core Assumptions Underlying Causal Inference in Mendelian Randomization Studies
In order for a genetic variant to qualify as a valid instrument for causal inference in a MR study, it must satisfy three core assumptions: Assumption 1: The genetic variant must be truly associated with the exposure (NB the SNP need not be the functional variant responsible for the SNP-exposure association). Typically, SNPs which pass genomewide significance (P < 5 × 10 −8 ) and have been replicated in an independent sample are used as instruments in MR studies. The use of weak instruments can bias MR estimates towards the confounded observational estimate in one-sample MR settings and towards the null in two-sample MR settings (with nonoverlapping samples). As common genetic variants The causal relationship between an exposure variable (X) and an outcome (Y) is estimated using genetic variants (Z) as an instrument, regardless of the presence of variables (C) that may confound the observational association between the exposure and outcome. One method of estimation involves calculation of the Wald Ratio, [see Burgess review paper for description of the various instrumental variable (IV) estimators available] [3], where the causal estimate (β IV ) is derived by dividing the estimated regression coefficient of the outcome on the single nucleotide polymorphism (SNP) (β YZ ) by the estimated regression coefficient of the exposure on the SNP (β XZ ). b Two-sample MR. c Bidirectional MR. d Mediation and two-step MR. e Multivariable MR. f Factorial MR frequently explain a small proportion of a trait's variance, it may be useful to combine the effects of many SNPs together in an allelic score and use this as an instrument in MR studies. Assumption 2: The genetic variant should not be associated with confounders of the exposure-outcome relationship. Although it is technically impossible to prove that this assumption holds in a MR study, it may be possible to disprove it by examining the association between the variant and known confounders of the exposure-outcome relationship. Assumption 3: The genetic variant should only be related to the outcome of interest through the exposure under study. This is commonly referred to as the "no pleiotropy" assumption or the exclusion restriction criterion. Horizontal pleiotropy, where a SNP is associated with multiple traits independently of the exposure of interest, potentially violates this assumption. While it is not possible to prove that this assumption holds in an MR study, various extensions of the basic MR design can be used to detect its presence, and estimate the causal effect of the exposure even in the presence of such violation of the assumption (see below).   Conduct MR on a similar exposure (proxy phenotype) for which GWAS data is available, for example, BMI is often used as a proxy of overall adiposity [18,19]. Polygenic score (PGS) approaches can be used when there is a lack of reliable instruments [38•, 39-41], although the results of these analyses must be interpreted cautiously due to concerns about their lack of specificity and possible reintroduction of pleiotropy Population stratification Spurious associations may arise in MR where the genetic variant and the outcome are associated with ancestral background in an admixed or stratified sample Use genetic associations derived from within homogenous populations only. Use summary results statistics from GWAS that have adequately controlled for population substructure through e.g. principal components analysis or linear mixed models Low power Causal estimates are imprecise (wide confidence intervals) and the MR analysis lacks power to detect a causal effect (1probability (type II error)). Power is a function of sample size, variance explained in the exposure by the SNP, causal effect size, strength of confounding, and type 1 error rate. Approximate power can be determined using a freely available web application [33] Same as above for weak instruments

Horizontal pleiotropy
The genetic instrument is associated with the outcome via pathway that does not pass through the exposure of interest [20] Better understanding of the underlying biological function of genetic variants/genes (for example using selected candidate loci). Utilizing variants which directly code for the exposure of interest (e.g., variants in the C-reactive protein gene to proxy serum levels of C-reactive protein). Specialized methods for providing estimates robust to horizontal pleiotropy with some relaxation of the IV assumptions (e.g., MR-Egger regression [42••]; Weighted Median approach [43•]), and for detecting pleiotropy, are described in Table 4 Linkage disequilibrium (LD) Confounding can be re-introduced in the analysis through a variant in LD with the instrument that exerts an effect on the outcome through a pathway other than through the exposure of interest Even when these core assumptions have been met, MR has a number of limitations which need to be considered (summarized in Table 2), and which have been discussed at length elsewhere [2,[45][46][47][48][49]50•, 51•, 52•].

Design of Mendelian Randomization Studies
The term MR covers a variety of approaches that use genetic variants to make inferences about the causal relationship between traits of interest [45, 52•]. Figure 1 illustrates some extensions to the basic MR design which are described in more detail in the paragraphs below.

Two-Sample Mendelian Randomization
Prior to 2011, most MR analyses were conducted using genetic instruments, exposure, and outcome of interest from individuals measured in the same sample (this is termed one-sample MR or single-sample MR). In such a scenario, the causal effect of the exposure on the outcome was typically estimated using 2-stage least-squares (2SLS) regression [53] (Fig. 1a). However, it is also possible to use MR to estimate causal effects where data on the exposure and outcome have been measured in different (or only partially overlapping) samples. This is known as two-sample MR [54•] (Fig. 1b). There are many advantages of using two-sample MR including in situations where it is difficult and/or expensive to measure the exposure and outcome in the same set of individuals (e.g., studies involving molecular gene expression data). Two-sample MR greatly increases the scope of MR analysis and continues to grow in popularity. For example, two-sample MR analyses can be performed on publicly available genomewide association study (GWAS) summary data, a fact that has been taken advantage of by web software (and R packages) like MR-Base [55••]. Two-sample MR is understandably becoming increasingly popular in the research community. The percentage of all MR studies that used the two-sample design framework rose from close to 0% in 2011 to around 40% in 2016 [56].

Bidirectional Mendelian Randomization
In bidirectional MR, instruments for both exposure and outcome are used to evaluate whether the "exposure" variable causes the "outcome" or whether the "outcome" variable Description Solution instruments is not a good idea because estimates of the SNP-exposure association will be biased upwards. In the case of two-sample MR, genetic associations published in discovery GWAS may overestimate the SNP-trait association, particularly if the GWAS is underpowered to detect the particular loci. In the case of GWAS of exposures, this will overestimate the effect of the genetic instrument relative to the exposure and result in bias of the causal estimate towards the null. Likewise, in the case of GWAS of outcomes, winner's curse will overestimate the association between the genetic instrument and the outcome and lead to a bias in causal estimates away from the null In the case of single-sample MR, using an unweighted allelic score of several variants may provide a sensitivity analysis. In the case of two-sample MR, utilize estimates from the replication analysis if these are appropriately precise Trait heterogeneity Genetic instruments are sometimes associated with multiple aspects of traits (exposures). Such heterogeneity does not preclude causal inference but it does undermine the ability to infer causality for particular dimensions of heterogeneous exposures and makes interpretation of MR analyses more difficult Only biological knowledge was able to resolve the particular dimension of the biological pathway causally relevant to certain outcomes (e.g., diseases). Further research is required in this area

Collider bias
Collider bias occurs when the exposure and outcome of interest independently influence a third risk factor, and this third risk factor is conditioned upon. Collider bias is mostly likely to occur in MR studies which are influenced by high attrition rates (loss to follow up), in case only studies (disease progression), and in MR studies where the instruments are generated in a GWAS which conditions one phenotype on another (e.g. waist circumference on CHD adjusted for BMI) [44•] Given the influence of selection and attrition on a study is known, this could lead to biased estimates of both phenotype and genetic association. For example, having DNA available on most participants in a birth cohort study offers the possibility of investigating the extent to which polygenic scores predict subsequent participation, which in turn would enable sensitivity analyses of the extent to which bias might distort estimates causes the "exposure" (Fig. 1c) [57•]. For example, in explaining the observational relationship between low levels of LDL cholesterol and risk of cancer, it may not be clear whether low levels of LDL cholesterol are causal for cancer, whether the presence of (undetected) cancer has a negative effect on LDL cholesterol, or whether the correlation between the two is due to latent confounding [58]. Bidirectional MR can help tease apart these relationships. MR analysis is first performed in one direction (i.e., "exposure" to "outcome"), and then performed in the opposite direction (i.e., "outcome" to "exposure") using the SNPs robustly associated with each trait in the separate GWASs. The approach assumes that the causal association works through an underlying mechanism where it is possible to determine a single causal temporal direction. However, the complexity of biological systems, such as the existence of feedback loops between exposure and outcome variables, may make interpretation of the results of such analyses difficult [52•]. In these situations, it may be possible to use structural equation modeling to estimate feedback loops, although the properties of such approaches have yet to be examined thoroughly [51•].

Two-Step Mendelian Randomization
Two-step MR is used to assess whether an intermediate trait acts as a causal mediator between an exposure and an outcome [59•]. As shown in Fig. 1d, in the first step of the procedure, genetic instruments for the exposure are used to estimate the causal effect of the exposure variable on the potential mediator. In the second step of the procedure, genetic instruments for the potential mediator are used to assess the causal effect of the mediator on the outcome. Evidence of association in both steps implies some degree of mediation of the association between the exposure and the outcome by the intermediate variable.

Multivariable Mendelian Randomization
In some situations, genetic variants are pleiotropically associated with multiple correlated phenotypes. For example, genetic variants associated with lipoprotein metabolism rarely correlate with only one specific lipid fraction [61, 62•]. Single variable MR is likely to result in misleading conclusions regarding causality due to the presence of this horizontal pleiotropy. Multivariable MR is able to overcome this problem by using instruments associated with multiple exposures to jointly estimate the independent causal effect of each of the risk factors on the outcome (Fig. 1e) [63•, 64, 65]. For example, multivariable MR has recently been successfully employed in examining the relationship between high-density lipoprotein cholesterol and coronary heart disease. Univariate MR analyses, which ignore potential pleiotropic effects from other lipid fractions, suggest that increasing HDL levels lowers the risk of coronary heart disease. However, multivariable MR, which is able to account for SNPs' pleiotropic effects through lowdensity lipoprotein and triglyceride levels [21•, 22], indicates that HDL is not causal for coronary heart disease, consistent with much of the evidence from randomized controlled trials [66][67][68].

Factorial Mendelian Randomization
The manner by which causes of disease act together to increase disease risk can have important public health implications, as above-additive effects act together to generate a greater burden of disease in the population [69]. Factorial MR can be used to determine the combined causal effects of the co-occurrence of two or more risk factors for disease [6•, 45] (Fig. 1f). In order to conduct factorial MR, individual level genotype data are required. For example, Ference et al. conducted a factorial MR study in order to investigate the effects of HMGCR and PCSK9 inhibition on CHD risk. In this study, a weighted genetic score for PCKS9 inhibition was constructed (with the weighting based on each SNP's effect on LDL cholesterol levels) and participants were allocated into either a high or low inhibition group based on the median value of the PCSK9 score. The genetic score for HMGCR inhibition was constructed and the individuals were further allocated into groups based on the median value of the HMGCR score (Fig. 1f). The causal estimates for PCSK9 and HMGCR inhibition, and the combined effect of the two on CHD could then be determined. Results from this factorial analysis suggested that HMGCR and PCSK9 inhibition have independent effects on CHD, and act together in an additive manner to reduce CHD risk [16]. Another example of a factorial MR suggested that CETP inhibitors and statins were associated with decreased LDL-C and apoB levels and reduced risk of cardiovascular events. The reduction in CVD risk was proportional to the apoB reduction but less than expected for the LDL-C reduction [70].

Resources for Performing Mendelian Randomization Analyses
MR, and in particular two-sample MR, provides a powerful, cost-efficient, and simple way to investigate potential causal relationships between many different human traits.
Usefully, many GWAS consortia have made the results of their meta-analyses publicly available, greatly facilitating the running of such analyses [71][72][73]. For example, as a centralized GWAS data resource, Phenoscanner [74], can be used to search for genetic association across a large number of phenotypes. In addition, Ben Neale's group have recently provided GWAS results of more than 2400 human traits based on up to 337,000 individuals from the latest UK Biobank release enabling two-sample MR analyses on a very large number of individuals (data can be downloaded from http://www.nealelab.is/blog/2017/7/19/ rapid-gwas-of-thousands-of-phenotypes-for-337000samples-in-the-uk-biobank). Several large-scale biobanks, such as the UK Biobank [75], the China Kadoorie biobank [76], and the HUNT study [77] allow researchers to apply for (a certain level of) genotype and phenotype information on large numbers of participants. These data sources can be used in one-or two-sample MR analyses when combined with other datasets. This idea led to the development of MR-Base [55••], which retrospectively collected, harmonized, and centralized complete GWAS summary datasets from the public domain. The curated summary data corresponds to 135 diseases, almost 2000 phenotypes in 1.5 million individuals and up to 4 billion SNP-trait associations, which is integrated with a software infrastructure (web interface, R package and API) for automating MR analyses. Therefore, MR-Base greatly increases the accessibility of GWAS summary results to other researchers, accelerates identification (discovery strand), prioritization (evidence synthesis strand), and evaluation (translational strand) of intervention targets.

Hypothesis-Free Investigations and "Mining the Phenome"
While there is obvious value in using MR to investigate the relationship between phenotypes for which causality has already been hypothesized, there is also an interest in detecting novel causal relationships. Hypothesis-free study designs such as genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have shown tremendous success in recent years, and there are some instances where this strategy has shown promise in detecting putative causal relationships between phenotypes [38•, 51•, 78]. In a recent "one exposure to many outcomes" MR application, Haycock et al. systematically examined the association between telomere length and 22 cancers and 32 primary nonneoplastic diseases. The results suggested that longer telomeres were generally associated with increased risk for sitespecific cancers but reduced risk for some non-neoplastic diseases, including cardiovascular diseases. This study highlighted the power of hypothesis-free MR in building a phenome-wide picture of traits of interest as opposed to the traditional "one exposure to one outcome" MR approach [28•].
Automation and data repositories provide solutions to some of the challenges involved in hypothesis-free MR. They trivialize the process of performing the analysis itself, and go some way towards improving reliability by (a) reducing human error [56] and (b) promoting the use of appropriate sensitivity analyses [3, 42••, 43•, 79, 80•, 81•]. However, many challenges still remain. Statistical power in MR is an issue even in the hypothesis-driven case, but hypothesis-free MR comes with a multiple testing burden that may be highly problematic. The nature of the data used in hypothesis-free MR is quite different from other hypothesis-driven study designs. There is often only a single consortium providing summary data for any one disease or trait which means that replication of a putative association in independent samples can be impossible. The emergence of large biobanks [75] may go some way to avoid this problem for many complex traits, but specific diseases for which cases need to be ascertained will still pose a challenge.
Another practical issue surrounds selecting those results from a hypothesis-free scan that are worthy of follow-up. Horizontal pleiotropy can manifest in many different patterns, which means that knowing the appropriate MR method to use for any particular pair of traits is difficult. Relying on a single method could lead to missed associations through of being overly conservative when there is no pleiotropy, or result in too many false positives because of miss-specifying the pleiotropic model. One method that has been developed recently to address this issue is MR-MoE (MR mixture of experts), which seeks to predict the most appropriate model based on the characteristics of the summary data [82•].
Another potential analytical strategy to mine the phenome would be to screen large publicly available disease and multiomic GWAS summary results for evidence of genetic correlation using LD score regression via LD hub [83••, 84•]. Here, if traits are causally related and have non-zero heritability then there should be non-zero genetic correlations. However, genetic correlations can arise due to genetic confounding and horizontal pleiotropy and do not provide evidence on the direction of causality. Those disease-omic pairs showing evidence of genetic correlation could be followed up by conducting formal MR analyses [85]. One potential drawback of this approach is that the statistical efficiency of LD score regression may not be as high as that of MR in many cases, so selecting the appropriate scenarios in which to apply this as a screening method is important and warrants a priori power calculations.

The Role of MR in Disease Progression and Treatment
To date, the large majority of GWAS identify genetic variants (SNPs) associated with incidence or risk of disease. Such variants are informative for disease prevention, but not necessarily for treatment aimed at influencing disease progression [86,   Provides an unbiased causal estimate in the presence of pleiotropy, by subtracting the pleiotropic effect (of an instrument) estimated in a subgroup of the population for whom the instrument is not associated with the exposure. It assumes that such a sub-population exists, and the measure of genetic pleiotropy in this sub-population is the same as the general population [27,95] Generalized gene-environment interaction models  Table 3 and 4) [44•, 104].

The Development of Approaches to Detect and Correct for Horizontal Pleiotropy in MR Analysis
The possibility of horizontal pleiotropy and the consequent violation of the exclusion restriction criterion are widely seen Assumes a linear relationship between the exposure and outcome and a bivariate normal distribution for the genetic estimates used. Naturally incorporate uncertainty and correlation between SNP-exposure and SNP-outcome parameter estimates. Yield efficient estimates when its specific modelling assumptions are met (and is therefore robust to weak instrument bias in this case), but is sensitive to model misspecification (for example due to invalid instruments). [3,79] Bayesian model averaging Assumes prior probabilities for three pleiotropy models: (1) no pleiotropy; (2)  as the greatest threat to the validity of MR studies. Over the last few years, investigators have developed a suite of approaches that relax the strict requirement that genetic instruments exhibit no horizontal pleiotropy yet still produce causal effect estimates that are asymptotically consistent. These approaches often rely on different sets of assumptions to each other, meaning that if the results from all of these different analyses are largely consistent, then the investigator can be more confident in drawing conclusions regarding causality. MR-Egger regression [42••] is one such approach where given a set of genetic variants that proxy an exposure variable of interest, estimates of the SNP-outcome association are regressed on estimates of the SNP-exposure association (this can be done in a one or two-sample MR framework), where each data point is weighted by the precision of the SNPoutcome coefficients. The slope of the weighted regression is an estimate of the causal effect of the exposure on the outcome. The intercept in this regression is free to vary, and the degree to which it departs from zero, is a function of the degree of directional pleiotropy present in the data.
The MR-Egger approach relaxes the requirement of no horizontal pleiotropy among the SNPs. Instead it assumes that there is no correlation between the gene-exposure association and the direct effect of the genetic variants on the outcome. This is referred to as the InSIDE assumption (Instrument Strength Independent of Direct Effect) and is a weaker requirement than the stricter exclusion restriction criterion. A drawback of the MR-Egger method is that it tends to suffer from low statistical power and is particularly susceptible to bias from weak instruments.
The weighted median estimator [94•] is a complementary method that permits up to 50% of the information in the MR analysis to come from SNPs that are invalid instruments. The mode-based estimate (MBE) further relaxes the assumption required for the weighted median approach and can estimate the causal effect when the most common pleiotropy value across instruments is zero [80•]. In addition, Bayesian modeling alternatives, such as Bayesian model averaging [81•], are under development, and may provide a framework to model pleiotropic effects and further relax MR assumptions, extending the scope of MR analysis.
In some circumstances, effect estimates are not consistent across independent instruments (e.g., with some genetic instruments showing unexpectedly large or small effects on the outcome, given the magnitude of their exposure effect), which could be indicative of horizontal pleiotropy. Formal statistical tests for heterogeneity can be used to assess this, such as Cochran's Q statistic (for IVW) and the Rucker's Q (for MR-Egger) [98,103].
In addition to the above-mentioned methods, visual inspection can be helpful to identify pleiotropic variants (e.g., outlier detection). For example, funnel plots are used to display the MR estimate of individual genetic variants against their precision. Asymmetry in the funnel plot may arise due to some genetic variants having unusually strong effects on the outcome, which is indicative of directional pleiotropy [42••]. In addition, heterogeneous effects can be visualized by scatterplots of the gene-outcome and gene-exposure associations [94•] and forest plots of Wald ratios for each independent genetic instrument. In the leave-one-out plot, one SNP is removed at a time and the overall effect estimate is recalculated so that influential individual SNPs can be identified. As sensitivity analyses, all the above visualization methods are implemented in the MR-Base R package [55••]. Other graphical approaches have been proposed recently, such as the radial plot [99] and Q-contribution plots [103], which can further help to assess heterogeneity across genetic variants and detection of pleiotropic variants.

The Development of Approaches to Assess Instrument Strength
It is important to assess the instrument strength in order to avoid weak instrument bias in MR analysis. When weak instruments are estimated in GWAS with small sample sizes, MR approaches can violate the "NO Measurement Error" (NOME) assumption, which assume that the SNP-exposure associations (weights of the regression) are estimated without measurement error [43•]. For IVW, weak instruments that violate the NOME assumption can be reliably detected using the mean F-statistic [102]. For MR-Egger, the degree of violation of the NOME assumption can be quantified using the I 2 statistic (I GX   2 ), a number ranging between 0 and 1, with higher values indicating less dilution of the causal effect estimate [43•].

Conclusion
MR is a flexible and robust statistical method which uses genetic variants as instrumental variables to detect and quantify causal relationships in observational epidemiological studies. In this review, we have endeavored to illustrate promising new findings and potential pitfalls of MR. The design strategies, assumptions, limitations, and potential of MR have been discussed. Given the growing availability of large-scale genetic resources and automated toolkits for implementing these methods, such as MR-Base and LD hub, we are now able to analyze all pairwise relationships within large multidimensional data sets in a hypothesis-free manner, producing evidence that can then be followed up in subsequent in-depth investigations. Human and Animal Rights and Informed Consent This article contains no studies with human or animal subjects performed by any of the authors.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

References
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance