# Interpreting findings from Mendelian randomization using the MR-Egger method

## Abstract

Mendelian randomization-Egger (MR-Egger) is an analysis method for Mendelian randomization using summarized genetic data. MR-Egger consists of three parts: (1) a test for directional pleiotropy, (2) a test for a causal effect, and (3) an estimate of the causal effect. While conventional analysis methods for Mendelian randomization assume that all genetic variants satisfy the instrumental variable assumptions, the MR-Egger method is able to assess whether genetic variants have pleiotropic effects on the outcome that differ on average from zero (directional pleiotropy), as well as to provide a consistent estimate of the causal effect, under a weaker assumption—the InSIDE (INstrument Strength Independent of Direct Effect) assumption. In this paper, we provide a critical assessment of the MR-Egger method with regard to its implementation and interpretation. While the MR-Egger method is a worthwhile sensitivity analysis for detecting violations of the instrumental variable assumptions, there are several reasons why causal estimates from the MR-Egger method may be biased and have inflated Type 1 error rates in practice, including violations of the InSIDE assumption and the influence of outlying variants. The issues raised in this paper have potentially serious consequences for causal inferences from the MR-Egger approach. We give examples of scenarios in which the estimates from conventional Mendelian randomization methods and MR-Egger differ, and discuss how to interpret findings in such cases.

## Keywords

Mendelian randomization Instrumental variable Robust methods MR-Egger Summarized data## Introduction

- 1.
associated with the risk factor;

- 2.
not associated with any confounder of the risk factor–outcome association;

- 3.

Mendelian randomization-Egger (MR-Egger) is a statistical method that can be employed when the instrumental variable assumptions do not hold, but a weaker assumption is satisfied [9]. The method is being increasingly used in practice, with applications including analyses of the causal effects of plasma urate on coronary heart disease risk [10], of height on income [11], of sleep patterns on type 2 diabetes [12], and of pubertal development on prostate cancer risk [13]. However, critical assessment of the method is lacking. In this paper, we first discuss implementation of the MR-Egger method. We then provide guidance to the applied practitioner in what circumstances the method will give reasonable estimates, and how to interpret these estimates, particularly for cases where the MR-Egger and conventional methods for Mendelian randomization give different results.

## Implementation of the MR-Egger method

MR-Egger consists of three parts: (1) a test that indicates both violations of the instrumental variable assumptions and bias in conventional instrumental variable analysis methods; (2) a test for a causal effect; and (3) an estimate of the causal effect. Software code in R for implementing all of the analyses in this paper is provided in “Appendix A.1” in supplementary material.

### Assumed framework of data and genetic associations

We assume that all relationships between variables (in particular, the genetic associations with the risk factor and with the outcome, and the causal effect of the risk factor on the outcome) are linear with no effect modification. We also assume that all genetic variants are uncorrelated (that is, not in linkage disequilibrium), although conventional instrumental variable methods for analysing summarized data from correlated variants have been developed [14], and similar extensions to the MR-Egger method are discussed later in this paper. The association between genetic variant \(G_j\) (\(j = 1, 2, \ldots , J\)) and the outcome is denoted \(\beta _{Yj}\), and the association between genetic variant \(G_j\) and the risk factor is denoted \(\beta _{Xj}\).

### Inverse-variance weighted method

*j*th genetic variant, and likewise \({\hat{\beta }_{Xj}}\) from univariable regression of the risk factor on the

*j*th genetic variant. With multiple genetic variants, the ratio estimates from each genetic variant can be averaged using an inverse-variance weighted formula taken from the meta-analysis literature to provide an overall causal estimate known as the inverse-variance weighted (IVW) estimate [17]. This assumes that the ratio estimates all provide independent evidence on the causal effect; this occurs when the genetic variants are uncorrelated. If the variance terms are taken as \(\frac{{{\mathrm{se}}}({\hat{\beta }_{Yj}})^2}{\hat{\beta }_{Xj}^2}\) (this is the first term from a delta method expansion for the ratio estimate [18]), then the pooled estimate (assuming a fixed-effect model) is [19]:

If the pleiotropic effects of the genetic variants are all zero (\(\alpha _j = 0\) for all *j*; in other words, if all genetic variants are valid instrumental variables), then each of the \(\hat{\theta }_j\) will be a consistent estimate of the causal effect, and the overall estimate \(\hat{\theta }_{IVW}\) (a weighted mean of the \(\hat{\theta }_j\)) will be a consistent estimate of the causal effect.

### MR-Egger method

Under the InSIDE assumption, the intercept from the MR-Egger analysis can be interpreted as the average pleiotropic effect of a genetic variant included in the analysis (the weighted mean of the \(\alpha _j\) using the inverse-variance weights \({{\mathrm{se}}}(\hat{\beta }_{Yj})^{-2}\)). If the average pleiotropic effect is zero (known as balanced pleiotropy), then the IVW method gives a consistent estimate of the causal effect (under the InSIDE assumption). Conversely, if the intercept from the MR-Egger analysis is not equal to zero, then either the average pleiotropic effect differs from zero (known as directional pleiotropy) or the InSIDE assumption is violated (or both). Hence, testing the intercept from the MR-Egger analysis provides an assessment of the validity of the instrumental variable assumptions, with a non-zero intercept indicating that the IVW estimate is biased. The test of whether the intercept differs from zero is referred to as the MR-Egger intercept test.

### Intuitive motivation for MR-Egger analysis

Figure 2 provides two examples where the estimates from the IVW and MR-Egger methods differ substantially. The left panel of Fig. 2 is a simulated illustration, whereas the right panel is a real-data example where the risk factor is plasma urate and the outcome is coronary heart disease (CHD) risk [10] (the choice of genetic variants and the associations with plasma urate are taken from White et al. [10]; associations with CHD risk are taken from the CARDIoGRAMplusC4D consortium 2015 data release [23], see Web Table A1 in supplementary material). The horizontal axis of the graph displays the estimated genetic associations with the risk factor (\(\hat{\beta }_{Xj}\)); the vertical axis displays estimated genetic associations with the outcome (\(\hat{\beta }_{Yj}\)). Each point on the graph represents a single genetic variant; lines represent 95% confidence intervals for the genetic associations. For any individual genetic variant, the ratio estimate \(\hat{\theta }_j\) is the gradient of the line connecting the relevant datapoint to the origin. The IVW estimate (solid line) is a weighted mean of these ratio estimates.

A conventional Mendelian randomization analysis—defined as an analysis in which the instrumental variable assumptions are assumed to hold for all of the genetic variants—assesses whether genetic variants that are associated with the risk factor also associate with the outcome. Median-based methods assess whether the majority (or weighted majority) of genetic variants are associated with the outcome. In comparison, MR-Egger assesses whether there is a dose–response relationship between the genetic associations with the risk factor and those with the outcome.

### Orientation of the genetic variants

Genetic associations are usually the per allele associations of the genetic variants with the risk factor and with the outcome. Associations of genetic variants (assumed here to be single nucleotide polymorphisms, SNPs, although other polymorphisms could also be considered) can be quoted with respect to either the major or the minor allele. For example, if a genetic variant has a C allele and a T allele, the association could equivalently be given as (say) 0.243 units per additional copy of the C allele, or as \(-0.243\) units per additional copy of the T allele. There is no prior reason why one orientation should be preferred over the other.

To address this issue, we orientate the genetic variants so that the associations with the risk factor all have the same sign. This means that directional pleiotropy is defined with respect to the risk factor-increasing allele (or equivalently, the risk factor-decreasing allele). Orientating the genetic variants in this way means that the MR-Egger analysis does not depend on the original coding of the genetic variants, and directional pleiotropy is perhaps more likely to be detected. It will be detected if pleiotropic effects tend to act in a consistent direction that corresponds to increases (or decreases) in the risk factor (particularly if the InSIDE assumption is additionally violated and genetic variants having greater associations with the risk factor also have larger pleiotropic effects).

As genetic variants included in a Mendelian randomization analysis are usually chosen as those having statistically robust associations with the risk factor, it is unlikely that the identity of the risk factor-increasing allele for a genetic variant is uncertain. However, if a genetic variant has a weak association with the risk factor, then a small change in its association with the risk factor from positive to negative will change its orientation in the MR-Egger analysis, thus potentially having a large impact on the MR-Egger causal estimate and intercept terms. This situation may arise if the genetic variants are chosen with respect to one variable, but associations are estimated with respect to a related variable (for example, genetic variants are chosen on the basis of their association with body mass index, but the risk factor of interest is a site-specific measure of adiposity) or are estimated in another population (for example, genetic variants are chosen on the basis of their association in European-descent individuals, but the associations used in the analysis are estimated in African-descent individuals). It may be prudent in such a situation to orientate variants according to their associations in the larger dataset.

## Interpretation of results from the MR-Egger method

In this section, we present issues relating to the interpretation of results from the MR-Egger method, including the precision of estimates, influence of outlying variants, violations of the InSIDE assumption, and situations where the MR-Egger and conventional methods give differing results.

### Precision of the MR-Egger estimate

While the precision of the IVW estimate depends on the proportion of variance in the risk factor explained by the genetic variants (typically measured by the \(R^2\) statistic) [28], the precision of the MR-Egger estimate additionally depends on the variability between the genetic associations with the risk factor [29]. In a hypothetical case where several genetic variants have almost equal associations with the risk factor, the IVW estimate may be very precise, particularly if the associations with the outcome are similar to each other (Fig. 4; left panel—grey area represents 95% confidence interval for the IVW estimate). However, in the MR-Egger analysis, the precisions of the intercept and causal estimates will be low (Fig. 4; right panel—grey area represents 95% confidence interval for the MR-Egger intercept and causal estimate). This behaviour can be diagnosed using the \(I^2\) statistic from the meta-analysis literature as proposed by Bowden et al. [29]. The Bowden \(I^2\) statistic is a measure of instrument strength for the MR-Egger method; values close to one indicate that the MR-Egger estimate does not suffer from ‘weak instrument bias’. In fact, if the genetic associations with the risk factor are exactly equal, then neither parameter in the MR-Egger regression model is formally identified, and the \(I^2\) statistic is zero.

The standard error of the causal estimate from the MR-Egger method will typically be larger than that from the IVW method; this will always be the case for fixed-effect analyses. A precise MR-Egger estimate requires genetic variants having a wide range of associations with the risk factor. However, as we discuss next, if one genetic variant has a much stronger association with the risk factor than others, then this variant will have a large influence on the coefficients in the MR-Egger regression.

### Influence of outlying variants on MR-Egger estimates

*FTO*gene region) has a much stronger association with the risk factor than other variants [30]. Influential points can be detected by standard regression diagnostic tools, such as calculating Cook’s distances and/or Studentized residuals for all the datapoints [31], and performing a leave-one-out analysis [32]. Cook’s distance is a measure of leverage, indicating the influence of a datapoint on the regression estimates (larger values indicate greater influence). A Studentized residual is a residual from the regression model divided by an estimate of its standard error, indicating the goodness-of-fit in the regression model for that point (larger values indicate more outlying points). A leave-one-out analysis is conducted by leaving each genetic variant out of the Mendelian randomization analysis in turn, conducting

*J*analyses each with \(J-1\) datapoints.

We calculated Cook’s distances and Studentized residuals for all the variants included in the MR-Egger analysis of plasma urate and CHD risk presented in Fig. 2 (right panel). The genetic variant with both the largest Cook’s distance and Studentized residual was not one of the two variants having the greatest association with plasma urate, but the variant having the strongest association with CHD risk (rs653178, nearest gene *ATXN2*). However, the omission of this variant did not substantially affect the MR-Egger analysis (neither the rejection of the intercept test, nor the failure to detect a causal effect).

### Plausibility and violations of the InSIDE assumption

While the MR-Egger intercept test does not require the InSIDE assumption to be satisfied to detect violations of the instrumental variable assumptions, the interpretation of the intercept as an average pleiotropic effect, as well as the assessment and estimation of a causal effect using MR-Egger, do rely on the InSIDE assumption. Equally, although the primary assumption for the IVW method is that all variants are valid instruments, it also provides consistent estimates when the average pleiotropic effect is zero and the InSIDE assumption is satisfied. Although the initial presentation of the MR-Egger method [9] gave biased estimates with inflated Type 1 error rates when the InSIDE assumption was not satisfied, the bias and Type 1 error inflation were both less than those for the IVW method. However, subsequent simulations have shown that estimates from the MR-Egger method can be more biased and have greater Type 1 error rates compared with the IVW method in settings when pleiotropic effects of multiple genetic variants act through the same confounder [25]. Hence, the InSIDE assumption is crucial to the interpretation of causal inferences from the MR-Egger method in the case of pleiotropy.

Some general plausibility of the InSIDE assumption can be inferred from the observation that genetic associations with different measured variables tend to be uncorrelated with each other, as demonstrated in empirical studies [33]. If all the genetic variants in a Mendelian randomization analysis have pleiotropic effects, but the pleiotropic effects act via unrelated variables that are not confounders of the risk factor–outcome associations, then the InSIDE assumption seems likely to hold. However, if the pleiotropic effects of several variants all act via the same confounder, the pleiotropic effects and instrument strengths will be strongly correlated, as both depend on the magnitude of the associations of the genetic variants with the confounder. Similarly, if a genetic variant has a pleiotropic effect via a confounder, then this will lead to an association with the risk factor (contributing to the instrument strength) and an association with the outcome (contributing to the pleiotropic effect). Crucially, if the effect of the genetic variant on the confounder increases, then its association with both the risk factor and with the outcome will increase. This means that genetic variants with larger effects on confounders will tend to have both larger instrument strengths and larger pleiotropic effects—leading to violation of the InSIDE assumption. It is difficult to imagine how the InSIDE assumption could be satisfied if several genetic variants have pleiotropic effects acting via confounders.

It has been claimed that the InSIDE assumption can be empirically tested by assessing the correlation between the ratio estimates for the individual variants and their associations with the risk factor [34]. However, the ratio estimate includes the association with the risk factor as its denominator, so a correlation between the ratio estimates and the associations with the risk factor would be expected even if all genetic variants were valid instruments.

### Comparing results between MR-Egger and conventional Mendelian randomization analyses

Additionally, if the MR-Egger intercept is larger than the association of any of the individual genetic variants (as in Fig. 5, right panel), then this implies (under the InSIDE assumption) that the average pleiotropic effect on the outcome of a genetic variant is larger in magnitude than the observed association with the outcome of all of the individual genetic variants. This seems implausible, and suggests that the InSIDE assumption is likely to be violated. The test for directional pleiotropy indicates that the genetic variants are not all valid instruments, but the negative MR-Egger estimate is highly dubious as the causal estimates from each variant in turn are all positive.

Finally, if a conventional Mendelian randomization analysis suggests no causal effect, then we would be reluctant to consider evidence from the MR-Egger method, as the method was proposed as a sensitivity analysis for a conventional Mendelian randomization analysis. Although it is possible for pleiotropic effects to bias the conventional Mendelian randomization estimate towards the null, it would seem at least as likely for the MR-Egger estimate to be biased due to violations of the InSIDE assumption or due to the influence of strong variants.

## Discussion

In this paper, we have described the problem of pleiotropy in Mendelian randomization, and the potential solution to this problem represented by the MR-Egger method. We have described how to implement the method, its assumptions, and various issues that may bias estimates. Finally, we have discussed how to interpret discordancies between results from the MR-Egger method and those from conventional Mendelian randomization methods.

While the MR-Egger method is a worthwhile sensitivity analysis for Mendelian randomization, it is by no means a panacea for all violations of the instrumental variable assumptions. Several of the issues raised in this paper have potentially serious consequences for MR-Egger estimates. These include violations of the InSIDE assumption—the assumption that the pleiotropic effects of the genetic variants in the analysis are uncorrelated with the associations of the variants with the risk factor. Violations of this assumption have been shown to lead to increased bias and Type 1 error rate inflation in the MR-Egger method compared with conventional methods in realistic simulations [25]. Another serious issue is that of the influence of outlying variants on MR-Egger estimates. We have shown how even a single genetic variant can have a substantial influence on a MR-Egger analysis, leading to rejection of the MR-Egger intercept test and reversal of the sign of the MR-Egger estimate (Fig. 5). A corollary of this is that Mendelian randomization analyses using the MR-Egger method should still seek to use genetic variants that are valid instrumental variables as far as possible.

### Alternative approaches for sensitivity analysis in Mendelian randomization

MR-Egger is far from the only method for sensitivity analysis in Mendelian randomization. Several reviews of such methods exist in the literature [8, 32, 35]. Approaches divide into those for assessing the validity of the instrumental variable assumptions, and robust methods that give consistent estimates of a causal effect under weaker assumptions than those of a conventional Mendelian randomization analysis (such as the MR-Egger method) [32]. Robust methods generally fall into two categories: (1) methods such as MR-Egger, that replace the instrumental variable assumptions with an alternative assumption or assumptions that are assumed to hold for the set of genetic variants (a similar approach for individual-level data was proposed by Kolesár et al. [22]); and (2) overidentification methods that assume the instrumental variable assumptions hold for some of the genetic variants, but not necessarily for all genetic variants. Individual-level data methods based on this approach have been proposed by Kang et al. [36], and Windmeijer et al. [37].

A simple summarized data robust method that falls into the second category is the weighted median method proposed by Bowden et al. [25]. An unweighted median-based analysis proceeds by calculating the causal estimate from each genetic variant individually (\(\hat{\theta }_j = \frac{\hat{\beta }_{Yj}}{\hat{\beta }_{Xj}}\)), and then calculating the median of these causal estimates. This estimate is consistent for the causal effect provided that at least 50% of the genetic variants are valid instrumental variables, and is unaffected by a few genetic variants with outlying causal estimates. As the sample size increases, the causal estimates from all valid instrumental variables will tend towards the same value, which will equal the median estimate provided that at least 50% of the genetic variants are valid instrumental variables [38]. A weighted median method has also been proposed, in which genetic variants with more precise causal estimates contribute more weight to the analysis [25]. The median-based methods may be more appropriate than the MR-Egger method in scenarios like those in Figs. 4 and 5 if the majority of variants are valid instruments. However, in the scenario in Fig. 2 (left panel), the median-based methods would still suggest a positive causal effect despite evidence for directional pleiotropy.

Another summarized data method that has robustness to outlying variants may be a simple variation of the IVW method using robust regression rather than standard linear regression. For example, regression using MM-estimation with Tukey’s bisquare objective function limits the contribution to the analysis from any single genetic variant [39, 40, 41].

No single method should be relied on for causal inference. A causal finding is more reliable if it is corroborated by multiple methods, particularly if the methods make different assumptions [32]. Methods such as MR-Egger are desirable as sensitivity analyses as they allow all genetic variants to violate the instrumental variable assumptions; however they require all genetic variants to satisfy an alternative assumption. In contrast, overidentification methods such as the median-based method allow some genetic variants to violate the instrumental variable assumptions in an arbitrary way, although the majority of variants are assumed to satisfy the assumptions. As such, in applied practice a range of sensitivity analysis should ideally be presented, as well as assessments as to whether the instrumental variable assumptions are satisfied for the genetic variants in the analysis.

### Other violations of the instrumental variable assumptions

Violations of the instrumental variable assumptions in the MR-Egger method are expressed here as pleiotropic effects. However, while all violations of the exclusion restriction assumption (the assumption that the effect of a genetic variant on the outcome only operates via the risk factor; this is equivalent to the third instrumental variable assumption as stated in this paper [6]) can be expressed in terms of pleiotropy [15], other violations cannot be. For example, population stratification is the presence of multiple subpopulations within the sample population [3]. If genetic associations with the risk factor, with the outcome, or the frequency of genetic variants differ between these subpopulations, then there may be a spurious association between the genetic variant and the outcome in the overall population. Such population effects, as well as selection effects (for example, the sample under analysis was ascertained conditional on the risk factor, or else the sample somehow is not representative of the population as a whole), are likely to lead to all genetic variants violating the instrumental variable assumptions, and hence consistency conditions for the robust methods presented above would be unlikely to hold.

### Linearity and homogeneity assumptions

Two assumptions that we have made in the specification of the analysis models for both conventional and MR-Egger methods are those of linearity and homogeneity of the causal effect. These assumptions are not necessary to estimate a causal effect; weaker assumptions (such as monotonicity [42] or a weaker version of the homogeneity assumption [43, 44]) can be made [45]. However, the assumptions of linearity and homogeneity ensure that the same causal effect is identified by all genetic variants that are valid instrumental variables. If the linearity and homogeneity assumptions are violated, then the causal estimate from a single variant still provides a valid test of the causal null hypothesis that the risk factor has no causal effect on the outcome [4]; as does the causal estimate from the IVW method, as this is a linear combination of the causal estimates from the individual variants [14].

We view violations of assumptions that lead to inappropriate inferences (inflated Type 1 error rate of the null hypothesis of no causal effect) as first-order concerns, while violations of assumptions that lead only to inappropriate causal estimates (but appropriate causal inferences both with a null and a non-null causal effect) are viewed as second-order concerns (and questions about the causal estimand, such as those arising due to non-collapsibility with a binary outcome [46], as third-order concerns). Violations of the assumptions of linearity and homogeneity of the causal effect are important, as they affect the interpretation of results from MR-Egger and conventional Mendelian randomization methods, and the applicability of causal estimates in practice. However, they will not lead to inappropriate inferences, and as such are less troublesome than violations of the three core instrumental variable assumptions. There are many reasons why Mendelian randomization estimates may differ from the result of intervening on the risk factor in practice (for example, the mechanism of the intervention, the duration of the intervention, and the timing of the intervening) [47], and so an overly literal interpretation of Mendelian randomization estimates is rarely justified, even when the instrumental variable assumptions are satisfied. An important situation under which the assumptions of linearity and homogeneity are satisfied for the risk factor–outcome relationship is when the causal effect is null.

### Extension to correlated variants

The IVW estimate has previously been extended to account for correlated variants, by fitting the regression model of Eq. (3) using generalized weighted linear regression [14]. Rather than the simple weights \({{\mathrm{se}}}(\hat{\beta }_{Yj})^{-2}\), we use a weighting matrix \(\Omega ^{-1}\), where \(\Omega \) has elements \(\Omega _{j1, j2} = {{\mathrm{se}}}(\hat{\beta }_{Yj1}) {{\mathrm{se}}}(\hat{\beta }_{Yj2}) \rho _{j1, j2}\) and \(\rho _{j1, j2}\) is the correlation between the \(j_1\)th and \(j_2\)th genetic variants. The IVW estimate accounting for correlation can be calculated either by matrix algebra using the weighting matrix, or by multiplying the genetic associations with the risk factor and outcome by the Cholesky decomposition of the weighting matrix, and then implementing a standard linear regression model with no weighting. A natural extension of the MR-Egger method with correlated variants can be constructed by allowing an intercept term in the generalized weighted linear regression.

With a fixed number of uncorrelated variants, the MR-Egger estimate is consistent when the weighted covariance between the genetic associations with the risk factor and the pleiotropic effects is zero. The analogous result for consistency in the MR-Egger method with correlated variants is provided in "Appendix A.4" in supplementary material. It is unlikely this criterion will be satisfied if all variants are mutually correlated, as correlations between the variants are likely to lead to correlations between the associations with the risk factor and the pleiotropic effects. However, including more than one variant in each gene region can improve precision of the causal estimate [48].

### Conclusion

A typical frustration for statisticians is that their methodological developments are ignored by the applied field. In the case of the MR-Egger method, the opposite situation is true – MR-Egger has been taken up by the field perhaps too rapidly, and often without understanding of the intricacies of the method and its interpretation. While some of the cautions expressed in this paper are also present in the original paper on MR-Egger, others have only come to light following the application of the method, and trying to understand its results. Similar concerns have been raised elsewhere [25, 29, 31, 49].

While we welcome the widespread adoption of MR-Egger, we hope that this paper aids practitioners in its appropriate use and interpretation, and that the method becomes seen rightly as a sensitivity analysis (and a fallible one) for Mendelian randomization, and one of many sensitivity analyses that can (and should) be used to assess the plausibility of any finding from an applied Mendelian randomization investigation.

## Notes

### Acknowledgements

Stephen Burgess is supported by Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z). Simon G. Thompson is supported by the British Heart Foundation (Grant Number CH/12/2/29428).

## Supplementary material

## References

- 1.Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070.CrossRefGoogle Scholar
- 2.Burgess S, Thompson SG. Mendelian randomization: methods for using genetic variants in causal estimation. London: Chapman & Hall; 2015.Google Scholar
- 3.Davey Smith G, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol. 2004;33(1):30–42. doi: 10.1093/ije/dyh132.CrossRefGoogle Scholar
- 4.Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007;16(4):309–30. doi: 10.1177/0962280206077743.CrossRefPubMedGoogle Scholar
- 5.Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29(4):722–9. doi: 10.1093/ije/29.4.722.CrossRefPubMedGoogle Scholar
- 6.Clarke PS, Windmeijer F. Instrumental variable estimators for binary outcomes. J Am Stat Assoc. 2012;107(500):1638–52. doi: 10.1080/01621459.2012.734171.CrossRefGoogle Scholar
- 7.Burgess S, Butterworth AS, Thompson JR. Beyond Mendelian randomization: how to interpret evidence of shared genetic predictors. J Clin Epidemiol. 2015;. doi: 10.1016/j.jclinepi.2015.08.001.
- 8.VanderWeele T, Tchetgen Tchetgen E, Cornelis M, Kraft P. Methodological challenges in Mendelian randomization. Epidemiology. 2014;25(3):427–35. doi: 10.1097/ede.0000000000000081.CrossRefPubMedPubMedCentralGoogle Scholar
- 9.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44(2):512–25.CrossRefPubMedPubMedCentralGoogle Scholar
- 10.White J, Sofat R, Hemani G, et al. Plasma urate and coronary heart disease: Mendelian randomisation analysis. Lancet Diabetes Endocrinol. 2015;4:327–36. doi: 10.1016/s2213-8587(15)00386-1.CrossRefGoogle Scholar
- 11.Tyrrell J, Jones SE, Beaumont R, Astley CM, Lovell R, Yaghootkar H, Tuke M, Ruth KS, Freathy RM, Hirschhorn JN, et al. Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. Br Med J. 2016;352:i582. doi: 10.1136/bmj.i582.CrossRefGoogle Scholar
- 12.Jones SE, Tyrrell J, Wood AR, Beaumont RN, Ruth KS, Tuke MA, Yaghootkar H, Hu Y, Teder-Laving M, Hayward C, et al. Genome-wide association analyses in \(\ge \) 119,000 individuals identifies thirteen morningness and two sleep duration loci. bioRxiv. 2016. doi: 10.1101/031369.
- 13.Bonilla C, Lewis SJ, Martin RM, Donovan JL, Hamdy FC, Neal DE, Eeles R, Easton D, Kote-Jarai Z, Al Olama AA, et al. Pubertal development and prostate cancer risk: Mendelian randomization study in a population-based cohort. BMC Med. 2016;14:66. doi: 10.1186/s12916-016-0602-x.CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med. 2016;35(11):1880–906. doi: 10.1002/sim.6835.CrossRefPubMedGoogle Scholar
- 15.Kang H, Zhang A, Cai T, Small D. Instrumental variables estimation with some invalid instruments, and its application to Mendelian randomisation. J Am Stat Assoc. 2015;. doi: 10.1080/01621459.2014.994705.Google Scholar
- 16.Lawlor D, Harbord R, Sterne J, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27(8):1133–63. doi: 10.1002/sim.3034.CrossRefPubMedGoogle Scholar
- 17.Burgess S, Butterworth AS, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37(7):658–65. doi: 10.1002/gepi.21758.CrossRefPubMedPubMedCentralGoogle Scholar
- 18.Thomas D, Lawlor D, Thompson J. Re: Estimation of bias in nongenetic observational studies using “Mendelian triangulation” by Bautista, et al. Ann Epidemiol. 2007;17(7):511–3. doi: 10.1016/j.annepidem.2006.12.005.CrossRefPubMedGoogle Scholar
- 19.Johnson T. Efficient calculation for multi-SNP genetic risk scores. Technical Report, The Comprehensive R Archive Network 2013. http://cran.r-project.org/web/packages/gtx/vignettes/ashg2012.pdf. Accessed 19 Nov 2014.
- 20.Burgess S, Dudbridge F, Thompson SG. Re: “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects”. Am J Epidemiol. 2015;181(4):290–1.CrossRefPubMedGoogle Scholar
- 21.Thompson S, Sharp S. Explaining heterogeneity in meta-analysis: a comparison of methods. Stat Med. 1999;18(20):2693–708.CrossRefPubMedGoogle Scholar
- 22.Kolesár M, Chetty R, Friedman J, Glaeser E, Imbens G. Identification and inference with many invalid instruments. J Bus Econ Stat. 2014;. doi: 10.1080/07350015.2014.978175.Google Scholar
- 23.CARDIoGRAMplusC4D Consortium. A comprehensive1000 genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47:1121–30. doi: 10.1038/ng.3396.
- 24.Burgess S. Plasma urate and coronary heart disease: fingerprint match, but no smoking gun. Lancet Diabetes Endocrinol. 2016;. doi: 10.1016/S2213-8587(15)00425-8.Google Scholar
- 25.Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40(4):304–14. doi: 10.1002/gepi.21965.CrossRefPubMedPubMedCentralGoogle Scholar
- 26.Borenstein M, Hedges L, Higgins J, Rothstein H. Introduction to meta-analysis. Chapter 34: generality of the basic inverse-variance method. Chichester: Wiley; 2009.CrossRefGoogle Scholar
- 27.Dobson A. An introduction to generalized linear models. London: Chapman & Hall; 2001. doi: 10.1201/9781420057683.CrossRefGoogle Scholar
- 28.Burgess S. Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome. Int J Epidemiol. 2014;43(3):922–9. doi: 10.1093/ije/dyu005.CrossRefPubMedPubMedCentralGoogle Scholar
- 29.Bowden J, Del Greco F, Minelli C, Davey Smith G, Sheehan NA, Thompson JR. Assessing the suitability of summary data for Mendelian randomization analyses using MR-Egger regression: the role of the \(I^2\) statistic. Int J Epidemiol. 2016;. doi: 10.1093/ije/dyw220.Google Scholar
- 30.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206. doi: 10.1038/nature14177.CrossRefPubMedPubMedCentralGoogle Scholar
- 31.Corbin LJ, Richmond RC, Wade KH, Burgess S, Bowden J, Smith GD, Timpson NJ. Body mass index as a modifiable risk factor for type 2 diabetes: refining and understanding causal estimates using Mendelian randomisation. Diabetes. 2016;. doi: 10.2337/db16-0418.PubMedPubMedCentralGoogle Scholar
- 32.Burgess S, Bowden J, Fall T, Ingelsson E, Thompson S. Sensitivity analyses for robust causal inference from Mendelian randomization analyses with multiple genetic variants. Epidemiology. 2017;28(1):30–42. doi: 10.1097/EDE.0000000000000559.CrossRefPubMedGoogle Scholar
- 33.Pickrell J, Berisa T, Segurel L, Tung JY, Hinds D. Detection and interpretation of shared genetic influences on 40 human traits. Nat Genet. 2016;. doi: 10.1038/ng.3570.Google Scholar
- 34.White J, Swerdlow DI, Preiss D, Fairhurst-Hunter Z, Keating BJ, Asselbergs FW, Sattar N, Humphries SE, Hingorani AD, Holmes MV. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA Cardiol. 2016;1(6):692–9. doi: 10.1001/jamacardio.2016.1884.CrossRefPubMedGoogle Scholar
- 35.Glymour M, Tchetgen Tchetgen E, Robins J. Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am J Epidemiol. 2012;175(4):332–9. doi: 10.1093/aje/kwr323.CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Kang H, Kreuels B, Adjei O, Krumkamp R, May J, Small DS. The causal effect of malaria on stunting: a Mendelian randomization and matching approach. Int J Epidemiol. 2013;42(5):1390–8. doi: 10.1093/ije/dyt116.CrossRefPubMedGoogle Scholar
- 37.Windmeijer F, Farbmacher H, Davies N, Davey Smith G, White I. Selecting (in)valid instruments for instrumental variables estimation 2015. http://www.hec.unil.ch/documents/seminars/iems/1849.pdf.
- 38.Han C. Detecting invalid instruments using L1-GMM. Econ Lett. 2008;101:285–7.CrossRefGoogle Scholar
- 39.Mosteller F, Tukey JW. Data analysis and regression: a second course in statistics. Boston, MA, USA: Addison–Wesley; 1977.Google Scholar
- 40.Huber PJ. Robust statistics. Berlin: Springer; 2011.CrossRefGoogle Scholar
- 41.Burgess S, Bowden J, Dudbridge F, Thompson SG. Robust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization. 2016. arXiv:1606.03729.
- 42.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75. doi: 10.2307/2951620.CrossRefGoogle Scholar
- 43.Robins JM. The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Health service research methodology: a focus on AIDS. Washington, DC, USA: National Center for Health Services Research; 1989. p. 113–159.Google Scholar
- 44.Hernán M, Robins J. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17(4):360–72. doi: 10.1097/01.ede.0000222409.00878.37.CrossRefPubMedGoogle Scholar
- 45.Swanson S, Hernán M. Commentary: how to report instrumental variable analyses (suggestions welcome). Epidemiology. 2013;24(3):370–4. doi: 10.1097/ede.0b013e31828d0590.CrossRefPubMedGoogle Scholar
- 46.Burgess S, CHD CRP Genetics Collaboration. Identifying the odds ratio estimated by a two-stage instrumental variable analysis with a logistic regression model. Stat Med. 2013;32(27):4726–47. doi: 10.1002/sim.5871.
- 47.Burgess S, Butterworth A, Malarstig A, Thompson S. Use of Mendelian randomisation to assess potential benefit of clinical intervention. Br Med J. 2012;345:e7325. doi: 10.1136/bmj.e7325.CrossRefGoogle Scholar
- 48.Burgess S, Scott R, Timpson N, Davey Smith G, Thompson SG, EPIC-InterAct Consortium. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015;30(7):543–52. doi: 10.1007/s10654-015-0011-z.
- 49.Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan N, Thompson J. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med. 2017;. doi: 10.1002/sim.7221.PubMedPubMedCentralGoogle Scholar
- 50.The Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45:1274–83. doi: 10.1038/ng.2797.
- 51.Schunkert H, König I, Kathiresan S, Reilly M, Assimes T, Holm H, Preuss M, Stewart A, Barbalic M, Gieger C, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011;43(4):333–8. doi: 10.1038/ng.784.CrossRefPubMedPubMedCentralGoogle Scholar
- 52.Do R, Willer CJ, Schmidt EM, Sengupta S, Gao C, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat Genet. 2013;45:1345–52. doi: 10.1038/ng.2795.CrossRefPubMedPubMedCentralGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.