A Primer in Mendelian Randomization Methodology with a Focus on Utilizing Published Summary Association Data

  • Niki L. Dimou
  • Konstantinos K. TsilidisEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1793)


Mendelian randomization (MR) is becoming a popular approach to estimate the causal effect of an exposure on an outcome overcoming limitations of observational epidemiology. The advent of genome-wide association studies and the increasing accumulation of summarized data from large genetic consortia make MR a powerful technique. In this review, we give a primer in MR methodology, describe efficient MR designs and analytical strategies, and focus on methods and practical guidance for conducting an MR study using summary association data. We show that the analysis is straightforward utilizing either the MR-base platform or available packages in R. However, further research is required for the development of specialized methodology to assess MR assumptions.

Key words

Mendelian randomization Summarized data Instrumental variable Causal inference 



NLD was supported by the IKY scholarship programme, which is co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the action entitled “Reinforcement of Postdoctoral Researchers” in the framework of the Operational Programme “Human Resources Development Program, Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) 2014 – 2020. KKT was supported by the World Cancer Research Fund International Regular Grant Programme (WCRF 2014/1180).


  1. 1.
    Thomas DC, Conti DV (2004) Commentary: the concept of 'Mendelian Randomization'. Int J Epidemiol 33(1):21–25. CrossRefPubMedGoogle Scholar
  2. 2.
    Smith GD, Ebrahim S (2003) Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32(1):1–22CrossRefPubMedGoogle Scholar
  3. 3.
    Lawlor DA, Harbord RM, Sterne JA et al (2008) Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 27(8):1133–1163. CrossRefPubMedGoogle Scholar
  4. 4.
    Bochud M, Rousson V (2010) Usefulness of Mendelian randomization in observational epidemiology. Int J Environ Res Public Health 7(3):711–728. CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Burgess S, Butterworth A, Malarstig A et al (2012) Use of Mendelian randomisation to assess potential benefit of clinical intervention. BMJ 345:e7325. CrossRefPubMedGoogle Scholar
  6. 6.
    Kivimaki M, Smith GD, Timpson NJ et al (2008) Lifetime body mass index and later atherosclerosis risk in young adults: examining causal links using Mendelian randomization in the cardiovascular risk in young finns study. Eur Heart J 29(20):2552–2560. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Voight BF, Peloso GM, Orho-Melander M et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380(9841):572–580. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Carreras-Torres R, Haycock PC, Relton CL et al (2016) The causal relevance of body mass index in different histological types of lung cancer: a Mendelian randomization study. Sci Rep 6:31121. CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Dixon SC, Nagle CM, Thrift AP et al (2016) Adult body mass index and risk of ovarian cancer by subtype: a Mendelian randomization study. Int J Epidemiol 45(3):884–895. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Gao C, Patel CJ, Michailidou K et al (2016) Genetically predicted body mass index and breast cancer risk: mendelian randomization analyses of data from 145,000 women of European descent. PLoS Med 13(8):e1002105. CrossRefGoogle Scholar
  11. 11.
    Guo Y, Warren Andersen S, Shu XO et al (2016) Genetically predicted body mass index and breast cancer risk: mendelian randomization analyses of data from 145,000 women of European descent. PLoS Med 13(8):e1002105.
  12. 12.
    Didelez V, Sheehan N (2007) Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res 16(4):309–330. CrossRefPubMedGoogle Scholar
  13. 13.
    Glymour MM, Tchetgen Tchetgen EJ, Robins JM (2012) Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am J Epidemiol 175(4):332–339. CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Hernan MA, Robins JM (2006) Instruments for causal inference: an epidemiologist's dream? Epidemiology 17(4):360–372. CrossRefPubMedGoogle Scholar
  15. 15.
    Lawlor DA (2016) Commentary: two-sample Mendelian randomization: opportunities and challenges. Int J Epidemiol 45(3):908–915. CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Burgess S, Scott RA, Timpson NJ et al (2015) Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol 30(7):543–552. CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Burgess S, Small DS, Thompson SG (2015) A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res.
  18. 18.
    Boef AG, Dekkers OM, le Cessie S (2015) Mendelian randomization studies: a review of the approaches used and the quality of reporting. Int J Epidemiol 44(2):496–511. CrossRefPubMedGoogle Scholar
  19. 19.
    Davies NM, Smith GD, Windmeijer F et al (2013) Issues in the reporting and conduct of instrumental variable studies: a systematic review. Epidemiology 24(3):363–369. CrossRefPubMedGoogle Scholar
  20. 20.
    Haycock PC, Burgess S, Wade KH et al (2016) Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies. Am J Clin Nutr 103(4):965–978. CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Hemani G, Zheng J, Wade KH et al (2016) MR-base: a platform for systematic causal inference across the phenome using billions of genetic associations. bioRxivr.
  22. 22.
    Greenland S (2000) An introduction to instrumental variables for epidemiologists. Int J Epidemiol 29(4):722–729CrossRefPubMedGoogle Scholar
  23. 23.
    Martens EP, Pestman WR, de Boer A et al (2006) Instrumental variables: application and limitations. Epidemiology 17(3):260–267. CrossRefPubMedGoogle Scholar
  24. 24.
    Wald A (1940) The fitting of straight lines if both variables are subject to error. Ann Math Stat 11(3):284–300CrossRefGoogle Scholar
  25. 25.
    Fieller E (1954) Some problems in interval estimation. J R Stat Soc Series B Stat Methodology 16(2):175–185Google Scholar
  26. 26.
    Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall/CRC Press, Boca Raton, FloridaCrossRefGoogle Scholar
  27. 27.
    Anderson T, Rubin H (1949) Estimators of the parameters of a single equation in a complete set of stochastic equations. Ann Mathe Stat 21(1):570–582Google Scholar
  28. 28.
    Moreira M (2003) A conditional likelihood ratio test for structural models. Econometrica 71(4):1027–1048CrossRefGoogle Scholar
  29. 29.
    Ebrahim S, Davey Smith G (2008) Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Hum Genet 123(1):15–33. CrossRefPubMedGoogle Scholar
  30. 30.
    Angrist J, Pischke J (2009) Mostly harmless econometrics: an empiricist’s companion. Chapter 4: instrumental variables in action: sometimes you get what you need. Princeton University Press, Princeton, New JerseyGoogle Scholar
  31. 31.
    Nagelkerke N, Fidler V, Bernsen R et al (2000) Estimating treatment effects in randomized clinical trials in the presence of non-compliance. Stat Med 19(14):1849–1864CrossRefPubMedGoogle Scholar
  32. 32.
    Davidson R, MacKinnon J (1993) Estimation and inference in econometrics. Chapter 18: simultaneous equation models. Oxford University Press, OxfordGoogle Scholar
  33. 33.
    Kleibergen F, Zivot E (2003) Bayesian and classical approaches to instrumental variable regression. J Econom 114:29–72CrossRefGoogle Scholar
  34. 34.
    Foster E (1997) Instrumental variables for logistic regression: an illustration. Soc Sci Res 26(4):487–504CrossRefGoogle Scholar
  35. 35.
    Johnston KM, Gustafson P, Levy AR et al (2008) Use of instrumental variables in the analysis of generalized linear models in the presence of unmeasured confounding with applications to epidemiological research. Stat Med 27(9):1539–1556. CrossRefPubMedGoogle Scholar
  36. 36.
    Hansen LP, Heaton J, Yaron A (1996) Finite-sample properties of some alternative GMM estimators. J Bus Econ Stat 14(3):262–280Google Scholar
  37. 37.
    Bowden J, Vansteelandt S (2011) Mendelian randomization analysis of case-control data using structural mean models. Stat Med 30(6):678–694. CrossRefPubMedGoogle Scholar
  38. 38.
    Greenland S, Lanes S, Jara M (2008) Estimating effects from randomized trials with discontinuations: the need for intent-to-treat design and G-estimation. Clin Trials 5(1):5–13. CrossRefPubMedGoogle Scholar
  39. 39.
    Robins J (1986) A new approach to causal inference in mortality studies with a sustained exposure period–application to control of the healthy worker survivor effect. Math Model 7(9–12):1393–1512CrossRefGoogle Scholar
  40. 40.
    Pierce BL, Burgess S (2013) Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am J Epidemiol 178(7):1177–1184. CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Pierce BL, Ahsan H, Vanderweele TJ (2011) Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants. Int J Epidemiol 40(3):740–752. CrossRefPubMedGoogle Scholar
  42. 42.
    Welter D, MacArthur J, Morales J et al (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42(Database issue):D1001–D1006. CrossRefPubMedGoogle Scholar
  43. 43.
    Burgess S, Butterworth A, Thompson SG (2013) Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 37(7):658–665. CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Johnson T (2011) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Technical report, Queen Mary University of LondonGoogle Scholar
  45. 45.
    Thomas DC, Lawlor DA, Thompson JR (2007) Re: estimation of bias in nongenetic observational studies using "Mendelian triangulation" by Bautista et al. Ann Epidemiol 17(7):511–513. CrossRefPubMedGoogle Scholar
  46. 46.
    Burgess S, Thompson SG (2013) Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol 42(4):1134–1144. CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Burgess S, Dudbridge F, Thompson SG (2016) Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med 35(11):1880–1906. CrossRefPubMedGoogle Scholar
  48. 48.
    Stock J, Wright J, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stat 20(4):518–529CrossRefGoogle Scholar
  49. 49.
    Staiger D, Stock J (1997) Instrumental variables regression with weak instruments. Econometrica 65(3):557–586CrossRefGoogle Scholar
  50. 50.
    Burgess S, Granell R, Palmer TM et al (2014) Lack of identification in semiparametric instrumental variable models with binary outcomes. Am J Epidemiol 180(1):111–119. CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Burgess S, Thompson SG, CRP CHD Genetics Collaboration (2011) Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40(3):755–764. CrossRefPubMedGoogle Scholar
  52. 52.
    Higgins JP, Thompson SG, Deeks JJ et al (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560. CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Bowden J, Del Greco MF, Minelli C et al (2016) Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-egger regression: the role of the I2 statistic. Int J Epidemiol 45(6):1961–1974. CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Greco MF, Minelli C, Sheehan NA et al (2015) Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med 34(21):2926–2940. CrossRefGoogle Scholar
  55. 55.
    Bowden J, Davey Smith G, Burgess S (2015) Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int J Epidemiol 44(2):512–525. CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Brion MJ, Shakhbazov K, Visscher PM (2013) Calculating statistical power in Mendelian randomization studies. Int J Epidemiol 42(5):1497–1501. CrossRefPubMedGoogle Scholar
  57. 57.
    Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89(428):1314–1328. CrossRefGoogle Scholar
  58. 58.
    Han C (2008) Detecting invalid instruments using L1-GMM. Econ Lett 101(3):285–287CrossRefGoogle Scholar
  59. 59.
    Bowden J, Davey Smith G, Haycock PC et al (2016) Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol 40(4):304–314. CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Bowden J, Del Greco MF, Minelli C et al (2017) A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med.
  61. 61.
    GTEx Consortium (2015) Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348(6235):648–660. CrossRefPubMedCentralGoogle Scholar
  62. 62.
    Gaunt TR, Shihab HA, Hemani G et al (2016) Systematic identification of genetic influences on methylation across the human life course. Genome Biol 17:61. CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Kettunen J, Demirkan A, Wurtz P et al (2016) Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat Commun 7:11122. CrossRefPubMedPubMedCentralGoogle Scholar
  64. 64.
    Deming Y, Xia J, Cai Y et al (2016) Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits. Sci Rep 6:18092. CrossRefPubMedCentralGoogle Scholar
  65. 65.
    Locke AE, Kahali B, Berndt SI et al (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518(7538):197–206. CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Wang Y, McKay JD, Rafnar T et al (2014) Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat Genet 46(7):736–741. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Hygiene and EpidemiologyUniversity of Ioannina School of MedicineIoanninaGreece
  2. 2.Department of Epidemiology and Biostatistics, School of Public HealthImperial College LondonLondonUK

Personalised recommendations