Large-Scale Genomic Biobanks and Cardiovascular Disease

  • Aeron M. Small
  • Christopher J. O’Donnell
  • Scott M. Damrauer
Cardiovascular Genomics (TL Assimes, Section Editor)
Part of the following topical collections:
  1. Topical Collection on Cardiovascular Genomics


Purpose of review

Cardiovascular disease is a leading cause of morbidity and mortality worldwide and is the focus of extensive biomedical research. Large genetic consortia combining data from many traditional prospective cohort and ascertained case-control study designs have facilitated the discovery of genetic associations for a variety of cardiovascular diseases including diabetes, coronary artery disease, and hypertension. Biobank-based genetic studies offer an alternative whereby large populations are genotyped and linked to electronic health records.

Recent findings

Biobank sample sizes worldwide have surpassed even the largest genetic consortia and have yielded key insights into the genetic determinants of both common and rare cardiovascular phenotypes.


Herein, we provide an overview of the largest genomic biobanks and discuss the relevant advantages and challenges inherent to the biobank model of cohort generation and genomic study design.


Biobanks Cardiovascular disease Genetics 


Compliance with Ethical Standards

Conflict of Interest

Aeron M. Small, Christopher J. O’Donnell, and Scott M. Damrauer declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. 1.
    WHO | The top 10 causes of death. WHO. 2017.Google Scholar
  2. 2.
    Kathiresan S. Genetics Of human cardiovascular disease. Cell. 2012;148:1242–56.CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Mc Pherson R. A Common Allele On chromosome 9 associated with coronary heart disease. Science 2007;316.Google Scholar
  4. 4.
    Helgadottir A. A Common Variant On chromosome 9p21 affects the risk of myocardial infarction. Science 2007;316.Google Scholar
  5. 5.
    Samani NJ. Genomewide Association Analysis of coronary artery disease. N Engl J Med 2007;357.Google Scholar
  6. 6.
    Auer PL, Stitziel NO. Genetic association studies in cardiovascular diseases: do we have enough power? Trends Cardiovasc Med. 2017;27:397–404.CrossRefPubMedGoogle Scholar
  7. 7.
    Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol. 2006;164:609–14.CrossRefPubMedGoogle Scholar
  8. 8.
    • CARDIoGRAMplusC4D Consortium, Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet. 2013;45:25–33. This study represents an example of meta-analysis to characterize the genetic architecture of coronary artery disease, a common cardiovascular trait. Google Scholar
  9. 9.
    Preuss M, Konig IR, Thompson JR, Erdmann J, Absher D, Assimes TL, et al. Design of the Coronary ARtery DIsease genome-wide replication and meta-analysis (CARDIoGRAM) study: a genome-wide association meta-analysis involving more than 22 000 cases and 60 000 controls. Circ Cardiovasc Genet. 2010;3:475–83.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Khera AV, Won HH, Peloso GM, O'Dushlaine C, Liu D, Stitziel NO, et al. Association of Rare and Common Variation in the lipoprotein lipase gene with coronary artery disease. JAMA. 2017;317:937–46.CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, Marouli E, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet. 2017;49:1385–91.CrossRefPubMedGoogle Scholar
  12. 12.
    Howson JMM, Zhao W, Barnes DR, Ho WK, Young R, Paul DS, et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat Genet. 2017;49:1113–9.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, et al. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45:1274–83.CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    International Stroke Genetics Consortium (ISGC), Wellcome Trust case control consortium 2 (WTCCC2), Bellenguez C, Bevan S, Gschwendtner A, Spencer CC, et al. genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke. Nat Genet. 2012;44:328–33.CrossRefGoogle Scholar
  15. 15.
    Morris AP, Voight BF, Teslovich TM, Ferreira T, Segre AV, Steinthorsdottir V, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet. 2012;44:981–90.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    International Consortium for Blood Pressure Genome-Wide Association Studies, Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, et al. genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011;478:103–9.CrossRefGoogle Scholar
  17. 17.
    Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, Heid IM. B et al. six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet. 2009;41:25–34.CrossRefPubMedGoogle Scholar
  18. 18.
    Nikpay M, Goel A, Hall LM, Willenborg C, Kanoni D, Kyriakou T, et al. A comprehensive 1,000 genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47(10):1121–30. Scholar
  19. 19.
    Psaty BM, O'Donnell CJ, Gudnason V, Lunetta KL, Folsom AR, Rotter JI, et al. Cohorts for heart and aging research in genomic epidemiology (CHARGE) consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet. 2009;2:73–80.CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet. 2010;42:441–7.CrossRefGoogle Scholar
  21. 21.
    Liu A, Pollard K. Biobanking For personalized medicine. Adv Exp Med Biol. 2015;864:55–68.CrossRefPubMedGoogle Scholar
  22. 22.
    Olson JE, Bielinski SJ, Ryu E, Winkler EM, Takahashi PY, Pathak J, et al. Biobanks and personalized medicine. Clin Genet. 2014;86:50–5.CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20:117–21.CrossRefPubMedGoogle Scholar
  24. 24.
    Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 2015;7:41.CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    China Kadoorie Biobank. 2017. Accessed February 2, 2018.
  26. 26.
    Chen Z, Chen J, Collins R, Guo Y, Peto R, Wu F, et al. China Kadoorie biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J Epidemiol. 2011;40:1652–66.CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    •• UK Biobank: Protocol for a large-scale prospective epidemiological resource, 2007, authored by the UK biobank board and staff. This is the design protocol for the UK Biobank.
  28. 28.
    UK Biobank. 2017. Accessed February 2, 2018.
  29. 29.
    •• Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, Whitbourne S, Deen J, Shannon C, Humphries D, Guarino P, Aslan M, Anderson D, LaFleur R, Hammond T, Schaa K, Moser J, Huang G, Muralidhar S, Przygodzki R and O'Leary TJ. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol. 2016;70:214–23. This article describes the design and objectives for the Million Veteran Program, a mega-biobank of US veterans. Google Scholar
  30. 30.
    The Precision Medicine Initiative Cohort Program - Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, NIH 2015.Google Scholar
  31. 31.
    Nagai A, Hirata M, Kamatani Y, Muto K, Matsuda K, Kiyohara Y, et al. BioBank Japan Cooperative hospital G and Kubo M. Overview of the BioBank Japan project: study design and profile. J Epidemiol. 2017;27:S2–8.CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Tapia-Conyer R, Kuri-Morales P, Alegre-Diaz J, Whitlock G, Emberson J, Clark S, et al. Cohort profile: the Mexico City prospective study. Int J Epidemiol. 2006;35:243–9.CrossRefPubMedGoogle Scholar
  33. 33.
    Cho SY, Hong EJ, Nam JM, Han B, Chu C, Park O. Opening of the national biobank of Korea as the infrastructure of future biomedical science in Korea. Osong Public Health Res Perspect. 2012;3:177–84.CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Auria Biopankki. Accessed October 27, 2017.
  35. 35.
    deCODE Genetics. Accessed October 27, 2017.
  36. 36.
    Carey DJ, Fetterolf SN, Davis FD, Faucett WA, Kirchner HL, Mirshahi U, et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med. 2016;18:906–13.CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Kaiser Permanente. Research Program on Genes, Environment and Health. Accessed October 27, 2017.
  38. 38.
    Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014;52:28–35.CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Green ED, Guyer MS. National Human Genome Research I. Charting a course for genomic medicine from base pairs to bedside. Nature. 2011;470:204–13.CrossRefPubMedGoogle Scholar
  40. 40.
    Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12:417–28.CrossRefPubMedGoogle Scholar
  41. 41.
    Bustamente CD. Genomics For the world. Nature. 2011;475:163–5.CrossRefGoogle Scholar
  42. 42.
    Gravel S. Demographic History and rare allele sharing among human populations. PNAS 2011;108.Google Scholar
  43. 43.
    Shae S. Accelerating The use of electronic health Records in Physician Practices. N Engl J Med 2010;362.Google Scholar
  44. 44.
    Wei WQ, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc. 2016;23:e20–7.CrossRefPubMedGoogle Scholar
  45. 45.
    eMERGE network (Electronic Medical Records and Genomics): Accessed October 27, 2017.
  46. 46.
    Kho AN. Electronic Medical Records for Genetic Research- Results of the eMERGE consortium. Sci Transl Med 2011;3.Google Scholar
  47. 47.
    Biob ank UK. Published papers. www. 2017.
  48. 48.
    Celis-Morales CA, Lyall DM, Welsh P, Anderson J, Steell L, Guo Y, et al. Association between active commuting and incident cardiovascular disease, cancer, and mortality: prospective cohort study. BMJ. 2017;357:j1456.CrossRefPubMedGoogle Scholar
  49. 49.
    Groft SC. Rare. Diseases research: expanding collaborative translational research opportunities. Chest. 2013;144:16–23.CrossRefGoogle Scholar
  50. 50.
    Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14:681–91.CrossRefPubMedGoogle Scholar
  51. 51.
    Majid Akhtar M, Elliott PM. Rare Disease in Cardiovascular Medicine I. Eur Heart J. 2017;38:1625–8.CrossRefPubMedGoogle Scholar
  52. 52.
    Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009;10:241–51.CrossRefPubMedGoogle Scholar
  53. 53.
    Zanoni P. Rare Variant in scavenger receptor BI raises HDL cholesterol and increases risk of coronary heart disease. Science 2016;351.Google Scholar
  54. 54.
    Stitziel NO, Khera AV, Wang X, Bierhals AJ, Vourakis AC, Sperry AE, et al. ANGPTL3 deficiency and protection against coronary artery disease. J Am Coll Cardiol. 2017;69:2054–63.CrossRefPubMedGoogle Scholar
  55. 55.
    Biobanks BE. Electronic medical records- enabling CostEffective research. Sci Transl Med. 2014;234Google Scholar
  56. 56.
    • Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM and Crawford DC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–10. This article describes an electronic health record-based method to perform Phenome Wide Association Study (PheWAS), and demonstrates the efficacy of PheWAS on a variety of established SNP-disease associations. Google Scholar
  57. 57.
    Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31:1102–10.CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Gupta V, Walia GK, Sachdeva MP. 'Mendelian randomization': an approach for exploring causal relations in epidemiology. Public Health. 2017;145:113–9.CrossRefPubMedGoogle Scholar
  59. 59.
    Thanassoulis G, Mendelian O'DCJ. Randomization: nature's randomized trial in the post-genome era. JAMA. 2009;301:2386–8.CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Voight BF, Peloso GM, Orho-Melander M, Frikke-Schmidt R, Barbalic M, Jensen MK, et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet. 2012;380:572–80.CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Klarin D, Emdin CA, Natarajan P, Conrad MF, Consortium I, Kathiresan S. Genetic Analysis. Of venous thromboembolism in UK biobank identifies the ZFPM2 locus and implicates obesity as a causal risk factor. Circ Cardiovasc Genet. 2017;10:e001643.CrossRefPubMedGoogle Scholar
  62. 62.
    Burgess SaSGT. Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation: Boca Raton: Chapman & Hall/CRC; 2015.Google Scholar
  63. 63.
    Kalia SS, Adelman K, Bale SJ, Chung WK, Eng C, Evans JP, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet Med. 2017;19:249–55.CrossRefPubMedGoogle Scholar
  64. 64.
    O'Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40:1620–39.CrossRefPubMedPubMedCentralGoogle Scholar
  65. 65.
    Birman-Deych E. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43:480–5.CrossRefPubMedGoogle Scholar
  66. 66.
    Rubbo B, Fitzpatrick NK, Denaxas S, Daskalopoulou M, Yu N, Patel RS, et al. Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: a systematic review and recommendations. Int J Cardiol. 2015;187:705–11.CrossRefPubMedGoogle Scholar
  67. 67.
    Hripcsak G, Knirsch C, Zhou L, Wilcox A, Bias MG. Associated with mining electronic health records. J Biomed Discov Collab. 2011;6:48–52.CrossRefPubMedPubMedCentralGoogle Scholar
  68. 68.
    Wright SM. Where do elderly veterans obtain Care for Acute Myocardial Infarction- Department of veterans affairs or Medicare? Health Serv Res 1997;31.Google Scholar
  69. 69.
    Gange SJ, Golub ET. From smallpox to big data: the next 100 years of epidemiologic methods. Am J Epidemiol. 2016;183:423–6.CrossRefPubMedGoogle Scholar
  70. 70.
    Ehrenstein V, Nielsen H, Pedersen AB, Johnsen SP, Pedersen L. Clinical epidemiology in the era of big data: new opportunities, familiar challenges. Clin Epidemiol. 2017;9:245–50.CrossRefPubMedPubMedCentralGoogle Scholar
  71. 71.
    Biobank UK. Genotyping and quality control of UK biobank, a large-scale, extensively phenotyped prospective resource. UK Biobank Press Release 2015.Google Scholar
  72. 72.
    Ganna A, Ingelsson E. 5 year mortality predictors in 498 103 UK biobank participants: a prospective population-based study. Lancet. 2015;386:533–40.CrossRefPubMedGoogle Scholar
  73. 73.
    Lyall DM, Celis-Morales CA, Anderson J, Gill JM, Mackay DF, McIntosh AM, et al. Associations between single and multiple cardiometabolic diseases and cognitive abilities in 474 129 UK biobank participants. Eur Heart J. 2017;38:577–83.PubMedGoogle Scholar
  74. 74.
    Wilson PW. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–47.CrossRefPubMedGoogle Scholar
  75. 75.
    Sanderson SC, Brothers KB, Mercaldo ND, Clayton EW, Antommaria AH, Aufox SA, et al. Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the US. Am J Hum Genet. 2017;100:414–27.CrossRefPubMedPubMedCentralGoogle Scholar
  76. 76.
    Wolf SM, Crock BN, Van Ness B, Lawrenz F, Kahn JP, Beskow LM. C et al. managing incidental findings and research results in genomic research involving biobanks and archived data sets. Genet Med. 2012;14:361–84.CrossRefPubMedPubMedCentralGoogle Scholar
  77. 77.
    Bledsoe MJ, Clayton EW, McGuire AL, Grizzle WE, O'Rourke PP, Zeps N. Return of research results from genomic biobanks: cost matters. Genet Med. 2013;15:103–5.CrossRefPubMedGoogle Scholar

Copyright information

© This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2018

Authors and Affiliations

  • Aeron M. Small
    • 1
  • Christopher J. O’Donnell
    • 2
    • 3
    • 4
  • Scott M. Damrauer
    • 5
    • 6
    • 7
  1. 1.Department of Medicine, Yale New Haven HospitalYale University School of MedicineNew HavenUSA
  2. 2.Cardiology Section, Department of Medicine, Veterans Affairs Boston Healthcare SystemBostonUSA
  3. 3.Cardiovascular Medicine Division, Department of Medicine, Brigham and Women’s HospitalHarvard Medical SchoolBostonUSA
  4. 4.Million Veteran Program, Department of Veteran’s AffairsWashingtonUSA
  5. 5.Corporal Michael Crescenz VA Medical CenterPhiladelphiaUSA
  6. 6.Perlman School of MedicineUniversity of PennsylvaniaPhiladelphiaUSA
  7. 7.Hospital of the University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations