Skip to main content

Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies

  • Protocol
  • First Online:
Genome-Wide Association Studies

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

Abstract

Growing genomic and phenotypic datasets require different groups around the world to collaborate and integrate these valuable resources to maximize their benefit and increase reference population sizes for genomic prediction and genome-wide association studies (GWAS). However, different studies use different genotyping techniques which requires a synchronizing step for the genotyped variants called “imputation” before combining them. Optimally, different GWAS datasets can be analysed within a meta-analysis, which recruits summary statistics instead of actual data. This chapter describes the general principles for genotypic imputation and meta-GWAS analysis with a description of study designs and command lines required for such analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9(1):29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Battenfield SD, Sheridan JL, Silva LD, Miclaus KJ, Dreisigacker S, Wolfinger RD et al (2018) Breeding-assisted genomics: applying meta-GWAS for milling and baking quality in CIMMYT wheat breeding program. PLoS One 13(11):e0204757

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Evangelou E, Ioannidis JPA (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389

    Article  CAS  PubMed  Google Scholar 

  5. Bolormaa S, Pryce JE, Reverter A, Zhang Y, Barendse W, Kemper K et al (2014) A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet 10(3):e1004198

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S et al (2014) Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. The plant. Genome 7(3). https://doi.org/10.3835/plantgenome2014.05.0023

  7. Whalen A, Gorjanc G, Ros-Freixedes R, Hickey JM (2018) Assessment of the performance of hidden Markov models for imputation in animal breeding. Genet Sel Evol 50(1):1–10

    Article  Google Scholar 

  8. Torkamaneh D, Boyle B, Belzile F (2018) Efficient genome-wide genotyping strategies and data integration in crop plants. Theor Appl Genet 131(3):499–511

    Article  CAS  PubMed  Google Scholar 

  9. Spiliopoulou A, Colombo M, Orchard P, Agakov F, McKeigue P (2017) GeneImp: fast imputation to large reference panels using genotype likelihoods from ultralow coverage sequencing. Genetics 206(1):91–104

    Article  PubMed  PubMed Central  Google Scholar 

  10. Das S, Abecasis GR, Browning BL (2018) Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 19:73–96

    Article  CAS  PubMed  Google Scholar 

  11. Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D, Daly MJ (2006) Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet 38(6):663–667

    Article  CAS  PubMed  Google Scholar 

  12. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Daetwyler HD, Wiggans GR, Hayes BJ, Woolliams JA, Goddard ME (2011) Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 189(1):317–327

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hickey JM, Kinghorn BP, Tier B, van der Werf JHJ, Cleveland MA (2012) A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation. Genet Sel Evol 44:9

    Article  PubMed  PubMed Central  Google Scholar 

  15. Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101

    Article  CAS  PubMed  Google Scholar 

  16. Browning SR, Browning BL (2011) Haplotype phasing: existing methods and new developments. Nat Rev Genet 12(10):703–714

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4):2213–2233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834

    Article  PubMed  PubMed Central  Google Scholar 

  20. Rubinacci S, Delaneau O, Marchini J (2020) Genotype imputation using the positional burrows wheeler transform. PLoS Genet 16(11):e1009049

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30(9):1266–1272

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Meuwissen T, Goddard M (2010) The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data. Genetics 185(4):1441–1449

    Article  PubMed  PubMed Central  Google Scholar 

  23. Whalen A, Hickey JM (2020) AlphaImpute2: Fast and accurate pedigree and population based imputation for hundreds of thousands of individuals in livestock populations. bioRxiv 2020.09.16.299677; https://doi.org/10.1101/2020.09.16.299677

  24. Sargolzaei M, Chesnais JP, Schenkel FS (2014) A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15(1):1–12

    Article  Google Scholar 

  25. Rutkoski JE, Poland J, Jannink JL, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3 (Bethesda) 3(3):427–439

    Article  Google Scholar 

  26. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525

    Article  CAS  PubMed  Google Scholar 

  27. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 39(1):1–22

    Google Scholar 

  28. Stekhoven DJ, Bühlmann P (2012) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118

    Article  CAS  PubMed  Google Scholar 

  29. Money D, Gardner K, Migicovsky Z, Schwaninger H, Zhong GY, Myles S (2015) LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3: genes, genomes. Genetics 5(11):2383–2390

    Google Scholar 

  30. Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O (2021) Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet 53(1):120–126

    Article  CAS  PubMed  Google Scholar 

  31. Snelling WM, Hoff JL, Li JH, Kuehn LA, Keel BN, Lindholm-Perry AK, Pickrell JK (2020) Assessment of imputation from low-pass sequencing to predict merit of beef steers. Genes 11(11):1312

    Article  CAS  PubMed Central  Google Scholar 

  32. Huang Y, Hickey JM, Cleveland MA, Maltecca C (2012) Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genet Sel Evol 44:25

    Article  PubMed  PubMed Central  Google Scholar 

  33. Shi F, Tibbits J, Pasam RK, Kay P, Wong D, Petkowski J et al (2017) Exome sequence genotype imputation in globally diverse hexaploid wheat accessions. Theor Appl Genet 130(7):1393–1404

    Article  CAS  PubMed  Google Scholar 

  34. de Oliveira AA, Guimarães LJM, Guimarães CT, Guimarães PEDO, Pinto MDO, Pastina MM, Margarido GRA (2020) Single nucleotide polymorphism calling and imputation strategies for cost-effective genotyping in a tropical maize breeding program. Crop Sci 60(6):3066–3082

    Article  CAS  Google Scholar 

  35. Wang DR, Agosto-Pérez FJ, Chebotarov D, Shi Y, Marchini J, Fitzgerald M et al (2018) An imputation platform to enhance integration of rice genetic resources. Nat Commun 9(1):3519

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Iwata H, Jannink JL (2010) Marker genotype imputation in a low-marker-density panel with a high-marker-density reference panel: accuracy evaluation in barley breeding lines. Crop Sci 50(4):1269–1278

    Article  Google Scholar 

  37. Fikere M, Barbulescu DM, Malmberg MM, Spangenberg GC, Cogan NO, Daetwyler HD (2020) Meta-analysis of GWAS in canola blackleg (Leptosphaeria maculans) disease traits demonstrates increased power from imputed whole-genome sequence. Sci Rep 10:14300

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Happ MM, Wang H, Graef GL, Hyten DL (2019) Generating high density, low cost genotype data in soybean [Glycine max (L.) Merr.]. G3 (Bethesda) 9(7):2153–2160

    Article  CAS  Google Scholar 

  39. Torkamaneh D, Belzile F (2015) Scanning and filling: ultra-dense SNP genotyping combining genotyping-by-sequencing, SNP array and whole-genome resequencing data. PLoS One 10(7):e0131533

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Jensen SE, Charles JR, Muleta K, Bradbury PJ, Casstevens T, Deshpande SP et al (2020) A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome 13(1):e20009

    Article  CAS  PubMed  Google Scholar 

  41. Joukhadar R, Thistlethwaite R, Trethowan R, Keeble-Gagnère G, Hayden MJ, Ullah S, Daetwyler HD (2021) Meta-analysis of genome-wide association studies reveal common loci controlling agronomic and quality traits in a wide range of normal and heat stressed environments. Theor Appl Genet 134(7):2113–2127. https://doi.org/10.1007/s00122-021-03809-y

    Article  CAS  PubMed  Google Scholar 

  42. Gao Y, Yang Z, Yang W, Yang Y, Gong J, Yang QY, Niu X (2021) Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation. Nucleic Acids Res. Jan 8;49(D1):D1480-D1488. https://doi.org/10.1093/nar/gkaa953. PMID: 33137192; PMCID: PMC7779032

  43. Zeggini E, Ioannidis JP (2009) Meta-analysis in genome-wide association studies. Pharmacogenomics 10:191–201

    Article  PubMed  Google Scholar 

  44. Pereira TV, Patsopoulos NA, Salanti G, Ioannidis JP (2009) Discovery properties of genome-wide association signals from cumulatively combined data sets. Am J Epidemiol 170(10):1197–1206

    Article  PubMed  PubMed Central  Google Scholar 

  45. Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26(17):2190–2191

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans D. Daetwyler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Joukhadar, R., Daetwyler, H.D. (2022). Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2237-7_11

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2236-0

  • Online ISBN: 978-1-0716-2237-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics