Data Integration, Imputation , and Meta-analysis for Genome-Wide Association Studies

Joukhadar, Reem; Daetwyler, Hans D.

doi:10.1007/978-1-0716-2237-7_11

Reem Joukhadar⁴ &
Hans D. Daetwyler^4,5

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

2557 Accesses
2 Citations
5 Altmetric

Abstract

Growing genomic and phenotypic datasets require different groups around the world to collaborate and integrate these valuable resources to maximize their benefit and increase reference population sizes for genomic prediction and genome-wide association studies (GWAS). However, different studies use different genotyping techniques which requires a synchronizing step for the genotyped variants called “imputation” before combining them. Optimally, different GWAS datasets can be analysed within a meta-analysis, which recruits summary statistics instead of actual data. This chapter describes the general principles for genotypic imputation and meta-GWAS analysis with a description of study designs and command lines required for such analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9(1):29
Article CAS PubMed PubMed Central Google Scholar
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24
Article CAS PubMed PubMed Central Google Scholar
Battenfield SD, Sheridan JL, Silva LD, Miclaus KJ, Dreisigacker S, Wolfinger RD et al (2018) Breeding-assisted genomics: applying meta-GWAS for milling and baking quality in CIMMYT wheat breeding program. PLoS One 13(11):e0204757
Article PubMed PubMed Central CAS Google Scholar
Evangelou E, Ioannidis JPA (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389
Article CAS PubMed Google Scholar
Bolormaa S, Pryce JE, Reverter A, Zhang Y, Barendse W, Kemper K et al (2014) A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet 10(3):e1004198
Article PubMed PubMed Central CAS Google Scholar
Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S et al (2014) Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. The plant. Genome 7(3). https://doi.org/10.3835/plantgenome2014.05.0023
Whalen A, Gorjanc G, Ros-Freixedes R, Hickey JM (2018) Assessment of the performance of hidden Markov models for imputation in animal breeding. Genet Sel Evol 50(1):1–10
Article Google Scholar
Torkamaneh D, Boyle B, Belzile F (2018) Efficient genome-wide genotyping strategies and data integration in crop plants. Theor Appl Genet 131(3):499–511
Article CAS PubMed Google Scholar
Spiliopoulou A, Colombo M, Orchard P, Agakov F, McKeigue P (2017) GeneImp: fast imputation to large reference panels using genotype likelihoods from ultralow coverage sequencing. Genetics 206(1):91–104
Article PubMed PubMed Central Google Scholar
Das S, Abecasis GR, Browning BL (2018) Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 19:73–96
Article CAS PubMed Google Scholar
Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D, Daly MJ (2006) Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet 38(6):663–667
Article CAS PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Article CAS PubMed PubMed Central Google Scholar
Daetwyler HD, Wiggans GR, Hayes BJ, Woolliams JA, Goddard ME (2011) Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 189(1):317–327
Article CAS PubMed PubMed Central Google Scholar
Hickey JM, Kinghorn BP, Tier B, van der Werf JHJ, Cleveland MA (2012) A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation. Genet Sel Evol 44:9
Article PubMed PubMed Central Google Scholar
Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101
Article CAS PubMed Google Scholar
Browning SR, Browning BL (2011) Haplotype phasing: existing methods and new developments. Nat Rev Genet 12(10):703–714
Article CAS PubMed PubMed Central Google Scholar
Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4):2213–2233
Article CAS PubMed PubMed Central Google Scholar
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097
Article CAS PubMed PubMed Central Google Scholar
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834
Article PubMed PubMed Central Google Scholar
Rubinacci S, Delaneau O, Marchini J (2020) Genotype imputation using the positional burrows wheeler transform. PLoS Genet 16(11):e1009049
Article CAS PubMed PubMed Central Google Scholar
Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30(9):1266–1272
Article CAS PubMed PubMed Central Google Scholar
Meuwissen T, Goddard M (2010) The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data. Genetics 185(4):1441–1449
Article PubMed PubMed Central Google Scholar
Whalen A, Hickey JM (2020) AlphaImpute2: Fast and accurate pedigree and population based imputation for hundreds of thousands of individuals in livestock populations. bioRxiv 2020.09.16.299677; https://doi.org/10.1101/2020.09.16.299677
Sargolzaei M, Chesnais JP, Schenkel FS (2014) A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15(1):1–12
Article Google Scholar
Rutkoski JE, Poland J, Jannink JL, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3 (Bethesda) 3(3):427–439
Article Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article CAS PubMed Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 39(1):1–22
Google Scholar
Stekhoven DJ, Bühlmann P (2012) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118
Article CAS PubMed Google Scholar
Money D, Gardner K, Migicovsky Z, Schwaninger H, Zhong GY, Myles S (2015) LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3: genes, genomes. Genetics 5(11):2383–2390
Google Scholar
Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O (2021) Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet 53(1):120–126
Article CAS PubMed Google Scholar
Snelling WM, Hoff JL, Li JH, Kuehn LA, Keel BN, Lindholm-Perry AK, Pickrell JK (2020) Assessment of imputation from low-pass sequencing to predict merit of beef steers. Genes 11(11):1312
Article CAS PubMed Central Google Scholar
Huang Y, Hickey JM, Cleveland MA, Maltecca C (2012) Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genet Sel Evol 44:25
Article PubMed PubMed Central Google Scholar
Shi F, Tibbits J, Pasam RK, Kay P, Wong D, Petkowski J et al (2017) Exome sequence genotype imputation in globally diverse hexaploid wheat accessions. Theor Appl Genet 130(7):1393–1404
Article CAS PubMed Google Scholar
de Oliveira AA, Guimarães LJM, Guimarães CT, Guimarães PEDO, Pinto MDO, Pastina MM, Margarido GRA (2020) Single nucleotide polymorphism calling and imputation strategies for cost-effective genotyping in a tropical maize breeding program. Crop Sci 60(6):3066–3082
Article CAS Google Scholar
Wang DR, Agosto-Pérez FJ, Chebotarov D, Shi Y, Marchini J, Fitzgerald M et al (2018) An imputation platform to enhance integration of rice genetic resources. Nat Commun 9(1):3519
Article PubMed PubMed Central CAS Google Scholar
Iwata H, Jannink JL (2010) Marker genotype imputation in a low-marker-density panel with a high-marker-density reference panel: accuracy evaluation in barley breeding lines. Crop Sci 50(4):1269–1278
Article Google Scholar
Fikere M, Barbulescu DM, Malmberg MM, Spangenberg GC, Cogan NO, Daetwyler HD (2020) Meta-analysis of GWAS in canola blackleg (Leptosphaeria maculans) disease traits demonstrates increased power from imputed whole-genome sequence. Sci Rep 10:14300
Article CAS PubMed PubMed Central Google Scholar
Happ MM, Wang H, Graef GL, Hyten DL (2019) Generating high density, low cost genotype data in soybean [Glycine max (L.) Merr.]. G3 (Bethesda) 9(7):2153–2160
Article CAS Google Scholar
Torkamaneh D, Belzile F (2015) Scanning and filling: ultra-dense SNP genotyping combining genotyping-by-sequencing, SNP array and whole-genome resequencing data. PLoS One 10(7):e0131533
Article PubMed PubMed Central CAS Google Scholar
Jensen SE, Charles JR, Muleta K, Bradbury PJ, Casstevens T, Deshpande SP et al (2020) A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome 13(1):e20009
Article CAS PubMed Google Scholar
Joukhadar R, Thistlethwaite R, Trethowan R, Keeble-Gagnère G, Hayden MJ, Ullah S, Daetwyler HD (2021) Meta-analysis of genome-wide association studies reveal common loci controlling agronomic and quality traits in a wide range of normal and heat stressed environments. Theor Appl Genet 134(7):2113–2127. https://doi.org/10.1007/s00122-021-03809-y
Article CAS PubMed Google Scholar
Gao Y, Yang Z, Yang W, Yang Y, Gong J, Yang QY, Niu X (2021) Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation. Nucleic Acids Res. Jan 8;49(D1):D1480-D1488. https://doi.org/10.1093/nar/gkaa953. PMID: 33137192; PMCID: PMC7779032
Zeggini E, Ioannidis JP (2009) Meta-analysis in genome-wide association studies. Pharmacogenomics 10:191–201
Article PubMed Google Scholar
Pereira TV, Patsopoulos NA, Salanti G, Ioannidis JP (2009) Discovery properties of genome-wide association signals from cumulatively combined data sets. Am J Epidemiol 170(10):1197–1206
Article PubMed PubMed Central Google Scholar
Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26(17):2190–2191
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
Reem Joukhadar & Hans D. Daetwyler
School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
Hans D. Daetwyler

Authors

Reem Joukhadar
View author publications
You can also search for this author in PubMed Google Scholar
Hans D. Daetwyler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans D. Daetwyler .

Editor information

Editors and Affiliations

Département de Phytologie, Université Laval, Quebec City, QC, Canada
Davoud Torkamaneh
Institut de biologie intégrative et des systems, Université Laval, Québec, QC, Canada
François Belzile

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Joukhadar, R., Daetwyler, H.D. (2022). Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_11

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2237-7_11
Published: 01 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2236-0
Online ISBN: 978-1-0716-2237-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies