Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies

Bayer, Philipp E.; Gill, Mitchell; Danilevicz, Monica F.; Edwards, David

doi:10.1007/978-1-0716-2237-7_9

Philipp E. Bayer⁴,
Mitchell Gill⁴,
Monica F. Danilevicz⁴ &
…
David Edwards⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

2351 Accesses
2 Citations

Abstract

Single-nucleotide polymorphisms (SNPs) have become the primary type of molecular genetic marker used in a diverse range of genetic and genomic studies. SNPs can be used to identify genomic regions linked to traits such as disease in genome-wide association studies, to understand population structure and diversity, or to understand mechanisms of genome evolution. One of the first steps of any SNP-based workflow, following SNP discovery, is quality control of SNP data. The protocol described here details how to perform quality control on SNP data to minimise errors in downstream analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cavalli-Sforza LL, Bodmer WF (1971) The genetics of human populations. Courier Corporation, Chelmsford
Google Scholar
Brownlee G, Sanger F, Barrell B (1967) Nucleotide sequence of 5 S-ribosomal RNA from Escherichia coli. Nature 215(5102):735–736
Article CAS Google Scholar
Chang JC, Kan YW (1979) Beta 0 thalassemia, a nonsense mutation in man. Proc Natl Acad Sci 76(6):2886–2889
Article CAS Google Scholar
Sherry ST et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311
Article CAS Google Scholar
Doddamani D et al (2015) CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea. Database 2015:bav078
Article Google Scholar
Lai K et al (2015) Identification and characterization of more than 4 million intervarietal SNP s across the group 7 chromosomes of bread wheat. Plant Biotechnol J 13(1):97–104
Article CAS Google Scholar
Scheben A et al (2019) CropSNPdb: a database of SNP array data for Brassica crops and hexaploid bread wheat. Plant J 98(1):142–152
Article CAS Google Scholar
Batley J et al (2003) A high-throughput SNuPE assay for genotyping SNPs in the flanking regions of Zea mays sequence tagged simple sequence repeats. Mol Breed 11(2):111–120
Article CAS Google Scholar
Barker G et al (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19(3):421–422
Article CAS Google Scholar
Lorenc MT et al (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1(2):370–382
Article CAS Google Scholar
Edwards D, Batley J, Snowdon RJ (2013) Accessing complex crop genomes with next-generation sequencing. Theor Appl Genet 126(1):1–11
Article CAS Google Scholar
Hurgobin B, Edwards D (2017) SNP discovery using a pangenome: has the single reference approach become obsolete? Biology 6(1):21
Article Google Scholar
Golicz AA et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36(2):132–145
Article CAS Google Scholar
Bayer PE et al (2020) Plant pan-genomes are the new reference. Nat Plants 6:914–920
Article Google Scholar
Tuupanen S et al (2009) The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet 41(8):885–890
Article CAS Google Scholar
Samani NJ et al (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357(5):443–453
Article CAS Google Scholar
Pillai SG et al (2009) A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 5(3):e1000421
Article Google Scholar
Edwards TL et al (2010) Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Ann Hum Genet 74(2):97–109
Article CAS Google Scholar
Thomas G et al (2008) Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 40(3):310–315
Article CAS Google Scholar
Michailidou K et al (2015) Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet 47(4):373–380
Article CAS Google Scholar
Cooper JD et al (2008) Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet 40(12):1399–1401
Article CAS Google Scholar
Wiggans GR et al (2017) Genomic selection in dairy cattle: the USDA experience. Annu Rev Anim Biosci 5:309–327
Article Google Scholar
Platt A et al (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6(2):e1000843
Article Google Scholar
Mousavi-Derazmahalleh M et al (2018) The western Mediterranean region provided the founder population of domesticated narrow-leafed lupin. Theor Appl Genet 131(12):2543–2554
Article CAS Google Scholar
Bayer PE et al (2015) High-resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. Theor Appl Genet 128(6):1039–1047
Article Google Scholar
Dalton-Morgan J et al (2014) A high-throughput SNP array in the amphidiploid species Brassica napus shows diversity in resistance genes. Funct Integr Genomics 14(4):643–655
Article CAS Google Scholar
Gacek K et al (2017) Genome-wide association study of genetic control of seed fatty acid biosynthesis in Brassica napus. Front Plant Sci 7:2062
Article Google Scholar
Poland J et al (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome 5(3):103–113
CAS Google Scholar
Ribeiro A et al (2015) An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC Bioinformatics 16(1):1–16
Article Google Scholar
McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303
Article CAS Google Scholar
Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907
Google Scholar
Danecek P et al (2021) Twelve years of SAMtools and BCFtools. GigaScience 10(2):giab008
Article Google Scholar
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158
Article CAS Google Scholar
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Article CAS Google Scholar
Grüning B et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476
Article Google Scholar
Xu J et al (2002) Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Hum Genet 111(6):573
Article Google Scholar

Download references

Author information

Authors and Affiliations

Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
Philipp E. Bayer, Mitchell Gill, Monica F. Danilevicz & David Edwards

Authors

Philipp E. Bayer
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Gill
View author publications
You can also search for this author in PubMed Google Scholar
Monica F. Danilevicz
View author publications
You can also search for this author in PubMed Google Scholar
David Edwards
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Edwards .

Editor information

Editors and Affiliations

Département de Phytologie, Université Laval, Quebec City, QC, Canada
Davoud Torkamaneh
Institut de biologie intégrative et des systems, Université Laval, Québec, QC, Canada
François Belzile

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bayer, P.E., Gill, M., Danilevicz, M.F., Edwards, D. (2022). Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_9

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2237-7_9
Published: 01 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2236-0
Online ISBN: 978-1-0716-2237-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics