Abstract
Single-nucleotide polymorphisms (SNPs) have become the primary type of molecular genetic marker used in a diverse range of genetic and genomic studies. SNPs can be used to identify genomic regions linked to traits such as disease in genome-wide association studies, to understand population structure and diversity, or to understand mechanisms of genome evolution. One of the first steps of any SNP-based workflow, following SNP discovery, is quality control of SNP data. The protocol described here details how to perform quality control on SNP data to minimise errors in downstream analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cavalli-Sforza LL, Bodmer WF (1971) The genetics of human populations. Courier Corporation, Chelmsford
Brownlee G, Sanger F, Barrell B (1967) Nucleotide sequence of 5 S-ribosomal RNA from Escherichia coli. Nature 215(5102):735–736
Chang JC, Kan YW (1979) Beta 0 thalassemia, a nonsense mutation in man. Proc Natl Acad Sci 76(6):2886–2889
Sherry ST et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311
Doddamani D et al (2015) CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea. Database 2015:bav078
Lai K et al (2015) Identification and characterization of more than 4 million intervarietal SNP s across the group 7 chromosomes of bread wheat. Plant Biotechnol J 13(1):97–104
Scheben A et al (2019) CropSNPdb: a database of SNP array data for Brassica crops and hexaploid bread wheat. Plant J 98(1):142–152
Batley J et al (2003) A high-throughput SNuPE assay for genotyping SNPs in the flanking regions of Zea mays sequence tagged simple sequence repeats. Mol Breed 11(2):111–120
Barker G et al (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19(3):421–422
Lorenc MT et al (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1(2):370–382
Edwards D, Batley J, Snowdon RJ (2013) Accessing complex crop genomes with next-generation sequencing. Theor Appl Genet 126(1):1–11
Hurgobin B, Edwards D (2017) SNP discovery using a pangenome: has the single reference approach become obsolete? Biology 6(1):21
Golicz AA et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36(2):132–145
Bayer PE et al (2020) Plant pan-genomes are the new reference. Nat Plants 6:914–920
Tuupanen S et al (2009) The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet 41(8):885–890
Samani NJ et al (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357(5):443–453
Pillai SG et al (2009) A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 5(3):e1000421
Edwards TL et al (2010) Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Ann Hum Genet 74(2):97–109
Thomas G et al (2008) Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 40(3):310–315
Michailidou K et al (2015) Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet 47(4):373–380
Cooper JD et al (2008) Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet 40(12):1399–1401
Wiggans GR et al (2017) Genomic selection in dairy cattle: the USDA experience. Annu Rev Anim Biosci 5:309–327
Platt A et al (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6(2):e1000843
Mousavi-Derazmahalleh M et al (2018) The western Mediterranean region provided the founder population of domesticated narrow-leafed lupin. Theor Appl Genet 131(12):2543–2554
Bayer PE et al (2015) High-resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. Theor Appl Genet 128(6):1039–1047
Dalton-Morgan J et al (2014) A high-throughput SNP array in the amphidiploid species Brassica napus shows diversity in resistance genes. Funct Integr Genomics 14(4):643–655
Gacek K et al (2017) Genome-wide association study of genetic control of seed fatty acid biosynthesis in Brassica napus. Front Plant Sci 7:2062
Poland J et al (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome 5(3):103–113
Ribeiro A et al (2015) An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC Bioinformatics 16(1):1–16
McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303
Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907
Danecek P et al (2021) Twelve years of SAMtools and BCFtools. GigaScience 10(2):giab008
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Grüning B et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476
Xu J et al (2002) Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Hum Genet 111(6):573
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bayer, P.E., Gill, M., Danilevicz, M.F., Edwards, D. (2022). Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2237-7_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2236-0
Online ISBN: 978-1-0716-2237-7
eBook Packages: Springer Protocols