Skip to main content

Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies

  • Protocol
  • First Online:
Genome-Wide Association Studies

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

Abstract

Single-nucleotide polymorphisms (SNPs) have become the primary type of molecular genetic marker used in a diverse range of genetic and genomic studies. SNPs can be used to identify genomic regions linked to traits such as disease in genome-wide association studies, to understand population structure and diversity, or to understand mechanisms of genome evolution. One of the first steps of any SNP-based workflow, following SNP discovery, is quality control of SNP data. The protocol described here details how to perform quality control on SNP data to minimise errors in downstream analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cavalli-Sforza LL, Bodmer WF (1971) The genetics of human populations. Courier Corporation, Chelmsford

    Google Scholar 

  2. Brownlee G, Sanger F, Barrell B (1967) Nucleotide sequence of 5 S-ribosomal RNA from Escherichia coli. Nature 215(5102):735–736

    Article  CAS  Google Scholar 

  3. Chang JC, Kan YW (1979) Beta 0 thalassemia, a nonsense mutation in man. Proc Natl Acad Sci 76(6):2886–2889

    Article  CAS  Google Scholar 

  4. Sherry ST et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311

    Article  CAS  Google Scholar 

  5. Doddamani D et al (2015) CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea. Database 2015:bav078

    Article  Google Scholar 

  6. Lai K et al (2015) Identification and characterization of more than 4 million intervarietal SNP s across the group 7 chromosomes of bread wheat. Plant Biotechnol J 13(1):97–104

    Article  CAS  Google Scholar 

  7. Scheben A et al (2019) CropSNPdb: a database of SNP array data for Brassica crops and hexaploid bread wheat. Plant J 98(1):142–152

    Article  CAS  Google Scholar 

  8. Batley J et al (2003) A high-throughput SNuPE assay for genotyping SNPs in the flanking regions of Zea mays sequence tagged simple sequence repeats. Mol Breed 11(2):111–120

    Article  CAS  Google Scholar 

  9. Barker G et al (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19(3):421–422

    Article  CAS  Google Scholar 

  10. Lorenc MT et al (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1(2):370–382

    Article  CAS  Google Scholar 

  11. Edwards D, Batley J, Snowdon RJ (2013) Accessing complex crop genomes with next-generation sequencing. Theor Appl Genet 126(1):1–11

    Article  CAS  Google Scholar 

  12. Hurgobin B, Edwards D (2017) SNP discovery using a pangenome: has the single reference approach become obsolete? Biology 6(1):21

    Article  Google Scholar 

  13. Golicz AA et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36(2):132–145

    Article  CAS  Google Scholar 

  14. Bayer PE et al (2020) Plant pan-genomes are the new reference. Nat Plants 6:914–920

    Article  Google Scholar 

  15. Tuupanen S et al (2009) The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet 41(8):885–890

    Article  CAS  Google Scholar 

  16. Samani NJ et al (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357(5):443–453

    Article  CAS  Google Scholar 

  17. Pillai SG et al (2009) A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 5(3):e1000421

    Article  Google Scholar 

  18. Edwards TL et al (2010) Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Ann Hum Genet 74(2):97–109

    Article  CAS  Google Scholar 

  19. Thomas G et al (2008) Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 40(3):310–315

    Article  CAS  Google Scholar 

  20. Michailidou K et al (2015) Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet 47(4):373–380

    Article  CAS  Google Scholar 

  21. Cooper JD et al (2008) Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet 40(12):1399–1401

    Article  CAS  Google Scholar 

  22. Wiggans GR et al (2017) Genomic selection in dairy cattle: the USDA experience. Annu Rev Anim Biosci 5:309–327

    Article  Google Scholar 

  23. Platt A et al (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6(2):e1000843

    Article  Google Scholar 

  24. Mousavi-Derazmahalleh M et al (2018) The western Mediterranean region provided the founder population of domesticated narrow-leafed lupin. Theor Appl Genet 131(12):2543–2554

    Article  CAS  Google Scholar 

  25. Bayer PE et al (2015) High-resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. Theor Appl Genet 128(6):1039–1047

    Article  Google Scholar 

  26. Dalton-Morgan J et al (2014) A high-throughput SNP array in the amphidiploid species Brassica napus shows diversity in resistance genes. Funct Integr Genomics 14(4):643–655

    Article  CAS  Google Scholar 

  27. Gacek K et al (2017) Genome-wide association study of genetic control of seed fatty acid biosynthesis in Brassica napus. Front Plant Sci 7:2062

    Article  Google Scholar 

  28. Poland J et al (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome 5(3):103–113

    CAS  Google Scholar 

  29. Ribeiro A et al (2015) An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC Bioinformatics 16(1):1–16

    Article  Google Scholar 

  30. McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303

    Article  CAS  Google Scholar 

  31. Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907

    Google Scholar 

  32. Danecek P et al (2021) Twelve years of SAMtools and BCFtools. GigaScience 10(2):giab008

    Article  Google Scholar 

  33. Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  CAS  Google Scholar 

  34. Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575

    Article  CAS  Google Scholar 

  35. Grüning B et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476

    Article  Google Scholar 

  36. Xu J et al (2002) Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Hum Genet 111(6):573

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Edwards .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Bayer, P.E., Gill, M., Danilevicz, M.F., Edwards, D. (2022). Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2237-7_9

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2236-0

  • Online ISBN: 978-1-0716-2237-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics