Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

  • Ben Hayes
Part of the Methods in Molecular Biology book series (MIMB, volume 1019)


This chapter provides an overview of statistical methods for genome-wide association studies (GWAS) in animals, plants, and humans. The simplest form of GWAS, a marker-by-marker analysis, is illustrated with a simple example. The problem of selecting a significance threshold that accounts for the large amount of multiple testing that occurs in GWAS is discussed. Population structure causes false positive associations in GWAS if not accounted for, and methods to deal with this are presented. Methodology for more complex models for GWAS, including haplotype-based approaches, accounting for identical by descent versus identical by state, and fitting all markers simultaneously are described and illustrated with examples.

Key words

GWAS Population structure Multiple testing 


  1. 1.
    Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69:1–14PubMedCrossRefGoogle Scholar
  2. 2.
    Luo ZW (1998) Linkage disequilibrium in a two-locus model. Heredity 80:198–208PubMedCrossRefGoogle Scholar
  3. 3.
    Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971PubMedGoogle Scholar
  4. 4.
    Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32:2227–2234CrossRefGoogle Scholar
  5. 5.
    Fernando RL, Nettleton D, Southey BR, Dekkers JCM, Rothschild MF et al (2004) Controlling the proportion of false positives in multiple dependent tests. Genetics 166:611–619PubMedCrossRefGoogle Scholar
  6. 6.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57(1):289–300Google Scholar
  7. 7.
    Weller JI, Song JZ, Heyen DW, Lewin HA, Ron M (1998) A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150:1699–1706PubMedGoogle Scholar
  8. 8.
    Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B 64:479–498CrossRefGoogle Scholar
  9. 9.
    Pryce JE, Hayes BJ, Bolormaa S, Goddard ME (2011) Polymorphic regions affecting human height also control stature in cattle. Genetics 187(3):981–984PubMedCrossRefGoogle Scholar
  10. 10.
    Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000) Association mapping in structured populations. Am J Hum Genet 67:170–181PubMedCrossRefGoogle Scholar
  11. 11.
    Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–513PubMedGoogle Scholar
  12. 12.
    MacLeod IM, Hayes BJ, Savin KW, Chamberlain AJ, McPartlan HC, Goddard ME (2010) Power of a genome scan to detect and locate quantitative trait loci in cattle using dense single nucleotide polymorphisms. J Anim Breed Genet 127(2):133–142PubMedCrossRefGoogle Scholar
  13. 13.
    Hayes BJ, Goddard ME (2008) Technical note: prediction of breeding values using marker-derived relationship matrices. J Anim Sci 86(9):2089–2092PubMedCrossRefGoogle Scholar
  14. 14.
    Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190PubMedCrossRefGoogle Scholar
  15. 15.
    McVean G (2009) A genealogical interpretation of principal components analysis. PLoS Genet 5(10):e1000686PubMedCrossRefGoogle Scholar
  16. 16.
    Daetwyler HD, Kemper KE, van der Werf JH, Hayes BJ (2012) Components of the accuracy of genomic prediction in a multi-breed sheep population. J Anim Sci 2012 May 14 [Epub ahead of print]Google Scholar
  17. 17.
    Gilmour AR, Gogel BJ, Cullis BR, Welham SJ, Thompson R (2006) ASReml user guide release 2.0. VSN International, Hemel Hempstead, UKGoogle Scholar
  18. 18.
    Pryce JE, Bolormaa S, Chamberlain AJ, Bowman PJ, Savin K, Goddard ME, Hayes BJ (2010) A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J Dairy Sci 93(7):3331–3345PubMedCrossRefGoogle Scholar
  19. 19.
    Meuwissen THE, Goddard ME (2001) Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 33:605–634PubMedCrossRefGoogle Scholar
  20. 20.
    Grapes L, Dekkers JC, Rothschild MF, Fernando RL (2004) Genetics 166:1561PubMedCrossRefGoogle Scholar
  21. 21.
    Grapes L, Firat MZ, Dekkers JC, Rothschild MF, Fernando RL (2006) Genetics 172:1955PubMedCrossRefGoogle Scholar
  22. 22.
    Zhao HH, Fernando RL, Dekkers JCM (2007) Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics 175(1975–1986):27Google Scholar
  23. 23.
    Hayes BJ, Chamberlain AC, McPartlan H, McLeod I, Sethuraman L, Goddard ME (2007) Accuracy of marker assisted selection with single markers and marker haplotypes in cattle. Genet Res 89:215–220PubMedCrossRefGoogle Scholar
  24. 24.
    Calus MP, Meuwissen TH, de Roos AP, Veerkamp RF (2008) Accuracy of genomic selection using different methods to define haplotypes. Genetics 178(1):553–561PubMedCrossRefGoogle Scholar
  25. 25.
    Browning SR, Thompson EA (2012) Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 190(4):1521–1531PubMedCrossRefGoogle Scholar
  26. 26.
    Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden PA, Heath AC, Martin NG, Montgomery GW, Weedon MN, Loos RJ, Frayling TM, McCarthy MI, Hirschhorn JN, Goddard ME, Visscher PM (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44(4):369–375, S1–3PubMedCrossRefGoogle Scholar
  27. 27.
    Meuwissen THE, Hayes B, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–182933PubMedGoogle Scholar
  28. 28.
    Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME (2009) Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet Res (Camb) 91(5):307–311CrossRefGoogle Scholar
  29. 29.
    Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12:186PubMedCrossRefGoogle Scholar
  30. 30.
    Veerkamp RF, Verbyla KL, Mulder HA, Calus MP (2010) Simultaneous QTL detection and genomic breeding value estimation using high density SNP chips. BMC Proc 4(Suppl 1):S9PubMedCrossRefGoogle Scholar
  31. 31.
    Peters SO, Kizilkaya K, Garrick DJ, Fernando RL, Reecy JM, Weaber RL, Silver GA, Thomas MG (2012) Bayesian genome wide association analyses of growth and yearling ultrasound measures of carcass traits in Brangus heifers. J Anim Sci 2012 Jun 4. [Epub ahead of print]Google Scholar
  32. 32.
    Zeng J, Pszczola M, Wolc A, Strabel T, Fernando RL, Garrick DJ, Dekkers JC (2012) Genomic breeding value prediction and QTL mapping of QTLMAS2011 data using Bayesian and GBLUP methods. BMC Proc 6(Suppl 2):S7PubMedCrossRefGoogle Scholar
  33. 33.
    Kizilkaya K, Tait RG, Garrick DJ, Fernando RL, Reecy JM (2011) Whole genome analysis of infectious bovine keratoconjunctivitis in Angus cattle using Bayesian threshold models. BMC Proc 5(Suppl 4):S22PubMedCrossRefGoogle Scholar
  34. 34.
    Sun X, Habier D, Fernando RL, Garrick DJ, Dekkers JC (2011) Genomic breeding value prediction and QTL mapping of QTLMAS2010 data using Bayesian methods. BMC Proc 5(Suppl 3):S13PubMedCrossRefGoogle Scholar
  35. 35.
    Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, Mason BA, Goddard ME (2012) Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci 95(7):4114–4129PubMedCrossRefGoogle Scholar
  36. 36.
    Meuwissen TH, Goddard ME (2004) Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol 36(3):261–279PubMedCrossRefGoogle Scholar
  37. 37.
    Hill WG, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231CrossRefGoogle Scholar
  38. 38.
    Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI et al (2008) Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 40:584–591PubMedCrossRefGoogle Scholar
  39. 39.
    Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV et al (2008) Many sequence variants affecting diversity of adult human height. Nat Genet 40:609–615PubMedCrossRefGoogle Scholar
  40. 40.
    Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM et al (2008) Genome wide association study identifies 20 loci that influence human height. Nat Genet 39:1245–1250CrossRefGoogle Scholar
  41. 41.
    Kim J-J, Lee H-I, Park T, Kim K, Lee J-E et al (2010) Identification of 15 loci influencing height in a Korean population. J Hum Genet 55:27–31PubMedCrossRefGoogle Scholar
  42. 42.
    Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM et al (2007) Recent human effective population size estimated from linkage disequilibrium. Genome Res 17:520–526PubMedCrossRefGoogle Scholar
  43. 43.
    Bovine Hapmap Consortium (2009) Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 24:528–532CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  • Ben Hayes
    • 1
    • 2
  1. 1.Biosciences Research Division, Department of Primary IndustriesBundooraAustralia
  2. 2.La Trobe UniversityBundooraAustralia

Personalised recommendations