Testing Departure from Hardy-Weinberg Proportions

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1666)

Abstract

The Hardy-Weinberg principle, one of the most important principles in population genetics, was originally developed for the study of allele frequency changes in a population over generations. It is now, however, widely used in studies of human diseases to detect inbreeding, population stratification, and genotyping errors. For assessment of deviation from Hardy-Weinberg proportions in data, the most popular approaches include the asymptotic Pearson’s chi-squared goodness-of-fit test and the exact test. Pearson’s chi-squared goodness-of-fit test is simple and straightforward, but is very sensitive to a small sample size or rare allele frequency. The exact test of Hardy-Weinberg proportions is preferable in these situations. The exact test can be performed through complete enumeration of heterozygote genotypes or on the basis of the Markov chain Monte Carlo procedure. In this chapter, we describe the Hardy-Weinberg principle and the commonly used Hardy-Weinberg proportion tests and their applications, and we demonstrate how the chi-squared test and exact test of Hardy-Weinberg proportions can be performed step-by-step using the popular software programs SAS, R, and PLINK, which have been widely used in genetic association studies, along with numerical examples. We also discuss approaches for testing Hardy-Weinberg proportions in case–control study designs that are better than traditional approaches for testing Hardy-Weinberg proportions in controls only. Finally, we note that deviation from the Hardy-Weinberg proportions in affected individuals can provide evidence for an association between genetic variants and diseases.

Key words

Hardy-Weinberg proportions Exact test Pearson’s chi-squared goodness-of-fit test Genetic association study Quality control Genotyping error SAS/genetics PLINK Case–control genetic association study Population stratification 

References

  1. 1.
    Castle WE (1903) The laws of Galton and Mendel and some laws governing race improvement by selection. Proc Amer Acad Arts Sci 35:233–242Google Scholar
  2. 2.
    Hardy GH (1908) Mendelian proportions in a mixed population. Science 28(706):49–50CrossRefPubMedGoogle Scholar
  3. 3.
    Weinberg W (1908) On the demonstration of heredity in man. In: Boyer SH (ed) Papers on human genetics. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  4. 4.
    Crow JF (1988) Eighty years ago: the beginnings of population genetics. Genetics 119(3):473–476PubMedPubMedCentralGoogle Scholar
  5. 5.
    Weir BS (1996) Genetic data analysis II: methods for discrete population genetic data. Sinauer Associates, Sunderland, MAGoogle Scholar
  6. 6.
    Cockerham CC (1969) Variance of gene frequencies. Evolution 23:72–84CrossRefPubMedGoogle Scholar
  7. 7.
    Wright S (1951) The genetical structure of populations. Ann Eugen 15:323–354CrossRefPubMedGoogle Scholar
  8. 8.
    Price GR (1971) Extension of the Hardy-Weinberg law to assortative mating. Ann Hum Genet 34(4):455–458CrossRefPubMedGoogle Scholar
  9. 9.
    Shockley W (1973) Deviations from Hardy-Weinberg frequencies caused by assortative mating in hybrid populations. Proc Natl Acad Sci U S A 70(3):732–736CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Templeton A (2006) Population genetics and microevolutionary theory. Wiley, Hoboken, NJCrossRefGoogle Scholar
  11. 11.
    Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case-control association studies. PLoS Genet 1(3):e32CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Weinberg CR, Morris RW (2003) Invited commentary: testing for Hardy-Weinberg disequilibrium using a genome single-nucleotide polymorphism scan based on cases only. Am J Epidemiol 158(5):401–403CrossRefPubMedGoogle Scholar
  13. 13.
    Deng HW, Chen WM, Recker RR (2000) QTL fine mapping by measuring and testing for Hardy-Weinberg and linkage disequilibrium at a series of linked marker loci in extreme samples of populations. Am J Hum Genet 66(3):1027–1045CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Deng HW, Chen WM, Recker RR (2001) Population admixture: detection by Hardy-Weinberg test and its quantitative effects on linkage-disequilibrium methods for localizing genes underlying complex traits. Genetics 157(2):885–897PubMedPubMedCentralGoogle Scholar
  15. 15.
    Grover VK, Cole DE, Hamilton DC (2010) Attributing Hardy-Weinberg disequilibrium to population stratification and genetic association in case-control studies. Ann Hum Genet 74(1):77–87CrossRefPubMedGoogle Scholar
  16. 16.
    Ryckman K, Williams SM (2008) Calculation and use of the Hardy-Weinberg model in association studies. Curr Protoc Hum Genet Chapter 1:Unit 1.18Google Scholar
  17. 17.
    Wigginton JE, Cutler DJ, Abecasis GR (2005) A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet 76(5):887–893CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Attia J, Thakkinstian A, McElduff P, Milne E, Dawson S, Scott RJ, Klerk N, Armstrong B, Thompson J (2010) Detecting genotyping error using measures of degree of Hardy-Weinberg disequilibrium. Stat Appl Genet Mol Biol 9 (1):ArticleGoogle Scholar
  19. 19.
    Gomes I, Collins A, Lonjou C, Thomas NS, Wilkinson J, Watson M, Morton N (1999) Hardy-Weinberg quality control. Ann Hum Genet 63(Pt 6):535–538CrossRefPubMedGoogle Scholar
  20. 20.
    Graffelman J, Camarena JM (2008) Graphical tests for Hardy-Weinberg equilibrium based on the ternary plot. Hum Hered 65(2):77–84CrossRefPubMedGoogle Scholar
  21. 21.
    Hosking L, Lumsden S, Lewis K, Yeo A, McCarthy L, Bansal A, Riley J, Purvis I, CF X (2004) Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur J Hum Genet 12(5):395–399CrossRefPubMedGoogle Scholar
  22. 22.
    Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, Boehm F, Caporaso NE, Cornelis MC, Edenberg HJ, Gabriel SB, Harris EL, Hu FB, Jacobs KB, Kraft P, Landi MT, Lumley T, Manolio TA, McHugh C, Painter I, Paschall J, Rice JP, Rice KM, Zheng X, Weir BS (2010) Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol 4(6):591–602CrossRefGoogle Scholar
  23. 23.
    Li M, Li C (2008) Assessing departure from Hardy-Weinberg equilibrium in the presence of disease association. Genet Epidemiol 32(7):589–599CrossRefPubMedGoogle Scholar
  24. 24.
    Schaid DJ, Batzler AJ, Jenkins GD, Hildebrandt MA (2006) Exact tests of Hardy-Weinberg equilibrium and homogeneity of disequilibrium across strata. Am J Hum Genet 79(6):1071–1080CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Tapper W, Collins A, Gibson J, Maniatis N, Ennis S, Morton NE (2005) A map of the human genome in linkage disequilibrium units. Proc Natl Acad Sci U S A 102(33):11835–11839CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Wang J, Shete S (2010) Using both cases and controls for testing Hardy-Weinberg proportions in a genetic association study. Hum Hered 69(3):212–218CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Weale ME (2010) Quality control for genome-wide association studies. Methods Mol Biol 628:341–372CrossRefPubMedGoogle Scholar
  28. 28.
    Wittke-Thompson JK, Pluzhnikov A, Cox NJ (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet 76(6):967–986CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nat Rev Genet 6(11):847–859CrossRefPubMedGoogle Scholar
  30. 30.
    Akey JM, Zhang K, Xiong M, Doris P, Jin L (2001) The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am J Hum Genet 68(6):1447–1456CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Weiss ST, Silverman EK, Palmer LJ (2001) Case-control association studies in pharmacogenetics. Pharmacogenomics J 1(3):157–158CrossRefPubMedGoogle Scholar
  32. 32.
    Xu J, Turner A, Little J, Bleecker ER, Meyers DA (2002) Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Hum Genet 111(6):573–574CrossRefPubMedGoogle Scholar
  33. 33.
    Wang J, Shete S (2008) A test for genetic association that incorporates information about deviation from Hardy-Weinberg proportions in cases. Am J Hum Genet 83(1):53–63CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Cox DG, Kraft P (2006) Quantification of the power of Hardy-Weinberg equilibrium testing to detect genotyping error. Hum Hered 61(1):10–14CrossRefPubMedGoogle Scholar
  35. 35.
    Fardo DW, Becker KD, Bertram L, Tanzi RE, Lange C (2009) Recovering unused information in genome-wide association studies: the benefit of analyzing SNPs out of Hardy-Weinberg equilibrium. Eur J Hum Genet 17(12):1676–1682CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Leal SM (2005) Detection of genotyping errors and pseudo-SNPs via deviations from Hardy-Weinberg equilibrium. Genet Epidemiol 29(3):204–214CrossRefPubMedGoogle Scholar
  37. 37.
    Teo YY, Fry AE, Clark TG, Tai ES, Seielstad M (2007) On the usage of HWE for identifying genotyping errors. Ann Hum Genet 71(Pt 5):701–703CrossRefPubMedGoogle Scholar
  38. 38.
    Zou GY, Donner A (2006) The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched case-control data: a cautionary note. Ann Hum Genet 70(Pt 6):923–933PubMedGoogle Scholar
  39. 39.
    Salanti G, Amountza G, Ntzani EE, Ioannidis JP (2005) Hardy-Weinberg equilibrium in genetic association studies: an empirical evaluation of reporting, deviations, and power. Eur J Hum Genet 13(7):840–848CrossRefPubMedGoogle Scholar
  40. 40.
    Feder JN, Gnirke A, Thomas W, Tsuchihashi Z, Ruddy DA, Basava A, Dormishian F et al (1996) A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nat Genet 13(4):399–408CrossRefPubMedGoogle Scholar
  41. 41.
    Jiang R, Dong J, Wang D, Sun FZ (2001) Fine-scale mapping using Hardy-Weinberg disequilibrium. Ann Hum Genet 65(Pt 2):207–219CrossRefPubMedGoogle Scholar
  42. 42.
    Nielsen DM, Ehm MG, Weir BS (1998) Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am J Hum Genet 63(5):1531–1540CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Lee WC (2003) Searching for disease-susceptibility loci by testing for Hardy-Weinberg disequilibrium in a gene bank of affected individuals. Am J Epidemiol 158(5):397–400CrossRefPubMedGoogle Scholar
  44. 44.
    Song K, Elston RC (2006) A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies. Stat Med 25(1):105–126CrossRefPubMedGoogle Scholar
  45. 45.
    Won S, Elston RC (2008) The power of independent types of genetic information to detect association in a case-control study design. Genet Epidemiol 32(8):731–756CrossRefPubMedGoogle Scholar
  46. 46.
    Hoh J, Wille A, Ott J (2001) Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res 11(12):2115–2119CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Yates F (1934) Contingency tables involving small numbers and the X2 test. J Roy Stat Soc Suppl 1:217–235CrossRefGoogle Scholar
  48. 48.
    Fisher RA (1935) The logic of inductive inference. J Roy Stat Soc 98:39–54CrossRefGoogle Scholar
  49. 49.
    Emigh T (1954) A comparison of tests for Hardy-Weinberg equilibrium. Biometrics 36:627–642CrossRefGoogle Scholar
  50. 50.
    Haldane JBS (1954) An exact test for randomness of mating. J Genet 52:631–635CrossRefGoogle Scholar
  51. 51.
    Engels WR (2009) Exact tests for Hardy-Weinberg proportions. Genetics 183(4):1431–1441CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Levene H (1949) On a matching problem arising in genetics. Ann Math Stat 20:91–94CrossRefGoogle Scholar
  53. 53.
    Louis EJ, Dempster ER (1987) An exact test for Hardy-Weinberg and multiple alleles. Biometrics 43(4):805–811CrossRefPubMedGoogle Scholar
  54. 54.
    Guo SW, Thompson EA (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48(2):361–372CrossRefPubMedGoogle Scholar
  55. 55.
    Aoki S (2003) Network algorithm for the exact test of Hardy-Weinberg proportion for multiple alleles. Biom J 45(4):471–490CrossRefGoogle Scholar
  56. 56.
    Maurer HP, Melchinger AE, Frisch M (2007) An incomplete enumeration algorithm for an exact test of Hardy-Weinberg proportions with multiple alleles. Theor Appl Genet 115(3):393–398CrossRefPubMedGoogle Scholar
  57. 57.
    Huber M, Chen Y, Dinwoodie I, Dobra A, Nicholas M (2006) Monte Carlo algorithms for Hardy-Weinberg proportions. Biometrics 62(1):49–53CrossRefPubMedGoogle Scholar
  58. 58.
    Yuan A, Bonney GE (2003) Exact test of Hardy-Weinberg equilibrium by Markov chain Monte Carlo. Math Med Biol 20(4):327–340CrossRefPubMedGoogle Scholar
  59. 59.
    Lazzeroni LC, Lange K (1997) Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables. Ann Stat 25(1):138–168CrossRefGoogle Scholar
  60. 60.
    Hernandez JL, Weir BS (1989) A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 45(1):53–70CrossRefPubMedGoogle Scholar
  61. 61.
    Maiste PJ, Weir BS (2004) Optimal testing strategies for large, sparse multinomial models. Comput Stat Data An 46(3):605–620CrossRefGoogle Scholar
  62. 62.
    Montoya-Delgado LE, Irony TZ, de B Pereira CA, Whittle MR (2001) An unconditional exact test for the Hardy-Weinberg equilibrium law: sample-space ordering using the Bayes factor. Genetics 158(2):875–883Google Scholar
  63. 63.
    Shoemaker J, Painter I, Weir BS (1998) A Bayesian characterization of Hardy-Weinberg disequilibrium. Genetics 149(4):2079–2088PubMedPubMedCentralGoogle Scholar
  64. 64.
    Wakefield J (2010) Bayesian methods for examining Hardy-Weinberg equilibrium. Biometrics 66(1):257–265CrossRefPubMedGoogle Scholar
  65. 65.
    Wellek S, Goddard KA, Ziegler A (2010) A confidence-limit-based approach to the assessment of Hardy-Weinberg equilibrium. Biom J 52(2):253–270PubMedGoogle Scholar
  66. 66.
    Goddard KA, Ziegler A, Wellek S (2009) Adapting the logical basis of tests for Hardy-Weinberg Equilibrium to the real needs of association studies in human and medical genetics. Genet Epidemiol 33(7):569–580CrossRefPubMedGoogle Scholar
  67. 67.
    SAS Institute Inc. (2008) SAS/Genetics™ 92 user’s guide. SAS Institute Inc., Cary, NCGoogle Scholar
  68. 68.
    R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  69. 69.
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575CrossRefPubMedPubMedCentralGoogle Scholar
  70. 70.
    Purcell S (2009) PLINK (v1.07). http://pngu.mgh.harvard.edu/purcell/plink/
  71. 71.
    Wang J, Yu R, Shete S (2014) X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet Epidemiol 38(6):483–493CrossRefPubMedPubMedCentralGoogle Scholar
  72. 72.
    Clayton D (2008) Testing for association on the X chromosome. Biostatistics 9(4):593–600CrossRefPubMedPubMedCentralGoogle Scholar
  73. 73.
    Zheng G, Joo J, Zhang C, Geller NL (2007) Testing association for markers on the X chromosome. Genet Epidemiol 31(8):834–843CrossRefPubMedGoogle Scholar
  74. 74.
    Graffelman J, Weir BS (2016) Testing for Hardy-Weinberg equilibrium at biallelic genetic markers on the X chromosome. Heredity (Edinb) 116(6):558–568CrossRefGoogle Scholar
  75. 75.
    Warnes G, Gorjanc G, Leisch F, Man M (2013) genetics: Population Genetics. R package version 1.3.8.1. https://CRAN.R-project.org/package=genetics
  76. 76.
    Graffelman J (2015) Exploring diallelic genetic markers: the HardyWeinberg package. J Stat Softw 64(3):1–22CrossRefGoogle Scholar
  77. 77.
    Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7):499–511CrossRefPubMedGoogle Scholar
  78. 78.
    Shriner D (2013) Impact of Hardy-Weinberg disequilibrium on post-imputation quality control. Hum Genet 132(9):1073–1075CrossRefPubMedGoogle Scholar
  79. 79.
    Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6):e1000529CrossRefPubMedPubMedCentralGoogle Scholar
  80. 80.
    Roshyara NR, Kirsten H, Horn K, Ahnert P, Scholz M (2014) Impact of pre-imputation SNP-filtering on genotype imputation results. BMC Genet 15:88CrossRefPubMedPubMedCentralGoogle Scholar
  81. 81.
    Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT (2010) Data quality control in genetic case-control association studies. Nat Protoc 5(9):1564–1573CrossRefPubMedPubMedCentralGoogle Scholar
  82. 82.
    Fuchsberger C, Abecasis GR, Hinds DA (2015) minimac2: faster genotype imputation. Bioinformatics 31(5):782–784CrossRefPubMedGoogle Scholar
  83. 83.
    Uh HW, Deelen J, Beekman M, Helmer Q, Rivadeneira F, Hottenga JJ, Boomsma DI, Hofman A, Uitterlinden AG, Slagboom PE, Bohringer S, Houwing-Duistermaat JJ (2012) How to deal with the early GWAS data when imputing and combining different arrays is necessary. Eur J Hum Genet 20(5):572–576CrossRefPubMedGoogle Scholar
  84. 84.
    Southam L, Panoutsopoulou K, Rayner NW, Chapman K, Durrant C, Ferreira T, Arden N, Carr A, Deloukas P, Doherty M, Loughlin J, McCaskie A, Ollier WE, Ralston S, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, Marchini J, Zeggini E (2011) The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur J Hum Genet 19(5):610–614CrossRefPubMedPubMedCentralGoogle Scholar
  85. 85.
    Porcu E, Sanna S, Fuchsberger C, Fritsche LG (2013) Genotype imputation in genome-wide association studies. Curr Protoc Hum Genet Chapter 1:Unit 1.25Google Scholar
  86. 86.
    Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8):955–959CrossRefPubMedPubMedCentralGoogle Scholar
  87. 87.
    Browning SR (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124(5):439–450CrossRefPubMedPubMedCentralGoogle Scholar
  88. 88.
    Yu C, Zhang S, Zhou C, Sile S (2009) A likelihood ratio test of population Hardy-Weinberg equilibrium for case-control studies. Genet Epidemiol 33(3):275–280CrossRefPubMedPubMedCentralGoogle Scholar
  89. 89.
    Taylor J, Tibshirani R (2006) A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics 7(2):167–181CrossRefPubMedGoogle Scholar
  90. 90.
    Wang J, Shete S (2009) Is the tail-strength measure more powerful in tests of genetic association? Response. Am J Hum Genet 84(2):298–300CrossRefPubMedCentralGoogle Scholar
  91. 91.
    Painter I (2013) GWASExactHW: exactHardy-Weinburg testing for Genome Wide Association Studies. R package version 1.01. http://CRAN.R-project.org/package=GWASExactHW
  92. 92.
    Maindonald JH and Johnson R (2016) hwde: Models and tests for departure from Hardy-Weinberg equilibrium and independence between loci. R package version 0.67. https://CRAN.R-project.org/package= hwde
  93. 93.
    Zhao JH (2007) gap: Genetic analysis package. J Stat Softw 23(8):1–18Google Scholar
  94. 94.
    Cardillo G (2007) HWtest: a routine to test if a locus is in Hardy Weinberg equilibrium (exact test). http://www.mathworks.com/matlabcentral/fileexchange/14425-hwtest
  95. 95.
    Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21(2):263–265CrossRefPubMedGoogle Scholar
  96. 96.
    Li B, Leal SM (2009) Deviations from hardy-weinberg equilibrium in parental and unaffected sibling genotype data. Hum Hered 67(2):104–115CrossRefPubMedGoogle Scholar
  97. 97.
    Lancaster HO (1961) Significance tests in discrete distributions. J Am Stat Assoc 56(294):223–234CrossRefGoogle Scholar
  98. 98.
    Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11(6):415–425CrossRefPubMedGoogle Scholar
  99. 99.
    Lee S, Abecasis GR, Boehnke M, Lin X (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95(1):5–23CrossRefPubMedPubMedCentralGoogle Scholar
  100. 100.
    Zhu X, Wang J, Peng B, Shete S (2016) Empirical estimation of sequencing error rates using smoothing splines. BMC Bioinformatics 17:177CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of Texas MD Anderson Cancer CenterHoustonUSA
  2. 2.Department of EpidemiologyUniversity of Texas MD Anderson Cancer CenterHoustonUSA

Personalised recommendations