Estimating Allele Frequencies

Part of the Methods in Molecular Biology book series (MIMB, volume 1666)


Methods of estimating allele frequencies from data on unrelated and related individuals are described in this chapter. For samples of unrelated individuals with simple codominant markers, the natural estimators of allele frequencies can be used. For genetic data on related individuals, maximum likelihood estimation (MLE) can be applied to compute allele frequencies. Factors that influence allele frequencies in populations are also explained.

Key words

Allele Genotype Phenotype Natural estimator Unrelated individuals Related individuals Relatives Families Pedigree Founder Nonfounder Population genetics Disease research Hardy–Weinberg equilibrium Maximum likelihood estimation Log-likelihood Expectation-maximization algorithm ABO blood group Natural selection Mutation Migration Genetic drift Nonrandom mating 


  1. 1.
    Ott J (1992) Strategies for characterizing highly polymorphic markers in human gene mapping. Am J Hum Genet 51:283–290PubMedPubMedCentralGoogle Scholar
  2. 2.
    Lockwood JR, Roeder K, Devlin B (2001) A Bayesian hierarchical model for allele frequencies. Genet Epidemiol 20:17–33CrossRefPubMedGoogle Scholar
  3. 3.
    Mandal DM, Sorant AJ, Atwood LD, Wilson AF, Bailey-Wilson JE (2006) Allele frequency misspecification: effect on power and type I error of model-dependent linkage analysis of quantitative traits under random ascertainment. BMC Genet 7:21CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM (2004) Design and analysis of admixture mapping studies. Am J Hum Genet 74:965–978CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Montana G, Pritchard JK (2004) Statistical tests for admixture mapping with case-control and cases-only data. Am J Hum Genet 75:771–789CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Ceppellini R, Siniscalco M, Smith CA (1955) The estimation of gene frequencies in a random-mating population. Ann Hum Genet 20:97–115CrossRefPubMedGoogle Scholar
  7. 7.
    Smith CA (1957) Counting methods in genetical statistics. Ann Hum Genet 21:254–276CrossRefPubMedGoogle Scholar
  8. 8.
    Boehnke M (1991) Allele frequency estimation from data on relatives. Am J Hum Genet 48:22–25PubMedPubMedCentralGoogle Scholar
  9. 9.
    McPeek MS, Wu X, Ober C (2004) Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60:359–367CrossRefPubMedGoogle Scholar
  10. 10.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via Em algorithm. J R Stat Soc Series B Stat Methodol 39:1–38Google Scholar
  11. 11.
    Elston RC, Stewart J (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21:523–542CrossRefPubMedGoogle Scholar
  12. 12.
    Lange K, Boehnke M (1983) Extensions to pedigree analysis. V. Optimal calculation of Mendelian likelihoods. Hum Hered 33:291–301CrossRefPubMedGoogle Scholar
  13. 13.
    Lange K, Weeks D, Boehnke M (1988) Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet Epidemiol 5:471–472CrossRefPubMedGoogle Scholar
  14. 14.
    Elston RC, Gray-McGuire C (2004) A review of the 'Statistical Analysis for Genetic Epidemiology' (S.A.G.E.) software package. Hum Genomics 1:456–459CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Broman KW (2001) Estimation of allele frequencies with data on sibships. Genet Epidemiol 20:307–315CrossRefPubMedGoogle Scholar
  16. 16.
    Guo CY, DeStefano AL, Lunetta KL, Dupuis J, Cupples LA (2005) Expectation maximization algorithm based haplotype relative risk (EM-HRR): test of linkage disequilibrium using incomplete case-parents trios. Hum Hered 59:125–135CrossRefPubMedGoogle Scholar
  17. 17.
    Allen AS, Satten GA (2007) Inference on haplotype/disease association using parent-affected-child data: the projection conditional on parental haplotypes method. Genet Epidemiol 31:211–223CrossRefPubMedGoogle Scholar
  18. 18.
    Boyles AL, Scott WK, Martin ER et al (2005) Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing. Hum Hered 59:220–227CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Bergemann TL, Huang Z (2009) A new method to account for missing data in case-parent triad studies. Hum Hered 68:268–277CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Burrell AS, Disotell TR (2009) Panmixia postponed: ancestry-related assortative mating in contemporary human populations. Genome Biol 10:245CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Torche F (2010) Educational assortative mating and economic inequality: a comparative analysis of three Latin American countries. Demography 47:481–502CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Sebro R, Hoffman TJ, Lange C, Rogus JJ, Risch NJ (2010) Testing for non-random mating: evidence for ancestry-related assortative mating in the Framingham heart study. Genet Epidemiol 34:674–679CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J hum Genet 81:559–575CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4:7CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    S.A.G.E. (2016) Statistical analysis for genetic Epidemiology, Release 6.4.:

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Arthritis and Clinical Immunology Research ProgramOklahoma Medical Research FoundationOklahoma CityUSA

Personalised recommendations