Imputation-Based Local Ancestry Inference in Admixed Populations

  • Bogdan Paşaniuc
  • Justin Kennedy
  • Ion Măndoiu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5542)


Accurate inference of local ancestry from whole-genome genetic variation data is critical for understanding the history of admixed human populations and detecting SNPs associated with disease via admixture mapping. Although several existing methods achieve high accuracy when inferring local ancestry for individuals resulting from the admixture of genetically distant ancestral populations (e.g., African-Americans), ancestry inference in the case when ancestral populations are closely related remains challenging. Surprisingly, methods based on the analysis of allele frequencies at unlinked SNP loci currently outperform methods based on haplotype analysis, despite the latter methods seemingly receiving more detailed information about the genetic makeup of ancestral populations.

In this paper we propose a novel method for imputation-based local ancestry inference that exploits ancestral haplotype information more effectively than previous haplotype-based methods. Our method uses the ancestral haplotypes to impute genotypes at all typed SNP loci (temporarily marking each SNP genotype as missing) under each possible local ancestry. We then assign to each locus the local ancestry that yields the highest imputation accuracy, as estimated within a neighborhood of the locus. Experiments on simulated data show that imputation-based ancestry assignment is competitive with best existing methods in the case of distant ancestral populations, and yields a significant improvement for closely related ancestral populations. Further demonstrating the synergy between imputation and ancestry inference, we also give results showing that the accuracy of untyped SNP genotype imputation in admixed individuals improves significantly when using estimates of local ancestry. The open source C++ code of our method, released under the GNU General Public Licence, is available for download at


Hide Markov Model Ancestral Population Imputation Accuracy Admix Population Ancestral Haplotype 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 164–171 (1970)CrossRefGoogle Scholar
  2. 2.
    The Wellcome Trust Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)Google Scholar
  3. 3.
    Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Mach. Learn. 29(2-3), 245–273 (1997)CrossRefGoogle Scholar
  4. 4.
    Kennedy, J., Măndoiu, I.I., Paşaniuc, B.: Genotype error detection using hidden markov models of haplotype diversity. Journal of Computational Biology 15(9), 1155–1171 (2008)CrossRefPubMedGoogle Scholar
  5. 5.
    Kennedy, J., Paşaniuc, B., Măndoiu, I.I.: GEDI: Genotype error detection and imputation using hidden markov models of haplotype diversity (manuscript) (in preparation),
  6. 6.
    Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12, 1243–1260 (2005)CrossRefPubMedGoogle Scholar
  7. 7.
    Li, Y., Abecasis, G.R.: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. American Journal of Human Genetics 79, 2290 (2006)Google Scholar
  8. 8.
    Marchini, J., Spencer, C., Teo, Y.Y., Donnelly, P.: A bayesian hierarchical mixture model for genotype calling in a multi-cohort study (2007) (in preparation)Google Scholar
  9. 9.
    Paşaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations (under review)Google Scholar
  10. 10.
    Parra, E.J., Marcini, A., Akey, J., Martinson, J., Batzer, M.A., Cooper, R., Forrester, T., Allison, D.B., Deka, R., Ferrell, R.E., et al.: Estimating african american admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63(6), 1839–1851 (1998)CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: Phasing genotypes using a hidden Markov model. In: Măndoiu, I.I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, pp. 355–372. Wiley, Chichester (2008)Google Scholar
  12. 12.
    Reich, D., Patterson, N.: Will admixture mapping work to find disease genes? Philos. Trans. R Soc. Lond. B Biol. Sci. 360, 1605–1607 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Sankararaman, S., Kimmel, G., Halperin, E., Jordan, M.I.: On the inference of ancestries in admixed populations. Genome Research (18), 668–675 (2008)Google Scholar
  14. 14.
    Sankararaman, S., Sridhar, S., Kimmel, G., Halperin, E.: Estimating local ancestry in admixed populations. American Journal of Human Genetics 8(2), 290–303 (2008)CrossRefGoogle Scholar
  15. 15.
    Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629–644 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Schwartz, R.: Algorithms for association study design using a generalized model of haplotype conservation. In: Proc. CSB, pp. 90–97 (2004)Google Scholar
  17. 17.
    Smith, M.W., Patterson, N., Lautenberger, J.A., Truelove, A.L., McDonald, G.J., Waliszewska, A., Kessing, B.D., Malasky, M.J., Scafe, C., Le, E., et al.: A high-density admixture map for disease gene discovery in african americans. Am. J. Hum. Genet. 74(5), 1001–1013 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Sundquist, A., Fratkin, E., Do, C.B., Batzoglou, S.: Effect of genetic divergence in identifying ancestral origin using HAPAA. Genome Research 18(4), 676–682 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Tang, H., Coram, M., Wang, P., Zhu, X., Risch, N.: Reconstructing genetic ancestry blocks in admixed individuals. Am. J. Hum. Genet. 79, 1–12 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Tang, H., Peng, J., Pei Wang, P., Risch, N.J.: Estimation of individual admixture: Analytical and study design considerations. Genetic Epidemiology 28, 289–301 (2005)CrossRefPubMedGoogle Scholar
  21. 21.
    Tian, C., Hinds, D.A., Shigeta, R., Kittles, R., Ballinger, D.G., Seldin, M.F.: A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am. J. Hum. Genet. 79, 640–649 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Bogdan Paşaniuc
    • 1
  • Justin Kennedy
    • 2
  • Ion Măndoiu
    • 2
  1. 1.International Computer Science InstituteBerkeleyUSA
  2. 2.CSE DepartmentUniversity of ConnecticutStorrsUSA

Personalised recommendations