Skip to main content

Imputation-Based Local Ancestry Inference in Admixed Populations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5542))

Abstract

Accurate inference of local ancestry from whole-genome genetic variation data is critical for understanding the history of admixed human populations and detecting SNPs associated with disease via admixture mapping. Although several existing methods achieve high accuracy when inferring local ancestry for individuals resulting from the admixture of genetically distant ancestral populations (e.g., African-Americans), ancestry inference in the case when ancestral populations are closely related remains challenging. Surprisingly, methods based on the analysis of allele frequencies at unlinked SNP loci currently outperform methods based on haplotype analysis, despite the latter methods seemingly receiving more detailed information about the genetic makeup of ancestral populations.

In this paper we propose a novel method for imputation-based local ancestry inference that exploits ancestral haplotype information more effectively than previous haplotype-based methods. Our method uses the ancestral haplotypes to impute genotypes at all typed SNP loci (temporarily marking each SNP genotype as missing) under each possible local ancestry. We then assign to each locus the local ancestry that yields the highest imputation accuracy, as estimated within a neighborhood of the locus. Experiments on simulated data show that imputation-based ancestry assignment is competitive with best existing methods in the case of distant ancestral populations, and yields a significant improvement for closely related ancestral populations. Further demonstrating the synergy between imputation and ancestry inference, we also give results showing that the accuracy of untyped SNP genotype imputation in admixed individuals improves significantly when using estimates of local ancestry. The open source C++ code of our method, released under the GNU General Public Licence, is available for download at http://dna.engr.uconn.edu/software/GEDI-ADMX/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 164–171 (1970)

    Article  Google Scholar 

  2. The Wellcome Trust Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

    Google Scholar 

  3. Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Mach. Learn. 29(2-3), 245–273 (1997)

    Article  Google Scholar 

  4. Kennedy, J., Măndoiu, I.I., Paşaniuc, B.: Genotype error detection using hidden markov models of haplotype diversity. Journal of Computational Biology 15(9), 1155–1171 (2008)

    Article  CAS  PubMed  Google Scholar 

  5. Kennedy, J., Paşaniuc, B., Măndoiu, I.I.: GEDI: Genotype error detection and imputation using hidden markov models of haplotype diversity (manuscript) (in preparation), http://dna.engr.uconn.edu/software/gedi/

  6. Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12, 1243–1260 (2005)

    Article  CAS  PubMed  Google Scholar 

  7. Li, Y., Abecasis, G.R.: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. American Journal of Human Genetics 79, 2290 (2006)

    Google Scholar 

  8. Marchini, J., Spencer, C., Teo, Y.Y., Donnelly, P.: A bayesian hierarchical mixture model for genotype calling in a multi-cohort study (2007) (in preparation)

    Google Scholar 

  9. Paşaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations (under review)

    Google Scholar 

  10. Parra, E.J., Marcini, A., Akey, J., Martinson, J., Batzer, M.A., Cooper, R., Forrester, T., Allison, D.B., Deka, R., Ferrell, R.E., et al.: Estimating african american admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63(6), 1839–1851 (1998)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: Phasing genotypes using a hidden Markov model. In: Măndoiu, I.I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, pp. 355–372. Wiley, Chichester (2008)

    Google Scholar 

  12. Reich, D., Patterson, N.: Will admixture mapping work to find disease genes? Philos. Trans. R Soc. Lond. B Biol. Sci. 360, 1605–1607 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sankararaman, S., Kimmel, G., Halperin, E., Jordan, M.I.: On the inference of ancestries in admixed populations. Genome Research (18), 668–675 (2008)

    Google Scholar 

  14. Sankararaman, S., Sridhar, S., Kimmel, G., Halperin, E.: Estimating local ancestry in admixed populations. American Journal of Human Genetics 8(2), 290–303 (2008)

    Article  Google Scholar 

  15. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629–644 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Schwartz, R.: Algorithms for association study design using a generalized model of haplotype conservation. In: Proc. CSB, pp. 90–97 (2004)

    Google Scholar 

  17. Smith, M.W., Patterson, N., Lautenberger, J.A., Truelove, A.L., McDonald, G.J., Waliszewska, A., Kessing, B.D., Malasky, M.J., Scafe, C., Le, E., et al.: A high-density admixture map for disease gene discovery in african americans. Am. J. Hum. Genet. 74(5), 1001–1013 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sundquist, A., Fratkin, E., Do, C.B., Batzoglou, S.: Effect of genetic divergence in identifying ancestral origin using HAPAA. Genome Research 18(4), 676–682 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Tang, H., Coram, M., Wang, P., Zhu, X., Risch, N.: Reconstructing genetic ancestry blocks in admixed individuals. Am. J. Hum. Genet. 79, 1–12 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Tang, H., Peng, J., Pei Wang, P., Risch, N.J.: Estimation of individual admixture: Analytical and study design considerations. Genetic Epidemiology 28, 289–301 (2005)

    Article  PubMed  Google Scholar 

  21. Tian, C., Hinds, D.A., Shigeta, R., Kittles, R., Ballinger, D.G., Seldin, M.F.: A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am. J. Hum. Genet. 79, 640–649 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. http://www.hapmap.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paşaniuc, B., Kennedy, J., Măndoiu, I. (2009). Imputation-Based Local Ancestry Inference in Admixed Populations. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds) Bioinformatics Research and Applications. ISBRA 2009. Lecture Notes in Computer Science(), vol 5542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01551-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01551-9_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01550-2

  • Online ISBN: 978-3-642-01551-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics