Abstract
Accurate inference of local ancestry from whole-genome genetic variation data is critical for understanding the history of admixed human populations and detecting SNPs associated with disease via admixture mapping. Although several existing methods achieve high accuracy when inferring local ancestry for individuals resulting from the admixture of genetically distant ancestral populations (e.g., African-Americans), ancestry inference in the case when ancestral populations are closely related remains challenging. Surprisingly, methods based on the analysis of allele frequencies at unlinked SNP loci currently outperform methods based on haplotype analysis, despite the latter methods seemingly receiving more detailed information about the genetic makeup of ancestral populations.
In this paper we propose a novel method for imputation-based local ancestry inference that exploits ancestral haplotype information more effectively than previous haplotype-based methods. Our method uses the ancestral haplotypes to impute genotypes at all typed SNP loci (temporarily marking each SNP genotype as missing) under each possible local ancestry. We then assign to each locus the local ancestry that yields the highest imputation accuracy, as estimated within a neighborhood of the locus. Experiments on simulated data show that imputation-based ancestry assignment is competitive with best existing methods in the case of distant ancestral populations, and yields a significant improvement for closely related ancestral populations. Further demonstrating the synergy between imputation and ancestry inference, we also give results showing that the accuracy of untyped SNP genotype imputation in admixed individuals improves significantly when using estimates of local ancestry. The open source C++ code of our method, released under the GNU General Public Licence, is available for download at http://dna.engr.uconn.edu/software/GEDI-ADMX/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 164–171 (1970)
The Wellcome Trust Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Mach. Learn. 29(2-3), 245–273 (1997)
Kennedy, J., Măndoiu, I.I., Paşaniuc, B.: Genotype error detection using hidden markov models of haplotype diversity. Journal of Computational Biology 15(9), 1155–1171 (2008)
Kennedy, J., Paşaniuc, B., Măndoiu, I.I.: GEDI: Genotype error detection and imputation using hidden markov models of haplotype diversity (manuscript) (in preparation), http://dna.engr.uconn.edu/software/gedi/
Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12, 1243–1260 (2005)
Li, Y., Abecasis, G.R.: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. American Journal of Human Genetics 79, 2290 (2006)
Marchini, J., Spencer, C., Teo, Y.Y., Donnelly, P.: A bayesian hierarchical mixture model for genotype calling in a multi-cohort study (2007) (in preparation)
Paşaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations (under review)
Parra, E.J., Marcini, A., Akey, J., Martinson, J., Batzer, M.A., Cooper, R., Forrester, T., Allison, D.B., Deka, R., Ferrell, R.E., et al.: Estimating african american admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63(6), 1839–1851 (1998)
Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: Phasing genotypes using a hidden Markov model. In: Măndoiu, I.I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, pp. 355–372. Wiley, Chichester (2008)
Reich, D., Patterson, N.: Will admixture mapping work to find disease genes? Philos. Trans. R Soc. Lond. B Biol. Sci. 360, 1605–1607 (2005)
Sankararaman, S., Kimmel, G., Halperin, E., Jordan, M.I.: On the inference of ancestries in admixed populations. Genome Research (18), 668–675 (2008)
Sankararaman, S., Sridhar, S., Kimmel, G., Halperin, E.: Estimating local ancestry in admixed populations. American Journal of Human Genetics 8(2), 290–303 (2008)
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629–644 (2006)
Schwartz, R.: Algorithms for association study design using a generalized model of haplotype conservation. In: Proc. CSB, pp. 90–97 (2004)
Smith, M.W., Patterson, N., Lautenberger, J.A., Truelove, A.L., McDonald, G.J., Waliszewska, A., Kessing, B.D., Malasky, M.J., Scafe, C., Le, E., et al.: A high-density admixture map for disease gene discovery in african americans. Am. J. Hum. Genet. 74(5), 1001–1013 (2004)
Sundquist, A., Fratkin, E., Do, C.B., Batzoglou, S.: Effect of genetic divergence in identifying ancestral origin using HAPAA. Genome Research 18(4), 676–682 (2008)
Tang, H., Coram, M., Wang, P., Zhu, X., Risch, N.: Reconstructing genetic ancestry blocks in admixed individuals. Am. J. Hum. Genet. 79, 1–12 (2006)
Tang, H., Peng, J., Pei Wang, P., Risch, N.J.: Estimation of individual admixture: Analytical and study design considerations. Genetic Epidemiology 28, 289–301 (2005)
Tian, C., Hinds, D.A., Shigeta, R., Kittles, R., Ballinger, D.G., Seldin, M.F.: A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am. J. Hum. Genet. 79, 640–649 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paşaniuc, B., Kennedy, J., Măndoiu, I. (2009). Imputation-Based Local Ancestry Inference in Admixed Populations. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds) Bioinformatics Research and Applications. ISBRA 2009. Lecture Notes in Computer Science(), vol 5542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01551-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-01551-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01550-2
Online ISBN: 978-3-642-01551-9
eBook Packages: Computer ScienceComputer Science (R0)