Imputation-Based Local Ancestry Inference in Admixed Populations
Accurate inference of local ancestry from whole-genome genetic variation data is critical for understanding the history of admixed human populations and detecting SNPs associated with disease via admixture mapping. Although several existing methods achieve high accuracy when inferring local ancestry for individuals resulting from the admixture of genetically distant ancestral populations (e.g., African-Americans), ancestry inference in the case when ancestral populations are closely related remains challenging. Surprisingly, methods based on the analysis of allele frequencies at unlinked SNP loci currently outperform methods based on haplotype analysis, despite the latter methods seemingly receiving more detailed information about the genetic makeup of ancestral populations.
In this paper we propose a novel method for imputation-based local ancestry inference that exploits ancestral haplotype information more effectively than previous haplotype-based methods. Our method uses the ancestral haplotypes to impute genotypes at all typed SNP loci (temporarily marking each SNP genotype as missing) under each possible local ancestry. We then assign to each locus the local ancestry that yields the highest imputation accuracy, as estimated within a neighborhood of the locus. Experiments on simulated data show that imputation-based ancestry assignment is competitive with best existing methods in the case of distant ancestral populations, and yields a significant improvement for closely related ancestral populations. Further demonstrating the synergy between imputation and ancestry inference, we also give results showing that the accuracy of untyped SNP genotype imputation in admixed individuals improves significantly when using estimates of local ancestry. The open source C++ code of our method, released under the GNU General Public Licence, is available for download at http://dna.engr.uconn.edu/software/GEDI-ADMX/.
Unable to display preview. Download preview PDF.
- 2.The Wellcome Trust Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)Google Scholar
- 5.Kennedy, J., Paşaniuc, B., Măndoiu, I.I.: GEDI: Genotype error detection and imputation using hidden markov models of haplotype diversity (manuscript) (in preparation), http://dna.engr.uconn.edu/software/gedi/
- 7.Li, Y., Abecasis, G.R.: Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. American Journal of Human Genetics 79, 2290 (2006)Google Scholar
- 8.Marchini, J., Spencer, C., Teo, Y.Y., Donnelly, P.: A bayesian hierarchical mixture model for genotype calling in a multi-cohort study (2007) (in preparation)Google Scholar
- 9.Paşaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations (under review)Google Scholar
- 10.Parra, E.J., Marcini, A., Akey, J., Martinson, J., Batzer, M.A., Cooper, R., Forrester, T., Allison, D.B., Deka, R., Ferrell, R.E., et al.: Estimating african american admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63(6), 1839–1851 (1998)CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: Phasing genotypes using a hidden Markov model. In: Măndoiu, I.I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, pp. 355–372. Wiley, Chichester (2008)Google Scholar
- 13.Sankararaman, S., Kimmel, G., Halperin, E., Jordan, M.I.: On the inference of ancestries in admixed populations. Genome Research (18), 668–675 (2008)Google Scholar
- 16.Schwartz, R.: Algorithms for association study design using a generalized model of haplotype conservation. In: Proc. CSB, pp. 90–97 (2004)Google Scholar
- 17.Smith, M.W., Patterson, N., Lautenberger, J.A., Truelove, A.L., McDonald, G.J., Waliszewska, A., Kessing, B.D., Malasky, M.J., Scafe, C., Le, E., et al.: A high-density admixture map for disease gene discovery in african americans. Am. J. Hum. Genet. 74(5), 1001–1013 (2004)CrossRefPubMedPubMedCentralGoogle Scholar