A Spatial-Aware Haplotype Copying Model with Applications to Genotype Imputation

  • Wen-Yun Yang
  • Farhad Hormozdiari
  • Eleazar Eskin
  • Bogdan Pasaniuc
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)


Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.


Hide Markov Model Nature Genetic Reference Panel Imputation Accuracy Genotype Imputation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gibbs, R.A., Belmont, J.W., Hardenbol, P., Willis, T.D., Yu, F., et al.: The international hapmap project. Nature 426(6968), 789–796 (2003)CrossRefGoogle Scholar
  2. 2.
    Consortium, G.P., Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)CrossRefGoogle Scholar
  3. 3.
    Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lande, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29(6), 229–232 (2001)CrossRefGoogle Scholar
  4. 4.
    Kruglyak, L.: Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics 22(2), 139–144 (1999)CrossRefGoogle Scholar
  5. 5.
    Lohmueller, K.E., Bustamante, C.D., Clark, A.G.: Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. Genetics 182(1), 217–231 (2009)CrossRefGoogle Scholar
  6. 6.
    Pool, J.E., Hellmann, I., Jensen, J.D., Nielsen, R.: Population genetic inference from genomic sequence variation. Genome Res. 20(3), 291–300 (2010)CrossRefGoogle Scholar
  7. 7.
    Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913 (2007)CrossRefGoogle Scholar
  8. 8.
    Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34(8), 816–834 (2010)CrossRefGoogle Scholar
  9. 9.
    de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37(11), 1217–1223 (2005)CrossRefGoogle Scholar
  10. 10.
    Howie, B.N., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6), e1000529 (2009)Google Scholar
  11. 11.
    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44(8), 955–959 (2012)CrossRefGoogle Scholar
  12. 12.
    Chung, C.C., Kanetsky, P.A., Wang, Z., Hildebrandt, M.A.T., Koster, R., Skotheim, R.I., Kratz, C.P., Turnbull, C., Cortessis, V.K., Bakken, A.C., Bishop, D.T., Cook, M.B., Erickson, R.L., Foss, S.D., Jacobs, K.B., Korde, L.A., Kraggerud, S.M., Lothe, R.A., Loud, J.T., Rahman, N., Skinner, E.C., Thomas, D.C., Wu, X., Yeager, M., Schumacher, F.R., Greene, M.H., Schwartz, S.M., McGlynn, K.A., Chanock, S.J., Nathanson, K.L.: Meta-analysis identifies four new loci associated with testicular germ cell tumor. Nature Genetics 45(6), 680–685 (2013)CrossRefGoogle Scholar
  13. 13.
    Savage, S.A., Mirabello, L., Wang, Z., Gastier-Foster, J.M., Gorlick, R., Khanna, C., Flanagan, A.M., Tirabosco, R., Andrulis, I.L., Wunder, J.S., Gokgoz, N., Patio-Garcia, A., Sierrasesmaga, L., Lecanda, F., Kurucu, N., Ilhan, I.E., Sari, N., Serra, M., Hattinger, C., Picci, P., Spector, L.G., Barkauskas, D.A., Marina, N., de Toledo, S.R.C., Petrilli, A.S., Amary, M.F., Halai, D., Thomas, D.M., Douglass, C., Meltzer, P.S., Jacobs, K., Chung, C.C., Berndt, S.I., Purdue, M.P., Caporaso, N.E., Tucker, M., Rothman, N., Landi, M.T., Silverman, D.T., Kraft, P., Hunter, D.J., Malats, N., Kogevinas, M., Wacholder, S., Troisi, R., Helman, L., Fraumeni, J.F., Yeager, M., Hoover, R.N., Chanock, S.J.: Genome-wide association study identifies two susceptibility loci for osteosarcoma. Nature Genetics 45(7), 799–803 (2013)CrossRefGoogle Scholar
  14. 14.
    Pasaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations. Bioinformatics 25(12), i213–i221 (2009)Google Scholar
  15. 15.
    Price, A.L., Tandon, A., Patterson, N., Barnes, K.C., Rafaels, N., Ruczinski, I., Beaty, T.H., Mathias, R., Reich, D., Myers, S.: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genetics 5(6), e1000519 (2009)Google Scholar
  16. 16.
    Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4), 2213–2233 (2003)Google Scholar
  17. 17.
    Han, B., Kang, H.M., Eskin, E.: Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers. PLoS Genet. 5(4), e1000456+ (2009)Google Scholar
  18. 18.
    Browning, S.R.: Multilocus association mapping using variable-length markov chains. Am. J. Hum. Genet. 78(6), 903–913 (2006)CrossRefGoogle Scholar
  19. 19.
    Browning, S.R., Browning, B.L.: High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86(4), 526–539 (2010)CrossRefGoogle Scholar
  20. 20.
    Wegmann, D., Kessner, D.E., Veeramah, K.R., Mathias, R.A., Nicolae, D.L., Yanek, L.R., Sun, Y.V., Torgerson, D.G., Rafaels, N., Mosley, T., Becker, L.C., Ruczinski, I., Beaty, T.H., Kardia, S.L.R., Meyers, D.A., Barnes, K.C., Becker, D.M., Freimer, N.B., Novembre, J.: Recombination rates in admixed individuals identified by ancestry-based inference. Nature Genetics 43(9), 847–853 (2011)CrossRefGoogle Scholar
  21. 21.
    Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nature Methods 9(2), 179–181 (2012)CrossRefGoogle Scholar
  22. 22.
    Roychoudhury, A., Stephens, M.: Fast and accurate estimation of the population-scaled mutation rate, theta, from microsatellite genotype data. Genetics 176(2), 1363–1366 (2007)CrossRefGoogle Scholar
  23. 23.
    Pasaniuc, B., Rohland, N., McLaren, P.J., Garimella, K., Zaitlen, N., Li, H., Gupta, N., Neale, B.M., Daly, M.J., Sklar, P., Sullivan, P.F., Bergen, S., Moran, J.L., Hultman, C.M., Lichtenstein, P., Magnusson, P., Purcell, S.M., Haas, D.W., Liang, L., Sunyaev, S., Patterson, N., de Bakker, P.I.W., Reich, D., Price, A.L.: Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44(6), 631–635 (2012)CrossRefGoogle Scholar
  24. 24.
    Li, Y., Sidore, C., Kang, H.M., Boehnke, M., Abecasis, G.R.: Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 21(6), 940–951 (2011)CrossRefGoogle Scholar
  25. 25.
    Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A.R., Auton, A., Indap, A., King, K.S., Bergmann, S., Nelson, M.R., Stephens, M., Bustamante, C.D.: Genes mirror geography within europe. Nature 456(7218), 98–101 (2008)CrossRefGoogle Scholar
  26. 26.
    Yang, W.Y., Novembre, J., Eskin, E., Halperin, E.: A model-based approach for analysis of spatial structure in genetic data. Nature Genetics 44(6), 725–731 (2012)CrossRefGoogle Scholar
  27. 27.
    Baran, Y., Quintela, I., Carracedo, A., Pasaniuc, B., Halperin, E.: Enhanced localization of genetic samples through linkage-disequilibrium correction. Am. J. Hum. Genet. (May 2013)Google Scholar
  28. 28.
    Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)CrossRefGoogle Scholar
  29. 29.
    Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nature Methods 5(1), 16–18 (2008)CrossRefGoogle Scholar
  30. 30.
    Shendure, J., Mitra, R.D., Varma, C., Church, G.M.: Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5(5), 335–344 (2004)CrossRefGoogle Scholar
  31. 31.
    Howie, B., Marchini, J., Stephens, M.: Genotype imputation with thousands of genomes. G3: Genes, Genomes, Genetics 1(6), 457–470 (2011)CrossRefGoogle Scholar
  32. 32.
    Paaniuc, B., Avinery, R., Gur, T., Skibola, C.F., Bracci, P.M., Halperin, E.: A generic coalescent-based framework for the selection of a reference panel for imputation. Genetic Epidemiology 34(8), 773–782 (2010)CrossRefGoogle Scholar
  33. 33.
    Liu, E.Y., Li, M., Wang, W., Li, Y.: Mach-admix: Genotype imputation for admixed populations. Genetic Epidemiology 37(1), 25–37 (2013)CrossRefMathSciNetGoogle Scholar
  34. 34.
    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics (2012)Google Scholar
  35. 35.
    Nelson, M.R., Wegmann, D., Ehm, M.G., Kessner, D., St. Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S.A., Fraser, D., Warren, L., Aponte, J., Zawistowski, M., Liu, X., Zhang, H., Zhang, Y., Li, J., Li, Y., Li, L., Woollard, P., Topp, S., Hall, M.D., Nangle, K., Wang, J., Abecasis, G., Cardon, L.R., Zöllner, S., Whittaker, J.C., Chissoe, S.L., Novembre, J., Mooser, V.: An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People. Science 337(6090), 100–104 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Wen-Yun Yang
    • 1
  • Farhad Hormozdiari
    • 1
  • Eleazar Eskin
    • 1
    • 2
  • Bogdan Pasaniuc
    • 2
    • 3
  1. 1.Department of Computer ScienceUniversity of California Los AngelesUSA
  2. 2.Department of Human GeneticsUniversity of California Los AngelesUSA
  3. 3.Department of Pathology and Laboratory MedicineUniversity of California Los AngelesUSA

Personalised recommendations