Skip to main content

A Spatial-Aware Haplotype Copying Model with Applications to Genotype Imputation

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

  • 3013 Accesses

Abstract

Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gibbs, R.A., Belmont, J.W., Hardenbol, P., Willis, T.D., Yu, F., et al.: The international hapmap project. Nature 426(6968), 789–796 (2003)

    Article  Google Scholar 

  2. Consortium, G.P., Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)

    Article  Google Scholar 

  3. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lande, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29(6), 229–232 (2001)

    Article  Google Scholar 

  4. Kruglyak, L.: Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics 22(2), 139–144 (1999)

    Article  Google Scholar 

  5. Lohmueller, K.E., Bustamante, C.D., Clark, A.G.: Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. Genetics 182(1), 217–231 (2009)

    Article  Google Scholar 

  6. Pool, J.E., Hellmann, I., Jensen, J.D., Nielsen, R.: Population genetic inference from genomic sequence variation. Genome Res. 20(3), 291–300 (2010)

    Article  Google Scholar 

  7. Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913 (2007)

    Article  Google Scholar 

  8. Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34(8), 816–834 (2010)

    Article  Google Scholar 

  9. de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37(11), 1217–1223 (2005)

    Article  Google Scholar 

  10. Howie, B.N., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6), e1000529 (2009)

    Google Scholar 

  11. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44(8), 955–959 (2012)

    Article  Google Scholar 

  12. Chung, C.C., Kanetsky, P.A., Wang, Z., Hildebrandt, M.A.T., Koster, R., Skotheim, R.I., Kratz, C.P., Turnbull, C., Cortessis, V.K., Bakken, A.C., Bishop, D.T., Cook, M.B., Erickson, R.L., Foss, S.D., Jacobs, K.B., Korde, L.A., Kraggerud, S.M., Lothe, R.A., Loud, J.T., Rahman, N., Skinner, E.C., Thomas, D.C., Wu, X., Yeager, M., Schumacher, F.R., Greene, M.H., Schwartz, S.M., McGlynn, K.A., Chanock, S.J., Nathanson, K.L.: Meta-analysis identifies four new loci associated with testicular germ cell tumor. Nature Genetics 45(6), 680–685 (2013)

    Article  Google Scholar 

  13. Savage, S.A., Mirabello, L., Wang, Z., Gastier-Foster, J.M., Gorlick, R., Khanna, C., Flanagan, A.M., Tirabosco, R., Andrulis, I.L., Wunder, J.S., Gokgoz, N., Patio-Garcia, A., Sierrasesmaga, L., Lecanda, F., Kurucu, N., Ilhan, I.E., Sari, N., Serra, M., Hattinger, C., Picci, P., Spector, L.G., Barkauskas, D.A., Marina, N., de Toledo, S.R.C., Petrilli, A.S., Amary, M.F., Halai, D., Thomas, D.M., Douglass, C., Meltzer, P.S., Jacobs, K., Chung, C.C., Berndt, S.I., Purdue, M.P., Caporaso, N.E., Tucker, M., Rothman, N., Landi, M.T., Silverman, D.T., Kraft, P., Hunter, D.J., Malats, N., Kogevinas, M., Wacholder, S., Troisi, R., Helman, L., Fraumeni, J.F., Yeager, M., Hoover, R.N., Chanock, S.J.: Genome-wide association study identifies two susceptibility loci for osteosarcoma. Nature Genetics 45(7), 799–803 (2013)

    Article  Google Scholar 

  14. Pasaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations. Bioinformatics 25(12), i213–i221 (2009)

    Google Scholar 

  15. Price, A.L., Tandon, A., Patterson, N., Barnes, K.C., Rafaels, N., Ruczinski, I., Beaty, T.H., Mathias, R., Reich, D., Myers, S.: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genetics 5(6), e1000519 (2009)

    Google Scholar 

  16. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4), 2213–2233 (2003)

    Google Scholar 

  17. Han, B., Kang, H.M., Eskin, E.: Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers. PLoS Genet. 5(4), e1000456+ (2009)

    Google Scholar 

  18. Browning, S.R.: Multilocus association mapping using variable-length markov chains. Am. J. Hum. Genet. 78(6), 903–913 (2006)

    Article  Google Scholar 

  19. Browning, S.R., Browning, B.L.: High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86(4), 526–539 (2010)

    Article  Google Scholar 

  20. Wegmann, D., Kessner, D.E., Veeramah, K.R., Mathias, R.A., Nicolae, D.L., Yanek, L.R., Sun, Y.V., Torgerson, D.G., Rafaels, N., Mosley, T., Becker, L.C., Ruczinski, I., Beaty, T.H., Kardia, S.L.R., Meyers, D.A., Barnes, K.C., Becker, D.M., Freimer, N.B., Novembre, J.: Recombination rates in admixed individuals identified by ancestry-based inference. Nature Genetics 43(9), 847–853 (2011)

    Article  Google Scholar 

  21. Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nature Methods 9(2), 179–181 (2012)

    Article  Google Scholar 

  22. Roychoudhury, A., Stephens, M.: Fast and accurate estimation of the population-scaled mutation rate, theta, from microsatellite genotype data. Genetics 176(2), 1363–1366 (2007)

    Article  Google Scholar 

  23. Pasaniuc, B., Rohland, N., McLaren, P.J., Garimella, K., Zaitlen, N., Li, H., Gupta, N., Neale, B.M., Daly, M.J., Sklar, P., Sullivan, P.F., Bergen, S., Moran, J.L., Hultman, C.M., Lichtenstein, P., Magnusson, P., Purcell, S.M., Haas, D.W., Liang, L., Sunyaev, S., Patterson, N., de Bakker, P.I.W., Reich, D., Price, A.L.: Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44(6), 631–635 (2012)

    Article  Google Scholar 

  24. Li, Y., Sidore, C., Kang, H.M., Boehnke, M., Abecasis, G.R.: Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 21(6), 940–951 (2011)

    Article  Google Scholar 

  25. Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A.R., Auton, A., Indap, A., King, K.S., Bergmann, S., Nelson, M.R., Stephens, M., Bustamante, C.D.: Genes mirror geography within europe. Nature 456(7218), 98–101 (2008)

    Article  Google Scholar 

  26. Yang, W.Y., Novembre, J., Eskin, E., Halperin, E.: A model-based approach for analysis of spatial structure in genetic data. Nature Genetics 44(6), 725–731 (2012)

    Article  Google Scholar 

  27. Baran, Y., Quintela, I., Carracedo, A., Pasaniuc, B., Halperin, E.: Enhanced localization of genetic samples through linkage-disequilibrium correction. Am. J. Hum. Genet. (May 2013)

    Google Scholar 

  28. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)

    Article  Google Scholar 

  29. Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nature Methods 5(1), 16–18 (2008)

    Article  Google Scholar 

  30. Shendure, J., Mitra, R.D., Varma, C., Church, G.M.: Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5(5), 335–344 (2004)

    Article  Google Scholar 

  31. Howie, B., Marchini, J., Stephens, M.: Genotype imputation with thousands of genomes. G3: Genes, Genomes, Genetics 1(6), 457–470 (2011)

    Article  Google Scholar 

  32. Paaniuc, B., Avinery, R., Gur, T., Skibola, C.F., Bracci, P.M., Halperin, E.: A generic coalescent-based framework for the selection of a reference panel for imputation. Genetic Epidemiology 34(8), 773–782 (2010)

    Article  Google Scholar 

  33. Liu, E.Y., Li, M., Wang, W., Li, Y.: Mach-admix: Genotype imputation for admixed populations. Genetic Epidemiology 37(1), 25–37 (2013)

    Article  MathSciNet  Google Scholar 

  34. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics (2012)

    Google Scholar 

  35. Nelson, M.R., Wegmann, D., Ehm, M.G., Kessner, D., St. Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S.A., Fraser, D., Warren, L., Aponte, J., Zawistowski, M., Liu, X., Zhang, H., Zhang, Y., Li, J., Li, Y., Li, L., Woollard, P., Topp, S., Hall, M.D., Nangle, K., Wang, J., Abecasis, G., Cardon, L.R., Zöllner, S., Whittaker, J.C., Chissoe, S.L., Novembre, J., Mooser, V.: An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People. Science 337(6090), 100–104 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, WY., Hormozdiari, F., Eskin, E., Pasaniuc, B. (2014). A Spatial-Aware Haplotype Copying Model with Applications to Genotype Imputation. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_30

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics