Advertisement

A Guided Tour to Computational Haplotyping

  • Gunnar W. Klau
  • Tobias Marschall
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10307)

Abstract

Human genomes come in pairs: every individual inherits one version of the genome from the mother and another version from the father. Hence, every chromosome exists in two similar yet distinct “copies”, called haplotypes. The problem of determining the full sequences of both haplotypes is known as phasing or haplotyping. In this paper, we review different approaches for haplotyping and point out how they are formalized as optimization problems. We survey different technologies and, in this way, provide guidance on the characteristics of problem instances resulting from present day technologies. Furthermore, we highlight open algorithmic challenges.

Keywords

Illumina Sequencing Dynamic Programming Algorithm Large Pedigree Heterozygous Site Fragment Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Lawson, D.J., Hellenthal, G., Myers, S., Falush, D.: Inference of population structure using dense haplotype data. PLoS Genet. 8(1), e1002453 (2012)CrossRefGoogle Scholar
  2. 2.
    Sabeti, P.C., Varilly, P., Fry, B., et al.: Genome-wide detection and characterization of positive selection in human populations. Nature 449(7164), 913–918 (2007)CrossRefGoogle Scholar
  3. 3.
    Tewhey, R., Bansal, V., Torkamani, A., Topol, E.J., Schork, N.J.: The importance of phase information for human genomics. Nat. Rev. Genet. 12(3), 215–223 (2011)CrossRefGoogle Scholar
  4. 4.
    Corradin, O., Cohen, A.J., Luppino, J.M., Bayles, I.M., Schumacher, F.R., Scacheri, P.C.: Modeling disease risk through analysis of physical interactions between genetic variants within chromatin regulatory circuitry. Nat. Genet. 48(11), 1313–1320 (2016)CrossRefGoogle Scholar
  5. 5.
    Shlyueva, D., Stampfel, G., Stark, A.: Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15(4), 272–286 (2014)CrossRefGoogle Scholar
  6. 6.
    Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-Andr, V., Sigova, A.A., Hoke, H.A., Young, R.A.: Super-enhancers in the control of cell identity and disease. Cell 155(4), 934–947 (2013)CrossRefGoogle Scholar
  7. 7.
    Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.I., Young, R.A.: Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153(2), 307–319 (2013)CrossRefGoogle Scholar
  8. 8.
    Corradin, O., Saiakhova, A., Akhtar-Zaidi, B., Myeroff, L., Willis, J., Cowper-Sallari, R., Lupien, M., Markowitz, S., Scacheri, P.C.: Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24(1), 1–13 (2014)CrossRefGoogle Scholar
  9. 9.
    Eskin, E.: Discovering genes involved in disease and the mystery of missing heritability. Commun. ACM 58(10), 80–87 (2015)CrossRefGoogle Scholar
  10. 10.
    Glusman, G., Cox, H.C., Roach, J.C.: Whole-genome haplotyping approaches and genomic medicine. Genome Med. 6(9), 73 (2014)CrossRefGoogle Scholar
  11. 11.
    Browning, S.R., Browning, B.L.: Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12(10), 703–714 (2011)CrossRefGoogle Scholar
  12. 12.
    Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)CrossRefGoogle Scholar
  13. 13.
    Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nat. Meth. 9(2), 179–181 (2012)CrossRefGoogle Scholar
  14. 14.
    Delaneau, O., Zagury, J.F., Marchini, J.: Improved whole-chromosome phasing for disease and population genetic studies. Nat. Meth. 10(1), 5–6 (2013)CrossRefGoogle Scholar
  15. 15.
    O’Connell, J., Sharp, K., Shrine, N., Wain, L., Hall, I., Tobin, M., Zagury, J.F., Delaneau, O., Marchini, J.: Haplotype estimation for biobank-scale data sets. Nat. Genet. 48(7), 817–820 (2016)CrossRefGoogle Scholar
  16. 16.
    Loh, P.R., Palamara, P.F., Price, A.L.: Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48(7), 811–816 (2016)CrossRefGoogle Scholar
  17. 17.
    Loh, P.R., Danecek, P., Palamara, P.F., Fuchsberger, C., Reshef, Y.A., Finucane, H.K., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G.R., Durbin, R., Price, A.L.: Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48(11), 1443–1448 (2016)CrossRefGoogle Scholar
  18. 18.
    The 1000 Genomes Project Consortium: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)Google Scholar
  19. 19.
    The Genome of the Netherlands Consortium: Whole-genome sequence variation, population structure and demographic history of the dutch population. Nat. Genet. 46, 818–825 (2014)Google Scholar
  20. 20.
    Hehir-Kwa, J.Y., Marschall, T., Kloosterman, W.P., et al.: A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989 (2016)CrossRefGoogle Scholar
  21. 21.
    Rastas, P., Ukkonen, E.: Haplotype inference via hierarchical genotype parsing. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS, vol. 4645, pp. 85–97. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74126-8_9 CrossRefGoogle Scholar
  22. 22.
    Abecasis, G.R., Cherny, S.S., Cookson, W.O., Cardon, L.R.: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30(1), 97–101 (2002)CrossRefGoogle Scholar
  23. 23.
    Roach, J.C., Glusman, G., Hubley, R., Montsaroff, S.Z., Holloway, A.K., Mauldin, D.E., Srivastava, D., Garg, V., Pollard, K.S., Galas, D.J., Hood, L., Smit, A.F.A.: Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89(3), 382–397 (2011)CrossRefGoogle Scholar
  24. 24.
    Williams, A.L., Housman, D.E., Rinard, M.C., Gifford, D.K.: Rapid haplotype inference for nuclear families. Genome Biol. 11, R108 (2010)CrossRefGoogle Scholar
  25. 25.
    Chin, C.S., Peluso, P., Sedlazeck, F.J., Nattestad, M., Concepcion, G.T., Clum, A., Dunn, C., O’Malley, R., Figueroa-Balderas, R., Morales-Cruz, A., Cramer, G.R., Delledonne, M., Luo, C., Ecker, J.R., Cantu, D., Rank, D.R., Schatz, M.C.: Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Meth. 13(12), 1050–1054 (2016). Advance online publicationCrossRefGoogle Scholar
  26. 26.
    Weisenfeld, N.I., Kumar, V., Shah, P., Church, D., Jae, D.B.: Direct determination of diploid genome sequences. bioRxiv, 070425 (2016)Google Scholar
  27. 27.
    Snyder, M.W., Adey, A., Kitzman, J.O., Shendure, J.: Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16(6), 344–358 (2015)CrossRefGoogle Scholar
  28. 28.
    Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z.S., Munro, H.M., Abecasis, G.R., Donnelly, P.: A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78(3), 437–450 (2006)CrossRefGoogle Scholar
  29. 29.
    Chen, W., Li, B., Zeng, Z., Sanna, S., Sidore, C., Busonero, F., Kang, H.M., Li, Y., Abecasis, G.R.: Genotype calling and haplotyping in parent-offspring trios. Genome Res. 23(1), 142–151 (2013)CrossRefGoogle Scholar
  30. 30.
    Delaneau, O., Howie, B., Cox, A.J., Zagury, J.F., Marchini, J.: Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93(4), 687–696 (2013)CrossRefGoogle Scholar
  31. 31.
    Garg, S., Martin, M., Marschall, T.: Read-based phasing of related individuals. Bioinformatics (Oxford, England) 32(12), i234–i242 (2016)CrossRefGoogle Scholar
  32. 32.
    Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings Bioinform. 3(1), 23–31 (2002)CrossRefGoogle Scholar
  33. 33.
    Cilibrasi, R., Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS, vol. 3692, pp. 128–139. Springer, Heidelberg (2005). doi: 10.1007/11557067_11 CrossRefGoogle Scholar
  34. 34.
    Zhao, Y.Y., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted SNP fragments. Comput. Biol. Chem. 29(4), 281–287 (2005)CrossRefzbMATHGoogle Scholar
  35. 35.
    Bonizzoni, P., Dondi, R., Klau, G.W., Pirola, Y., Pisanti, N., Zaccaria, S.: On the minimum error correction problem for haplotype assembly in diploid and polyploid genomes. J. Comput. Biol. 23(9), 718–736 (2016). A journal of computational molecular cell biologyMathSciNetCrossRefGoogle Scholar
  36. 36.
    Hanscom, C., Talkowski, M.: Design of large-insert jumping libraries for structural variant detection using illumina sequencing. Curr. Protoc. Hum. Genet. 80, 7.22.1–7.22.9 (2014)CrossRefGoogle Scholar
  37. 37.
    Zheng, G.X.Y., Lau, B.T., Schnall-Levin, M., et al.: Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34(3), 303–311 (2016)CrossRefGoogle Scholar
  38. 38.
    Chaisson, M.J.P., Huddleston, J., Dennis, M.Y., Sudmant, P.H., Malig, M., Hormozdiari, F., Antonacci, F., Surti, U., Sandstrom, R., Boitano, M., Landolin, J.M., Stamatoyannopoulos, J.A., Hunkapiller, M.W., Korlach, J., Eichler, E.E.: Resolving the complexity of the human genome using single-molecule sequencing. Nature 517(7536), 608–611 (2015)CrossRefGoogle Scholar
  39. 39.
    Porubský, D., Sanders, A.D., van Wietmarschen, N., Falconer, E., Hills, M., Spierings, D.C.J., Bevova, M.R., Guryev, V., Lansdorp, P.M.: Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 26(11), 1565–1574 (2016)CrossRefGoogle Scholar
  40. 40.
    Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., Dekker, J.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)CrossRefGoogle Scholar
  41. 41.
    Rhee, J.K., Li, H., Joung, J.G., Hwang, K.B., Zhang, B.T., Shin, S.Y.: Survey of computational haplotype determination methods for single individual. Genes Genomics 38(1), 1–12 (2015)CrossRefGoogle Scholar
  42. 42.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010)CrossRefGoogle Scholar
  43. 43.
    Deng, F., Cui, W., Wang, L.: A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genom. 14(Suppl 2), S2 (2013)CrossRefGoogle Scholar
  44. 44.
    Patterson, M., Marschall, T., Pisanti, N., Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: haplotype assembly for future-generation sequencing reads. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 237–249. Springer, Cham (2014). doi: 10.1007/978-3-319-05269-4_19 CrossRefGoogle Scholar
  45. 45.
    Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22(6), 498–509 (2015)CrossRefGoogle Scholar
  46. 46.
    Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics (Oxford, England) 30(17), i379–i385 (2014)CrossRefGoogle Scholar
  47. 47.
    Pirola, Y., Zaccaria, S., Dondi, R., Klau, G.W., Pisanti, N., Bonizzoni, P.: HapCol: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics 32(11), 1610–1617 (2015)CrossRefGoogle Scholar
  48. 48.
    Fouilhoux, P., Mahjoub, A.R.: Solving VLSI design and DNA sequencing problems using bipartization of graphs. Comput. Optim. Appl. 51(2), 749–781 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Chen, Z.Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics (Oxford, England) 29(16), 1938–1945 (2013)CrossRefGoogle Scholar
  50. 50.
    Chen, Z.Z., Deng, F., Shen, C., Wang, Y., Wang, L.: Better ILP-based approaches to haplotype assembly. J. Comput. Biol. 23(7), 537–552 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Heinrich Heine UniversityDüsseldorfGermany
  2. 2.Center for Bioinformatics, Saarland UniversitySaarbrückenGermany
  3. 3.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations