A Survey of Computational Methods for Determining Haplotypes

  • Bjarni V. Halldórsson
  • Vineet Bafna
  • Nathan Edwards
  • Ross Lippert
  • Shibu Yooseph
  • Sorin Istrail
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2983)


It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing the complexity of studying SNPs. In this paper we review some of the computational approaches that have been taking for determining haplotypes and suggest new approaches.


Integer Programming Formulation Haplotype Inference Haplotype Phase Heterozygous Site Haplotype Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abecasis, G.R., Martin, R., Lewitzky, S.: Estimation of haplotype frequencies from diploid data. American Journal of Human Genetics 69(4 suppl. 1), 114 (2001)Google Scholar
  2. 2.
    Bafna, V., Gusfield, D., Lancia, G., Yooseph, S.: Haplotyping as a perfect phylogeny. A direct approach. Journal of Computational Biology 10(3), 323–340 (2003)CrossRefGoogle Scholar
  3. 3.
    Bodlaender, H., Fellows, M., Warnow, T.: Two strikes against perfect phylogeny. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 273–283. Springer, Heidelberg (1992)Google Scholar
  4. 4.
    Broder, A.: Generating random spanning trees. In: Proceedings of the IEEE 30th Annual Symposium on Foundations of Computer Science, pp. 442–447 (1989)Google Scholar
  5. 5.
    Chaiken, S.: A combinatorial proof of the all-minors matrix tree theorem. SIAM Journal on Algebraic and Discrete Methods 3, 319–329 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Chen, E.Y.: Methods and products for analyzing polymers. U.S. Patent 6,355,420Google Scholar
  7. 7.
    Clark, G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution 7(2), 111–122 (1990)Google Scholar
  8. 8.
    Cohn, H., Pemantle, R., Propp, J.: Generating a random sink-free orientation in quadratic time. Electronic Journal of Combinatorics 9(1) (2002)Google Scholar
  9. 9.
    Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: Highresolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)CrossRefGoogle Scholar
  10. 10.
    Damaschke, P.: Fast perfect phylogeny haplotype inference. In: Lingas, A., Nilsson, B.J. (eds.) FCT 2003. LNCS, vol. 2751, pp. 183–194. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Day, W.H.E., Sankoff, D.: Computational complexity of inferring phylogenies by compatibility. Systematic Zoology 35(2), 224–229 (1986)CrossRefGoogle Scholar
  12. 12.
    Halperin, R.M.K.E.: Perfect phylogeny and haplotype assignement. In: Proceedings of the Eigth Annual International Conference on Computational Molecular Biology, RECOMB (to appear, 2004)Google Scholar
  13. 13.
    Eronen, L., Geerts, F., Toivonen, H.: A markov chain approach to reconstruction of long haplotypes. In: Pacific Symposium on Biocomputing (PSB 2004) (to appear, 2004)Google Scholar
  14. 14.
    Eskin, E., Halperin, E., Karp, R.M.: Large scale reconstruction of haplotypes from genotype data. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), pp. 104–113 (2003)Google Scholar
  15. 15.
    Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution 12(5), 921–927 (1995)Google Scholar
  16. 16.
    Fallin, D., Schork, N.J.: Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. American Journal of Human Genetics 67(4), 947–959 (2000)CrossRefGoogle Scholar
  17. 17.
    Frisse, L., Hudson, R., Bartoszewicz, A., Wall, J., Donfalk, T., Di Rienzo, A.: Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. American Journal of Human Genetics 69, 831–843 (2001)CrossRefGoogle Scholar
  18. 18.
    Greenspan, G., Geiger, D.: Model-based inference of haplotype block variation. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), pp. 131–137 (2003)Google Scholar
  19. 19.
    Gusfield, D.: A practical algorithm for optimal inference of haplotypes from diploid populations. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 183–189 (2000)Google Scholar
  20. 20.
    Gusfield, D.: Inference of haplotypes from samples of diploid populations: Complexity and algorithms. Journal of Computational Biology 8(3), 305–324 (2001)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (Extended abstract). In: Proceedings of the Sixth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 166–175 (2002)Google Scholar
  22. 22.
    Gusfield, D.: Haplotyping by pure parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Hartl, D.L., Clark, A.G.: Principles of Population Genetics. Sinauer Associates (1997)Google Scholar
  24. 24.
    Hawley, M.E., Kidd, K.K.: HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes. Journal of Heredity 86, 409–411 (1995)Google Scholar
  25. 25.
    Helmuth, L.: Genome research: Map of the human genome 3.0. Science 293(5530), 583–585 (2001)CrossRefGoogle Scholar
  26. 26.
    Hubbell, E.: Finding a maximum likelihood solution to haplotype phases is difficult. Personal communicationGoogle Scholar
  27. 27.
    Hubbell, E.: Finding a parsimony solution to haplotype phase is NP-hard. Personal communicationGoogle Scholar
  28. 28.
    Hudson, R.R.: Gene genealogies and the coalescent process. In: Futuyma, D., Antonovics, J. (eds.) Oxford surveys in evolutionary biology, vol. 7, pp. 1–44. Oxford University Press, Oxford (1990)Google Scholar
  29. 29.
    Jeffreys, J., Kauppi, L., Neumann, R.: Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics 29(2), 217–222 (2001)CrossRefGoogle Scholar
  30. 30.
    Kim, L., Kim, J.H., Waterman, M.S.: Haplotype reconstruction from SNP alignment. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), pp. 207–216 (2003)Google Scholar
  31. 31.
    Kimmel, G., Shamir, R.: Maximum likelihood resolution of multi-block genotypes. In: Proceedings of the Eigth Annual International Conference on Computational Molecular Biology, RECOMB (to appear, 2004)Google Scholar
  32. 32.
    Kirchhoff, G.: Über die auflösung der gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer ströme geführt wird. Annalen für der Physik und der Chemie 72, 497–508 (1847)zbMATHCrossRefGoogle Scholar
  33. 33.
    Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., Shlien, A., Palsson, S.T., Frigge, M.L., Thorgeirsson, T.E., Gulcher, J.R., Stefansson, K.: A high-resolution recombination map of the human genome. Nature Genetics 31(3), 241–247 (2002)Google Scholar
  34. 34.
    Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  35. 35.
    Li, J., Jiang, T.: Efficient rule based haplotyping algorithms for pedigree data. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), pp. 197–206 (2003)Google Scholar
  36. 36.
    Li, J., Jiang, T.: An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming. In: Proceedings of the Eigth Annual International Conference on Computational Molecular Biology (RECOMB) (to appear, 2004)Google Scholar
  37. 37.
    Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. American Journal of Human Genetics 71, 1129–1137 (2002)CrossRefGoogle Scholar
  38. 38.
    Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics 3(1), 23–31 (2002)CrossRefGoogle Scholar
  39. 39.
    Long, J.C., Williams, R.C., Urbanek, M.: An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics 56(2), 799–810 (1995)Google Scholar
  40. 40.
    Mitra, R., Butty, V., Shendure, J., Williams, B.R., Housman, D.E., Church, G.M.: Digital genotyping and haplotyping with polymerase colonies. Proceedings of the National Academy of Sciences 100(10), 5926–5931 (2003)CrossRefGoogle Scholar
  41. 41.
    Mitra, R., Church, G.M.: In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Research 27(e34), 1–6 (1999)Google Scholar
  42. 42.
    Niu, T., Qin, Z.S., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics 70, 157–169 (2002)CrossRefGoogle Scholar
  43. 43.
    Nordborg, M.: Handbook of Statistical Genetics, chapter Coalescent Theory. John Wiley & Sons, Ltd., Chichester (2001)Google Scholar
  44. 44.
    Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., Nguyen, B.T.N., Norris, M.C., Sheehan, J.B., Shen, N., Stern, D., Stokowski, R.P., Thomas, D.J., Trulson, M.O., Vyas, K.R., Frazer, K.A., Fodor, S.P.A., Cox, D.R.: Blocks of limited haplotype diversity revealed by high resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001)CrossRefGoogle Scholar
  45. 45.
    Rizzi, R., Bafna, V., Istrail, S., Lancia, G.: Practical algorithms and fixedparameter tractability for the single individual SNP haplotyping problem. In: Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pp. 29–43 (2002)Google Scholar
  46. 46.
    Steel, M.A.: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9, 91–116 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  47. 47.
    Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.-H., Duan, J., Carr, J.L., Lee, M.S., Koshy, B., Kumar, A.M., Zhang, G., Newell, W.R., Windemuth, A., Xu, C., Kalbfleisch, T.S., Shaner, S.L., Arnold, K., Schulz, V., Drysdale, C.M., Nandabalan, K., Judson, R.S., Ruano, G., Vovis, G.F.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293(5529), 489–493 (2001)CrossRefGoogle Scholar
  48. 48.
    Stephens, M., Donnelly, P.: Inference in molecular population genetics. Journal of the Royal Statistical Society, Series B 62(4), 605–635 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  49. 49.
    Stephens, M., Donnelly, P.: A comparison of bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73, 1162–1169 (2003)CrossRefGoogle Scholar
  50. 50.
    Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)CrossRefGoogle Scholar
  51. 51.
    Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. Journal of Computational Biology 8(1), 69–78 (2001)CrossRefGoogle Scholar
  52. 52.
    Zhang, P., Sheng, H., Morabia, A., Gilliam, T.C.: Optimal step length EM algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping. BMC Bioinformatics 4(3) (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Bjarni V. Halldórsson
    • 1
  • Vineet Bafna
    • 1
  • Nathan Edwards
    • 1
  • Ross Lippert
    • 1
  • Shibu Yooseph
    • 1
  • Sorin Istrail
    • 1
  1. 1.Informatics Research, Celera Genomics/Applied BiosystemsRockville

Personalised recommendations