COCOON 2003: Computing and Combinatorics pp 5-19 | Cite as

Empirical Exploration of Perfect Phylogeny Haplotyping and Haplotypers

  • Ren Hua Chung
  • Dan Gusfield
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2697)

Abstract

The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome 15. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A key, perhaps bottleneck, problem is to computationally determine haplotype pairs from genotype data. An approach to this problem based on viewing it in the context of perfect phylogeny was introduced in 14 along with an efficient solution. A slower (in worst case) variation of that method was implemented 3. Two simpler methods for the perfect phylogeny approach that are also slower (in worst case) than the first algorithm were later developed 1,7. We have implemented and tested all three of these approachs in order to compare and explain the practical efficiencies of the three methods. We discuss two other empirical observations: a strong phase-transition in the frequency of obtaining a unique solution as a function of the number of individuals in the input; and results of using the method to find non-overlapping intervals where the haplotyping solution is highly reliable, as a function of the level of recombination in the data. Finally, we discuss the biological basis for the size of these tests.

Keywords

High Linkage Disequilibrium Maximal Interval Empirical Exploration Haplotype Inference Haplotype Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph. Haplotyping as perfect phylogeny: A direct approach. Technical report, UC Davis, Department of Computer Science. July 17, 2002.Google Scholar
  2. 2.
    R. E. Bixby and D. K. Wagner. An almost linear-time algorithm for graph realization. Mathematics of Operations Research, 13:99–123, 1988.MATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    R.H. Chung and D. Gusfield. Perfect phylogeny haplotyper: Haplotype inferral using a tree model. Bioinformatics, 19(6):780–781, 2003.CrossRefGoogle Scholar
  4. 4.
    A. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol, 7:111–122, 1990.Google Scholar
  5. 5.
    A. Clark, K. Weiss, and D. Nickerson et. al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Human Genetics, 63:595–612, 1998.CrossRefGoogle Scholar
  6. 6.
    M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229–232, 2001.CrossRefGoogle Scholar
  7. 7.
    E. Eskin, E. Halperin, and R. Karp. Efficient reconstruction of haplotype structure via perfect phylogeny. Technical report, UC Berkeley, Computer Science Division (EECS), August, 2002.Google Scholar
  8. 8.
    M. Fullerton, A. Clark, Charles Sing, and et. al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. of Human Genetics, pages 881–900, 2000.Google Scholar
  9. 9.
    S. Cleary and K. St. John. Analysis of Haplotype Inference Data Requirements. Preprint, 2003.Google Scholar
  10. 10.
    F. Gavril and R. Tamari. An algorithm for constructing edge-trees from hypergraphs. Networks, 13:377–388, 1983.MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    D. Gusfield. Efficient algorithms for inferring evolutionary history. Networks, 21:19–28, 1991.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  13. 13.
    D. Gusfield. Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of computational biology, 8(3), 2001.Google Scholar
  14. 14.
    D. Gusfield. Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract). In Proceedings of RECOMB 2002: The Sixth Annual International Conference on Computational Biology, pages 166–175, 2002.Google Scholar
  15. 15.
    L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583–585, 2001.CrossRefGoogle Scholar
  16. 16.
    R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1–44, 1990.Google Scholar
  17. 17.
    R. Hudson. Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, 2002.CrossRefGoogle Scholar
  18. 18.
    C. Langley. U.C. Davis Dept. of Evolution and Ecology. Personal Communication, 2003.Google Scholar
  19. 19.
    J.Z. Lin, A. Brown, and M. T. Clegg. Heterogeneous geographic patterns of nucleotide sequence diversity between two alcohol dehydrogenase genes in wild barley (Hordeum vulgare subspecies spontaneum). PNAS, 98:531–536, 2001.CrossRefGoogle Scholar
  20. 20.
    S. Lin, D. Cutler, M. Zwick, and A. Cahkravarti. Haplotype inference in random population samples. Am. J. of Hum. Genet., 71:1129–1137, 2003.CrossRefGoogle Scholar
  21. 21.
    T. Niu, Z. Qin, X. Xu, and J.S. Liu. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet, 70:157–169, 2002.CrossRefGoogle Scholar
  22. 22.
    S. Orzack, D. Gusfield, and V. Stanton. The absolute and relative accuracy of haplotype inferral methods and a consensus approach to haplotype inferral. Abstract Nr 115 in Am. Society of Human Genetics, Supplement 2001.Google Scholar
  23. 23.
    M. Stephens, N. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics, 68:978–989, 2001.CrossRefGoogle Scholar
  24. 24.
    S. Tavare. Calibrating the clock: Using stochastic processes to measure the rate of evolution. In E. Lander and M. Waterman, editors, Calculating the Secretes of Life. National Academy Press, 1995.Google Scholar
  25. 25.
    W.T. Tutte. An algorithm for determining whether a given binary matroid is graphic. Proc. of Amer. Math. Soc, 11:905–917, 1960.CrossRefMathSciNetGoogle Scholar
  26. 26.
    C. Wade and M. Daly et al. The mosaic structure of variation in the laboratory mouse genome. Nature, 420:574–578, 2002.CrossRefGoogle Scholar
  27. 27.
    Shibu Yooseph. Personal Communication, 2003.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ren Hua Chung
    • 1
  • Dan Gusfield
    • 1
  1. 1.Computer Science DepartmentUniversity of California, DavisDavisUSA

Personalised recommendations