Skip to main content

Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

Abstract

Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design.

The complete SNPs sequence information from each of the two copies of a given chromosome in a diploid genome is called a haplotype. The Haplotyping Problem for a single individual is as follows: Given a set of fragments from one individual’s DNA, find a maximally consistent pair of SNPs haplotypes (one per chromosome copy) by removing data “errors” related to sequencing errors, repeats, and paralogous recruitment. Two versions of the problem, i.e. the Minimum Fragment Removal (MFR) and the Minimum SNP Removal (MSR), are considered.

The Haplotyping Problem was introduced in [8], where it was proved that both MSR and MFR are polynomially solvable when each fragment covers a set of consecutive SNPs (i.e., it is a gapless fragment), and NP-hard in general. The original algorithms of [8] are of theoretical interest, but by no means practical. In fact, one relies on finding the maximum stable set in a perfect graph, and the other is a reduction to a network flow problem. Furthermore, the reduction does not work when there are fragments completely included in others, and neither algorithm can be generalized to deal with a bounded total number of holes in the data. In this paper, we give the first practical algorithms for the Haplotyping Problem, based on Dynamic Programming. Our algorithms do not require the fragments to not include each other, and are polynomial for each constant k bounding the total number of holes in the data. For m SNPs and n fragments, we give an O(mn 2k+2) algorithm for the MSR problem, and an O(22k m 2 n+23k m 3) algorithm for the MFR problem, when each fragment has at most k holes. In particular, we obtain an O(mn 2) algorithm for MSR and an O(m 2 n+m 3) algorithm for MFR on gapless fragments.

Finally, we prove that both MFR and MSR are APX-hard in general.

Research partially done while enjoying hospitality at BRICS, Department of Computer Science, University of Aarhus, Denmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. S. Booth and G. S. Lueker. Testing for the consecutive ones property, intervals graphs and graph planarity testing using PQ-tree algorithms. J. Comput. System Sci., 13:335–379, 1976.

    MATH  MathSciNet  Google Scholar 

  2. A. Chakravarti. It’s raining SNP, hallelujah? Nature Genetics, 19:216–217, 1998.

    Article  Google Scholar 

  3. A. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology Evolution, 7:111–122, 1990.

    Google Scholar 

  4. D. Gusfield. A practical algorithm for optimal inference of haplotypes from diploid populations. In R. Altman, T.L. Bailey, P. Bourne, M. Gribskov, T. Lengauer, I.N. Shindyalov, L.F. Ten Eyck, and H. Weissig, editors, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 183–189, Menlo Park, CA, 2000. AAAI Press.

    Google Scholar 

  5. D. Gusfield. Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In G. Myers, S. Hannenhalli, S. Istrail, P. Pevzner, and M. Watermand, editors, Proceedings of the Sixth Annual International Conference on Computational Biology, pages 166–175, New York, NY, 2002. ACM Press.

    Google Scholar 

  6. L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583–585, 2001.

    Article  Google Scholar 

  7. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409:860–921, 2001.

    Article  Google Scholar 

  8. G. Lancia, V. Bafna, S. Istrail, R. Lippert, and R. Schwartz. SNPs problems, complexity and algorithms. In Proceedings of Annual European Symposium on Algorithms (ESA), volume 2161 of Lecture Notes in Computer Science, pages 182–193. Springer, 2001.

    Google Scholar 

  9. R. Lippert, R. Schwartz, G. Lancia, and S. Istrail. Algorithmic strategies for the SNPs haplotype assembly problem. Briefings in Bioinformatics, 3(1):23–31, 2002.

    Article  Google Scholar 

  10. C. Lund and M. Yannakakis. The approximation of maximum subgraph problems. In Proceedings of 20th Int. Colloqium on Automata, Languages and Programming, pages 40–51. Springer-Verlag, 1994.

    Google Scholar 

  11. E. Marshall. Drug firms to create public database of genetic mutations. Science Magazine, 284(5413):406–407, 1999.

    Google Scholar 

  12. J.C. Venter et al. The sequence of the human genome. Science, 291:1304–1351, 2001.

    Article  Google Scholar 

  13. J. Weber and E. Myers. Human whole genome shotgun sequencing. Genome Research, 7:401–409, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rizzi, R., Bafna, V., Istrail, S., Lancia, G. (2002). Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics