Skip to main content

Efficient and Tight Upper Bounds for Haplotype Inference by Pure Parsimony Using Delayed Haplotype Selection

  • Conference paper
Progress in Artificial Intelligence (EPIA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4874))

Included in the following conference series:

Abstract

Haplotype inference from genotype data is a key step towards a better understanding of the role played by genetic variations on inherited diseases. One of the most promising approaches uses the pure parsimony criterion. This approach is called Haplotype Inference by Pure Parsimony (HIPP) and is NP-hard as it aims at minimising the number of haplotypes required to explain a given set of genotypes. The HIPP problem is often solved using constraint satisfaction techniques, for which the upper bound on the number of required haplotypes is a key issue. Another very well-known approach is Clark’s method, which resolves genotypes by greedily selecting an explaining pair of haplotypes. In this work, we combine the basic idea of Clark’s method with a more sophisticated method for the selection of explaining haplotypes, in order to explicitly introduce a bias towards parsimonious explanations. This new algorithm can be used either to obtain an approximated solution to the HIPP problem or to obtain an upper bound on the size of the pure parsimony solution. This upper bound can then used to efficiently encode the problem as a constraint satisfaction problem. The experimental evaluation, conducted using a large set of real and artificially generated examples, shows that the new method is much more effective than Clark’s method at obtaining parsimonious solutions, while keeping the advantages of simplicity and speed of Clark’s method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adkins, R.M.: Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genet. 5(1), 22 (2004)

    Article  Google Scholar 

  2. Brown, D., Harrower, I.: A new integer programming formulation for the pure parsimony problem in haplotype analysis. In: Workshop on Algorithms in Bioinformatics (2004)

    Google Scholar 

  3. Brown, D., Harrower, I.: Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2), 141–154 (2006)

    Article  Google Scholar 

  4. Graça, A., Marques-Silva, J., Lynce, I., Oliveira, A.: Efficient haplotype inference with pseudo-Boolean optimization. Algebraic Biology 2007, 125–139 (July 2007)

    Google Scholar 

  5. Clark, A.G.: Inference of haplotypes from pcr-amplified samples of diploid populations. Molecular Biology and Evolution 7(2), 111–122 (1990)

    Google Scholar 

  6. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)

    Article  Google Scholar 

  7. Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K., Arnold, K., Ruano, G., Liggett, S.B.: Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. National Academy of Sciences 97, 10483–10488 (2000)

    Article  Google Scholar 

  8. Greenspan, G., Geiger, D.: High density linkage disequilibrium mapping using models of haplotype block variation. Bioinformatics 20(supp. 1) (2004)

    Google Scholar 

  9. Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of Computational Biology 8(3), 305–324 (2001)

    Article  Google Scholar 

  10. Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Gusfield, D., Orzach, S.H.: Haplotype Inference. In: Handbook on Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9, CRC Press, Boca Raton, USA (2005)

    Google Scholar 

  12. Huang, Y.-T., Chao, K.-M., Chen, T.: An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology 12(10), 1261–1274 (2005)

    Article  Google Scholar 

  13. Kerem, B., Rommens, J., Buchanan, J., Markiewicz, D., Cox, T., Chakravarti, A., Buchwald, M., Tsui, L.C.: Identification of the cystic fibrosis gene: Genetic analysis. Science 245, 1073–1080 (1989)

    Article  Google Scholar 

  14. Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics 49(4), 725–738 (1964)

    Google Scholar 

  15. Kroetz, D.L., Pauli-Magnus, C., Hodges, L.M., Huang, C.C., Kawamoto, M., Johns, S.J., Stryke, D., Ferrin, T.E., DeYoung, J., Taylor, T., Carlson, E.J., Herskowitz, I., Giacomini, K.M., Clark, A.G.: Sequence diversity and haplotype structure in the human abcd1 (mdr1, multidrug resistance transporter). Pharmacogenetics 13, 481–494 (2003)

    Article  Google Scholar 

  16. Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing 16(4), 348–359 (2004)

    Article  MathSciNet  Google Scholar 

  17. Lynce, I., Marques-Silva, J.: Efficient haplotype inference with Boolean satisfiability. In: National Conference on Artificial Intelligence (AAAI) (July 2006)

    Google Scholar 

  18. Lynce, I., Marques-Silva, J.: SAT in bioinformatics: Making the case with haplotype inference. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics 70, 157–169 (2002)

    Article  Google Scholar 

  20. Orzack, S.H., Gusfield, D., Olson, J., Nesbitt, S., Subrahmanyan, L., Stanton Jr., V.P.: Analysis and exploration of the use of rule-based algorithms and consensus methods for the inferral of haplotypes. Genetics 165, 915–928 (2003)

    Google Scholar 

  21. Rieder, M.J., Taylor, S.T., Clark, A.G., Nickerson, D.A.: Sequence variation in the human angiotensin converting enzyme. Nature Genetics 22, 481–494 (2001)

    Google Scholar 

  22. Stephens, M., Smith, N., Donelly, P.: A new statistical method for haplotype reconstruction. American Journal of Human Genetics 68, 978–989 (2001)

    Article  Google Scholar 

  23. The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (October 27, 2005)

    Google Scholar 

  24. Wang, L., Xu, Y.: Haplotype inference by maximum parsimony. Bioinformatics 19(14), 1773–1780 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Neves Manuel Filipe Santos José Manuel Machado

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marques-Silva, J., Lynce, I., Graça, A., Oliveira, A.L. (2007). Efficient and Tight Upper Bounds for Haplotype Inference by Pure Parsimony Using Delayed Haplotype Selection. In: Neves, J., Santos, M.F., Machado, J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science(), vol 4874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77002-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77002-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77000-8

  • Online ISBN: 978-3-540-77002-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics