Annals of Operations Research

, Volume 184, Issue 1, pp 137–162 | Cite as

Haplotype inference with pseudo-Boolean optimization

  • Ana GraçaEmail author
  • João Marques-Silva
  • Inês Lynce
  • Arlindo L. Oliveira


The fast development of sequencing techniques in the recent past has required an urgent development of efficient and accurate haplotype inference tools. Besides being a crucial issue in genetics, haplotype inference is also a challenging computational problem. Among others, pure parsimony is a viable modeling approach to solve the problem of haplotype inference and also an interesting NP-hard problem in itself. Recently, the introduction of SAT-based methods, including pseudo-Boolean optimization (PBO) methods, has produced very efficient solvers. This paper provides a detailed description of RPoly, a PBO approach for the haplotype inference by pure parsimony (HIPP) problem. Moreover, an extensive evaluation of existent HIPP solvers, on a comprehensive set of instances, confirms that RPoly is currently the most efficient and robust HIPP approach.

Haplotype inference Pure parsimony Pseudo-Boolean optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aloul, F., Ramadi, A., Markov, I., & Sakallah, K. (2002). Generic ILP versus specialized 0-1 ILP: an update. In IEEE/ACM international conference on computer-aided design (ICCAD’02) (pp. 450–457). Google Scholar
  2. Brown, D., & Harrower, I. (2004). A new integer programming formulation for the pure parsimony problem in haplotype analysis. In LNCS: Vol. 3240. Workshop on algorithms in bioinformatics (WABI’04) (pp. 254–265). Google Scholar
  3. Brown, D., & Harrower, I. (2006). Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB’06), 3(2), 141–154. CrossRefGoogle Scholar
  4. Browning, S., & Browning, B. (2007). Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. American Journal of Human Genetics (AJHG), 81(5), 1084–1097. CrossRefGoogle Scholar
  5. Burgtorf, C., Kepper, P., Hoehe, M., Schmitt, C., Reinhardt, R., Lehrach, H., & Sauer, S. (2003). Clone-based systematic haplotyping (CSH): a procedure for physical haplotyping of whole genomes. Genome Research, 13(12), 2717–2724. CrossRefGoogle Scholar
  6. Daly, M., Rioux, J., Schaffner, S., Hudson, T., & Lander, E. (2001). High-resolution haplotype structure in the human genome. Nature Genetics, 29, 229–232. CrossRefGoogle Scholar
  7. Delaneau, O., Coulonges, C., & Zagury, J. F. (2008). Shape-IT: new rapid an accurate algorithm for haplotype inference. BMC Bioinformatics, 9, 540. CrossRefGoogle Scholar
  8. Drysdale, C., McGraw, D., Stack, C., Stephens, J., Judson, R., Nandabalan, K., Arnold, K., Ruano, G., & Liggett, S. (2000). Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. In National academy of sciences (NAS) (Vol. 97, pp. 10.483–10.488). Google Scholar
  9. Eén, N., & Sörensson, N. (2003). An extensible SAT-solver. In LNCS: vol. 2919, International conference on theory and applications of satisfiability testing (SAT’03) (pp. 502–518). Google Scholar
  10. Eén, N., & Sörensson, N. (2006). Translating pseudo-Boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation, 2, 1–26. Google Scholar
  11. Erdem, E., & Türe, F. (2008). Efficient haplotype inference with answer set programming. In National conference on artificial intelligence (AAAI’08) (pp. 436–441). Google Scholar
  12. Excoffier, L., & Slatkin, M. (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5), 921–927. Google Scholar
  13. Gaspero, L., & Roli, A. (2008). Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. Journal of Algorithms: Algorithms in Logic, Informatics and Cognition, 63(1–3), 55–69. Google Scholar
  14. Graça, A., Marques-Silva, J., Lynce, I., & Oliveira, A. (2007). Efficient haplotype inference with pseudo-Boolean optimization. In LNCS: Vol. 4545, Algebraic biology (AB’07) (pp. 125–139). Google Scholar
  15. Graça, A., Lynce, I., Marques-Silva, J., & Oliveira, A. (2008a). Generic ILP vs specialized 0-1 ILP for haplotype inference. In Workshop on constraint based methods for bioinformatics (WCB’08). Google Scholar
  16. Graça, A., Marques-Silva, J., Lynce, I., & Oliveira, A. (2008b). Efficient haplotype inference with combined CP and OR techniques. In LNCS: Vol. 5015, International conference on integration of AI and OR techniques in constraint programming for combinatorial optimization problems (CPAIOR’08) (pp. 308–312). Google Scholar
  17. Gusfield, D. (2003). Haplotype inference by pure parsimony. In Annual symposium on combinatorial pattern matching (CPM’03) (pp. 144–155). Google Scholar
  18. Halldórsson, B., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., & Istrail, S. (2004). A survey of computational methods for determining haplotypes. In LNCS: Vol. 2983, DIMACS/RECOMB satellite workshop on computational methods for SNPs and haplotype inference (pp. 26–47). Google Scholar
  19. Halperin, E., & Eskin, E. (2004). Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics, 20(12), 1842–1849. CrossRefGoogle Scholar
  20. Halperin, E., & Karp, R. (2004). Perfect phylogeny and haplotype assignment. In Annual international conference on computational molecular biology (RECOMB’03) (pp. 10–19). Google Scholar
  21. Huang, Y., Chao, K., & Chen, T. (2005). An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 12(10), 1261–1274. CrossRefGoogle Scholar
  22. Hudson, R. (1990). Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7, 1–44. Google Scholar
  23. Hudson, R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2), 337–338. CrossRefGoogle Scholar
  24. Johnson, G., Esposito, L., Barratt, B., Smith, A., Heward, J., Genova, G., Ueda, H., Cordell, H., Eaves, I., Dudbridge, F., Twells, R., Payne, F., Hughes, W., Nutland, S., Stevens, H., Carr, P., Tuomilehto-Wolf, E., Tuomilehto, J., Gough, S., Clayton, D., & Todd, J. (2001). Haplotype tagging for the identification of common disease genes. Nature, 29, 233–237. Google Scholar
  25. Kelly, E., Sievers, F., & McManus, R. (2004). Haplotype frequency estimation error analysis in the presence of missing genotype data. BMC Bioinformatics, 5, 188. CrossRefGoogle Scholar
  26. Kerem, B., Rommens, J., Buchanan, J., Markiewicz, D., Cox, T., Chakravarti, A., Buchwald, M., & Tsui, L. C. (1989). Identification of the cystic fibrosis gene: Genetic analysis. Science, 245, 1073–1080. CrossRefGoogle Scholar
  27. Kroetz, D. L., Pauli-Magnus, C., Hodges, L. M., Huang, C. C., Kawamoto, M., Johns, S. J., Stryke, D., Ferrin, T. E., DeYoung, J., Taylor, T., Carlson, E. J., Herskowitz, I., Giacomini, K. M., & Clark, A. G. (2003). Sequence diversity and haplotype structure in the human ABCD1 (MDR1, multidrug resistance transporter). Pharmacogenetics, 13, 481–494. CrossRefGoogle Scholar
  28. Lancia, G., Pinotti, C. M., & Rizzi, R. (2004). Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing, 16(4), 348–359. CrossRefGoogle Scholar
  29. Lynce, I., & Marques-Silva, J. (2006a). Efficient haplotype inference with Boolean satisfiability. In National conference on artificial intelligence (AAAI’06) (pp. 104–109). Google Scholar
  30. Lynce, I., & Marques-Silva, J. (2006b). SAT in bioinformatics: making the case with haplotype inference. In LNCS: Vol. 4121, International conference on theory and applications of satisfiability testing (SAT’06) (pp. 136–141). Google Scholar
  31. Lynce, I., & Marques-Silva, J. (2008). Haplotype inference with Boolean satisfiability. International Journal on Artificial Intelligence Tools, 17(2), 355–387. CrossRefGoogle Scholar
  32. Lynce, I., Marques-Silva, J., & Prestwich, S. (2008). Boosting haplotype inference with local search. Constraints, 13(1), 155–179. CrossRefGoogle Scholar
  33. Manquinho, V., & Marques-Silva, J. (2005). Effective lower bounding techniques for pseudo-Boolean optimization. In Design, automation and test in Europe conference and exhibition (DATE’05) (pp. 660–665). Google Scholar
  34. Manquinho, V., Marques-Silva, J., & Planes, J. (2009). Algorithms for weighted Boolean optimization. In LNCS: Vol. 5584, International conference on theory and applications of satisfiability testing (SAT’09) (pp. 495–508). Google Scholar
  35. Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z., Munro, H., Abecassis, G., Donnelly, P., & Consortium, I. H. (2006). A comparison of phasing algorithms for trios and unrelated individuals. American Journal of Human Genetics, 78, 437–450. CrossRefGoogle Scholar
  36. Neigenfind, J., Gyetvai, G., Basekow, R., Diehl, S., Achenbach, U., Gebhardt, C., Selbig, J., & Kersten, B. (2008). Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics, 9, 356. CrossRefGoogle Scholar
  37. Patil, N., Berno, A., Hinds, D., Barrett, W., Doshi, J., Hacker, C., Kautzer, C., Lee, D., Marjoribanks, C., McDonough, D., Nguyen, B., Norris, M., Sheehan, J., Shen, N., Stern, D., Stokowski, R., Thomas, D., Trulson, M., Vyas, K., Frazer, K., Fodor, S., & Cox, D. (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719–1723. CrossRefGoogle Scholar
  38. Rieder, M. J., Taylor, S. T., Clark, A. G., & Nickerson, D. A. (2001). Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22, 481–494. Google Scholar
  39. Schaffner, S., Foo, C., Gabriel, S., Reich, D., Daly, M., & Altshuler, D. (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576–1583. CrossRefGoogle Scholar
  40. Scheet, P., & Stephens, M. (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics, 78, 629–644. CrossRefGoogle Scholar
  41. Sheini, H. M., & Sakallah, K. A. (2006). Pueblo: A hybrid pseudo-Boolean SAT solver. Journal on Satisfiability, Boolean Modeling and Computation, 2, 165–189. Google Scholar
  42. Stephens, M., Smith, N., & Donelly, P. (2001). A new statistical method for haplotype reconstruction. American Journal of Human Genetics, 68, 978–989. CrossRefGoogle Scholar
  43. The International HapMap Consortium (2003). The international hapmap project. Nature, 426, 789–796. CrossRefGoogle Scholar
  44. The International HapMap Consortium (2005). A haplotype map of the human genome. Nature, 437, 1299–1320. CrossRefGoogle Scholar
  45. The International HapMap Consortium (2007). A second generation human haplotype map over 3.1 million SNPs. Nature, 449, 851–861. CrossRefGoogle Scholar
  46. Wang, L., & Xu, Y. (2003). Haplotype inference by maximum parsimony. Bioinformatics, 19(14), 1773–1780. CrossRefGoogle Scholar
  47. Wang, R. S., Zhang, X. S., & Sheng, L. (2005). Haplotype inference by pure parsimony via genetic algorithm. In LNOR: Vol. 5, Operations research and its applications: the fifth international symposium (ISORA’05) (pp. 296–306). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Ana Graça
    • 1
    Email author
  • João Marques-Silva
    • 2
  • Inês Lynce
    • 1
  • Arlindo L. Oliveira
    • 1
  1. 1.Instituto Superior Técnico (IST)Technical University of Lisbon and INESC-ID LisboaLisbonPortugal
  2. 2.Complex and Adaptive Systems Lab, School of Computer Science and InformaticsUniversity College DublinDublinIreland

Personalised recommendations