Comparing Integer Linear Programming to SAT-Solving for Hard Problems in Computational and Systems Biology

  • Hannah Brown
  • Lei Zuo
  • Dan GusfieldEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12099)


It is useful to have general-purpose solution methods that can be applied to a wide range of problems, rather than relying on the development of clever, intricate algorithms for each specific problem. Integer Linear Programming is the most widely-used such general-purpose solution method. It is successful in a wide range of problems. However, there are some problems in computational biology where integer linear programming has had only limited success. In this paper, we explore an alternate, general-purpose solution method: SAT-solving, i.e., constructing Boolean formulas in conjunctive normal form (CNF) that encode a problem instance, and using a SAT-solver to determine if the CNF formula is satisfiable or not. In three hard problems examined, we were very surprised to find the SAT-solving approach was dramatically better than the ILP approach in two problems; and a little slower, but more robust, in the third problem. We also re-examined and confirmed an earlier result on a fourth problem, using current ILP and SAT-solvers. These results should encourage further efforts to exploit SAT-solving in computational biology.


Integer programming SAT-solving Computational biology 


  1. 1.
    Bafna, V., Bansal, V.: Inference about recombination from haplotype data: lower bounds and recombination hotspots. J. Comput. Biol. 13, 501–521 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of Boolean cardinality constraints. In: Rossi, F. (ed.) CP 2003. LNCS, vol. 2833, pp. 108–122. Springer, Heidelberg (2003). Scholar
  3. 3.
  4. 4.
    Brown, D., Harrower, I.: Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Trans. Comput. Biol. Bioinform. 3(2), 141–154 (2006)CrossRefGoogle Scholar
  5. 5.
    Dill, K.A., et al.: Principles of protein folding - a perspective from simple exact models. Protein Sci. 4, 561–602 (1995)CrossRefGoogle Scholar
  6. 6.
    Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003). Scholar
  7. 7.
    Gusfield, D.: ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. MIT Press, Cambridge (2014)CrossRefGoogle Scholar
  8. 8.
    Gusfield, D.: Integer Linear Programming in Computational and Systems Biology: An Entry-Level Text. Cambridge University Press, Cambridge (2019)CrossRefGoogle Scholar
  9. 9.
    Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. In: Proceedings of the 27th ACM Symposium on the Theory of Computing, pp. 178–189 (1995)Google Scholar
  10. 10.
    Hartmann, T., Wieseke, N., Sharan, R., Middendorf, M., Bernt, M.: Genome rearrangement with ILP. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(5), 1585–1593 (2018). Scholar
  11. 11.
    Knuth, D.E.: The Art of Computer Programming. Fascicle 6: Satisfiability, vol. 4. Addison-Wesley, Boston (2015)Google Scholar
  12. 12.
    Kolodny, R., Koehl, P., Levitt, M.: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J. Mol. Biol. 346(4), 1173–1188 (2005). Scholar
  13. 13.
    Lancia, G., Pinotti, C., Rizzi, R.: Haplotyping populations by pure parsimony: complexity, exact and approximation algorithms. INFORMS J. Comput. Spec. Issue Comput. Biol. 16, 348–359 (2004)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Lancia, G., Rinaldi, F., Serafini, P.: A unified integer programming model for genome rearrangement problems. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 491–502. Springer, Cham (2015). Scholar
  15. 15.
    Lynce, I., Marques-Silva, J.: Efficient haplotype inference with Boolean satisfiability. In: Proceedings of the Twenty-First AAAI Conference on Artificial Intelligence, pp. 104–109 (2006)Google Scholar
  16. 16.
    Lynce, I., Marques-Silva, J.: SAT in bioinformatics: making the case with haplotype inference. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, pp. 136–141. Springer, Heidelberg (2006). Scholar
  17. 17.
    Malikic, S., et al.: PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res. 29, 1860–1877 (2019)CrossRefGoogle Scholar
  18. 18.
    Matsieva, J.: Optimization techniques for phylogenetics. Ph.D. thesis, Department of Computer Science, University of California, Davis (2019)Google Scholar
  19. 19.
    Matsieva, J., Kelk, S., Scornavacca, C., Whidden, C., Gusfield, D.: A resolution of the static formulation question for the problem of computing the history bound. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(2), 404–417 (2017). Scholar
  20. 20.
    Myers, S., Griffiths, R.C.: Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375–394 (2003)Google Scholar
  21. 21.
    Nunes, L., Galvao, L., Lopes, H., Moscato, P., Berretta, R.: An integer programming model for protein structure prediction using the 3D-HP side chain model. Discret. Appl. Math. 198, 206–214 (2016)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Palmer, J., Herbon, L.: Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J. Mol. Evol. 27, 87–97 (1988)CrossRefGoogle Scholar
  23. 23.
    Shao, M., Moret, B.M.E.: Comparing genomes with rearrangements and segmental duplications. Bioinformatics 31(12), i329–i338 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CaliforniaDavisUSA
  2. 2.Department of Computer ScienceThe University of Hong KongPok Fu LamChina

Personalised recommendations