Integer Linear Programming in Computational Biology: Overview of ILP, and New Results for Traveling Salesman Problems in Biology

  • Dan GusfieldEmail author
Part of the Computational Biology book series (COBO, volume 29)


Integer linear programming (ILP) is a powerful and versatile technique for framing and solving hard optimization problems of many types. In the last several years, ILP has become widely used in computational biology, although predominantly by computationally and mathematically trained researchers, such as Bernard Moret. In an effort to reach a broader set of researchers, this chapter begins with an introduction to ILP, illustrated by the phenomena of cliques and independent sets in biological graphs. Then, the focus shifts to new research results on the use of ILP to solve traveling salesman problems, using compact ILP formulations. Such formulations have been largely declared useless in the optimization literature. However, in this chapter, I argue that the correct compact formulation can be very effective for problems of the size and structure that arise in computational biology. These empirical results, and some additional arguments, then bring into question the relevance of the concept of strength of an ILP formulation as a predictor of the speed that it will be solved.


Integer programming Biological networks Clique finding Independent set Traveling salesman problem Strength Beauty Efficiency 



This research was supported by NSF grant 1528234. The research was done partly while on sabbatical at the Simons Institute for Computational Theory, UC Berkeley. I would also like to thank Thong Le for help on understanding proofs about strength; Jim Orlin, T. L. Magnanti, and David Shmoys for helpful communications. Finally, I thank Tandy Warnow, Mohammed El-Kebir, and the anonymous reviewers who provided many helpful suggestions.


  1. 1.
    Agarwala, R., Applegate, D.L., Maglott, D., Schuler, G.D., Schäffer, A.A.: A fast and scalable radiation hybrid map construction and integration strategy. Genome Res. 10(3), 350–364 (2000)CrossRefGoogle Scholar
  2. 2.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall (1993)Google Scholar
  3. 3.
    Alizadeh, F., Karp, R.M., Weisser, D., Zweig, G.: Physical mapping of chromosomes using unique probes. J. Comput. Biol. 2, 159–184 (1995)CrossRefGoogle Scholar
  4. 4.
    Althaus, E., Klau, G.W., Kohlbacher, O., Lenhof, H.P., Reinert, K.: Integer linear programming in computational biology. In: Festschrift Mehlhorn, LNCS 5760, pp. 199 – 218. Springer (2009)Google Scholar
  5. 5.
    Álvarez-Miranda, E., Ljubić, I., Mutzel, P.: The maximum weight connected subgraph problem. In: Junger, M., Reinelt, G. (eds.) Facets of Combinatorial Optimization, pp. 245–270. Springer (2013)Google Scholar
  6. 6.
    Bertsimas, D., Weismantel, R.: Optimization Over Integers, vol. 13. Dynamic Ideas, Belmont (MA) (2005)Google Scholar
  7. 7.
    Blanchette, M., Bourque, G., Sankoff, D.: Breakpoint phylogenies. In: Miyano, S., Takagi, T. (eds.) Genome Informatics, pp. 25–34. University Academy Press (1997)Google Scholar
  8. 8.
    Blum, C., Festa, P.: Metaheuristics for String Problems in Bio-informatics. Wiley (2016)Google Scholar
  9. 9.
    Chimani, M., Rahmann, S., Bocker, S.: Exact ILP solutions for phylogenetic minimum flip problems. In: Proceedings of the First ACM-BCB Conference, pp. 147–153 (2010)Google Scholar
  10. 10.
    Claus, A.: A new formulation for the travelling salesman problem. SIAM J. Algebr. Discr. Methods 5, 21–25 (1984)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Conforti, M., Cornuejols, G., Zambelli, G.: Integer Programming. Springer (2014)Google Scholar
  12. 12.
    Dantzig, G.B., Fulkerson, D.R., Johnson, S.M.: Solution of a large-scale travelling-salesman problem. Oper. Res. 2, 393–410 (1954)Google Scholar
  13. 13.
    Felsenstein, J.: Inferring Phylogenies. Sinauer (2004)Google Scholar
  14. 14.
    Forrester, R., Greenberg, H.J.: Quadratic binary programming models in computational biology. Alg. Oper. Res. 3, 110129 (2008)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Fox, K., Gavish, B., Graves, S.: An n-constraint formulation of the (time-dependent) traveling salesman problem. Oper. Res. 28, 101821 (1980)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Frumkin, J.P., Patra, B.N., Sevold, A., Ganguly, K., Patel, C., Yoon, S., Schmid, M.B., Ray, A.: The interplay between chromosome stability and cell cycle control explored through gene-gene interaction and computational simulation. Nucleic Acids Res. 44, 8073–8085 (2016)CrossRefGoogle Scholar
  17. 17.
    Gavish, B., Graves, S.: The travelling salesman problem and related problems. Working Paper OR 078-78. Technical Report. MIT, Operations Research Center (1978)Google Scholar
  18. 18.
    Gouveia, L., Vos, S.: A classification of formulations for the (time-dependent) traveling salesman problem. Europ. J. Oper. Res. 83, 69–82 (1995)CrossRefGoogle Scholar
  19. 19.
    Gusfield, D.: Algorithms on Strings, Trees and Sequence. Computer Science and Computational Biology. Cambridge University Press (1997)Google Scholar
  20. 20.
    Gusfield, D.: Integer linear programming in computational and systems biology: an entry-level text and course. Cambridge University Press (2019)Google Scholar
  21. 21.
    Gusfield, D., Frid, Y., Brown, D.: Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data. In: Proceedings of 13th Annual International Conference on Combinatorics and Computing, pp. 51–64. LNCS 4598, Springer (2007)Google Scholar
  22. 22.
    Huttlin, E.L., Ting, L., Bruckner, R.J., Gebreab, F., Gygi, M.P., Szpyt, J., Tam, S., Zarraga, G., Colby, G., Baltier, K., Dong, R., Guarani, V., Vaites, L.P., Ordureau, A., Rad, R., Erickson, B.K., Whr, M., Chick, J., Zhai, B., Kolippakkam, D., Mintseris, J., Obar, R.A., Harris, T., Artavanis-Tsakonas, S., Sowa, M.E., Camilli, P.D., Paulo, J.A., Harper, J.W., Gygi, S.P.: The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015)CrossRefGoogle Scholar
  23. 23.
    Johnson, M., Hummer, G.: Interface-resolved network of protein-protein interactions. PLoS Comput. Biol. 9, e1003,065 (2013)CrossRefGoogle Scholar
  24. 24.
    Johnson, O., Liu, J.: A traveling salesman approach for predicting protein functions. Source Code Biol. Med. 1, (2006)Google Scholar
  25. 25.
    Kingsford, C.L., Chazelle, B., Singh, M.: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21, 1028–1036 (2005)CrossRefGoogle Scholar
  26. 26.
    Korostensky, C., Gonnet, G.: Near optimal multiple sequence alignments using a traveling salesman problem approach. In: Proceedings of String Processing and Information Retrieval Symposium, p. 105. IEEE (1999)Google Scholar
  27. 27.
    Korostensky, C., Gonnet, G.: Using traveling salesman problem algorithms for evolutionary tree construction. Bioinformatics 16, 619–627 (2000)CrossRefGoogle Scholar
  28. 28.
    Lancia, G.: Integer programming models for computational biology problems. J. Comp. Sci. Tech. 19, 6077 (2004)MathSciNetGoogle Scholar
  29. 29.
    Lancia, G.: Mathematical programming in computational biology: an annotated bibliography. Algorithms 1, 100129 (2008)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Langevin, A., Soumis, F., Desrosiers, J.: Classification of travelling salesman problem formulations. Oper. Res. Let. 9, 12732 (1990)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Lorenzo, E., Camacho-Caceres, K., Ropelewski, A.J., Rosas, J., Ortiz-Mojer, M., Perez-Marty, L., Irizarry, J., Gonzalez, V., Rodríguez, J.A., Cabrera-Rios, M., Isaza, C.: An optimization-driven analysis pipeline to uncover biomarkers and signaling paths: cervix cancer. Microarrays 4(2), 287–310 (2015)CrossRefGoogle Scholar
  32. 32.
    Mazza, A., Klockmeier, K., Wanker, E., Sharan, R.: An integer programming framework for inferring disease complexes from network data. Bioinformatics 32, i271–i277 (2016)CrossRefGoogle Scholar
  33. 33.
    Miller, C., Tucker, R., Zemlin, R.: Integer programming formulation of traveling salesman problems. J. Assoc. Comput. Mach. pp. 326–329 (1960)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Moret, B., Bader, D.A., Warnow, T.: High-performance algorithm engineering for computational phylogenetics. J. Supercomput. 22, 99–111 (2002)CrossRefGoogle Scholar
  35. 35.
    Oncan, T., Altnel, I., Laporte, G.: A comparative analysis of several asymmetric traveling salesman problem formulations. Comp. Oper. Res. 36, 637654 (2009)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Orman, A., Williams, H.: A survey of different integer programming formulations of the travelling salesman problem. Technical Report, Department of Operational Research, London School of Economics and Political Science (2004)Google Scholar
  37. 37.
    Orman, A., Williams, H.P.: A survey of different integer programming formulations of the travelling salesman problem. In: Kontoghiorghes, E., Gatu, C. (eds.) Optimisation, Econometric and Financial Analysis, vol. 9, pp. 91–104. Springer, Berlin, Heidelberg (2007)Google Scholar
  38. 38.
    Padberg, M., Sung, T.Y.: An analytical comparison of different formulations of the travelling salesman problem. Math. Prog. 52, 315–357 (1991)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Pataki, G.: The bad and the good-and-ugly. Technical Report, Columbia University, IEOR (2000). CORC 2000-1Google Scholar
  40. 40.
    Pataki, G.: Teaching integer programming formulations using the traveling salesman problem. SIAM Rev. 65, 116–123 (2003)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Reinelt, G.: TSPLIB-A traveling salesman problem library. ORSA J. Comp. 3, 376–384 (1991)CrossRefGoogle Scholar
  42. 42.
    Reiter, J., Makohon-Moore, A., Gerold, J., Bozic, I., Chatterjee, K., Iacobuzio-Donahue, C., Vogelstein, B., Nowak, M.: Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, (2017)CrossRefGoogle Scholar
  43. 43.
    Sankoff, D., Blanchette, M.: Multiple genome rearrangement and breakpoint phylogeny. J. Comp. Biol. 5, 555–570 (1998)CrossRefGoogle Scholar
  44. 44.
    Sawik, T.: A note on the Miller-Tucker-Zemlin model for the asymmetric traveling salesman problem. Bull. Polish Acad. Sci. Tech. Sci. 64, 517–520 (2016)CrossRefGoogle Scholar
  45. 45.
    Shao, M., Lin, Y., Moret, B.M.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Shao, M., Moret, B.M.E.: Comparing genomes with rearrangements and segmental duplications. Bioinformatics 31(12), i329–i338 (2015)CrossRefGoogle Scholar
  47. 47.
    Shao, M., Moret, B.M.E.: A fast and exact algorithm for the exemplar breakpoint distance. J. Comput. Biol. 23(5), 337–346 (2016)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Shao, M., Moret, B.M.E.: On computing breakpoint distances for genomes with duplicate genes. J. Comput. Biol. 24(6), 571–580 (2017)CrossRefGoogle Scholar
  49. 49.
    Wong, R.: Integer programming formulations of the traveling salesman problem. In: Rabbat, G. (ed.) Proceedings of ICCC 80, IEEE Conference on Circuits and Computing, pp. 149–152 (1980)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of CaliforniaDavisUSA

Personalised recommendations