Mathematical Programming

, Volume 105, Issue 2–3, pp 387–425

A branch-and-cut algorithm for multiple sequence alignment

  • Ernst Althaus
  • Alberto Caprara
  • Hans-Peter Lenhof
  • Knut Reinert
Article

Abstract

We consider a branch-and-cut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facet-defining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facet-defining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branch-and-cut with only a carefully-selected subset of the variables as a heuristic.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Achterberg, T.: SCIP - a framework to integrate constraint and mixed integer programming. Technical Report 04-19, Zuse Institute Berlin, 2004. http://www.zib.de/bib/pub/pw
  2. 2.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefGoogle Scholar
  3. 3.
    Bienstock, D.: Potential function methods for approximately solving linear programming problems, Theory and Practice. Kluwer Academic Publishers, Boston, 2002Google Scholar
  4. 4.
    Carr, R.D., Lancia, G.: Compact vs exponential-size lp relaxations. Operations Research Letters 30, 57–65 (2002)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Carr, R.D., Lancia, G.: Compact optimization can outperform separation: A case study in structural proteomics. 4OR 2, 221–233 (2004)Google Scholar
  6. 6.
    Carrillo, H., Lipman, D.J.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48 (5), 1073–1082 (1988)MathSciNetGoogle Scholar
  7. 7.
    Dayhoff, M., Schwartz, R., Orcut, B.: A model of evolutionary change in proteins. In: M. Dayhoff (ed.) Atlas of Protein Sequence and Structure, vol 5, National Biomedical Research Foundation, Washington, D.C., 1979, pp 345–352Google Scholar
  8. 8.
    Delcher, A., Kasif, S., Fleischmann, R., J. Peterson, W. O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Res 27, 2369–2376 (1999)Google Scholar
  9. 9.
    Eppstein, D.: Sequence comparison with mixed convex and concave costs. J Algorithms (11), 85–101 (1990)Google Scholar
  10. 10.
    Fischetti, M., Toth, P.: A polyhedral approach to the asymmetric traveling salesman problem. Management Sci 43 (11), 1520–1536 (1997)Google Scholar
  11. 11.
    Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979Google Scholar
  12. 12.
    Golumbic, M.C.: Algorithmic graph theory and perfect graphs. Academic Press, New York, 1980Google Scholar
  13. 13.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRefGoogle Scholar
  14. 14.
    Gupta, S., Kececioglu, J., Schaeffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)CrossRefGoogle Scholar
  15. 15.
    Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, 1997Google Scholar
  16. 16.
    Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Science 89, 10915–10919 (1992)Google Scholar
  17. 17.
    Kipp Martin, R.: Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10, 119–128 (1991)MATHMathSciNetGoogle Scholar
  18. 18.
    Larmore, L., Schieber, B.: Online dynamic programming with applications to the prediction of rna secondary structure. In: Proceedings of the First Symposium on Discrete Algorithms 1990, pp 503–512Google Scholar
  19. 19.
    LEDA (Library of Efficient Data Types and Algorithms), 2004. http://www.algorithmic-solutions.com
  20. 20.
    Lenhof, H.-P., Morgenstern, B., Reinert, K.: An exact solution for the segment-to-segment multiple sequence alignment problem. Bioinformatics 15 (3), 203–210 (1999)Google Scholar
  21. 21.
    Lermen, M., Reinert, K.: The practical use of the Open image in new window algorithm for exact multiple sequence alignment. J. Comput. Biol. 7(5), 655–673 (2000)Google Scholar
  22. 22.
    Notredame, C., Higgins, D.G., Heringa, J.: T-coffee : A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)CrossRefGoogle Scholar
  23. 23.
    Reinert, K.: A Polyhedral Approach to Sequence Alignment Problems. PhD thesis, Universität des Saarlandes, 1999Google Scholar
  24. 24.
    Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the First Annual International Conference on Computational Molecular Biology (RECOMB-97), 1997, pp 241–249Google Scholar
  25. 25.
    K. Reinert, J. Stoye, and T. Will. An iterative methods for faster sum-of-pairs multiple sequence alignment. BIOINFORMATICS 16(9):808–814, 2000.Google Scholar
  26. 26.
    SCIL–Symbolic Constraints for Integer Linear programming, 2002. http://www.mpi-sb.mpg.de/SCIL
  27. 27.
    Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15 (1), 87–88 (1999) http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/prog_scores.html Google Scholar
  28. 28.
    Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ernst Althaus
    • 1
  • Alberto Caprara
    • 2
  • Hans-Peter Lenhof
    • 1
  • Knut Reinert
    • 3
  1. 1.Universität des SaarlandesSaarbrückenGermany
  2. 2.DEISUniversitá di BolognaBolognaItaly
  3. 3.Institute of Computer ScienceFreie Universität BerlinBerlinGermany

Personalised recommendations