A Lagrangian relaxation approach for the multiple sequence alignment problem

Abstract

We present a branch-and-bound (bb) algorithm for the multiple sequence alignment problem (MSA), one of the most important problems in computational biology. The upper bound at each bb node is based on a Lagrangian relaxation of an integer linear programming formulation for MSA. Dualizing certain inequalities, the Lagrangian subproblem becomes a pairwise alignment problem, which can be solved efficiently by a dynamic programming approach. Due to a reformulation w.r.t. additionally introduced variables prior to relaxation we improve the convergence rate dramatically while at the same time being able to solve the Lagrangian problem efficiently. Our experiments show that our implementation, although preliminary, outperforms all exact algorithms for the multiple sequence alignment problem. Furthermore, the quality of the alignments is among the best computed so far.

References

  1. Althaus E, Caprara A, Lenhof H-P, Reinert K (2002) Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. In: Lengauer T, Lenhof H-P (eds) Proceedings of the European conference on computational biology, Saarbrücken, October 2002. Bioinformatics, vol 18. Oxford University Press, London, pp S4–S16

    Google Scholar 

  2. Althaus E, Caprara A, Lenhof H-P, Reinert K (2006) A branch-and-cut algorithm for multiple sequence alignment. Math Program 105:387–425

    MATH  Article  MathSciNet  Google Scholar 

  3. Beasley J (1993) Lagrangian relaxation. In: Modern heuristic techniques for combinatorial problems. Blackwell Scientific, Oxford

    Google Scholar 

  4. Caprara A, Fischetti M, Toth P (1999) A heuristic method for the set cover problem. Oper Res 47:730–743

    MATH  MathSciNet  Google Scholar 

  5. Carrillo H, Lipman DJ (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48(5):1073–1082

    MATH  Article  MathSciNet  Google Scholar 

  6. Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376

    Article  Google Scholar 

  7. Edgar RC (2004) Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  Google Scholar 

  8. Elias I (2003) Settling the intractability of multiple alignment. In: Proc. of the 14th ann. int. symp. on algorithms and computation (ISAAC’03). Lecture notes in computer science, vol 2906. Springer, Berlin, pp 352–363

    Google Scholar 

  9. Eppstein D (1990) Sequence comparison with mixed convex and concave costs. J Algorithms 11:85–101

    MATH  Article  MathSciNet  Google Scholar 

  10. Fisher M (1994) Optimal solutions of vehicle routing problems using minimum k-trees. Oper Res 42:626–642

    MATH  MathSciNet  Article  Google Scholar 

  11. Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York

    MATH  Google Scholar 

  12. Gupta S, Kececioglu J, Schaeffer A (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol 2:459–472

    Google Scholar 

  13. Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  14. Held M, Karp R (1971) The traveling salesman problem and minimum spanning trees: part II. Math Program 1:6–25

    MATH  Article  MathSciNet  Google Scholar 

  15. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids 33:511

    Article  Google Scholar 

  16. Larmore L, Schieber B (1990) Online dynamic programming with applications to the prediction of RNA secondary structure. In: Proceedings of the first symposium on discrete algorithms, pp 503–512

  17. Lermen M, Reinert K (2000) The practical use of the \(\mathcal{A}^{*}\) algorithm for exact multiple sequence alignment. J Comput Biol 7(5):655–673

    Article  Google Scholar 

  18. Lipman D, Altschul S, Kececioglu J (1989) A tool for multiple sequence alignment. Proc Nat Acad Sci US Am 86:4412–4415

    Article  Google Scholar 

  19. Lucena A (1993) Steiner problem in graphs: Lagrangean relaxation and cutting-planes. COAL Bull 21:2–7

    Google Scholar 

  20. Mehlhorn K, Näher S (1999) The LEDA platform of combinatorial and geometric computing. Cambridge University Press, Cambridge. See also http://www.mpi-sb.mpg.de/LEDA/

    Google Scholar 

  21. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217

    Article  Google Scholar 

  22. Reinert K (1999) A polyhedral approach to sequence alignment problems. PhD thesis, Universität des Saarlandes, 1999

  23. Reinert K, Lenhof H-P, Mutzel P, Mehlhorn K, Kececioglu J (1997) A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the first annual international conference on computational molecular biology (RECOMB-97), pp 241–249

  24. Reinert K, Stoye J, Will T (2000) An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 16(9):808–814

    Article  Google Scholar 

  25. Sankoff D, Kruskal JB (1983) Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addison–Wesley, Reading

    Google Scholar 

  26. Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005) DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6:66

    Article  Google Scholar 

  27. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stefan Canzar.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Althaus, E., Canzar, S. A Lagrangian relaxation approach for the multiple sequence alignment problem. J Comb Optim 16, 127–154 (2008). https://doi.org/10.1007/s10878-008-9139-z

Download citation

Keywords

  • Sequence comparison
  • Lagrangian relaxation
  • Branch and bound