Advertisement

LASA: A Tool for Non-heuristic Alignment of Multiple Sequences

  • Ernst Althaus
  • Stefan Canzar
Part of the Communications in Computer and Information Science book series (CCIS, volume 13)

Abstract

We have developed a non-heuristic tool (LASA) for the multiple sequence alignment problem (MSA), one of the most important problems in computational molecular biology. It is based on a dynamic programming algorithm for solving a Lagrangian relaxation of an integer linear programming (ILP) formulation for MSA. The objective function that is optimized by LASA models the sum-of-pairs scoring scheme and “truly” affine gap costs. Due to a reformulation w.r.t. additionally introduced variables prior to relaxation we improve the convergence rate dramatically while at the same time being able to solve the Lagrangian problem efficiently. Our experiments show that our implementation LASA outperforms all exact algorithms for the multiple sequence alignment problem. Furthermore, the quality of the alignments ranks among the best computed so far.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Althaus, E., Canzar, S.: A lagrangian relaxation approach for the multiple sequence alignment problem. In: Dress, A.W.M., Xu, Y., Zhu, B. (eds.) COCOA. LNCS, vol. 4616. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Althaus, E., Caprara, A., Lenhof, H.-P., Reinert, K.: Aligning multiple sequences by cutting planes. Mathematical Programming 105, 387–425 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Beasley, J.: Lagrangian Relaxation. In: Modern heuristic techniques for combinatorial problems. Blackwell Scientific Publications (1993)Google Scholar
  4. 4.
    Caprara, A., Fischetti, M., Toth, P.: A heuristic method for the set cover problem. Operations Research 47, 730–743 (1999)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)CrossRefGoogle Scholar
  6. 6.
    Elias, I.: Settling the intractability of multiple alignment. In: Ibaraki, T., Katoh, N., Ono, H. (eds.) ISAAC 2003. LNCS, vol. 2906, pp. 352–363. Springer, Heidelberg (2003)Google Scholar
  7. 7.
    Eppstein, D.: Sequence comparison with mixed convex and concave costs. Journal of Algorithms (11), 85–101 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Gupta, S., Kececioglu, J., Schaeffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)CrossRefGoogle Scholar
  9. 9.
    Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)zbMATHGoogle Scholar
  10. 10.
    Held, M., Karp, R.: The traveling salesman problem and minimum spanning trees: part ii. Mathematical Programming 1, 6–25 (1971)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Katoh, K., Ichi Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33, 511 (2005)CrossRefGoogle Scholar
  12. 12.
    Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)CrossRefGoogle Scholar
  13. 13.
    Lipman, D., Altschul, S., Kececioglu, J.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 86, 4412–4415 (1989)CrossRefGoogle Scholar
  14. 14.
    Lucena, A.: Steiner problem in graphs: Lagrangean relaxation and cutting-planes. COAL Bulletin 21, 2–7 (1993)Google Scholar
  15. 15.
    Mehlhorn, K., Näher, S.: The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, Cambridge (1999)Google Scholar
  16. 16.
    Morgenstern, B.: DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucl. Acids Res. 32(2), 33–36 (2004)CrossRefGoogle Scholar
  17. 17.
    Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar
  18. 18.
    Reinert, K.: A Polyhedral Approach to Sequence Alignment Problems. PhD thesis, Universität des Saarlandes (1999)Google Scholar
  19. 19.
    Stoye, J., Evers, D., Meyer, F.: Rose: generating sequence families (1998)Google Scholar
  20. 20.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  21. 21.
    Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1), 87–88 (1999)CrossRefGoogle Scholar
  22. 22.
    Walle, I.V., Lasters, I., Wyns, L.: SABmark - a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267–1268 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ernst Althaus
    • 1
  • Stefan Canzar
    • 1
  1. 1.Max-Planck Institut für InformatikSaarbrückenGermany

Personalised recommendations