Speeding Up the DIALIGN Multiple Alignment Program by Using the ‘Greedy Alignment of BIOlogical Sequences LIBrary’ (GABIOS-LIB)

  • Saïd Abdeddaïm
  • Burkhard Morgenstern
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2066)

Abstract

A sensitive method for multiple sequence alignment should be able to align local motifs that are contained in some but not necessarily in all of the input sequences. In addition, it should be possible to integrate various of such partial local alignments into one single multiple output alignment. This leads to the question of consistency of partial alignments. Based on a new set-theoretical definition of sequence alignment, the consistency problem is discussed theoretically, and a recently developed library of C functions for consistency calculation (GABIOSLIB) is described. GABIOS-LIB has been integrated into the DIALIGN alignment program to carry out consistency tests during the multiple alignment procedure. While the resulting alignments are exactly the same as with the previous version of DIALIGN, the running time of the program has been crucially improved. For large data sets, the new version of DIALIGN is up to 120 times faster than the old version. Availability: http://bibiserv.TechFak.Uni-Bielefeld.DE/dialign/

Keywords

multiple sequence alignment partial alignment consistency consistent equivalence relation greedy lgorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Abdeddaïm. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pages 4–11, 1996.Google Scholar
  2. 2.
    S. Abdeddaïm. Incremental computation of transitive closure and greedy alignment. In Proc. of 8-th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 167–179, 1997.Google Scholar
  3. 3.
    S. F. Altschul, W. Gish, W. Miller, E. M. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.Google Scholar
  4. 4.
    K.-M. Chao and W. Miller. Linear-space algorithms that build local alignments from fragments. Algorithmica, 13:106–134, 1995.MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    E. Depiereux, G. Baudoux, P. Briffeuil, I. Reginster, X. D. Boll, C. Vinals, and E. Feytmans. Match-Box server: a multiple sequence alignment tool placing emphasis on reliability. CABIOS, 13:249–256, 1997.Google Scholar
  6. 6.
    E. Depiereux and E. Feytmans. Match-box: a fundamentally new algorithm for the simultaneous alignment of several protein sequences. CABIOS, 8:501–509, 1992.Google Scholar
  7. 7.
    D. Eppstein, Z. Galil, R. Giancarlo, and G. Italiano. Sparse dynamic programming I: Linear cost functions. J. Assoc. Comput. Mach., 39:519–545, 1992.MATHMathSciNetGoogle Scholar
  8. 8.
    O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.CrossRefGoogle Scholar
  9. 9.
    O. Gotoh. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol., 264:823–838, 1996.CrossRefGoogle Scholar
  10. 10.
    B. Göttgens, L. Barton, J. Gilbert, A. Bench, M. Sanchez, S. Bahn, S. Mistry, D. Grafham, A. McMurray, M. Vaudin, E. Amaya, D. Bentley, and A. Green. Analysis of vertebrate scl loci identifies conserved enhancers. Nature Biotechnology, 18:181–186, 2000.CrossRefGoogle Scholar
  11. 11.
    D. Joseph, J. Meidanis, and P. Tiwari. Determining DNA sequence similarity using maximum independent set algorithms for interval graphs. Lecture Notes in Computer Science, 621:326–337, 1992.Google Scholar
  12. 12.
    A. Krause, P. Nicodème, E. Bornberg-Bauer, M. Rehmsmeier, and M. Vingron. Www access to the systers protein sequence cluster set. Bioinformatics, 15:262–263, 1999.CrossRefGoogle Scholar
  13. 13.
    C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science, 262(5131):208–4, 1993.CrossRefGoogle Scholar
  14. 14.
    W. Miller. So many genomes, so little time. Nature Biotechnology, 18:148–149, 2000.CrossRefGoogle Scholar
  15. 15.
    B. Morgenstern. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15:211–218, 1999.CrossRefGoogle Scholar
  16. 16.
    B. Morgenstern. A space-efficient algorithm for aligning large genomic sequences. Bioinformatics, in press.Google Scholar
  17. 17.
    B. Morgenstern, A. W. M. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA, 93:12098–12103, 1996.MATHCrossRefGoogle Scholar
  18. 18.
    B. Morgenstern, K. Frech, A. W. M. Dress, and T. Werner. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics, 14:290–294, 1998.CrossRefGoogle Scholar
  19. 19.
    B. Morgenstern, K. Hahn, W. R. Atchley, and A. W. M. Dress. Segment-based scores for pairwise and multiple sequence alignments. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 115–121, Menlo Parc, CA, 1998. AAAI Press.Google Scholar
  20. 20.
    B. Morgenstern, J. Stoye, and A. W. M. Dress. Consistent equivalence relations: a set-theoretical framework for multiple sequence alignment. Materialien und Preprints 133, University of Bielefeld, 1999.Google Scholar
  21. 21.
    S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.CrossRefGoogle Scholar
  22. 22.
    C. Notredame and D. Higgins. SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515–1524, 1996.CrossRefGoogle Scholar
  23. 23.
    W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA, 85:2444–2448, 1988.CrossRefGoogle Scholar
  24. 24.
    T. F. Smith and M. S. Waterman. Comparison of biosequences. Advances in Applied Mathematics, 2:482–489, 1981.MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    J. Stoye. Multiple sequence alignment with the divide-and-conquer method. Gene, 211:GC45–GC56, 1998.CrossRefGoogle Scholar
  26. 26.
    J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.CrossRefGoogle Scholar
  27. 27.
    J. D. Thompson, F. Plewniak, and O. Poch. BAliBASE: A benchmark alignment database for the evaluation of multiple sequence alignment programs. Bioinformatics, 15:87–88, 1999.CrossRefGoogle Scholar
  28. 28.
    J. D. Thompson, F. Plewniak, and O. Poch. A comprehensive comparison of protein sequence alignment programs. Nucleic Acids Research, 27:2682–2690, 1999.Google Scholar
  29. 29.
    J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch. DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research, 28:2919–2926, 2000.CrossRefGoogle Scholar
  30. 30.
    M. Vingron and P. Argos. Motif recognition and alignment for many sequences by comparison of dot-matrices. J Mol Biol, 218(1):33–43, 1991.CrossRefGoogle Scholar
  31. 31.
    M. Vingron and P. Pevzner. Multiple sequence comparison and consistency on multipartite graphs. Advances in Applied Mathematics, 16:1–22, 1995.MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    J. W. Wilbur and D. J. Lipman. The context dependent comparison of biological sequences. SIAM J. Appl. Math., 44:557–567, 1984.MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Saïd Abdeddaïm
    • 1
  • Burkhard Morgenstern
    • 2
    • 3
  1. 1.LIFAR - ABISS, Faculté des Sciences et TechniquesUniversité de RouenMont-Saint-Aignan CedexFrance
  2. 2.AVENTIS PharmaEssexUK
  3. 3.MIPSMax-Planck-Institut für BiochemieMartinsriedGermany

Personalised recommendations