Advertisement

Aligning alignments

  • John D. Kececioglu
  • Weiqing Zhang
Session IV
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1448)

Abstract

While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.

Keywords

Sequence comparison sum-of-pairs alignment affine gap costs quasi-natural gap costs profiles 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Altschul, S.F. “Gap costs for multiple sequence alignment.” Journal of Theoretical Biology 138, 297–309, 1989.Google Scholar
  2. [2]
    Anson, E.L. and E.W. Myers. “ReAligner: a program for refining DNA sequence multi-alignments.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 9–13, 1997.Google Scholar
  3. [3]
    Carrillo, H. and D. Lipman. “The multiple sequence alignment problem in biology.” SIAM Journal on Applied Mathematics 48, 1073–1082, 1988.Google Scholar
  4. [4]
    Dayhoff, M.O., R.M. Schwartz and B.C. Orcutt. “A model of evolutionary change in proteins.” In Atlas of Protein Sequence and Structure 5:3, M.O. Dayhoff editor, 345–352, 1978.Google Scholar
  5. [5]
    Fredman, M.L. “Algorithms for computing evolutionary similarity measures with length independent gap penalties.” Bulletin of Mathematical Biology 46:4, 553–566, 1984.Google Scholar
  6. [6]
    Galil, Z. and R. Giancarlo. “Speeding up dynamic programming with applications to molecular biology.” Theoretical Computer Science 64, 107–118, 1989.Google Scholar
  7. [7]
    Gotoh, O. “An improved algorithm for matching biological sequences.” Journal of Molecular Biology 162, 705–708, 1982.Google Scholar
  8. [8]
    Gotoh, O. “Optimal alignment between groups of sequences and its application to multiple sequence alignment.” Computer Applications in the Biosciences 9:3, 361–370, 1993.Google Scholar
  9. [9]
    Gotoh, O. “Further improvement in methods of group-to-group sequence alignment with generalized profile operations.” Computer Applications in the Biosciences 10:4, 379–387, 1994.Google Scholar
  10. [10]
    Gribskov, M., A.D. McLachlan, and D. Eisenberg. “Profile analysis: detection of distantly related proteins.” Proceedings of the National Academy of Sciences USA 84, 4355–4358, 1987.Google Scholar
  11. [11]
    Gupta, S., J. Kececioglu and A. Schäffer. “Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment.” Journal of Computational Biology 2:3, 459–472, 1995.Google Scholar
  12. [12]
    Gusfield, D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.Google Scholar
  13. [13]
    Henikoff, S. and J.G. Henikoff. “Amino acid substitution matrices from protein blocks.” Proceedings of the National Academy of Sciences USA 89, 10915–10919, 1992.Google Scholar
  14. [14]
    Hirschberg, D.S. “A linear space algorithm for computing longest common subsequences.” Communications of the ACM 18, 341–343, 1975.Google Scholar
  15. [15]
    Lipman, D.G., S.F. Altschul and J.D. Kececioglu. “A tool for multiple sequence alignment.” Proceedings of the National Academy of Sciences USA 86, 4412–4415, 1989.Google Scholar
  16. [16]
    Miller, W. and E.W. Myers. “Sequence comparison with concave weighting functions.” Bulletin of Mathematical Biology 50, 97–120, 1988.Google Scholar
  17. [17]
    Myers, E.W. and W. Miller. “Optimal alignments in linear space.” Computer Applications in the Biosciences 4:1, 11–17, 1988.Google Scholar
  18. [18]
    Myers, G., S. Selznick, Z. Zhang and W. Miller. “Progressive multiple alignment with constraints.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 220–225, 1997.Google Scholar
  19. [19]
    Sankoff, D. and J.B. Kruskal, editors. Time Warps, String Edits, and Macro molecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983.Google Scholar
  20. [20]
    Setubal, J. and J. Meidanis. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.Google Scholar
  21. [21]
    Taylor, E.W., A. Bhat, R. Nadimpalli, W. Zhang and J.D. Kececioglu. “HIV-1 encodes a sequence overlapping env gp41 with highly significant similarity to selenium dependent glutathione peroxidases.” Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology 15:5, 393–394, 1997.Google Scholar
  22. [22]
    Wang, L. and T. Jiang. “On the complexity of multiple sequence alignment.” Journal of Computational Biology 1:4, 337–348, 1994.Google Scholar
  23. [23]
    Waterman, M.S. “Efficient sequence alignment algorithms.” Journal of Theoretical Biology 108, 333–337, 1984.Google Scholar
  24. [24]
    Waterman, M.S. Introduction to Computational Biology: Maps, Sequences, and Genomes. Chapman and Hall, London, 1995.Google Scholar
  25. [25]
    Zhang, W., J.D. Kececioglu and E.W. Taylor. “Assessing distant homology between an aligned protein family and a proposed member through accurate sequence alignment.” Technical Report 97-3, Department of Computer Science, The University of Georgia, October 1997. Submitted to Journal of Molecular Biology.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • John D. Kececioglu
    • 1
  • Weiqing Zhang
    • 2
  1. 1.Department of Computer ScienceThe University of GeorgiaAthens
  2. 2.Department of Medicinal ChemistryThe University of GeorgiaAthens

Personalised recommendations