Skip to main content
Log in

Parametric optimization of sequence alignment

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Theoptimal alignment or theweighted minimum edit distance between two DNA or amino acid sequences for a given set of weights is computed by classical dynamic programming techniques, and is widely used in molecular biology. However, in DNA and amino acid sequences there is considerable disagreement about how to weight matches, mismatches, insertions/deletions (indels or spaces), and gaps.Parametric sequence alignment is the problem of computing the optimal-valued alignment between two sequences as afunction of variable weights for matches, mismatches, spaces, and gaps. The goal is to partition the parameter space into regions (which are necessarily convex) such that in each region one alignment is optimal throughout and such that the regions are maximal for this property. In this paper we are primarily concerned with the structure of this convex decomposition, and secondarily with the complexity of computing the decomposition. The most striking results are the following: For the special case where only matches, mismatches, and spaces are counted, and where spaces are counted throughout the alignment, we show that the decomposition is surprisingly simple: all regions are infinite; there are at most n2/3 regions; the lines that bound the regions are all of the form Β=c + (c + 0.5)α; and the entire decomposition can be found inO(knm) time, wherek is the actual number of regions, andn<m are the lengths of the two strings. These results were found while implementing a large software package for parametric sequence analysis, and in turn have led to faster algorithms for those tasks. A conference version of this paper first appeared in [10].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. Argos and M. Vingron, Sensitivity comparison of protein amino acid sequences, inMethods in Enzymology, Vol. 183 (R. Doolittle, ed.), Academic Press, San Diego, CA, pp. 352–365.

  2. R. F. Doolittle,Of Urfs and Orfs: A Primer on How To Analyze Derived Amino Acid Sequences, University Science Books, 1986.

  3. R. F. Doolittle (ed.),Methods in Enzymology, Vol. 183, Academic Press, San Diego, CA.

  4. M. Eisner and D. Severance, Mathematical techniques for efficient record segmentation in large shared databases.J. Assoc. Comput. Mach.,23 (1976), 619–635.

    MATH  MathSciNet  Google Scholar 

  5. W. M. Fitch and T. F. Smith, Optimal sequence alignments.Proc. Nat. Acad. Sci. USA,80 (1983), 1382–1386.

    Article  Google Scholar 

  6. O. Gotoh, Optimal sequence alignment allowing for long gaps,Bull. Math. Biol,52(3) (1990), 359–373.

    MATH  Google Scholar 

  7. M. Gribskov and J. Devereux,Sequence Analysis Primer, Stokton Press, 1991.

  8. D. Gusfield, Parametric combinatorial computing and a problem of program module distribution,J. Assoc. Comput. Mech.,30 (1983), 551–563.

    MATH  MathSciNet  Google Scholar 

  9. D. Gusfield, K. Balasubramanian, J. Bronder, D. Mayfield, D. Naor, and P. Stelling, PARAL: An efficient program to optimally align strings with variable match, mismatch, space and gap weights, in preparation.

  10. D. Gusfield, K. Balasubramanian and D. Naor, Parametric optimization of sequence alignment,Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, 1992, pp. 432–439.

  11. N. Megiddo, Combinatorial optimization with rational objective functions,Math. Oper. Res.,4 (1979), 414–424.

    Article  MATH  MathSciNet  Google Scholar 

  12. W. R. Pearson and D. J. Lipman, Improved tools for biological sequence comparison,Proc. Nat. Acad. Sci. USA,85 (1988), 2444–2448.

    Article  Google Scholar 

  13. D. Sankoff and J. Kruskal (eds.),Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, 1983.

    Google Scholar 

  14. G. D. Schuler, S. F. Altschul and D. J. Lipman, A workbench for multiple alignment construction and analysis, inProteins: Structure Function and Genetics,9(3), 180–190, in press.

  15. R. Schwarz and M. Dayhoff, Matrices for detecting distant relationships, inAtlas of Protein Sequences, National Biomedical Research Foundation, Washington, DC, 1979, pp. 353–358.

    Google Scholar 

  16. T. F. Smith and M. S. Waterman, Identification of common molecular subsequences,J. Molecular Biol.,147 (1981), 195–197.

    Article  Google Scholar 

  17. H. Stone, Critical load factors in distributed systems.IEEE Trans. Software Engrg.,4(3) (1978), 254–258.

    Article  Google Scholar 

  18. G. von Heijne,Sequence Analysis in Molecular Biology, Academic Press, New York, 1987.

    Google Scholar 

  19. M. S. Waterman, Sequence alignments, inMathematical Methods for DNA Sequences (M. S. Waterman, ed.), CRC Press, Boca Raton, FL, 1989, pp. 53–92.

    Google Scholar 

  20. M. S. Waterman, M. Eggert, and E. Lander, Parametric sequence comparisons,Proc. Nat. Acad. Sci. USA,89 (1992), 6090–6093.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by Alberto Apostolico.

This research was partially supported by Grant DE-FG03-90ER60999 from the Department of Energy, and Grants CCR-8803704 and CCR-9103937 from the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gusfield, D., Balasubramanian, K. & Naor, D. Parametric optimization of sequence alignment. Algorithmica 12, 312–326 (1994). https://doi.org/10.1007/BF01185430

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01185430

Key words

Navigation