Skip to main content
Log in

Linear-space algorithms that build local alignments from fragments

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

This paper presents practical algorithms for building an alignment of two long sequences from a collection of “alignment fragments,” such as all occurrences of identical 5-tuples in each of two DNA sequences. We first combine a time-efficient algorithm developed by Galil and coworkers with a space-saving approach of Hirschberg to obtain a local alignment algorithm that uses0((M+N+F logN) logM) time and0(M+N) space to align sequences of lengthsM andN from a pool of F alignment fragments. Ideas of Huang and Miller are then employed to develop a time- and space-efficient algorithm that computesn best nonintersecting alignments for anyn>1. An example illustrates the utility of these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altschul, S., W. Gish, W. Miller, E. Myers, and D. Lipman (1990). A basic local alignment search tool.J. Mol Biol,215, 403–410.

    Google Scholar 

  • Boguski, M., R. C. Hardison, S. Schwartz, and W. Miller (1992). Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control regions using new software tools for multiple alignment and visualization.The New Biologist,4, 247–260.

    Google Scholar 

  • Chao, K.-M., W. R. Pearson, and W. Miller (1992). Aligning two sequences within a specified diagonal band.CABIOS,8, 481–487.

    Google Scholar 

  • Chao, K.-M., R. C. Hardison, and W. Miller (1993). Constrained sequence alignment.Bull. Math. Biol.,55, 503–524.

    MATH  Google Scholar 

  • Doolittle, R. F., ed. (1990).Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. Methods in Enzymology, Vol. 183. Academic Press, New York.

    Google Scholar 

  • Eppstein, D., Z. Galil, R. Giancarlo, and G. F. Italiano (1992a). Sparse dynamic programming. I: Linear cost functions.J. Assoc. Comput. Mach.,39, 519–545.

    MATH  MathSciNet  Google Scholar 

  • Eppstein, D., Z. Galil, R. Giancarlo, and G. F. Italiano (1992b). Sparse dynamic programming. II: Convex and concave cost functions.J. Assoc. Comput. Mach.,39, 546–567.

    MATH  MathSciNet  Google Scholar 

  • Feng, D. F., M. S. Johnson, and R. F. Doolittle (1985). Aligning amino acid sequences: comparison commonly used methods.J. Mol. Evol.,21, 112–125.

    Article  Google Scholar 

  • Fitch, W. M., and T. F. Smith (1983). Optimal sequence alignments,Proc Nat. Acad. Sci. USA,80, 1382–1386.

    Article  Google Scholar 

  • Galil, Z., and R. Giancarlo (1989). Speeding up dynamic programming with applications to molecular biology.Theor. Comput. Sci.,64, 107–118.

    Article  MATH  MathSciNet  Google Scholar 

  • Galil, Z., and K. Park (1992). Dynamic programming with convexity, concavity, and sparsity.Theoret. Comput. Sci.,92, 49–76.

    Article  MATH  MathSciNet  Google Scholar 

  • Goad, W. B., and M. I. Kanehisa (1982). Pattern recognition in nucleic acid sequences. I: A general method for finding local homologies and symmetries.Nucleic Acids Res.,10, 247–263.

    Article  Google Scholar 

  • Gotoh, O. (1982). An improved algorithm for matching biological sequences.J. Mol. Biol.,162, 705–708.

    Article  Google Scholar 

  • Gribskov, M., R. Luthy, and D. Eisenberg (1990). Profile analysis. In R. F. Doolittle (ed.),Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. Methods in Enzymology, Vol. 183. Academic Press, New York, pp. 146–159.

    Chapter  Google Scholar 

  • Hardison, R. C., and W. Miller (1993). Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters.Mol. Biol. Evol.,10, 73–102.

    Google Scholar 

  • Hardison, R. C., J. Xu, J. Jackson, J. Mansberger, O. Selifonova, B. Grotch, H. Petrykowska, J. Biesecker, and W. Miller (1993a). Comparative analysis of the locus control region of the rabbitβ-like globin gene cluster. HS3 increases transient expression of an embryonic ε-globin gene.Nucleic Acids Res.,21, 1265–1272.

    Article  Google Scholar 

  • Hardison, R. C., K.-M. Chao, M. Adamkiewicz, D. Price, J. Jackson, T. Zeigler, N. Stojanovic, and W. Miller (1993b). Positive and negative regulatory elements of the rabbit ε-globin gene revealed by an improved multiple alignment program and functional analysis.DNA Sequence — J. DNA Sequencing and Mapping,4, 163–176.

    Google Scholar 

  • Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences.Comm. ACM,28, 341–343.

    Article  MathSciNet  Google Scholar 

  • Huang, X., and W. Miller (1991). A time-efficient, linear-space local similarity algorithm.Adv. in Appl. Math.,12, 337–357.

    Article  MATH  MathSciNet  Google Scholar 

  • Huang, X., R. C. Hardison, and W. Miller (1990). A space-efficient algorithm for local similarities.CABIOS 6, 373–381.

    Google Scholar 

  • Miller, W., and E. Myers (1988). Sequence comparison with concave weighting functions.Bull. Math. Biol.,50, 97–120.

    MATH  MathSciNet  Google Scholar 

  • Myers, E., and X. Huang (1992). An0(N 2 logN) restriction map comparison and search algorithm.Bull. Math. Biol.,54, 599–618.

    MATH  Google Scholar 

  • Myers, E., and W. Miller (1988). Optimal alignments in linear space.CABIOS,4, 11–17.

    Google Scholar 

  • Needleman, S. B., and C. D. Wunsch (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. Mol. Biol.,48, 443–453.

    Article  Google Scholar 

  • Pascarella, S., and P. Argos (1992). Analysis of insertions/deletions in protein structures.J. Mol. Biol.,224, 461–471.

    Article  Google Scholar 

  • Pearson, W. R. (1990). Rapid and sensitive synthesis comparison with FASTP and FASTA. In R. F. Doolittle (ed.),Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. Methods in Enzymology, Vol. 183. Academic Press, New York, pp. 63–95.

    Google Scholar 

  • Pearson, W. R., and D. Lipman (1988). Improved tool or biological sequence comparison.Proc. Nat. Acad. Sci. USA,85, 2444–2448.

    Article  Google Scholar 

  • Pugh, W. (1990). Slip lists: a probabilistic alternative to balanced trees.Comm. ACM,33, 668–676.

    Article  Google Scholar 

  • Sankoff, D., and J. B. Kruskal (eds.) (1983).Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparisons. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Schwartz, S., W. Miller, C.-M. Yang, and R. C. Hardison (1991). Software tools for analyzing pairwise sequence alignments.Nucleic Acids Res.,19, 4663–4667.

    Article  Google Scholar 

  • Sellers, P. H. (1984). Pattern recognition in genetic sequences by mismatch density.Bull. Math. Biol.,46, 501–514.

    MATH  MathSciNet  Google Scholar 

  • Smith, T. F., and M. S. Waterman (1981). Identification of common molecular sequences.J. Mol. Biol.,147, 195–197.

    Article  Google Scholar 

  • Smith, T. F., M. S. Waterman, and W. M. Fitch (1981). Comparative biosequence metrics.J. Mol. Evol.,18, 38–46.

    Article  Google Scholar 

  • Wagner, R. A., and M. J. Fischer (1974). The string-to-string correction problem.J. Assoc. Comput. Mach. 21, 168–173.

    MATH  MathSciNet  Google Scholar 

  • Waterman, M. S. (1984). Efficient sequence alignment algorithms.J. Theoret. Biol.,108, 333–337.

    Article  MathSciNet  Google Scholar 

  • Waterman, M. S. (1989). Sequence alignments. In M. S. Waterman, ed.,Mathematical Methods for DNA Sequences. CRC Press, Boca Raton, FL, pp. 53–92.

    Google Scholar 

  • Waterman, M. S., and M. Eggert (1987). A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.J. Mol. Biol.,197, 723–728.

    Article  Google Scholar 

  • Wilbur, W., and D. Lipman (1983). Rapid similarity searches of nucleic acid and protein data banks.Proc. Nat. Acad. Sci. USA,80, 726–730.

    Article  Google Scholar 

  • Wilbur, W., and D. Lipman (1984). The context dependent comparison of biological sequences.SIAM J. Appl. Math.,44, 557–567.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by E. W. Myers.

This work was supported in part by Grant RO1 LM05110 from the National Library of Medicine.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chao, K.M., Miller, W. Linear-space algorithms that build local alignments from fragments. Algorithmica 13, 106–134 (1995). https://doi.org/10.1007/BF01188583

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01188583

Key words

Navigation