Advertisement

Fast identification of approximately matching substrings

  • Archie L. Cobbs
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 807)

Abstract

Let two strings S, T over a finite alphabet Σ be given, and let M be an arbitrary relation on Σ×Σ. Define an approximate match (x,y) of two length m subwords (substrings) xS, yT when M(x i ,y i , for all 1≤im. A match implies all the local alignments (without insertions and deletions) which are pairings of specific occurrances of x and y. A match (x,y) is maximal if there exists no longer match (u, v) such that all of the local alignments implied by (x,y) are contained in a local alignment implied by (u,v). We give an efficient algorithm for finding all maximal matches between S and T. The algorithm runs in time bounded by the sum of the lengths of the maximal matches, at worst. OΣ¦2n2). The main application is identifying homologous regions of protein sequences.

Keywords

Linear Time Local Alignment Maximal Match Suffix Tree Finite Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Building a complete inverted file for a set of text files in linear time. In FOCS, pages 349–358. ACM, January 1984.Google Scholar
  2. 2.
    M. T. Chen and Joel Seiferas. Efficient and Elegant Subword Tree Construction, pages 97–107. Springer-Verlag, Berlin, 1985.Google Scholar
  3. 3.
    B. Clift, D. Haussler, R. McConnell, T. D. Schneider, and G. D. Stormo. Sequence landscapes. Nucleic Acids Res., 14(1):141–158, January 1986.Google Scholar
  4. 4.
    M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt. Atlas of Protein Structure, volume 5. National Biomedical Research Foundation, Washington, DC, 1978. suppl. 3.Google Scholar
  5. 5.
    Edward M. McCreight. A space-economical suffix tree construction algorithm. JACM, 23(2):262–272, April 1976.Google Scholar
  6. 6.
    P. Weine. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Archie L. Cobbs
    • 1
  1. 1.Computer Science DivisionUniversity of California BerkeleyBerkeley

Personalised recommendations