Skip to main content

Fast identification of approximately matching substrings

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Abstract

Let two strings S, T over a finite alphabet Σ be given, and let M be an arbitrary relation on Σ×Σ. Define an approximate match (x,y) of two length m subwords (substrings) xS, yT when M(x i ,y i , for all 1≤im. A match implies all the local alignments (without insertions and deletions) which are pairings of specific occurrances of x and y. A match (x,y) is maximal if there exists no longer match (u, v) such that all of the local alignments implied by (x,y) are contained in a local alignment implied by (u,v). We give an efficient algorithm for finding all maximal matches between S and T. The algorithm runs in time bounded by the sum of the lengths of the maximal matches, at worst. OΣ¦2 n 2). The main application is identifying homologous regions of protein sequences.

Supported by DOE Grant 442427-22446

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Building a complete inverted file for a set of text files in linear time. In FOCS, pages 349–358. ACM, January 1984.

    Google Scholar 

  2. M. T. Chen and Joel Seiferas. Efficient and Elegant Subword Tree Construction, pages 97–107. Springer-Verlag, Berlin, 1985.

    Google Scholar 

  3. B. Clift, D. Haussler, R. McConnell, T. D. Schneider, and G. D. Stormo. Sequence landscapes. Nucleic Acids Res., 14(1):141–158, January 1986.

    Google Scholar 

  4. M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt. Atlas of Protein Structure, volume 5. National Biomedical Research Foundation, Washington, DC, 1978. suppl. 3.

    Google Scholar 

  5. Edward M. McCreight. A space-economical suffix tree construction algorithm. JACM, 23(2):262–272, April 1976.

    Google Scholar 

  6. P. Weine. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cobbs, A.L. (1994). Fast identification of approximately matching substrings. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics