Abstract
Let two strings S, T over a finite alphabet Σ be given, and let M be an arbitrary relation on Σ×Σ. Define an approximate match (x,y) of two length m subwords (substrings) x ⊑ S, y ⊑ T when M(x i ,y i , for all 1≤i≤m. A match implies all the local alignments (without insertions and deletions) which are pairings of specific occurrances of x and y. A match (x,y) is maximal if there exists no longer match (u, v) such that all of the local alignments implied by (x,y) are contained in a local alignment implied by (u,v). We give an efficient algorithm for finding all maximal matches between S and T. The algorithm runs in time bounded by the sum of the lengths of the maximal matches, at worst. O(¦Σ¦2 n 2). The main application is identifying homologous regions of protein sequences.
Supported by DOE Grant 442427-22446
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Building a complete inverted file for a set of text files in linear time. In FOCS, pages 349–358. ACM, January 1984.
M. T. Chen and Joel Seiferas. Efficient and Elegant Subword Tree Construction, pages 97–107. Springer-Verlag, Berlin, 1985.
B. Clift, D. Haussler, R. McConnell, T. D. Schneider, and G. D. Stormo. Sequence landscapes. Nucleic Acids Res., 14(1):141–158, January 1986.
M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt. Atlas of Protein Structure, volume 5. National Biomedical Research Foundation, Washington, DC, 1978. suppl. 3.
Edward M. McCreight. A space-economical suffix tree construction algorithm. JACM, 23(2):262–272, April 1976.
P. Weine. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cobbs, A.L. (1994). Fast identification of approximately matching substrings. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_6
Download citation
DOI: https://doi.org/10.1007/3-540-58094-8_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive