The Max-Shift Algorithm for Approximate String Matching
The approximate string matching problem is to find all locations which a pattern of length m matches a substring of a text of length n with at most k differences. The program agrep is a simple and practical bit-vector algorithm for this problem. In this paper we consider the following incremental version of the problem: given an appropriate encoding of a comparison between A and bB, can one compute the answer for A and B, and the answer for A and Bc with equal efficiency, where b and c are additional symbols? Here we present an elegant and very easy to implement bit-vector algorithm for answering these questions that requires only O(n⌈m/w⌉) time, where n is the length of A, m is the length of B and w is the number of bits in a machine word. We also present an O(nm⌈h/w⌉) algorithm for the fixed-length approximate string matching problem: given a text t, a pattern p and an integer h, compute the optimal alignment of all substrings of p of length h and a substring of t.
KeywordsString algorithms approximate string matching dynamic programming edit-distance
Unable to display preview. Download preview PDF.
- 1.R. A. Baeza-Yates and G. H. Gonnet, A new approach to text searching, CACM, Vol 35, (1992), pp. 74–82.Google Scholar
- 2.R. A. Baeza-Yates and G. Navarro, A faster algorithm for approximate string matching, in Proceedings of the 7th Symposium on Combinatorial Pattern Matching, LNCS, Vol. 1075, Springer-Verlag, New York, (1996), pp. 1–23.Google Scholar
- 3.R. A. Baeza-Yates and G. Navarro, Analysis for algorithm engineering: Improving an algorithm for approximate pattern matching. Unpublished manuscript.Google Scholar
- 18.S. Wu and U. Manber, Fast text searching allowing errors, CACM, Vol 35, (1992), pp. 83–91.Google Scholar