Approximate string-matching and the q-gram distance
Some results are summarized on approximate string-matching with a string distance function that is computable in linear time and is based on the so-called q-grams (‘n-grams’). An algorithm is given for the associated string matching problem that finds the locally best approximate occurrences of pattern P, ∣P∣ = m, in text T, ∣T∣ = n, in time O(n log(m - q)). The occurrences with distance ≤ k can be found in time O(nlog k). This should be compared to the edit distance based k-differences problem for which the best algorithm currently known needs O(kn). The q-gram distance yields a lower bound for the unit cost edit distance, which leads to a fast hybrid algorithm for the k-differences problem.
KeywordsEdit Distance String Match Suffix Tree Difference Problem Approximate String Match
Unable to display preview. Download preview PDF.
- W. I. Chang and E. L. Lawler: Approximate string matching in sublinear expected time. In: Proc. IEEE 1990 Ann. Symposium of Foundations of Computer Science, pp. 116–124.Google Scholar
- M. Crochemore: String matching with constraints. In: Proc. MFCS’88 Symposium. Lect. Notes in Computer Science 324, (Springer-Verlag 198), 44–58.Google Scholar
- P. Jokinen, J. Tarhio, and E. Ukkonen: A comparison of approximate string matching algorithms. Submitted.Google Scholar
- T. Kohonen and E. Reuhkala: A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. In: Proc. 4th Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, pp. 807–809.Google Scholar
- J. Tarhio and E. Ukkonen: Boyer-Moore approach to approximate string matching. In: Proc. 2nd Scand. Workshop on Algorithm Theory (SWAT’90), Lect. Notes in Computer Science 447 (Springer-Verlag 1990), 348–359.Google Scholar
- E. Ukkonen: Approximate string-matching with q-grams and maximal matches.Google Scholar
- E. Ukkonen and D. Wood: Approximate string matching with suffix automata. Submitted. Report A-1990–4, Department of Computer Science, University of Helsinki, April 1990.Google Scholar