Boyer-Moore strategy to efficient approximate string matching
We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches.
This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet , which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size ω, that is, m(⌈log2(k+1)⌉+1)≤ω. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step.
Notions of shift and character skip found in the Boyer-Moore (BM)  approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4/m−k).
KeywordsPartial State Annual Symposium String Match Suffix Tree Word Size
Unable to display preview. Download preview PDF.
- 2.A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18:333–340, 1975.Google Scholar
- 3.T. Akutsu. Approximate string matching with don't care characters. In M. Crochemore and D. Gusfield, editors, Lecture Notes in Computer Science, volume 807 of Combinatorial Pattern Matching (5th Annual Symposium, CPM94), pages 229–242. Springer-Verlag, 1994.Google Scholar
- 4.R. Baeza-Yates and G. H. Gonnet. Fast string matching with k mismatches. Technical Report CS-88-36, Data Structuring Group, September 1988.Google Scholar
- 5.R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.Google Scholar
- 6.R. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, October 1992.Google Scholar
- 7.R. Baeza-Yates and C. H. Perleberg. Fast and practical approximate string matching. In Lecture Notes in Computer Science, volume 644 of Combinatorial Pattern Matching (3 th Annual Symposium, CPM92), pages 185–191. Springer-Verlag, 1992.Google Scholar
- 9.R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, October 1977.Google Scholar
- 10.M. J. Fischer and M. S. Paterson. String-matching and other products. In R. Karp, editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, R.I., 1974.Google Scholar
- 12.R. Grossi and F. Luccio. Simple and efficient string matching with k mismatches. Inf. Proc. Letters, 3(33):113–120, November 1989.Google Scholar
- 14.G. Kucherov and M. Rusinowitch. Matching a set of strings with variable length don't cares. In Z. Galil and E. Ukkonen, editors, Lecture Notes in Computer Science, volume 937 of 6th annual symposium, CPM95, pages 230–247. Espoo,Finland, Springer, July 1995.Google Scholar
- 15.G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoret. Comput. Sci., (43):239–249, 1986.Google Scholar
- 16.U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Proceeding Letters, 37:133–136, 1991.Google Scholar
- 17.R. Y. Pinter. Efficient string matching whith don't-care patterns. In A. Apostolico and E.-V. Z. Galil, editors, Combinatorial Algorithms on Words, volume F12, pages 11–29. Springer-Verlag, 1985.Google Scholar
- 18.J. Tarhio and E. Ukkonen. Boyer-moore approach to approximate string matching. In J. R. Gilbert and R. G. Karlsson, editors, Lecture Notes in Computer Science, volume 447 of 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, pages 348–359. Bergen, Norway, Springer-Verlag, July 1990.Google Scholar
- 19.E. Ukkonen. Approximate string-matching over suffix trees. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Lecture Notes in Computer Science, volume 684 of Combinatorial Pattern Matching (4 th Annual Symposium, CPM93), pages 240–249. Springer-Verlag, 1993.Google Scholar
- 20.S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, October 1992.Google Scholar