Advertisement

Boyer-Moore strategy to efficient approximate string matching

  • Nadia El-Mabrouk
  • Maxime Crochemore
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1075)

Abstract

We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches.

This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size ω, that is, m(⌈log2(k+1)⌉+1)≤ω. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step.

Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4/m−k).

Keywords

Partial State Annual Symposium String Match Suffix Tree Word Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    K. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, December 1987.CrossRefGoogle Scholar
  2. 2.
    A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18:333–340, 1975.Google Scholar
  3. 3.
    T. Akutsu. Approximate string matching with don't care characters. In M. Crochemore and D. Gusfield, editors, Lecture Notes in Computer Science, volume 807 of Combinatorial Pattern Matching (5th Annual Symposium, CPM94), pages 229–242. Springer-Verlag, 1994.Google Scholar
  4. 4.
    R. Baeza-Yates and G. H. Gonnet. Fast string matching with k mismatches. Technical Report CS-88-36, Data Structuring Group, September 1988.Google Scholar
  5. 5.
    R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.Google Scholar
  6. 6.
    R. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, October 1992.Google Scholar
  7. 7.
    R. Baeza-Yates and C. H. Perleberg. Fast and practical approximate string matching. In Lecture Notes in Computer Science, volume 644 of Combinatorial Pattern Matching (3 th Annual Symposium, CPM92), pages 185–191. Springer-Verlag, 1992.Google Scholar
  8. 8.
    A. A. Bertossi and F. Logi. Parallel string matching with variable length don't cares. Journal of parallel and distributed computing, 22:229–234, 1994.CrossRefGoogle Scholar
  9. 9.
    R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, October 1977.Google Scholar
  10. 10.
    M. J. Fischer and M. S. Paterson. String-matching and other products. In R. Karp, editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, R.I., 1974.Google Scholar
  11. 11.
    Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17:52–54, 1986.CrossRefGoogle Scholar
  12. 12.
    R. Grossi and F. Luccio. Simple and efficient string matching with k mismatches. Inf. Proc. Letters, 3(33):113–120, November 1989.Google Scholar
  13. 13.
    D. E. Knuth, J. H. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6:323–350, June 1977.CrossRefGoogle Scholar
  14. 14.
    G. Kucherov and M. Rusinowitch. Matching a set of strings with variable length don't cares. In Z. Galil and E. Ukkonen, editors, Lecture Notes in Computer Science, volume 937 of 6th annual symposium, CPM95, pages 230–247. Espoo,Finland, Springer, July 1995.Google Scholar
  15. 15.
    G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoret. Comput. Sci., (43):239–249, 1986.Google Scholar
  16. 16.
    U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Proceeding Letters, 37:133–136, 1991.Google Scholar
  17. 17.
    R. Y. Pinter. Efficient string matching whith don't-care patterns. In A. Apostolico and E.-V. Z. Galil, editors, Combinatorial Algorithms on Words, volume F12, pages 11–29. Springer-Verlag, 1985.Google Scholar
  18. 18.
    J. Tarhio and E. Ukkonen. Boyer-moore approach to approximate string matching. In J. R. Gilbert and R. G. Karlsson, editors, Lecture Notes in Computer Science, volume 447 of 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, pages 348–359. Bergen, Norway, Springer-Verlag, July 1990.Google Scholar
  19. 19.
    E. Ukkonen. Approximate string-matching over suffix trees. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Lecture Notes in Computer Science, volume 684 of Combinatorial Pattern Matching (4 th Annual Symposium, CPM93), pages 240–249. Springer-Verlag, 1993.Google Scholar
  20. 20.
    S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, October 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Nadia El-Mabrouk
    • 1
  • Maxime Crochemore
    • 1
  1. 1.IGMUniversité Marne la ValléeNoisy Le Grand Cedex

Personalised recommendations