Advertisement

Algorithmica

, Volume 12, Issue 4–5, pp 247–267 | Cite as

Speeding up two string-matching algorithms

  • M. Crochemore
  • A. Czumaj
  • L. Gasieniec
  • S. Jarominek
  • T. Lecroq
  • W. Plandowski
  • W. Rytter
Article

Abstract

We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ·n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.

Key words

Analysis of algorithms Pattern matching String matching Suffix tree Suffix automaton Combinatorial problems Periods Text processing Data retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Ah]
    A. V. Aho, Algorithms for finding patterns in strings, inHandbook of Theoretical Computer Science, vol. A (J. van Leeuwen, ed.), Elsevier, Amsterdam, 1990, pp. 255–300.Google Scholar
  2. [Ap]
    A. Apostolico, The myriad virtues of suffix trees, inCombinatorial Algorithms on Words (A. Apostolico and Z. Galil, eds.), NATO Advanced Science Institutes, Series F, vol. 12, Springer-Verlag, Berlin, 1985, pp. 85–96.Google Scholar
  3. [AG]
    A. Apostolico and R. Giancarlo, The Boyer-Moore-Galil string searching strategies revisited,SIAM J. Comput. 15 (1986), 98–105.zbMATHCrossRefMathSciNetGoogle Scholar
  4. [BR]
    R. A. Baeza-Yates and M. Régnier, Average running time of the Boyer-Moore-Horspool algorithm,Theoret. Comput. Sci. 92(1) (1992), 19–31.zbMATHCrossRefMathSciNetGoogle Scholar
  5. [BKR]
    L. Banachowski, A. Kreczmar, and W. Rytter,Analysis of Algorithms and Data Structures, Addison-Wesley, Reading, MA, 1991.zbMATHGoogle Scholar
  6. [BBE]
    A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M. T. Chen, and J. Seiferas, The smallest automaton recognizing the subwords of a text,Theoret. Comput. Sci. 40 (1985), 31–55.zbMATHCrossRefMathSciNetGoogle Scholar
  7. [BM]
    R. S. Boyer and J. S. Moore, A fast string searching algorithm,Comm. ACM 20 (1977), 762–772.CrossRefGoogle Scholar
  8. [Co]
    R. Cole, Tight bounds on the complexity of the Boyer-Moore pattern matching algorithm,Proceedings of the 2nd Annual ACM Symposium on Discrete Algorithms, 1990, pp. 224–233.Google Scholar
  9. [Cr]
    M. Crochemore, Transducers and repetitions,Theoret. Comput. Sci. 45 (1986), 63–86.zbMATHCrossRefMathSciNetGoogle Scholar
  10. [G]
    Z. Galil, On improving the worst case running time of the Boyer-Moore string searching algorithm,Comm. ACM 22 (1979), 505–508.zbMATHCrossRefMathSciNetGoogle Scholar
  11. [GO]
    L. J. Guibas and A. M. Odlyzko, A new proof of the linearity of the Boyer-Moore string searching algorithm,SIAM J. Comput. 9 (1980), 672–682.zbMATHCrossRefMathSciNetGoogle Scholar
  12. [H]
    R. N. Horspool, Practical fast searching in strings,Software—Practice and Experience,10 (1980), 501–506.CrossRefGoogle Scholar
  13. [HS]
    A. Hume and D. M. Sunday, Fast string searching,Software—Practice and Experience 21(11) (1991), 1221–1248.CrossRefGoogle Scholar
  14. [KMP]
    D. E. Knuth, J. H. Morris Jr and V. R. Pratt, Fast pattern matching in strings,SIAM J. Comput. 6 (1977), 323–350.zbMATHCrossRefMathSciNetGoogle Scholar
  15. [L]
    T. Lecroq, A variation on Boyer-Moore algorithm,Theoret. Comput. Sci. 92 (1992), 119–144.zbMATHCrossRefMathSciNetGoogle Scholar
  16. [R]
    W. Rytter, A correct preprocessing algorithm for Boyer-Moore string searching,SIAM J. Comput. 9 (1980), 509–512.zbMATHCrossRefMathSciNetGoogle Scholar
  17. [Y]
    A. C. Yao, The complexity of pattern matching for a random string,SIAM J. Comput. 8 (1979), 368–387.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag New York Inc. 1994

Authors and Affiliations

  • M. Crochemore
    • 1
  • A. Czumaj
    • 2
  • L. Gasieniec
    • 2
  • S. Jarominek
    • 2
  • T. Lecroq
    • 1
  • W. Plandowski
    • 2
  • W. Rytter
    • 2
  1. 1.LITP, Institut Blaise PascalUniversité Paris 7Paris Cedex 05France
  2. 2.Institute of InformaticsWarsaw UniversityWarsaw 59Poland

Personalised recommendations