Complexity of Sequential Pattern Matching Algorithms

  • Mireille Régnier
  • Wojciech Szpankowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1518)

Abstract

We formally define a class of sequential pattern matching algorithms that includes all variations of Morris-Pratt algorithm. For the last twenty years it was known that the complexity of such algorithms is bounded by a linear function of the text length. Recently, substantial progress has been made in identifying lower bounds. We now prove there exists asymptotically a linearity constant for the worst and the average cases. We use Subadditive Ergodic Theorem and prove an almost sure convergence. Our results hold for any given pattern and text and for stationary ergodic pattern and text. In the course of the proof, we establish some structural property, namely, the existence of “unavoidable positions” where the algorithm must stop to compare. This property seems to be uniquely reserved for Morris-Pratt type algorithms (e.g., Boyer and Moore algorithm does not possess this property).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Apostolico and R. Giancarlo, The Boyer-Moore-Galil String Searching Strategies Revisited, SIAM J. Compt., 15, 98–105, 1986.MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    R. Baeza-Yates and M. Régnier, Average Running Time of Boyer-Moore-Horspool Algorithm, Theoretical Computer Science, 92, 19–31, 1992.MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, 1968.MATHGoogle Scholar
  4. 4.
    A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45, 1989.MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    R. Boyer and J. Moore, A fast String Searching Algorithm, Comm. of the ACM, 20, 762–772, 1977.CrossRefGoogle Scholar
  6. 6.
    D. Breslauer, L. Colussi, and L. Toniolo, Tight Comparison Bounds for the String Prefix-Matching Problem, Proc. 4-th Symposium on Combinatorial Pattern Matching, Padova, Italy, 11–19. Springer-Verlag, 1993.Google Scholar
  7. 7.
    R. Cole, R. Hariharan, M. Paterson, and U. Zwick, Tighter Lower Bounds on the Exact Complexity of String Matching, SIAM J. Comp., 24, 30–45, 1995.MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    L. Colussi, Z. Galil, and R. Giancarlo, On the Exact Complexity of String Matching, Proc. 31-st Annual IEEE Symposium on the Foundations of Computer Science, 135–143. IEEE, 1990.Google Scholar
  9. 9.
    M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York 1995.Google Scholar
  10. 10.
    Y. Derriennic, Une Théoréme Ergodique Presque Sous Additif, Ann. Probab., 11, 669–677, 1983.MATHMathSciNetCrossRefGoogle Scholar
  11. 11.
    R. Durrett, Probability: Theory and Examples, Wadsworth & Brooks/Cole Books, Pacific Grove, California, 1991.MATHGoogle Scholar
  12. 12.
    L. Guibas and A. Odlyzko, A New Proof of the Linearity of the Boyer-Moore String Matching Algorithm, SIAM J. Compt., 9, 672–682, 1980.MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    C. Hancart, Analyse Exacte et en Moyenne d’Algorithmes de Recherche d’un Motif dans un Texte, These, l’Universite Paris 7, 1993.Google Scholar
  14. 14.
    P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, J. Combinatorial Theory. Ser. A, 66, 237–269, 1994.MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    J.F.C. Kingman, Subadditive Processes, in Ecole d’Eté de Probabilités de Saint-Flour V-1975, Lecture Notes in Mathematics, 539, Springer-Verlag, Berlin 1976.Google Scholar
  16. 16.
    D.E. Knuth, J. Morris and V. Pratt, Fast Pattern Matching in Strings, SIAM J. Compt., 6, 189–195, 1977.MathSciNetGoogle Scholar
  17. 17.
    H. Mahmoud, M. Régnier and R. Smythe, Analysis of Boyer-Moore-Horspool String Matching Heuristic, in Random Structures and Algorithms, 10, 169–186, 1996.CrossRefGoogle Scholar
  18. 18.
    M. Régnier, Knuth-Morris-Pratt Algorithm: An Analysis, Proc. Mathematical Foundations for Computer Science 89, Porubka, Poland, Lecture Notes in Computer Science, 379, 431–444. Springer-Verlag, 1989.Google Scholar
  19. 19.
    I. Simon, String Matching Algorithms and Automata, First South-American Work-shop on String Processing 93, Belo Horizonte, Brazil, R. Baeza-Yates and N. Ziviani, ed, 151–157, 1993.Google Scholar
  20. 20.
    W. Szpankowski, Asymptotic Properties of Data Compression and Suffix Trees, IEEE Trans. Information Theory, 39, 1647–1659, 1993.MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    M. Waterman, Introduction to Computational Biology, Chapman & Hall, London 1995.MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Mireille Régnier
    • 1
  • Wojciech Szpankowski
    • 2
  1. 1.INRIA, RocquencourtLe ChesnayFrance
  2. 2.Dept. Computer SciencePurdue UniversityW. LafayetteUSA

Personalised recommendations