Abstract
We formally define a class of sequential pattern matching algorithms that includes all variations of Morris-Pratt algorithm. For the last twenty years it was known that the complexity of such algorithms is bounded by a linear function of the text length. Recently, substantial progress has been made in identifying lower bounds. We now prove there exists asymptotically a linearity constant for the worst and the average cases. We use Subadditive Ergodic Theorem and prove an almost sure convergence. Our results hold for any given pattern and text and for stationary ergodic pattern and text. In the course of the proof, we establish some structural property, namely, the existence of “unavoidable positions” where the algorithm must stop to compare. This property seems to be uniquely reserved for Morris-Pratt type algorithms (e.g., Boyer and Moore algorithm does not possess this property).
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The project was supported by NATO Collaborative Grant CRG.950060, the ESPRIT III Program No. 7141 ALCOM II, and NSF Grants NCR-9415491, NCR-9804760.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. Apostolico and R. Giancarlo, The Boyer-Moore-Galil String Searching Strategies Revisited, SIAM J. Compt., 15, 98–105, 1986.
R. Baeza-Yates and M. Régnier, Average Running Time of Boyer-Moore-Horspool Algorithm, Theoretical Computer Science, 92, 19–31, 1992.
P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, 1968.
A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45, 1989.
R. Boyer and J. Moore, A fast String Searching Algorithm, Comm. of the ACM, 20, 762–772, 1977.
D. Breslauer, L. Colussi, and L. Toniolo, Tight Comparison Bounds for the String Prefix-Matching Problem, Proc. 4-th Symposium on Combinatorial Pattern Matching, Padova, Italy, 11–19. Springer-Verlag, 1993.
R. Cole, R. Hariharan, M. Paterson, and U. Zwick, Tighter Lower Bounds on the Exact Complexity of String Matching, SIAM J. Comp., 24, 30–45, 1995.
L. Colussi, Z. Galil, and R. Giancarlo, On the Exact Complexity of String Matching, Proc. 31-st Annual IEEE Symposium on the Foundations of Computer Science, 135–143. IEEE, 1990.
M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York 1995.
Y. Derriennic, Une Théoréme Ergodique Presque Sous Additif, Ann. Probab., 11, 669–677, 1983.
R. Durrett, Probability: Theory and Examples, Wadsworth & Brooks/Cole Books, Pacific Grove, California, 1991.
L. Guibas and A. Odlyzko, A New Proof of the Linearity of the Boyer-Moore String Matching Algorithm, SIAM J. Compt., 9, 672–682, 1980.
C. Hancart, Analyse Exacte et en Moyenne d’Algorithmes de Recherche d’un Motif dans un Texte, These, l’Universite Paris 7, 1993.
P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, J. Combinatorial Theory. Ser. A, 66, 237–269, 1994.
J.F.C. Kingman, Subadditive Processes, in Ecole d’Eté de Probabilités de Saint-Flour V-1975, Lecture Notes in Mathematics, 539, Springer-Verlag, Berlin 1976.
D.E. Knuth, J. Morris and V. Pratt, Fast Pattern Matching in Strings, SIAM J. Compt., 6, 189–195, 1977.
H. Mahmoud, M. Régnier and R. Smythe, Analysis of Boyer-Moore-Horspool String Matching Heuristic, in Random Structures and Algorithms, 10, 169–186, 1996.
M. Régnier, Knuth-Morris-Pratt Algorithm: An Analysis, Proc. Mathematical Foundations for Computer Science 89, Porubka, Poland, Lecture Notes in Computer Science, 379, 431–444. Springer-Verlag, 1989.
I. Simon, String Matching Algorithms and Automata, First South-American Work-shop on String Processing 93, Belo Horizonte, Brazil, R. Baeza-Yates and N. Ziviani, ed, 151–157, 1993.
W. Szpankowski, Asymptotic Properties of Data Compression and Suffix Trees, IEEE Trans. Information Theory, 39, 1647–1659, 1993.
M. Waterman, Introduction to Computational Biology, Chapman & Hall, London 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Régnier, M., Szpankowski, W. (1998). Complexity of Sequential Pattern Matching Algorithms. In: Luby, M., Rolim, J.D.P., Serna, M. (eds) Randomization and Approximation Techniques in Computer Science. RANDOM 1998. Lecture Notes in Computer Science, vol 1518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49543-6_16
Download citation
DOI: https://doi.org/10.1007/3-540-49543-6_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65142-0
Online ISBN: 978-3-540-49543-7
eBook Packages: Springer Book Archive