Abstract
We introduce a new notion of weak factor recognition that is the foundation of new data structures and on-line string matching al- gorithms. We define a new automaton built on a string p = p 1 p 2 ... p m that acts like an oracle on the set of factors p i ... p j . If a string is recog- nized by this automaton, it may be a factor of p. But, if it is rejected, it is surely not a factor. We call it factor oracle. More precisely, this au- tomaton is acyclic, recognizes at least the factors of p, has m+ 1 states and a linear number of transitions. We give a very simple sequential construction algorithm to build it. Using this automaton, we design an efficient experimental on-line string matching algorithm (we conjecture its optimality in regard to the experimental results) that is really simple to implement. We also extend the factor oracle to predict that a string could be a suffix (i.e. in the set p i ::: p m ) of p. We obtain the suffix or- acle, that enables in some cases a tricky improvement of the previous string matching algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Allauzen. Combinatoire sur les mots et recherche de motifs (Combinatorics on words and string matching). PhD thesis, Université de Marne-la-Vallée, 2001.
C. Allauzen, M. Crochemore, and M. Raffinot. Factor oracle: a new structure for pattern matching. In Miroslav Bartosek Jan Pavelka, Gerard Tel, editor, SOF-SEM’99, Theory and Practice of Informatics (Brno, 1999), number 1725 in LNCS, pages 291–306. Springer-Verlag, 1999.
C. Allauzen and M. Raffinot. Factor oracle of a set of words. Technical Report 99-11, Institut Gaspard-Monge, Université de Marne-la-Vallée, 1999. http://www-igm.univ-mlv.fr/~raffinot/ftp/IGM99-11-english.ps.gz.
R.A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363–376, 1991.
A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M.T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci., 40(1):31–55, 1985.
R.S. Boyer and J.S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, 1977.
M. Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63–86, 1986.
M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.
A. Czumaj, M. Crochemore, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string-matching algorithms. Algorithmica, 12:247–267, 1994.
D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, August 1990.
A.C. Yao. The complexity of pattern matching for a random string. SIAM J. Comput., 8(3):368–387, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Allauzen, C., Crochemore, M., Raffinot, M. (2001). Efficient Experimental String Matching by Weak Factor Recognition* . In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_5
Download citation
DOI: https://doi.org/10.1007/3-540-48194-X_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive