Skip to main content

Efficient Experimental String Matching by Weak Factor Recognition*

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Included in the following conference series:

Abstract

We introduce a new notion of weak factor recognition that is the foundation of new data structures and on-line string matching al- gorithms. We define a new automaton built on a string p = p 1 p 2 ... p m that acts like an oracle on the set of factors p i ... p j . If a string is recog- nized by this automaton, it may be a factor of p. But, if it is rejected, it is surely not a factor. We call it factor oracle. More precisely, this au- tomaton is acyclic, recognizes at least the factors of p, has m+ 1 states and a linear number of transitions. We give a very simple sequential construction algorithm to build it. Using this automaton, we design an efficient experimental on-line string matching algorithm (we conjecture its optimality in regard to the experimental results) that is really simple to implement. We also extend the factor oracle to predict that a string could be a suffix (i.e. in the set p i ::: p m ) of p. We obtain the suffix or- acle, that enables in some cases a tricky improvement of the previous string matching algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Allauzen. Combinatoire sur les mots et recherche de motifs (Combinatorics on words and string matching). PhD thesis, Université de Marne-la-Vallée, 2001.

    Google Scholar 

  2. C. Allauzen, M. Crochemore, and M. Raffinot. Factor oracle: a new structure for pattern matching. In Miroslav Bartosek Jan Pavelka, Gerard Tel, editor, SOF-SEM’99, Theory and Practice of Informatics (Brno, 1999), number 1725 in LNCS, pages 291–306. Springer-Verlag, 1999.

    Google Scholar 

  3. C. Allauzen and M. Raffinot. Factor oracle of a set of words. Technical Report 99-11, Institut Gaspard-Monge, Université de Marne-la-Vallée, 1999. http://www-igm.univ-mlv.fr/~raffinot/ftp/IGM99-11-english.ps.gz.

  4. R.A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363–376, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  5. A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M.T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci., 40(1):31–55, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  6. R.S. Boyer and J.S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, 1977.

    Article  MATH  Google Scholar 

  7. M. Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63–86, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  8. M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.

    Google Scholar 

  9. A. Czumaj, M. Crochemore, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string-matching algorithms. Algorithmica, 12:247–267, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  10. D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, August 1990.

    Article  Google Scholar 

  11. A.C. Yao. The complexity of pattern matching for a random string. SIAM J. Comput., 8(3):368–387, 1979.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allauzen, C., Crochemore, M., Raffinot, M. (2001). Efficient Experimental String Matching by Weak Factor Recognition* . In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-48194-X_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42271-6

  • Online ISBN: 978-3-540-48194-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics