Skip to main content

Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12803))

Included in the following conference series:

Abstract

Automata play a very important role in the design of string matching algorithms as their use has always led to elegant and very efficient solutions in practice. In this paper, we present a new general approach to the exact string matching algorithm based on a non-standard efficient simulation of the suffix automaton of the pattern and give a specific efficient implementation of it. To show the effectiveness of our algorithm, we perform an extensive comparison against the most effective alternatives known in literature in terms of search speed and shift advancements. From our experimental results the new algorithm turns out to be very efficient in practical cases scaling much better when the length of the pattern increases, improving the search speed by nearly 10 times under suitable conditions.

This work has been supported by G.N.C.S., Istituto Nazionale di Alta Matematica “Francesco Severi” and by Programma Ricerca di Ateneo UNICT 2020-22 linea 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We notice that the EPSM algorithm is designed for simply counting the number of matching occurrences without reporting the corresponding positions.

  2. 2.

    Source code is available at: https://github.com/ostafen/unique-factor-matcher.

References

  1. Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  2. Cantone, D., Faro, S., Giaquinta, E.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 288–298. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13509-5_26

    Chapter  Google Scholar 

  3. Cantone, D., Faro, S., Giaquinta, E.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Inf. Comput. 213, 3–12 (2012). https://doi.org/10.1016/j.ic.2011.03.006

    Article  MathSciNet  MATH  Google Scholar 

  4. Cantone, D., Faro, S., Pavone, A.: Speeding up string matching by weak factor recognition. In: Proceedings of the Prague Stringology Conference 2017, pp. 42–50 (2017). http://www.stringology.org/event/2017/p05.html

  5. Cantone, D., Faro, S., Pavone, A.: Linear and efficient string matching algorithms based on weak factor recognition. ACM J. Exp. Algorithmics 24(1), 1.8:1–1.8:20 (2019). https://doi.org/10.1145/3301295

    Article  MathSciNet  MATH  Google Scholar 

  6. Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press (1994). http://www-igm.univ-mlv.fr/%7Emac/REC/B1.html

  7. Durian, B., Peltola, H., Salmela, L., Tarhio, J.: Bit-parallel search algorithms for long patterns. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 129–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13193-6_12

    Chapter  Google Scholar 

  8. Faro, S., Külekci, M.O.: Fast packed string matching for short patterns. In: Proceedings of the 15th Meeting on Algorithm Engineering and Experiments, pp. 113–121. SIAM (2013). https://doi.org/10.1137/1.9781611972931.10

  9. Faro, S., Külekci, M.O.: Fast and flexible packed string matching. J. Discret. Algorithms 28, 61–72 (2014). https://doi.org/10.1016/j.jda.2014.07.003

    Article  MathSciNet  MATH  Google Scholar 

  10. Faro, S., Lecroq, T.: A fast suffix automata based algorithm for exact online string matching. In: Moreira, N., Reis, R. (eds.) CIAA 2012. LNCS, vol. 7381, pp. 149–158. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31606-7_13

    Chapter  MATH  Google Scholar 

  11. Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Comput. Surv. 45(2), 13:1–13:42 (2013). https://doi.org/10.1145/2431211.2431212

    Article  MATH  Google Scholar 

  12. Faro, S., Lecroq, T., Borzi, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference, pp. 99–111 (2016). http://www.stringology.org/event/2016/p09.html

  13. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). https://doi.org/10.1137/0206024

    Article  MathSciNet  MATH  Google Scholar 

  14. Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: fast extended string matching. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 14–33. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0030778

    Chapter  MATH  Google Scholar 

  15. Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39984-1_7

    Chapter  MATH  Google Scholar 

  16. Uratani, N., Takeda, M.: A fast string-searching algorithm for multiple patterns. Inf. Process. Manag. 29(6), 775–792 (1993). https://doi.org/10.1016/0306-4573(93)90106-N

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Scafiti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Faro, S., Scafiti, S. (2021). Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton. In: Maneth, S. (eds) Implementation and Application of Automata. CIAA 2021. Lecture Notes in Computer Science(), vol 12803. Springer, Cham. https://doi.org/10.1007/978-3-030-79121-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79121-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79120-9

  • Online ISBN: 978-3-030-79121-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics