Abstract
Automata play a very important role in the design of string matching algorithms as their use has always led to elegant and very efficient solutions in practice. In this paper, we present a new general approach to the exact string matching algorithm based on a non-standard efficient simulation of the suffix automaton of the pattern and give a specific efficient implementation of it. To show the effectiveness of our algorithm, we perform an extensive comparison against the most effective alternatives known in literature in terms of search speed and shift advancements. From our experimental results the new algorithm turns out to be very efficient in practical cases scaling much better when the length of the pattern increases, improving the search speed by nearly 10 times under suitable conditions.
This work has been supported by G.N.C.S., Istituto Nazionale di Alta Matematica “Francesco Severi” and by Programma Ricerca di Ateneo UNICT 2020-22 linea 2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We notice that the EPSM algorithm is designed for simply counting the number of matching occurrences without reporting the corresponding positions.
- 2.
Source code is available at: https://github.com/ostafen/unique-factor-matcher.
References
Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)
Cantone, D., Faro, S., Giaquinta, E.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 288–298. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13509-5_26
Cantone, D., Faro, S., Giaquinta, E.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Inf. Comput. 213, 3–12 (2012). https://doi.org/10.1016/j.ic.2011.03.006
Cantone, D., Faro, S., Pavone, A.: Speeding up string matching by weak factor recognition. In: Proceedings of the Prague Stringology Conference 2017, pp. 42–50 (2017). http://www.stringology.org/event/2017/p05.html
Cantone, D., Faro, S., Pavone, A.: Linear and efficient string matching algorithms based on weak factor recognition. ACM J. Exp. Algorithmics 24(1), 1.8:1–1.8:20 (2019). https://doi.org/10.1145/3301295
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press (1994). http://www-igm.univ-mlv.fr/%7Emac/REC/B1.html
Durian, B., Peltola, H., Salmela, L., Tarhio, J.: Bit-parallel search algorithms for long patterns. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 129–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13193-6_12
Faro, S., Külekci, M.O.: Fast packed string matching for short patterns. In: Proceedings of the 15th Meeting on Algorithm Engineering and Experiments, pp. 113–121. SIAM (2013). https://doi.org/10.1137/1.9781611972931.10
Faro, S., Külekci, M.O.: Fast and flexible packed string matching. J. Discret. Algorithms 28, 61–72 (2014). https://doi.org/10.1016/j.jda.2014.07.003
Faro, S., Lecroq, T.: A fast suffix automata based algorithm for exact online string matching. In: Moreira, N., Reis, R. (eds.) CIAA 2012. LNCS, vol. 7381, pp. 149–158. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31606-7_13
Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Comput. Surv. 45(2), 13:1–13:42 (2013). https://doi.org/10.1145/2431211.2431212
Faro, S., Lecroq, T., Borzi, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference, pp. 99–111 (2016). http://www.stringology.org/event/2016/p09.html
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). https://doi.org/10.1137/0206024
Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: fast extended string matching. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 14–33. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0030778
Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39984-1_7
Uratani, N., Takeda, M.: A fast string-searching algorithm for multiple patterns. Inf. Process. Manag. 29(6), 775–792 (1993). https://doi.org/10.1016/0306-4573(93)90106-N
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Faro, S., Scafiti, S. (2021). Efficient String Matching Based on a Two-Step Simulation of the Suffix Automaton. In: Maneth, S. (eds) Implementation and Application of Automata. CIAA 2021. Lecture Notes in Computer Science(), vol 12803. Springer, Cham. https://doi.org/10.1007/978-3-030-79121-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-79121-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79120-9
Online ISBN: 978-3-030-79121-6
eBook Packages: Computer ScienceComputer Science (R0)